How can I let the model receive multiple images at once #60

bibibabibo26 · 2024-07-03T07:29:06Z

Can your model be fed with multiple images at once, such as different frames of a video? Or can it be modified so that the input to the language model is the tokens of multiple images at once?

mmaaz60 · 2024-07-08T19:39:50Z

Hi @bibibabibo26,

Thank you for your interest in our work. The current GLaMM model is designed to work with single image only. However, it can be modified to accept multiple images. At the LLM part, it would be relatively simpler as we can consider multiple images as video frames and concatenate the images. In the grounding part, we may have to introduce special tokens to decide if the generated <seg> token refers to the first of second image. Alternatively, we need to design a segmentation encoder-decoder architecture that can work with multiple images.

Please do share if you have made any progress towards this interesting research direction. Good Luck!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I let the model receive multiple images at once #60

How can I let the model receive multiple images at once #60

bibibabibo26 commented Jul 3, 2024

mmaaz60 commented Jul 8, 2024

How can I let the model receive multiple images at once #60

How can I let the model receive multiple images at once #60

Comments

bibibabibo26 commented Jul 3, 2024

mmaaz60 commented Jul 8, 2024