Hi,
I’m working with the Gemini 2.0 flash lite model and have encountered an unexpected behavior related to image input sizes and token count.
Here’s what I’m seeing:
- When I pass a single image along with a prompt, the image (e.g., sized larger than 384×384) gets processed and takes up ~1400 tokens, which makes sense.
- However, when the prompt also includes sample/reference images (e.g., for in-context examples), all images — including my main input image — are tokenized as 258 tokens per image regardless of their dimensions
- I tried resizing the images manually to optimize token usage, but in the second case (with sample images), manual resizing seems to have no effect on token count.
I couldn’t find anything in the Gemini docs that describes this behavior. Is this expected?
Any guidance or clarification would be appreciated.
Thanks!