This is about Vertex AI, so maybe not the correct place, but perhaps a user here also has experience with Vertex AI.
I’m trying to understand the pricing for image inputs to Gemini Pro, which seems to be contradictory.
On this page, it’s written that the token cost per image is calculated by splitting the image into tiles of 258 tokens each. So a larger image is likely to incur higher token costs than a smaller one. My rough calculations say a maximum sized image would require 16 tiles x 258 tokens/tile = 4,128 tokens.
Here’s how tokens are calculated for images:
- Gemini 1.0 Pro Vision: Each image accounts for 258 tokens.
- Gemini 1.5 Flash and Gemini 1.5 Pro:
- If both dimensions of an image are less than or equal to 384 pixels, then 258 tokens are used.
- If one dimension of an image is greater than 384 pixels, then the image is cropped into tiles. Each tile size defaults to the smallest dimension (width or height) divided by 1.5. If necessary, each tile is adjusted so that it’s not smaller than 256 pixels and not greater than 768 pixels. Each tile is then resized to 768x768 and uses 258 tokens.
But here, we see a fixed cost of $0.001315 per image (current Gemini Pro pricing). No mention of tiles or maximum dimensions for that price.
This seems to be contradictory. Anybody know which is the correct info?