Hi, I’m currently using gemini-2.0-flash-exp-image-generation via the API to generate new images from a combination of a short text prompt and an input image. I use the output images to make short animations.
My current calculations make it seem like you can generate ~3000 images for 40 cents. Can someone clarify whether my understanding of the costs of this model is correct? In some ways it seems like it gets treated differently than the other models (dramatically different rate limit of 10 req per min regardless of tier).
Example
My image is 390x508 with a 220 character string. I use model.count_tokens and see that it’s 303 tokens before passing it to the API for generation. The result candidates object shows me that the prompt token count is indeed 303 like I saw, but weirdly the candidate token amount is None. If I take the output image and run it through count_tokens, I calculate 259.
Does that mean for this call, if I pay the quoted 10 cents per 1M for input 40c/1M output:
input cost: 303 tokens @ 10c/1M → 10c for ~3300 inputs
output cost: 259 tokens @ 40c/1M → 40c for ~3000 outputs
That suggests I’d be able to generate around 3300~3800 transformed output images for 50 cents! Not complaining but it’s cheaper than expected. Is this right?
Input Image Size: (390, 508)
Input Text Size (chars): 220
Input Manual Tokens Count: total_tokens=303 cached_content_token_count=None
-------
# Response Candidates: 1
Response Usage cached_content_token_count=None candidates_token_count=None prompt_token_count=303 total_token_count=303
Output Image Size: (833, 1024)
Output Manual Token Count total_tokens=259 cached_content_token_count=None
Extra details: I’ve added billing details and started the free trial over 24 hours ago, but while my number of API calls continues to climb as people try out making animations, the cost section says it’s not available yet. I’d like to get this figured out before I get a monster bill.