This is a 9x difference between two counting methods - an order of magnitude difference. In both cases, it is the same exact input, only changed count_tokens to generate_content
Which one is correct? Can someone please clarify which one is the one I will be billed for?
Also, there were several issues of similar kind raised here (including last year), none of them got a proper response - either was marked as solved for no good reason (with unhelpful answer), or just ignored.
9x difference token count translates to 9x difference in cost for us. How does one plan any reasonably heavy runs with this? Hopefully, this deserves some attention and a response.
Hi @fat_panda, I have tried to count the tokens using the count_tokens method and also checked the metadata data of the generated response after passing the same img and prompt that was use to count the tokens using the count_tokens method. I have got the same tokens count.
Hi @fat_panda, While trying to reproduce the issue with the code given in the image, I have got the same token count mismatch. will escalate this to the team. Thank You.
Hi @fat_panda, This discrepancy while using the large size image is due to how the model processes these images. While using Gemini 2.0 Flash with image inputs with both dimensions <=384 pixels are counted as 258 tokens. Images larger in one or both dimensions are cropped and scaled as needed into tiles of 768x768 pixels, each counted as 258 tokens. Thank You.
Your response is extremely unhelpful and does not even try to address the problem at hand.
The original post asked a straightforward question:
There is a token counting API (takes model name and input data)
There is token count returned in response metadata (also takes model name and input data)
There is 9x discrepancy between the two counts: token counting API shows 9x less than model response metadata.
Which one is correct? How do I get correct count before running the model?
My assumption was that these two counts should match so that user in advance can know how many tokens is their input.
So what is the point of token counting API? Is Google’s official response that the output of token counting API has nothing to do with actual token count that user will be billed for (which might be 9x difference)?
This is from the token counting API documentation. It would indeed be helpful to know how to count tokens. Also if u scroll further down, you see
Call count_tokens with the input of the request.
This returns the total number of tokens in the input only. You can make this call before sending the input to the model to check the size of your requests.
So if your response is the official response, then why does the documentation make it sound like token counting API can be used to calculate the input token count?
Hi @Kiran_Sai_Ramineni, so which of the two methods returns the real token count with images larger then 384 pixels: client.models.count_tokens() or response.usage_metadata ?
The issue has been reported to the team and the fix is in WIP and response.usage_metadata has correct number of tokens used. From 2.0 the image tokens are calculated differently as per the doc but same has not been reflected in the client.models.count_tokens() api. Sorry for the delay.