Token counts mismatch - 9x discrepancy!

,

Hi,

I am seeing big token count difference between token counting API and the actual token count I get in the metadata of the generation response.

Token counts:

  • count_tokens - 261 tokens
  • generate_content - 2324 tokens

This is a 9x difference between two counting methods - an order of magnitude difference. In both cases, it is the same exact input, only changed count_tokens to generate_content

Which one is correct? Can someone please clarify which one is the one I will be billed for?

Also, there were several issues of similar kind raised here (including last year), none of them got a proper response - either was marked as solved for no good reason (with unhelpful answer), or just ignored.

9x difference token count translates to 9x difference in cost for us. How does one plan any reasonably heavy runs with this? Hopefully, this deserves some attention and a response.

Thanks.

Hi @fat_panda, I have tried to count the tokens using the count_tokens method and also checked the metadata data of the generated response after passing the same img and prompt that was use to count the tokens using the count_tokens method. I have got the same tokens count.

If possible could you please share the img you are using to reproduce the issue.
Thank You.

@Kiran_Sai_Ramineni
I think you are using a very small image.

Here is an example I get with a random image of reasonable size. Should be simple to reproduce.

Let me know if you have questions. Thanks.

@Kiran_Sai_Ramineni

Any updates on this? Can you confirm that you see the same issue? Were you able to reproduce?

Hi @fat_panda, While trying to reproduce the issue with the code given in the image, I have got the same token count mismatch. will escalate this to the team. Thank You.

Hi @fat_panda, This discrepancy while using the large size image is due to how the model processes these images. While using Gemini 2.0 Flash with image inputs with both dimensions <=384 pixels are counted as 258 tokens. Images larger in one or both dimensions are cropped and scaled as needed into tiles of 768x768 pixels, each counted as 258 tokens. Thank You.

Hi Kiran,

Your response is extremely unhelpful and does not even try to address the problem at hand.

The original post asked a straightforward question:

  • There is a token counting API (takes model name and input data)
  • There is token count returned in response metadata (also takes model name and input data)
  • There is 9x discrepancy between the two counts: token counting API shows 9x less than model response metadata.
  • Which one is correct? How do I get correct count before running the model?

My assumption was that these two counts should match so that user in advance can know how many tokens is their input.

So what is the point of token counting API? Is Google’s official response that the output of token counting API has nothing to do with actual token count that user will be billed for (which might be 9x difference)?

This is from the token counting API documentation. It would indeed be helpful to know how to count tokens. Also if u scroll further down, you see

  • Call count_tokens with the input of the request.
    This returns the total number of tokens in the input only. You can make this call before sending the input to the model to check the size of your requests.

So if your response is the official response, then why does the documentation make it sound like token counting API can be used to calculate the input token count?

Eagerly awaiting response. Thanks.

1 Like

Hi @Kiran_Sai_Ramineni, so which of the two methods returns the real token count with images larger then 384 pixels: client.models.count_tokens() or response.usage_metadata ?

Thanks

Hi @mberta, @fat_panda,

The issue has been reported to the team and the fix is in WIP and response.usage_metadata has correct number of tokens used. From 2.0 the image tokens are calculated differently as per the doc but same has not been reflected in the client.models.count_tokens() api. Sorry for the delay.

Thank you!

1 Like

I’m having the same problem, when processing the request using this URL:
https://generativelanguage.googleapis.com/v1/models/gemini-2.0-flash:generateContent?key=my-key

and an image 1501x115, this is my token count:
{
“modality”: “IMAGE”,
“tokenCount”: 3354
},

But when using this URL to check the token count:
https://generativelanguage.googleapis.com/v1/models/gemini-2.0-flash:generateContent?key=my-key

this is the new count:
{
“modality”: “IMAGE”,
“tokenCount”: 258
}

Which looks about correct.

My problem is how do I know that Google is going to charge me for the correct count.

We are getting ready to process 100,000,000+ requests and at 13X, Google will bankrupt us.

Watching the usage in the in the Google Cloud Console is not a very accurate way to know what is going on.

This is very urgent for us as we are under tremendous pressure to begin processing as much and as fast as possible.