I’m using gemini-2.5-flash-preview-05-20
(as I started my current project before the 06-17
version and want consistent results) with the Gemini API. However, I’m having trouble understanding my billing.
According to Google AI Studio, the input/output pricing for non-thinking mode should be $0.15/$0.60 per 1M tokens.
Based on my estimates, I’ve processed approximately 48M input tokens and 30M output tokens, so I would expect to be charged around $26. However, my current billing is close to $60.
I’m pretty sure I’m missing something, but I can’t identify what it is.
Would it be possible to know (or confirm) the exact pricing for this model and also the actual usage (# tokens)?
Thank you!
Hey @Cheshire_Cat — that does sound confusing. If your token estimates are accurate, the billing seems off. It’s best to check the detailed usage breakdown in your Google Cloud Console and confirm if any extra charges (like thinking mode or retries) are being applied. Reaching out to support might help clarify the pricing for your specific model version.
Thanks!
Hi @Deepakishore, thank you for your reply.
I tried to check the detailed pricing breakdown in the Google Cloud Console, and I found that (with still some minor discrepancies), I’m (probably) being charged according to the gemini-2.5-flash
pricing, even though I’m using the preview
version.
I suppose that the 05-20
version has been replaced by the stable release, however in Google AI Studio I now see only the preview-04-17
version, even though here it is still mentioned the preview-05-20
version.
In any case, I didn’t expect to be charged stable pricing while using a preview version.
Hello,
You should only be charged for the model which you are using, so just to verify are you using gemini-2.5-flash-preview-05-20 only or did you switch to any other version?
Hi @Lalit_Kumar, thank you for your reply.
I confirm that I’ve always been using gemini-2.5-flash-preview-05-20
. However, based on the detailed pricing, it seems that I’ve been charged for gemini-2.5-flash
.
I’m not entirely sure about this, as the pricing breakdown in the Google Cloud Console doesn’t specify the exact model version, in the SKU column it only says “Generate content output token count gemini 2.5 flash short output text non-thinking”.
By dividing the charged amount (̀~$62) by the token count (~28.5M), the result (~$2.2) is closer to the pricing for gemini-2.5-flash
(i.e., $2.5) than to the current preview
version (i.e., $0.6).
Hello!
To help me understand, could you please clarify if the token count you provided includes both the input and output tokens, or just the output tokens?
Hi, here’s the detailed pricing breakdown:
SKU |
Usage |
Cost |
Generate content output token count gemini 2.5 flash short output text non-thinking |
28.511.784 count |
~$62 |
Generate content input token count gemini 2.5 flash short input text |
57.630.552 count |
~$16 |
If I calculate the price per M tokens, I get ~$2.2 for output and ~$0.3 for input, which seems more aligned with the pricing for gemini-2.5-flash
(i.e., $2.5 for output and $0.3 for input) than with the current preview
version (i.e., $0.6 for output and $0.15 for input), even though I’ve always been using gemini-2.5-flash-preview-05-20
in my experiments.