Cost Accounting for Gemini 2.5 Pro

Ricky_Wang · October 9, 2025, 7:34am

Hi everyone,

Our team is building a commercial, multi-tenant AI Agent service powered by Gemini 2.5 Pro. We’ve run into a specific challenge while designing our granular, pay-as-you-go billing system for our clients and would appreciate your insights.

Business and Technical Context:

Core Model: Gemini 2.5 Pro (currently for text-only I/O).
Service Model: We serve multiple clients through a single Google Cloud project’s API and need to perform accurate, independent cost attribution for each.
Features Not Used: We are currently not using explicit Context Caching Storage or the Batch API.

Our Current Billing Logic and Questions: We’ve observed that in the usage metadata of the Gemini 2.5 Pro API response, total_token_count is not the sum of prompt_token_count and candidates_token_count.

Consequently, we’ve designed the following cost calculation process for each API call:

Get API Response Data: We log prompt_token_count, candidates_token_count, and total_token_count from the usage metadata.
Calculate “Thinking Tokens”: We calculate this portion using the formula: thinking_tokens = total_token_count - prompt_token_count - candidates_token_count.
Calculate Per-Call Cost: We use the following formula for billing: Call Cost = (prompt_token_count * input_price) + ((candidates_token_count + thinking_tokens) * output_price)
Aggregate Client Bills: We sum the costs of individual calls to generate the final bill.

Our Questions for the Community:

Confirmation on total_token_count: Is our understanding of total_token_count correct? Is it indeed composed of prompt, candidates, and thinking tokens? Is inferring thinking_tokens via subtraction the officially recognized or a community-accepted method?
Reasonableness of the Billing Model: Is it reasonable to group “thinking tokens” with “output tokens” (candidates) and charge them at the same output price? How do you explain and bill for this “intermediate processing” cost to your own clients?
Best Practices: Are there official recommendations or community best practices for granular, multi-tenant billing specifically for Gemini 2.5 Pro that you could share?
Quota Details: As a side note, we are still looking for documentation on the detailed rate limits (TPM/RPM) for different billing tiers. If you have a link, we’d greatly appreciate it.

Our primary goal is to create a billing system that is transparent and fair to our clients while accurately covering our own costs.

Thanks in advance for sharing your experience and advice

Krish_Varnakavi1 · October 9, 2025, 6:58pm

Hi @Ricky_Wang,

The usageMetadata in the Gemini API response provides a detailed breakdown of token consumption. While your method of calculating thinking_tokens by subtraction is logically sound, the API may provide a more direct metric.
The grouping strategy is not only reasonable but is also aligned with Google’s official pricing for Gemini 2.5 Pro.
Interested to know from the community.
Rate limits doc

I hope your app reaches maximum number of users.. Good Luck

Topic		Replies	Views
How Do I Accurately Calculate Gemini 2.5 Pro API Pricing? Google AI Studio api , billing	2	1337	January 23, 2026
### 📌 Questions for the Google Gemini API Team Gemini API api	1	156	March 12, 2025
Do thinkingBudget tokens count toward billed output in Gemini 2.5 Flash? Gemini API api , models , billing	1	168	July 11, 2025
Gemini Live 2.5 token counting - what is the expected cost of long-running video session? Gemini API api , billing	3	176	November 1, 2025
Gemini 2.5 pro - cost-token Gemini API billing	3	447	June 27, 2025

Cost Accounting for Gemini 2.5 Pro

Related topics