Hi everyone,
Our team is building a commercial, multi-tenant AI Agent service powered by Gemini 2.5 Pro. We’ve run into a specific challenge while designing our granular, pay-as-you-go billing system for our clients and would appreciate your insights.
Business and Technical Context:
-
Core Model: Gemini 2.5 Pro (currently for text-only I/O).
-
Service Model: We serve multiple clients through a single Google Cloud project’s API and need to perform accurate, independent cost attribution for each.
-
Features Not Used: We are currently not using explicit Context Caching Storage or the Batch API.
Our Current Billing Logic and Questions: We’ve observed that in the usage metadata of the Gemini 2.5 Pro API response, total_token_count is not the sum of prompt_token_count and candidates_token_count.
Consequently, we’ve designed the following cost calculation process for each API call:
-
Get API Response Data: We log
prompt_token_count,candidates_token_count, andtotal_token_countfrom theusagemetadata. -
Calculate “Thinking Tokens”: We calculate this portion using the formula:
thinking_tokens = total_token_count - prompt_token_count - candidates_token_count. -
Calculate Per-Call Cost: We use the following formula for billing:
Call Cost = (prompt_token_count * input_price) + ((candidates_token_count + thinking_tokens) * output_price) -
Aggregate Client Bills: We sum the costs of individual calls to generate the final bill.
Our Questions for the Community:
-
Confirmation on
total_token_count: Is our understanding oftotal_token_countcorrect? Is it indeed composed ofprompt,candidates, andthinkingtokens? Is inferringthinking_tokensvia subtraction the officially recognized or a community-accepted method? -
Reasonableness of the Billing Model: Is it reasonable to group “thinking tokens” with “output tokens” (candidates) and charge them at the same output price? How do you explain and bill for this “intermediate processing” cost to your own clients?
-
Best Practices: Are there official recommendations or community best practices for granular, multi-tenant billing specifically for Gemini 2.5 Pro that you could share?
-
Quota Details: As a side note, we are still looking for documentation on the detailed rate limits (TPM/RPM) for different billing tiers. If you have a link, we’d greatly appreciate it.
Our primary goal is to create a billing system that is transparent and fair to our clients while accurately covering our own costs.
Thanks in advance for sharing your experience and advice