How does AI Studio calculate "Token Usage" vs. Actual Context Window?

Tianyi_Hao · March 17, 2026, 7:59am

Hi everyone,

I’m trying to understand the exact mechanics of the “Token Usage” (Cost Estimation) panel in the Google AI Studio Chat interface (specifically using Gemini 3.1 Pro Preview). I’ve run into some confusing discrepancies between what the UI displays and how the actual context window/stateless API seems to work.

Here are the objective facts from my recent tests:

Observation 1: The UI Input counter does not append conversation history.

I sent an initial prompt of exactly 15 tokens. The model generated a 1,756-token output. The UI panel updated to show → Input tokens: 15, Output tokens: 1756.
I then sent a follow-up prompt of exactly 2 tokens (“Continue”). The model replied with 1,580 tokens.
After this second turn, the panel updated to show → Input tokens: 17 (which is just 15 + 2), and Output tokens: 3336 (1756 + 1580).
This proves the Input tokens metric in the UI is just a cumulative sum of the raw text I manually typed, rather than the actual payload (History + New Prompt) required by a stateless API.

Observation 2: Massive Output tokens and crossing the 1M limit without errors.
In a much longer session, my panel showed Input tokens: ~355k and Output tokens: ~703k, resulting in a Total tokens: ~1.05M. The UI progress bar turned red, indicating I had exceeded the 1,048,576 maximum context window. However, I was able to continue chatting perfectly fine without any “context window exceeded” errors.

The massive Output count seems to include the visible text plus the massive hidden Chain-of-Thought (CoT) processes (the “Thoughts” feature).

Based on these observations, I have a few specific questions for the community or the dev team:

How is the actual conversation history sent to the backend? Since the UI’s Input tokens counter clearly ignores previous AI outputs, what is the actual size of the payload being sent?
How are CoT / “Thoughts” handled in the context history? Given that the historical Output tokens (including raw CoT) are huge, sending them all back would instantly blow up the 1M context limit. Does the backend completely discard the raw CoT after generation?
Are thought summaries or digital signatures used? To maintain reasoning continuity without passing back hundreds of thousands of raw CoT tokens, does the frontend only pass back the visible text? Or does it pass back a lightweight “thought summary” or some kind of encrypted digital signature/state token to the backend?
How can I track the real context size? For developers writing long-form content (like novels), how can we accurately monitor the true context window usage per request to avoid silent truncation, since this UI panel seems to only act as a cumulative billing ledger?

Any insights into the actual engineering behind this UI and the API payload construction would be greatly appreciated!

Topic		Replies	Views
Token counting in Google AI Studio Playground vs API Gemini API ai-studio , gemini-api	2	143	January 26, 2026
AI Studio Cost Estimation Panel: Technically Confusing, and Potentially Undermines Billing Transparency Google AI Studio ai-studio , feedback , api	0	93	March 21, 2026
How Do I Accurately Calculate Gemini 2.5 Pro API Pricing? Google AI Studio api , billing	2	1389	January 23, 2026
Understad token count Gemini API api , prompt	4	271	February 27, 2025
1M Token Context? Then Why Does Gemini Pro Forget After 150K? Gemini API models , gemini , ai , memory	0	200	March 2, 2026

How does AI Studio calculate "Token Usage" vs. Actual Context Window?

Related topics