How are “short input”, “long input”, and “cached input” token costs calculated for Gemini 2.5 Flash?

I’m using the Gemini 2.5 Flash model via the Gemini API (Google Cloud billing) and I’m trying to understand how my usage is being broken down and billed.

In the Billing → Reports view I see multiple SKUs for the same day, for example:

  • Generate content input token count gemini 2.5 flash short input text

  • Generate content output token count gemini 2.5 flash short input text

  • Generate content input token count gemini 2.5 flash long input text

  • Generate content cached input token count gemini 2.5 flash short input text

  • Generate content cached input token count gemini 2.5 flash long input text

I have two questions:

  1. What is the exact threshold that decides whether a request is billed as “short input text” vs “long input text”?

    • Is it based on total input tokens per request?

    • If yes, what is the cutoff number of tokens for Gemini 2.5 Flash?

  2. How are the “cached input token count” SKUs calculated?

    • Under what conditions are tokens counted as cached input?

    • Are cached tokens billed at a different rate, and how can I estimate that cost from my side when calling the API?

My goal is to reproduce these costs on my end (given input/output token counts per request) and to understand when my prompts will fall into “short”, “long”, and “cached” buckets.

If there’s an official doc or example that explains this mapping in detail, a link to that would be very helpful.

Thanks in advance!

Hello,

Here is a clarification regarding input types and caching:

  • Short vs. Long Input: The distinction between “short input” and “long input” for Gemini models is determined by the total token count. The exact threshold that separates the two is 200,000 tokens. Depending on the model, these may be billed at different rates. However, the pricing for gemini-2.5-flash is the same for both short and long inputs.
  • Cached Input: This refers to tokens that the model does not need to re-process because they were stored from a previous call (either automatically or manually).
    • Implicit Caching: This automatically reduces costs when you repeat long prefixes. There are no storage fees associated with this.
    • Explicit Caching: This involves manually creating a cache. You typically pay a one-time “initialization” fee (at the standard input rate) and a storage fee (per hour); these tokens are then charged at a discounted rate for subsequent use.

References:

1 Like