How are “short input”, “long input”, and “cached input” token costs calculated for Gemini 2.5 Flash?

I’m using the Gemini 2.5 Flash model via the Gemini API (Google Cloud billing) and I’m trying to understand how my usage is being broken down and billed.

In the Billing → Reports view I see multiple SKUs for the same day, for example:

  • Generate content input token count gemini 2.5 flash short input text

  • Generate content output token count gemini 2.5 flash short input text

  • Generate content input token count gemini 2.5 flash long input text

  • Generate content cached input token count gemini 2.5 flash short input text

  • Generate content cached input token count gemini 2.5 flash long input text

I have two questions:

  1. What is the exact threshold that decides whether a request is billed as “short input text” vs “long input text”?

    • Is it based on total input tokens per request?

    • If yes, what is the cutoff number of tokens for Gemini 2.5 Flash?

  2. How are the “cached input token count” SKUs calculated?

    • Under what conditions are tokens counted as cached input?

    • Are cached tokens billed at a different rate, and how can I estimate that cost from my side when calling the API?

My goal is to reproduce these costs on my end (given input/output token counts per request) and to understand when my prompts will fall into “short”, “long”, and “cached” buckets.

If there’s an official doc or example that explains this mapping in detail, a link to that would be very helpful.

Thanks in advance!