Token counting mismatch between AI Studio Playground and API usageMetadata when using Function Calling

I am building a service using the Gemini API and its Function Calling feature.
While testing token usage, I noticed that the token counts shown in the AI Studio Playground
do not match the usageMetadata values returned from the API.
This makes it unclear which values are actually used for billing.

Here is the behavior I am seeing:

  1. In AI Studio Playground

    • I send a user query.
    • The model returns a function call JSON.
    • I execute the function separately and paste the function result (JSON) back into the Playground.
    • The longer the function result JSON becomes, the more the “Output Tokens” of “Token Usage” and “Output token cost” of “Cost Estimation” increase.
    • This suggests that the function result is being counted as output tokens.
  2. In the actual API (generateContent)

    • Step 1: Model generates the function call JSON → counted as candidatesTokenCount (expected).
    • Step 2: I send the function result JSON back as input.
    • In this step, increasing the size of the function result only increases the promptTokenCount, not the output count.
    • This matches the billing model as I understand it:
      function results should be counted as input tokens.

Based on the documentation, I believe:

  • Everything passed into a generateContent call (system_instruction, tools, history, and function results)
    should count as input tokens (promptTokenCount).
  • Only model-generated content should count as output tokens (candidatesTokenCount).

Because Playground is attributing more tokens to “Output” and increasing “Estimated Cost” when I enlarge the function result, the behavior seems inconsistent with the API.


My questions:

  1. Which values are authoritative for billing:
    Playground’s “Estimated Cost” or the API’s usageMetadata?

  2. In Function Calling:
    Is the function result JSON always counted as input tokens (promptTokenCount) in the next model call?

  3. Is the Playground UI misclassifying or double-counting tokens, especially across multi-step Function Calling flows?

  4. If this mismatch is unintended, is it a known issue?

I can provide full logs and screenshots, but I also included a simplified reproducible scenario below.


Optional: Detailed Reproduction Steps (for anyone who wants deeper context)

To test token behavior more clearly, I prepared a minimal setup:

Function definition

A function that takes a team name and returns a list of members:

  • Input: team name (e.g., "Red")
  • Output: JSON array listing team members

Example (simplified):

{
  "name": "getTeamMembers",
  "description": "Returns the list of members in a team",
  "parameters": {
    "type": "object",
    "properties": {
      "teamName": { "type": "string" }
    },
    "required": ["teamName"]
  }
}

Prompt used

What is the number of members in Team 'Red'? 
Please fetch the member list using the function and tell me only the final count.

Experiment

I executed the Function Calling flow twice:

  1. Case A: The function returns a small list (~10 members)

  2. Case B: The function returns a large list (1000 members)

Observed results

This mismatch is why I’m trying to confirm the intended behavior.

Thank you very much!
Happy to provide more data if needed.

Hello,

Thank you for using the forum. We were able to reproduce the issue with the help of the instructions you provided, and we will pass this information on to the Gemini development team.