Bug: input tokens from response_schema not included in response usage_metadata.prompt_token_count (or anywhere)

Adrian_Cable · September 29, 2025, 4:21am

Hi - I wonder if anyone Google-side can help with a roadblock here. Gemini API documentation indicates that response_schema renders the schema into the input tokens / prompt, and therefore has an associated cost (per token count), which makes perfect sense. We need to accurately model the cost of each API call, which we are doing (assuming here no cached tokens & no function calls for simplicity) as prompt_token_count * input_token_cost + candidates_token_count * output_token_cost.

Problem is that input tokens added as a result of a response_schema being specified in the generate_content API call do not seem to be counted anywhere. Does this mean we are not being charged for response_schema tokens? If we are being charged, it doesn’t seem the generate_content API call includes these tokens in the usage_metadata, so no way to know how much we are being billed for the call.

Example:

from pydantic import BaseModel, Field

# ...

class TestSchema(BaseModel):
    output: str = Field(description="model output")
    test1: str = Field(description="always output: 'one'")
    test2: str = Field(description="always output: 'two'")
    test3: str = Field(description="always output: 'three'")

# ...

resp_with_schema = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    config=types.GenerateContentConfig(
        temperature=0,
        candidate_count=1,
        thinking_config=types.ThinkingConfig(thinking_budget=0),
        response_mime_type="application/json",
        response_schema=TestSchema,
    ),
    contents=["Say hello."],
)

print(resp_with_schema.usage_metadata, resp_with_schema.candidates[0].content.parts[0].text)

resp_without_schema = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    config=types.GenerateContentConfig(
        temperature=0,
        candidate_count=1,
        thinking_config=types.ThinkingConfig(thinking_budget=0),
    ),
    contents=["Say hello."],
)

print(resp_without_schema.usage_metadata, resp_without_schema.candidates[0].content.parts[0].text)

You’ll see this test code runs fine - the schema is respected when specified in response_schema. But the usage_metadata in both cases on the input side is the same (prompt_token_count = 4, which clearly doesn’t include the response schema tokens).

Is this an API bug? In any case, how do we know the total input token count, including the response_schema tokens, so we can multiply by the input token price to compute the correct cost?

Thank you!

-Adrian

Lalit_Kumar · September 30, 2025, 9:34am

Hello,

Thank you raising your concern with us, we will discuss this issue internally and get back to you as soon as possible.

Thank you for your patience.

Topic		Replies	Views
Gemini 1.5 Pro charges x6 more tokens than expected on text prompts Gemini API gemini-15 , bug , api , gemini-api , gemini	8	309	June 10, 2024
Count of token with cache_context same as without cache_context Gemini API context_caching	2	173	November 14, 2025
Question with "Usage" in OpenAI compatibility Gemini API openai_compatibility	3	154	August 22, 2025
Token counting mismatch between AI Studio Playground and API usageMetadata when using Function Calling Google AI Studio ai-studio , billing , function-calling	1	71	January 6, 2026
Token counts mismatch - 9x discrepancy! Gemini API bug , api	9	625	April 17, 2025

Bug: input tokens from response_schema not included in response usage_metadata.prompt_token_count (or anywhere)

Related topics