Python SDK Support for Detecting Output Length Overrun

farhanhubble · March 24, 2025, 5:02am

Does the Python SDK support raising exceptions when the output is cut off due to the current 8192 token limit?

I’m trying to convert PDFs to text. Some files are big; while they fit into the input context window, the output gets truncated. I expect the SDS to raise an exception in such cases.

Kiran_Sai_Ramineni · March 24, 2025, 5:30am

Hi @farhanhubble, At present I think Gemini will truncate the output when the output token count exceeds the max limit instead of raising an exception. If you want the exception to be raised you write a code for this case. for example,

from google import genai

client = genai.Client(api_key=api)

response = client.models.generate_content_stream(
    model="gemini-2.0-flash",
    contents=["explain about ai"]
)
for chunk in response:
    print(chunk.text, end="")
    total_tokens=chunk.usage_metadata.total_token_count
    if total_tokens > 100:
      raise("token limit exceeded")

Thank You.

jkirstaetter · March 24, 2025, 5:36pm

Hi @farhanhubble

Welcome to the forum.

Look for the finishReason in the response object.

Cheers

Topic		Replies	Views
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	1173	June 11, 2025
Gemini Python SDK, response schema, data cut off, with max_output_tokens Gemini API python	1	278	October 16, 2024
How can I know how much tokens are generated from Gemini model from OpenAI SDK Gemini API api	2	146	June 1, 2025
Handling Token Limits in Gemini-1.5-Flash API Responses Gemini API gemini-15 , api	1	205	September 27, 2024
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	14	3732	March 6, 2026

Python SDK Support for Detecting Output Length Overrun

Related topics