"finishReason" : "MAX_TOKENS" - But Text is Empty

Here text is empty but giving finish reason as MAX_TOKEN

{
  "candidates" : [ {
    "content" : {
      "parts" : [ {
        "text" : ""
      } ],
      "role" : "model"
    },
    "finishReason" : "MAX_TOKENS",
    "index" : 0
  } ],
  "usageMetadata" : {
    "promptTokenCount" : 25090,
    "totalTokenCount" : 25090,
    "promptTokensDetails" : [ {
      "modality" : "TEXT",
      "tokenCount" : 25090
    } ]
  },
  "modelVersion" : "models/gemini-2.5-flash-preview-04-17"
}

I’ve mentioned maxOutputTokens as 65536 in request

But still why its completing even within 25K???

3 Likes

Hey @Yasar_Arafath
Welcome to the community!
We are aware of the issue. We will look into it and provide an update here.
Appreciate your patience!
Thank you!

any updates? @Sangeetha_Jana

1 Like

@Yasar_Arafath
Thank you for your patience. We currently don’t have any updates to share at this moment. Rest assured, we will provide you with any new information as soon as it becomes available.
Thank you!

1 Like

Facing a similar issue, the model will start spitting our the same token or the same series of tokens. In this case, it’s whitespace that just repeats until it maxes out, but I’ve had other issues with base64 starting to repeat on the output

chunk: candidates=[Candidate(content=Content(parts=[Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text='                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ')], role='model'), citation_metadata=None, finish_message=None, token_count=None, avg_logprobs=None, finish_reason=<FinishReason.MAX_TOKENS: 'MAX_TOKENS'>, grounding_metadata=None, index=0, logprobs_result=None, safety_ratings=None)] model_version='models/gemini-2.5-flash-preview-05-20' prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=1698, prompt_token_count=4856, total_token_count=11364) automatic_function_calling_history=None parsed=None

@Sangeetha_Jana any updates

Hey Users,
The issue should be resolved now. Please let us know if you are still facing issue.
Thank you!

Hi, I still get the same error using python sdk. this is the raw response result, it shows “finish_reason”: “MAX_TOKENS” which causes response.text = None

{
“candidates”: [
{
“content”: {
“parts”: null,
“role”: “model”
},
“citation_metadata”: null,
“finish_message”: null,
“token_count”: null,
“finish_reason”: “MAX_TOKENS”,
“url_context_metadata”: null,
“avg_logprobs”: null,
“grounding_metadata”: null,
“index”: null,
“logprobs_result”: null,
“safety_ratings”: null
}
],
“create_time”: “2025-06-05T03:24:28.726915Z”,
“response_id”: “7A1BaIOvLOadz7sP3fbcqQk”,
“model_version”: “gemini-2.5-pro-preview-03-25”,
“prompt_feedback”: null,
“usage_metadata”: {
“cache_tokens_details”: null,
“cached_content_token_count”: null,
“candidates_token_count”: null,
“candidates_tokens_details”: null,
“prompt_token_count”: 2882,
“prompt_tokens_details”: [
{
“modality”: “TEXT”,
“token_count”: 2882
}
],
“thoughts_token_count”: 1023,
“tool_use_prompt_token_count”: null,
“tool_use_prompt_tokens_details”: null,
“total_token_count”: 3905,
“traffic_type”: “ON_DEMAND”
},
“automatic_function_calling_history”: ,
“parsed”: null
}

Hey @Mochammad_Zava_Abbiy
It would be great if you can share the code snippet with prompt. So that I can replicate it.
Thank you!

Hi, this issue is still present. I’m using the following code with the new Google GenAI Python SDK:

gen_config = types.GenerateContentConfig(
    system_instruction=instruction,
    top_p=0.9,
    top_k=60,
    temperature=0.7,
    safety_settings=disabled_safety,
    response_mime_type="application/json",
    response_schema=groups_schema,
)

groups_fragment_response = await gemini_client.generate_content(
    contents=json_str,
    genconfig=gen_config,
    gemini_model_type=self.gemini_model_type
)

I’ve tried disabling JSON mode and manually setting the max output tokens to 1,000,000, but I still get this error from time to time:

candidates=[Candidate(content=Content(parts=None, role='model'), citation_metadata=None, finish_message=None, token_count=None, finish_reason=<FinishReason.MAX_TOKENS: 'MAX_TOKENS'>, avg_logprobs=None, grounding_metadata=None, index=0, logprobs_result=None, safety_ratings=None)] create_time=None response_id=None model_version='models/gemini-2.5-flash-preview-05-20' prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=194494, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=194494)], thoughts_token_count=65535, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=260029, traffic_type=None) automatic_function_calling_history=[] parsed=None

1 Like

I had a similar problem with Langchain. It works well with 2.0 Flash, but when I change to 2.5 Flash, it will return the following:
content=‘’ additional_kwargs={} response_metadata={‘prompt_feedback’: {‘block_reason’: 0, ‘safety_ratings’: }, ‘finish_reason’: ‘MAX_TOKENS’, ‘model_name’: ‘gemini-2.5-flash’, ‘safety_ratings’: } id=‘run–f31d267c-8c64-4f2e-9e06-feb8af8737a7-0’ usage_metadata={‘input_tokens’: 31765, ‘output_tokens’: 0, ‘total_tokens’: 34836, ‘input_token_details’: {‘cache_read’: 0}, ‘output_token_details’: {‘reasoning’: 3071}}

 llm = ChatGoogleGenerativeAI(model='gemini-2.5-flash',safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE, 
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE, 
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE, 
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    },api_key=google_api_key,temperature = 0.3,top_p = 0.7,max_output_tokens=3072, timeout=40,max_retries=2)

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"input": state["question"], "context": docs_content})
    try:
        start_time = time.time()
        response = llm.invoke(messages)  
        print(1)
        print(messages)
        print(2)
        print(response) #return empty text
        print(3)
        end_time = time.time()
        elapsed_time = end_time - start_time
        print(elapsed_time)
        print(llm.invoke('explain yourself in 500 words')) #return text
        
    except Exception as e:
        print(f"Error: {e}")

    return {"answer": response.content}

Reference:

Other info:
python 3.10
langchain 0.3.26
langchain-community 0.3.26
langchain-core 0.3.68
langchain-google-genai 2.1.6
google-ai-generativelanguage 0.6.18
langgraph 0.5.0

Same issue, using langchain-google-genai 2.1.7

My debugging logging is like:

Generation[0][0] - text_length: 1149, generation_info: {‘safety_ratings’: , ‘finish_reason’: ‘MAX_TOKENS’, ‘model_name’: ‘gemini-2.5-flash’}

try thinking_budget = 0,include_thoughts= False For version 2.1.8

Looks like it will be better.