"finishReason" : "MAX_TOKENS" - But Text is Empty

Yasar_Arafath · May 3, 2025, 7:43pm

Here text is empty but giving finish reason as MAX_TOKEN

{
  "candidates" : [ {
    "content" : {
      "parts" : [ {
        "text" : ""
      } ],
      "role" : "model"
    },
    "finishReason" : "MAX_TOKENS",
    "index" : 0
  } ],
  "usageMetadata" : {
    "promptTokenCount" : 25090,
    "totalTokenCount" : 25090,
    "promptTokensDetails" : [ {
      "modality" : "TEXT",
      "tokenCount" : 25090
    } ]
  },
  "modelVersion" : "models/gemini-2.5-flash-preview-04-17"
}

I’ve mentioned maxOutputTokens as 65536 in request

But still why its completing even within 25K???

Sangeetha_Jana · May 6, 2025, 6:43am

Hey @Yasar_Arafath
Welcome to the community!
We are aware of the issue. We will look into it and provide an update here.
Appreciate your patience!
Thank you!

Yasar_Arafath · May 7, 2025, 6:12pm

any updates? @Sangeetha_Jana

Sangeetha_Jana · May 9, 2025, 6:12am

@Yasar_Arafath
Thank you for your patience. We currently don’t have any updates to share at this moment. Rest assured, we will provide you with any new information as soon as it becomes available.
Thank you!

luke · May 28, 2025, 11:23pm

Facing a similar issue, the model will start spitting our the same token or the same series of tokens. In this case, it’s whitespace that just repeats until it maxes out, but I’ve had other issues with base64 starting to repeat on the output

chunk: candidates=[Candidate(content=Content(parts=[Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text='                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             ')], role='model'), citation_metadata=None, finish_message=None, token_count=None, avg_logprobs=None, finish_reason=<FinishReason.MAX_TOKENS: 'MAX_TOKENS'>, grounding_metadata=None, index=0, logprobs_result=None, safety_ratings=None)] model_version='models/gemini-2.5-flash-preview-05-20' prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=1698, prompt_token_count=4856, total_token_count=11364) automatic_function_calling_history=None parsed=None

Yasar_Arafath · May 30, 2025, 4:01pm

@Sangeetha_Jana any updates

Sangeetha_Jana · June 4, 2025, 6:25am

Hey Users,
The issue should be resolved now. Please let us know if you are still facing issue.
Thank you!

Mochammad_Zava_Abbiy · June 5, 2025, 4:44am

Hi, I still get the same error using python sdk. this is the raw response result, it shows “finish_reason”: “MAX_TOKENS” which causes response.text = None

{
“candidates”: [
{
“content”: {
“parts”: null,
“role”: “model”
},
“citation_metadata”: null,
“finish_message”: null,
“token_count”: null,
“finish_reason”: “MAX_TOKENS”,
“url_context_metadata”: null,
“avg_logprobs”: null,
“grounding_metadata”: null,
“index”: null,
“logprobs_result”: null,
“safety_ratings”: null
}
],
“create_time”: “2025-06-05T03:24:28.726915Z”,
“response_id”: “7A1BaIOvLOadz7sP3fbcqQk”,
“model_version”: “gemini-2.5-pro-preview-03-25”,
“prompt_feedback”: null,
“usage_metadata”: {
“cache_tokens_details”: null,
“cached_content_token_count”: null,
“candidates_token_count”: null,
“candidates_tokens_details”: null,
“prompt_token_count”: 2882,
“prompt_tokens_details”: [
{
“modality”: “TEXT”,
“token_count”: 2882
}
],
“thoughts_token_count”: 1023,
“tool_use_prompt_token_count”: null,
“tool_use_prompt_tokens_details”: null,
“total_token_count”: 3905,
“traffic_type”: “ON_DEMAND”
},
“automatic_function_calling_history”: ,
“parsed”: null
}

Sangeetha_Jana · June 6, 2025, 5:47am

Hey @Mochammad_Zava_Abbiy
It would be great if you can share the code snippet with prompt. So that I can replicate it.
Thank you!

DonnerTech · June 18, 2025, 10:38pm

Hi, this issue is still present. I’m using the following code with the new Google GenAI Python SDK:

gen_config = types.GenerateContentConfig(
    system_instruction=instruction,
    top_p=0.9,
    top_k=60,
    temperature=0.7,
    safety_settings=disabled_safety,
    response_mime_type="application/json",
    response_schema=groups_schema,
)

groups_fragment_response = await gemini_client.generate_content(
    contents=json_str,
    genconfig=gen_config,
    gemini_model_type=self.gemini_model_type
)

I’ve tried disabling JSON mode and manually setting the max output tokens to 1,000,000, but I still get this error from time to time:

candidates=[Candidate(content=Content(parts=None, role='model'), citation_metadata=None, finish_message=None, token_count=None, finish_reason=<FinishReason.MAX_TOKENS: 'MAX_TOKENS'>, avg_logprobs=None, grounding_metadata=None, index=0, logprobs_result=None, safety_ratings=None)] create_time=None response_id=None model_version='models/gemini-2.5-flash-preview-05-20' prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=None, candidates_tokens_details=None, prompt_token_count=194494, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=194494)], thoughts_token_count=65535, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=260029, traffic_type=None) automatic_function_calling_history=[] parsed=None

BarryC7 · July 9, 2025, 4:30am

I had a similar problem with Langchain. It works well with 2.0 Flash, but when I change to 2.5 Flash, it will return the following:
content=‘’ additional_kwargs={} response_metadata={‘prompt_feedback’: {‘block_reason’: 0, ‘safety_ratings’: }, ‘finish_reason’: ‘MAX_TOKENS’, ‘model_name’: ‘gemini-2.5-flash’, ‘safety_ratings’: } id=‘run–f31d267c-8c64-4f2e-9e06-feb8af8737a7-0’ usage_metadata={‘input_tokens’: 31765, ‘output_tokens’: 0, ‘total_tokens’: 34836, ‘input_token_details’: {‘cache_read’: 0}, ‘output_token_details’: {‘reasoning’: 3071}}

 llm = ChatGoogleGenerativeAI(model='gemini-2.5-flash',safety_settings={
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE, 
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE, 
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE, 
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    },api_key=google_api_key,temperature = 0.3,top_p = 0.7,max_output_tokens=3072, timeout=40,max_retries=2)

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"input": state["question"], "context": docs_content})
    try:
        start_time = time.time()
        response = llm.invoke(messages)  
        print(1)
        print(messages)
        print(2)
        print(response) #return empty text
        print(3)
        end_time = time.time()
        elapsed_time = end_time - start_time
        print(elapsed_time)
        print(llm.invoke('explain yourself in 500 words')) #return text
        
    except Exception as e:
        print(f"Error: {e}")

    return {"answer": response.content}

Reference:

Other info:
python 3.10
langchain 0.3.26
langchain-community 0.3.26
langchain-core 0.3.68
langchain-google-genai 2.1.6
google-ai-generativelanguage 0.6.18
langgraph 0.5.0

Vasily_Kolosov · July 15, 2025, 11:07am

Same issue, using langchain-google-genai 2.1.7

My debugging logging is like:

Generation[0][0] - text_length: 1149, generation_info: {‘safety_ratings’: , ‘finish_reason’: ‘MAX_TOKENS’, ‘model_name’: ‘gemini-2.5-flash’}

BarryC7 · July 18, 2025, 9:35am

try thinking_budget = 0,include_thoughts= False For version 2.1.8

Looks like it will be better.

Topic		Replies	Views
Gemini 2.5 API bug: missing finishReason when max token limit is reached Gemini API api , gemini-api	1	587	April 30, 2025
Gemini 2.5 Pro with empty response.text Gemini API gemini-20	55	3615	July 18, 2025
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	40	1675	July 27, 2025
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	303	June 11, 2025
The response of gemini-2.5-flash does not have both candidates and finishReason frequently Gemini API gemini-flash-2-5	4	223	June 6, 2025

"finishReason" : "MAX_TOKENS" - But Text is Empty

Related topics