I’d like to chime in as well. My observation is that this seems to be related to sending similar/identical prompts in close succession, rather than a MAX_TOKENS issue. I suspect it might be due to how the system handles highly repetitive requests, perhaps involving caching or deduplication. My temporary solution is to modify the prompt prefix for each request, by adding a small, unique identifier (e.g., ‘Hello, I’m 01/02/03’) to avoid these failures.
this bug is infuriating and ridiculous, 100% sure they are not solving this since it has been happening from MONTHS and basically makes it impossible to deploy a functional chat agent with gemini 2.5
This issue is still persistent, really unpredictable, sometimes 500 INTERNAL SERVER ERROR, sometimes empty content parts.
I have implemented retry with exponential backoff and jitter but still gemini-2.5-pro “Modified by moderator” to use in production.
It seems this issue happens when Gemini API recieves any blocked or finished message with empty content. Although it is a problem they should solve, since it makes no sense the API to send an empty message and bug with its own message, one possibility beyond exponential backoff is to remove any empty content message with blocked content or finish reason. I am trying this today, hopping it works…
@Wing_zeng Does having a random prefix with each request reliably fix the issue for you?
I found this worked for me, thanks for rec
I also encountered the same issue these days. After testing, I found that setting the ‘thinking budget’ (Gemini 2.5 Pro) to 128 significantly reduces this from happening. However, the generated content is still occasionally incomplete (at least it’s not empty).
Been running something for a while. Increasingly empty responses with no error. All the time today. Great model, but api is too unreliable atm, yet status page is green.
thanks man ! it was returning empty responses but now i am using Unique identifiers (timestamps, request IDs, session IDs) and now it is working perfectly , i think google is trying to block spam/similar request as well , once again it really helped. ![]()
is it accurate though?
so many failures. It is horrible to watch. I wonder whether AI can succeed.
There will most likely be some performance degradation, but that’s unavoidable. The 2.5 Pro API seems to be in a state of generating random stops right now, so the only thing I can do is try to minimize irrelevant output as much as possible.
Been using the same 2.5 Pro setup for weeks with no issues, but now I keep getting empty responses with finish_reason: stop. My retry loop fails more often than not, even after adding exponential backoff. Tried using unique identifiers (timestamps, request IDs, session IDs) and the Uniueq fix — neither worked. Still burning money on failed calls.
same , this problem only accures when using gemini 2.5 pro
Yes, same here. It suddenly started happening for the past few days. Last week it showed some signs of failure. But today, it is not totally usable with retries and all the optimizations. I hope the team fix the bug soon.
same for me, nothing seems to work, at least 50% of outputs are empty.
I have been seeing this issue on and off.
With structured output, I was seeing empty string, incomplete JSON and then full response for the same payload (in that order) for requests coming close to one after another.
With tools, I mostly saw response on the first request and subsequent requests for the same payload were returning either MALFORMED_FUNCTION_CALL or nothing in the content for the parts section.
As people mentioned here within the last 24 hours retries do not work as well as they did previously and there are more empty responses.
Same issue here with Gemini-2.5-Flash. Retrying the request in a loop doesn’t work either, even after 10 attempts still getting the same blank response. I’m using structured outputs, not sure if it’s related.
Here’s the response object if that helps:
automatic_function_calling_history=[],
candidates=[
Candidate(
content=Content(
role='model'
),
finish_reason=<FinishReason.MAX_TOKENS: 'MAX_TOKENS'>
),
],
create_time=datetime.datetime(2025, 8, 14, 17, 40, 4, 803533, tzinfo=TzInfo(UTC)),
model_version='gemini-2.5-flash',
response_id='dB-eaM2FMfOY698PuuzLqA0',
sdk_http_response=HttpResponse(
headers=<dict len=10>
),
usage_metadata=GenerateContentResponseUsageMetadata(
prompt_token_count=3891,
prompt_tokens_details=[
ModalityTokenCount(
modality=<MediaModality.TEXT: 'TEXT'>,
token_count=3891
),
],
thoughts_token_count=8191,
total_token_count=12082,
traffic_type=<TrafficType.ON_DEMAND: 'ON_DEMAND'>
)
the error is finish_reason=<FinishReason.MAX_TOKENS: ‘MAX_TOKENS’>.
now free key have a limit tpm of 250000, this is the reason
