Now the disaster continues…
One finding is that the problem intensifies, appears, or recovers after the limit reset time (12:00 PM Pacific Time). This is the same for people in different regions. It always triggers at 12:00 PM Pacific Time.
Now the disaster continues…
One finding is that the problem intensifies, appears, or recovers after the limit reset time (12:00 PM Pacific Time). This is the same for people in different regions. It always triggers at 12:00 PM Pacific Time.
I’m providing a crucial update to my previous posts regarding persistent issues with gemini-2.5-pro. My apologies for any confusion in my earlier descriptions; with new, precise logs, the situation is now unequivocally clear and represents a critical regression.
Original Problem (confirmed still active):
When attempting to use gemini-2.5-pro with a system_instruction parameter (even a very short one like “Ассистент.txt” - 126 characters, as shown in the log below), the model consistently returns a GenerateContentResponse object where:
finish_reason is "STOP".content for the candidate is present ("content": { "role": "model" }), but it contains no parts array with actual generated text.Workaround Failure (new critical issue):
My application previously employed a workaround for this bug: injecting the system prompt directly into the contents (message history) as a dummy user/model message pair, thus bypassing the system_instruction parameter. This workaround was effective for gemini-2.5-pro until recently.
However, this workaround now also results in the exact same “empty response” behavior: finish_reason: "STOP" with no content.parts in the GenerateContentResponse. This indicates that a change on Google’s side has effectively neutralized the workaround.
Summary of gemini-2.5-pro’s current state (for my setup):
system_instruction: Leads to an empty response (original bug, confirmed not fixed).contents): Now also leads to an empty response (new regression / workaround neutralization).gemini-2.5-pro with a system persona or initial context and receive any meaningful text response. This renders gemini-2.5-pro completely unusable for conversational AI tasks requiring a system prompt.This is a critical blocking regression for applications relying on gemini-2.5-pro with system prompts.
Could the Google AI team please urgently investigate this? We need a clear, reliable, and functional method to provide system instructions to gemini-2.5-pro. Is this behavior intentional, or is it a known service degradation? If intentional, what is the new recommended pattern for robust system prompting?
Here is a log from a run with gemini-2.5-pro using a short system prompt (“Ассистент.txt”, 126 characters) and with the workaround DISABLED, clearly showing the finish_reason: "STOP" with no generated parts:
--- DEBUG: Starting streaming response for Gemini model 'gemini-2.5-pro' ---
DEBUG: GeminiProvider.get_chat_response_stream: Config: {
"api_key": "XXXXXXXXXXXXXXXXXXXX",
"model": "gemini-2.5-pro",
"temperature": 1.0,
"top_p": 0.95,
"selected_prompt": "\u0410\u0441\u0441\u0438\u0441\u0442\u0435\u043d\u0442.txt"
}
DEBUG: GeminiProvider.get_chat_response_stream: Raw messages count: 1
DEBUG: GeminiProvider.get_chat_response_stream: System Prompt: 'Ты умный и честный ассистент.
Отвечай чётко, подробно и по теме.
Отвечай только на основе фактов.
Не выдумывай ничего от себя....' (Length: 126)
DEBUG: GeminiProvider._prepare_messages: Preparing 1 raw messages for Gemini API.
DEBUG: GeminiProvider._prepare_messages: Message 0 part 0 is text.
DEBUG: GeminiProvider._prepare_messages: Finished. Prepared 1 messages for Gemini API.
DEBUG: GeminiProvider.get_chat_response_stream: Prepared messages count: 1. First message: {
"role": "user",
"parts": [
"\u041f\u0440\u0438\u0432\u0435\u0442!"
]
}
DEBUG: GeminiProvider.get_chat_response_stream: Generation Config: GenerationConfig(candidate_count=None, stop_sequences=None, max_output_tokens=None, temperature=1.0, top_p=0.95, top_k=None, response_mime_type=None, response_schema=None, presence_penalty=None, frequency_penalty=None)
DEBUG: GeminiProvider.get_chat_response_stream: Safety Settings: {'HARM_CATEGORY_HARASSMENT': 'BLOCK_NONE', 'HARM_CATEGORY_HATE_SPEECH': 'BLOCK_NONE', 'HARM_CATEGORY_SEXUALLY_EXPLICIT': 'BLOCK_NONE', 'HARM_CATEGORY_DANGEROUS_CONTENT': 'BLOCK_NONE'}
DEBUG: GeminiProvider.get_chat_response_stream: Calling model.generate_content(stream=True)...
DEBUG: GeminiProvider.get_chat_response_stream: generate_content (streaming) call initiated. Iterating through chunks...
DEBUG: GeminiProvider.get_chat_response_stream: Received raw chunk 0: response:
GenerateContentResponse(
done=True,
iterator=None,
result=protos.GenerateContentResponse({
"candidates": [
{
"content": {
"role": "model"
},
"finish_reason": "STOP",
"index": 0
}
],
"usage_metadata": {
"prompt_token_count": 46,
"total_token_count": 46
},
"model_version": "gemini-2.5-pro"
}),
)
DEBUG: GeminiProvider.get_chat_response_stream: Chunk 0 has no parts. Skipping. Full chunk: response:...
DEBUG: GeminiProvider.get_chat_response_stream: Chunk 0 candidate finish reason: STOP
DEBUG: GeminiProvider.get_chat_response_stream: Chunk 0 candidate safety ratings: []
--- ПРЕДУПРЕЖДЕНИЕ GEMINI: Модель не дала ответ. Причина завершения: STOP (обработано). ---
DEBUG: GeminiProvider.get_chat_response_stream: Streaming response iteration completed.
--- DEBUG: Finished streaming response for Gemini model 'gemini-2.5-pro' ---
Попытка сгенерировать заголовок с помощью модели: gemini-2.5-flash-lite
DEBUG: GeminiProvider.__init__: Gemini API configured successfully.
--- DEBUG: Starting non-streaming response for Gemini model 'gemini-2.5-flash-lite' ---
DEBUG: GeminiProvider.get_chat_response: Config: {
"model": "gemini-2.5-flash-lite",
"temperature": 1.0,
"top_p": 0.95,
"max_output_tokens": 150
}
DEBUG: GeminiProvider.get_chat_response: Raw messages count: 1
DEBUG: GeminiProvider.get_chat_response: System Prompt: None
DEBUG: GeminiProvider._prepare_messages: Preparing 1 raw messages for Gemini API.
DEBUG: GeminiProvider._prepare_messages: Message 0 part 0 is text.
DEBUG: GeminiProvider._prepare_messages: Finished. Prepared 1 messages for Gemini API.
DEBUG: GeminiProvider.get_chat_response: Prepared messages count: 1. First message: {
"role": "user",
"parts": [
"\u0421\u043e\u0437\u0434\u0430\u0439 \u043e\u0447\u0435\u043d\u044c \u043a\u043e\u0440\u043e\u0442\u043a\u043e\u0435, \u043b\u0430\u043a\u043e\u043d\u0438\u0447\u043d\u043e\u0435 \u043d\u0430\u0437\u0432\u0430\u043d\u0438\u0435 \u0434\u043b\u044f \u0447\u0430\u0442\u0430 (3-5 \u0441\u043b\u043e\u0432), \u043e\u0441\u043d\u043e\u0432\u0430\u043d\u043d\u043e\u0435 \u043d\u0430 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u043c \u0437\u0430\u043f\u0440\u043e\u0441\u0435 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f. \u041e\u0442\u0432\u0435\u0442\u044c \u0422\u041e\u041b\u042C\u041a\u041e \u043d\u0430\u0437\u0432\u0430\u043d\u0438\u0435\u043c, \u0431\u0435\u0437 \u043a\u0430\u0432\u044b\u0447\u0435\u043a \u0438 \u043b\u0438\u0448\u043d\u0438\u0445 \u0441\u043b\u043e\u0432.\n\n\u0417\u0410\u041f\u0420\u041e\u0421 \u041f\u041e\u041b\u042C\u0417\u041e\u0412\u0410\u0422\u0415\u041b\u042F: \"\u041f\u0440\u0438\u0432\u0435\u0442!\"\n\u0422\u0412\u041e\u0419 \u041e\u0422\u0412\u0415\u0422:"
]
}
DEBUG: GeminiProvider.get_chat_response: Generation Config: GenerationConfig(candidate_count=None, stop_sequences=None, max_output_tokens=None, temperature=1.0, top_p=0.95, top_k=None, response_mime_type=None, response_schema=None, presence_penalty=None, frequency_penalty=None)
DEBUG: GeminiProvider.get_chat_response: Safety Settings: {'HARM_CATEGORY_HARASSMENT': 'BLOCK_NONE', 'HARM_CATEGORY_HATE_SPEECH': 'BLOCK_NONE', 'HARM_CATEGORY_SEXUALLY_EXPLICIT': 'BLOCK_NONE', 'HARM_CATEGORY_DANGEROUS_CONTENT': 'BLOCK_NONE'}
DEBUG: GeminiProvider.get_chat_response: Request Options: {'timeout': 120}
DEBUG: GeminiProvider.get_chat_response: Calling model.generate_content(stream=False)...
DEBUG: GeminiProvider.get_chat_response: generate_content (non-streaming) call completed. Full response object: response:
GenerateContentResponse(
done=True,
iterator=None,
result=protos.GenerateContentResponse({
"candidates": [
{
"content": {
"parts": [
{
"text": "\u041f\u0440\u0438\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0435"
}
],
"role": "model"
},
"finish_reason": "STOP",
"index": 0
}
],
"usage_metadata": {
"prompt_token_count": 70,
"candidates_token_count": 2,
"total_token_count": 72
},
"model_version": "gemini-2.5-flash-lite"
}),
)
DEBUG: GeminiProvider.get_chat_response: Received non-empty response. Text length: 11. Full response text: 'Приветствие...'
--- DEBUG: Finished non-streaming response for Gemini model 'gemini-2.5-flash-lite' ---
Заголовок успешно сгенерирован: 'Приветствие'
Chat saved to D:\DATA\FletChat Development V2 (New TTS)\chats\Приветствие.json
Here is the latest log, showing gemini-2.5-pro failing with an immediate StopIteration even when system_prompt is explicitly None:
--- DEBUG: Starting streaming response for Gemini model 'gemini-2.5-pro' ---
DEBUG: GeminiProvider.get_chat_response_stream: Config: {
"api_key": "ХХХХХХХХХ",
"model": "gemini-2.5-pro",
"temperature": 1.0,
"top_p": 0.95,
"selected_prompt": "\u0414\u0430\u0448\u0430.txt"
}
DEBUG: GeminiProvider.get_chat_response_stream: Raw messages count: 1
DEBUG: GeminiProvider.get_chat_response_stream: System Prompt: None
DEBUG: GeminiProvider._prepare_messages: Preparing 1 raw messages for Gemini API.
DEBUG: GeminiProvider._prepare_messages: Message 0 part 0 is text.
DEBUG: GeminiProvider._prepare_messages: Finished. Prepared 1 messages for Gemini API.
DEBUG: GeminiProvider.get_chat_response_stream: Prepared messages count: 1. First message: {
"role": "user",
"parts": [
"\u041f\u0440\u0438\u0432\u0435\u0442!"
]
}
DEBUG: GeminiProvider.get_chat_response_stream: Generation Config: GenerationConfig(candidate_count=None, stop_sequences=None, max_output_tokens=None, temperature=1.0, top_p=0.95, top_k=None, response_mime_type=None, response_schema=None, presence_penalty=None, frequency_penalty=None)
DEBUG: GeminiProvider.get_chat_response_stream: Safety Settings: {'HARM_CATEGORY_HARASSMENT': 'BLOCK_NONE', 'HARM_CATEGORY_HATE_SPEECH': 'BLOCK_NONE', 'HARM_CATEGORY_SEXUALLY_EXPLICIT': 'BLOCK_NONE', 'HARM_CATEGORY_DANGEROUS_CONTENT': 'BLOCK_NONE'}
DEBUG: GeminiProvider.get_chat_response_stream: Calling model.generate_content(stream=True)...
--- КРИТИЧЕСКАЯ ОШИБКА ВНУТРИ GEMINI API CLIENT (streaming) ---
Traceback (most recent call last):
File "D:\DATA\FletChat Development V2 (New TTS)\ai_client.py", line 130, in get_chat_response_stream
response_stream = model.generate_content(
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\google\generativeai\generative_models.py", line 329, in generate_content
return generation_types.GenerateContentResponse.from_iterator(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\google\generativeai\types\generation_types.py", line 634, in from_iterator
response = next(iterator)
^^^^^^^^^^^^^^
File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\google\api_core\grpc_helpers.py", line 116, in __next__
return next(self._wrapped)
^^^^^^^^^^^^^^^^^^^
File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\grpc\_channel.py", line 543, in __next__
return self._next()
^^^^^^^^^^^^
File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\grpc\_channel.py", line 950, in _next
raise StopIteration()
StopIteration
Тип ошибки: StopIteration, Сообщение:
--- DEBUG: Finished streaming response for Gemini model 'gemini-2.5-pro' ---
Ошибка провайдера: Ошибка Gemini API (стриминг):
I kindly ask Google representatives to answer.
What is the specific reason why the model does not work?
Should we expect improvements or look for a replacement and revise our implementation plans?
We’re seeing this constantly too, I had to put a retry loop around every API call that checks if the response is empty and keeps resubmitting the request until it isn’t. This is very clearly broken and really needs more attention.
I have a feeling they’re only doing this for free tier accounts.
The situation is exactly the same for projects linked to billing.
It seems like the problem is completely random - everyone is suggesting different solutions from trimming system prompts, disabling function calls, et cetera.
The only guaranteed-to-work option here is to migrate to Vertex AI or things like OpenRouter - we did and are now getting not only zero errors, but also higher TPS on the model.
Having this problem for about 2 months. Moving to Vertex API fixed the problem even though had to rewrite my codebase as vertexAI wont support some functions like file API.
Yep, I can confirm this too
Gemini 2.5 pro model is returning None,
this issue used to occur earlier but within 3 retires the model would return the output but today I have noticed that model is returning None consistently even after several retires (5-6 times)
model usage {‘prompt_token_count’: 17083, ‘candidates_token_count’: None, ‘total_token_count’: 17157}
[get_niosh_inputs] attempt 1: got non‐string raw=None, retrying…
video mime type: video/mp4
model usage {‘prompt_token_count’: 17083, ‘candidates_token_count’: None, ‘total_token_count’: 17161}
attempt 2: got non‐string raw=None, retrying…
Switching to Vertex API is not a solution for everyone.
The application I’m working on implies that the user will work with their API key. And if Google says that they provide free quotas for using their models, then the end user must decide for themselves whether to connect their key to the billing or not.
And connecting via Vertex API is such a hassle for the average user that I simply don’t consider it.
And let’s not forget that free quotas are also important for the average developer.
100% agree with this. My application also requires users to work with their own keys - free or paid. Going through Vertex API doesn’t work for me too.
I tried switching to vertex and had the exact same issue. Is there something that must be done different to avoid the issue?
The fact that the Vertex API has the same problems suggests that for 2.5 Pro they are intentionally limiting traffic for some regions and accounts.
I have no other explanation.
And the fact that Google representatives are silent only adds fuel to this theory.
I met this issue few days ago and found this thread.
Any updates? Or responses of the Gemini’s engineering teem are also empty or truncated?
no still no updates really dissapointing
It’s been completely broken for the past two weeks. I’ve been using the Gemini API with Roo and Cline as my main workflow since 2.5 was released, but lately it always fails with this error:
“Unexpected API Response: The language model did not provide any assistant messages. This may indicate an issue with the API or the model’s output.”
I’ll ask here, because in my opinion it’s related.
Free quota is reduced by 2 times for all users Gemini 2.5 Pro?
please look inti this issue @Logan_Kilpatrick alot of people are reporting it