I’m experiencing a persistent issue with Gemini 2.5-Pro
where the API returns HTTP 200 OK
responses with finishReason: "STOP"
but the content.parts
array is completely missing, resulting in no usable output.
Problem Details:
- Model:
gemini-2.5-pro
- SDK:
@google/genai
v1.7.0 - Frequency: Occurs very frequently (70-80% of requests) for almost 2 weeks now
- Context: Multi-modal requests with documents + text prompts
Example Response:
{“candidates”:[{“content”:{“role”:“model”},“finishReason”:“STOP”,“index”:0,“safetyRatings”:[{“category”:“HARM_CATEGORY_SEXUALLY_EXPLICIT”,“probability”:“NEGLIGIBLE”},{“category”:“HARM_CATEGORY_HATE_SPEECH”,“probability”:“NEGLIGIBLE”},{“category”:“HARM_CATEGORY_HARASSMENT”,“probability”:“NEGLIGIBLE”},{“category”:“HARM_CATEGORY_DANGEROUS_CONTENT”,“probability”:“NEGLIGIBLE”}]}],“modelVersion”:“gemini-2.5-pro”,“usageMetadata”:{“promptTokenCount”:12875,“totalTokenCount”:12888,“promptTokensDetails”:[{“modality”:“TEXT”,“tokenCount”:5651},{“modality”:“DOCUMENT”,“tokenCount”:7224}],“thoughtsTokenCount”:13}}
Observations:
- Safety ratings are all “NEGLIGIBLE” - so it’s not a content filtering issue
- finishReason is “STOP” - indicating normal completion, not truncation
- Token usage looks normal - around 12k tokens, well within limits
- thoughtsTokenCount present - model is “thinking” but not outputting, also it’s very low
- Same prompt works occasionally - suggesting intermittent issue, not prompt problem
What I’ve tried:
- Disabled all safety settings (
BLOCK_NONE
) - Adjusted
thinkingConfig
with differentthinkingBudget
values (0, -1, 1000) - Modified generation parameters (temperature, topP, topK)
- Set explicit
maxOutputTokens
- Tested same prompts in Google AI Studio (works inconsistently there too giving me “You’ve rached your rate limit. Please try again later.” even if I’m on Paid Tier 1. This happens only sometimes)
Request Configuration:
temperature: 0.3,
topP: 0.95,
topK: 40,
candidateCount: 1,
safetySettings: [/* all set to BLOCK_NONE */]
This issue has been happening consistently for about 2 weeks across different types of content and different prompts. The same prompts work fine with gemini-2.5-flash
, but we need the reasoning capabilities of the Pro model.
Is this a known issue with Gemini 2.5-Pro? Are there any recommended workarounds or configurations that might help ensure consistent content generation?
UPDATE:
From the latest posts on this forum, it’s clear that this is a widespread issue—and one that keeps getting worse by the day. I want to stress that, like many others, we rely on production software that depends on AI models such as 2.5 Pro. It’s frankly unacceptable that Google’s most stable model has been inconsistent for weeks, when issues of this kind should be resolved within hours at most. The situation is directly affecting our paying customers, who are unable to access the services they’ve purchased.