We are experiencing significant latency issues with the gemini-2.5-pro
model when using the generate_content
endpoint.
Configuration:
genai_config = {
'automatic_function_calling': {'disable': True},
'tool_config': {'function_calling_config': {'mode': 'auto'}},
'thinking_config': {'thinking_budget': -1, 'include_thoughts': True}
}
Example Call:
response = await client.aio.models.generate_content(
model="gemini-2.5-pro",
config=genai_config,
contents=messages
)
Observed Latency:
-
Request duration: ~7 minutes
-
Usage metadata:
prompt_token_count: 3736 candidates_token_count: 132 thoughts_token_count: 150 total_token_count: 4018
Dashboard Metrics:
- Average latency: ~35 seconds
- 99th percentile latency: 8 minutes
These latency issues have persisted for the past 3–4 days and are affecting nearly all requests at the tail end. Is this a known issue, and is any mitigation currently underway? Has anyone else observed similar behavior?