Gemini 3.0 Pro TTFT issue(?)

reid · December 3, 2025, 9:13am

I’m building a project with Gemini 3(Gemini API, not on vertex), and Time To First Token seems to reach infinity in most cases.

60s streaming timeout(no chunks received within 60s) hits on 2/3 of all requests. Removing the timeout will just yield the result endlessly, never reaching the first token.

Is that really normal with 3 pro? Am I just not patient enough to receive a single thought summary within 60s? To be clear, this was never ever an issue with 2.5 Pro and 60s timeout was OK.

Timotei_Molnar · December 3, 2025, 9:45am

Hi, this sounds very similar to what we’ve found.

Some requests to the Generative Language API to Gemini 3 Pro Preview are hanging and never returning. Not all, many get through correctly.

We started noticing it on Monday. Been using the same API last week without any problem.

In addition, many times it never hits the timeout we’ve set for it. One hypothesis is that it’s streaming its response and that constantly restarts the timeout.

Pannaga_J · December 4, 2025, 2:45pm

Hi @reid @Timotei_Molnar
Gemini 3 Pro uses dynamic thinking by default to reason through prompts. You can use the thinking_level parameter, which controls the maximum depth of the model’s internal reasoning process before it produces a response. Gemini 3 treats these levels as relative allowances for thinking rather than strict token guarantees.

If thinking_level is not specified, Gemini 3 Pro will default to high. For faster, lower-latency responses when complex reasoning isn’t required, you can constrain the model’s thinking level to low.
Thanks

reid · December 4, 2025, 6:06pm

Hi, thank you for your reply.

My timeout logic works with thinking in mind, meaning that the timeout will reset as soon as any streaming chunk will be received, so the thinking level doesn’t matter.

The issue here is that I don’t receive any blocks within 60s(and 120s, and 300s…). Changing the thinking level will not solve the issue as tested before.

Timotei_Molnar · December 4, 2025, 6:20pm

Hi @Pannaga_J,

Appreciate your reply! In our case, we’re aware of how the thinking_level parameter works and don’t think that’s the problem.

We’re talking API calls that don’t return anything in more than 10 minutes even when the requests are simple. But with LLMs you never know.

The additional problem is that our timeout was set to 5 minutes, but it was never triggered, because even if the API doesn’t return an answer, it does something that restarts the timer (our hypothesis is that streaming causes this).

ATM we’re bypassing this problem by hardening the timeout criteria. We forced it to stop 5 minutes after sending the request, no matter what, and we just send a new request. But I do believe this is a problem still happening with the API.

killaguy · December 4, 2025, 7:02pm

Hey all, im currently facing the same issue with Roocode. Whenever i make an api call via it it hangs mid response, with or without high level reasoning which is quite annoying. I am a paid Tier customer and i dont understand why i cant use a product i pay for over absic API call.

Pooja_Kapse · January 6, 2026, 1:19pm

Hi @reid,
Thanks for the follow up!
Can you please confirm if you are still facing this issue? To help us debug this issue, please provide minimal reproducible code.

reid · January 6, 2026, 2:19pm

Nope, I don’t really do anymore. Such error rate was greatly reduced after a while, but I can’t confirm if it’s 0 as of right now, since I don’t use 3 pro that much anymore.

If you’re still interested in the repro steps:
```
from google import genai

client = genai.Client()

response = client.models.generate_content(
model=“gemini-3-pro-preview”,
contents=“Explain how AI works in a few words”, # would be great if contents also had history, the error rate increased with more input tokens
config=types.GenerateContentConfig(
system_instruction=“Woah you’re some LLM that has like ± 800 token instruction! Anything else was pretty much unchanged, including 1.0 temp, max token output, high thinking…”),
)

print(response.text)
```
I’ve used streaming in the later revisions, but it didn’t really matter.

Topic		Replies	Views
Gemini 3 Pro does not responds or responds very slow Gemini API models , gemini , gemini-3	30	4340	April 25, 2026
Gemini pro models not response half of the requests Gemini API gemini	4	222	December 25, 2025
Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase Gemini API api , model , gemini-2-5	10	1181	July 21, 2025
Gateway Timeout issue Gemini API api , gemini_25_pro	5	570	November 5, 2025
Very slow response time on the new 2.5 Pro 0605 model Gemini API generative-ai , gemini-2-5	4	2659	June 27, 2025

Gemini 3.0 Pro TTFT issue(?)

Related topics