Gemini 3.0 Pro TTFT issue(?)

I’m building a project with Gemini 3(Gemini API, not on vertex), and Time To First Token seems to reach infinity in most cases.

60s streaming timeout(no chunks received within 60s) hits on 2/3 of all requests. Removing the timeout will just yield the result endlessly, never reaching the first token.

Is that really normal with 3 pro? Am I just not patient enough to receive a single thought summary within 60s? To be clear, this was never ever an issue with 2.5 Pro and 60s timeout was OK.

1 Like

Hi, this sounds very similar to what we’ve found.

Some requests to the Generative Language API to Gemini 3 Pro Preview are hanging and never returning. Not all, many get through correctly.

We started noticing it on Monday. Been using the same API last week without any problem.

In addition, many times it never hits the timeout we’ve set for it. One hypothesis is that it’s streaming its response and that constantly restarts the timeout.

Hi @reid @Timotei_Molnar
Gemini 3 Pro uses dynamic thinking by default to reason through prompts. You can use the thinking_level parameter, which controls the maximum depth of the model’s internal reasoning process before it produces a response. Gemini 3 treats these levels as relative allowances for thinking rather than strict token guarantees.

If thinking_level is not specified, Gemini 3 Pro will default to high. For faster, lower-latency responses when complex reasoning isn’t required, you can constrain the model’s thinking level to low.
Thanks

Hi, thank you for your reply.

My timeout logic works with thinking in mind, meaning that the timeout will reset as soon as any streaming chunk will be received, so the thinking level doesn’t matter.

The issue here is that I don’t receive any blocks within 60s(and 120s, and 300s…). Changing the thinking level will not solve the issue as tested before.

Hi @Pannaga_J,

Appreciate your reply! In our case, we’re aware of how the thinking_level parameter works and don’t think that’s the problem.

We’re talking API calls that don’t return anything in more than 10 minutes even when the requests are simple. But with LLMs you never know.

The additional problem is that our timeout was set to 5 minutes, but it was never triggered, because even if the API doesn’t return an answer, it does something that restarts the timer (our hypothesis is that streaming causes this).

ATM we’re bypassing this problem by hardening the timeout criteria. We forced it to stop 5 minutes after sending the request, no matter what, and we just send a new request. But I do believe this is a problem still happening with the API.

1 Like

Hey all, im currently facing the same issue with Roocode. Whenever i make an api call via it it hangs mid response, with or without high level reasoning which is quite annoying. I am a paid Tier customer and i dont understand why i cant use a product i pay for over absic API call.