Speed-bug with flash models (?)

Truans · May 20, 2026, 4:51am

This topic has been bothering me for a long time, ever since the release of 3.0 Flash, but with the release of 3.5 Flash it began to worry me even more.
3.1 Pro and 3.1 Flash Lite deliver the first token of thought with minimal latency (10-20 seconds for pro, 2-5 seconds for flash-lite), even with a huge context. But specifically 3.0 flash and 3.5 flash before bringing out the “thinking” tool on the playground just freezes for 3-5 minutes.
I haven’t tested it on lower contexts, but 3.0 and 3.5 flash with a context of 400,000 tokens the delay is about 3 minutes, with a context of 650,000 - about 5-6 minutes.
Has anyone else encountered this? Are there any solutions other than the obvious “start a new chat”? Is this how it’s supposed to be, or is it a bug? I’ve heard that API users in other services don’t encounter this, so it seems to me that this is a problem with the platform itself?

Truans · May 20, 2026, 4:52am

However, when transferring the same context to the Gemini app itself, I don’t encounter anything similar

Mahesh_Sutar · May 27, 2026, 7:38am

Hello @Truans ,
Would it be possible to share the prompt and any attached files you’re working with? A Google Drive link or similar would work great.

Topic		Replies	Views
Increased Latency in Gemini 3 pro and Flash Gemini API api , gemini-3	2	394	February 19, 2026
Is it just me or is AI Studio taking way to long for the most simple of tasks Google AI Studio ai-studio	10	363	May 11, 2026
After many conversations in google aistudio, the response will become very slow, which is very difficult to use Google AI Studio models	0	707	March 27, 2025
Ai studio slowness ( fix? ) Google AI Studio ai-studio , feedback , api , models	4	469	May 10, 2026
Slow response from Gemini 2.0 Flash Experimental Google AI Studio gemini-flash	11	1526	March 1, 2025

Speed-bug with flash models (?)

Related topics