Gemini 3.0 flash latency spikes

Hi, my Gemini 3.0 Flash (paid tier) call latency isn’t behaving reliably.

I’m using the model via an HTTP request:
https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent

My generationConfig is: { temperature: 0.7, maxOutputTokens: 10000, thinkingConfig: { thinkingLevel: "MINIMAL" } }

I am using it for the document processing where each input around 2000 tokens plus 4 000 tokens user pormt and system instructions

Most of the time, processing takes 6 to 7 seconds for my inputs, but sometimes it spikes to around 6 minutes even when the input is the same.

I then changed the temperature to 1.0 and tested 10 inputs.
Processing stayed at the expected 6 to 7 seconds, but I’m worried the latency spike could happen again.

I’m using this in a production workflow where speed is important. Can I rely on it if I keep the temperature at 1.0? I’d prefer not to switch to Grok 4.1 Fast.

2 Likes