I’m seeing very large latency variance with gemini-3.1-flash-lite-preview on identical/similar small classification requests via the direct Gemini API.
Config:
-
generateContent
-
thinkingConfig.thinkingLevel: “minimal”
-
~300-400 input tokens
-
~46-50 output tokens
-
thoughtsTokenCount: 0
Results:
gp_none attempt 1: 20.27s input=306 output=47 total=353
gp_none attempt 2: 0.60s input=306 output=50 total=356
gp_none attempt 3: 0.64s input=306 output=47 total=353
gp_basic attempt 1: 0.68s input=384 output=46 total=430
gp_basic attempt 2: 0.63s input=384 output=46 total=430
gp_basic attempt 3: 15.25s input=384 output=46 total=430
In the Ai Studio playground I consistently get 0.6s so this seems to be the true latency. For some reason the same prompt sometimes takes 15-20s via API (around 25% of the time in my tests).