Gemini 3.1 Flash-Lite is very slow and inconsistent

I’m seeing very large latency variance with gemini-3.1-flash-lite-preview on identical/similar small classification requests via the direct Gemini API.

Config:

  • generateContent

  • thinkingConfig.thinkingLevel: “minimal”

  • ~300-400 input tokens

  • ~46-50 output tokens

  • thoughtsTokenCount: 0

Results:
gp_none attempt 1: 20.27s input=306 output=47 total=353
gp_none attempt 2: 0.60s input=306 output=50 total=356
gp_none attempt 3: 0.64s input=306 output=47 total=353

gp_basic attempt 1: 0.68s input=384 output=46 total=430
gp_basic attempt 2: 0.63s input=384 output=46 total=430
gp_basic attempt 3: 15.25s input=384 output=46 total=430

In the Ai Studio playground I consistently get 0.6s so this seems to be the true latency. For some reason the same prompt sometimes takes 15-20s via API (around 25% of the time in my tests).