Hello! I am using the Gemini API through the Gen AI python SDK (python-genai) and I’m experiencing a weird latency issue that I’m having trouble to get to grips with.
I recently updated the lib from version 1.56.0 to 1.72.0 and immediately noticed that all of my inference requests increased significantly in latency (at least 70% increase), despite same prompts (typically ~3000 tokens of text + 1-3 images). I tried to downgrade to a package version in between (1.64.0) but the issue remained, so in the end I downgraded back to 1.56.0 and the latency was back down immediately.
Note that this is regardless of choice of LLM (tested gemini-3.1-flash-lite-preview, gemini-2.5-flash, gemini-2.0-flash), and regardless of reasoning effort.
Has anyone else experienced an issue like this? I’m really struggling to understand the root cause, and I can’t really find a related GitHub issue.