Extreme Latency Spikes in Gemini 2.5 Flash Video Inference (15s vs 60s+)

Rohit_Yadav1 · March 26, 2026, 5:45am

Input: 7-second MP4, 720p @ 30fps (approx. 210 frames).
Task: Content moderation/Safety check.
Expected Latency: 10–15 seconds (typical).
The Issue: Randomly, the same video with the same prompt takes more than 60 seconds to process.
Staging Overhead: Is there a significant penalty for direct byte uploads vs. using the File API for 7s videos?
Internal Reasoning: Does the model trigger a more expensive “verification” loop for moderation tasks that causes this 4x jump in latency?
Regional Throttling: Are these spikes indicative of “Cold Starts” on the inference nodes, and is there a way to reserve “warm” capacity for low-latency production needs?

Topic		Replies	Views
Gemini 3.0 flash latency spikes Gemini API models , gemini , gemini-3	0	282	February 11, 2026
Model TIme Response HEAVY variation [3 FLASH] Gemini API api , gemini-flash	2	95	January 12, 2026
Increased Latency in the Gemini 2.5 Flash API Gemini API gemini , gemini-flash	1	301	December 23, 2025
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	766	January 6, 2025
Live API latency spikes Gemini API bug , api , models , live-streaming , gemini-2-5	1	440	November 25, 2025