Extreme Latency Spikes in Gemini 2.5 Flash Video Inference (15s vs 60s+)

  1. Input: 7-second MP4, 720p @ 30fps (approx. 210 frames).
  2. Task: Content moderation/Safety check.
  3. Expected Latency: 10–15 seconds (typical).
  4. The Issue: Randomly, the same video with the same prompt takes more than 60 seconds to process.
  5. Staging Overhead: Is there a significant penalty for direct byte uploads vs. using the File API for 7s videos?
  6. Internal Reasoning: Does the model trigger a more expensive “verification” loop for moderation tasks that causes this 4x jump in latency?
  7. Regional Throttling: Are these spikes indicative of “Cold Starts” on the inference nodes, and is there a way to reserve “warm” capacity for low-latency production needs?