- Input: 7-second MP4, 720p @ 30fps (approx. 210 frames).
- Task: Content moderation/Safety check.
- Expected Latency: 10–15 seconds (typical).
- The Issue: Randomly, the same video with the same prompt takes more than 60 seconds to process.
- Staging Overhead: Is there a significant penalty for direct byte uploads vs. using the File API for 7s videos?
- Internal Reasoning: Does the model trigger a more expensive “verification” loop for moderation tasks that causes this 4x jump in latency?
- Regional Throttling: Are these spikes indicative of “Cold Starts” on the inference nodes, and is there a way to reserve “warm” capacity for low-latency production needs?