Problem Summary
I’m experiencing consistent timecode jumping issues when using Gemini 2.5 Pro in Google AI Studio for video transcription and image description. Timestamps in the output don’t align with actual video content, making the transcriptions unusable for time-sensitive work.
Technical Details
Model: Gemini 2.5 Pro via Google AI Studio
Task: Audio transcription + image description of video files
File format: MP4 and MOV
Input length: Originally longer videos, now restricted to 25-minute segments
What I’ve Tried
Length restriction: Limited videos to 25 minutes maximum - issue persists
Timecode burn-in: Added visible timecode overlay to video frames and explicitly instructed Gemini to reference the burn-in - no improvement in accuracy
Video splitting: Split longer content into segments, but this creates worse results due to timecode offsets (e.g., segment starting at 25:00 mark produces very poor timestamp correlation)
Specific Issues
Timecodes jump erratically and don’t match actual content timing
Model appears to ignore explicit instructions to use visual timecode burn-ins
Temporal offset handling is particularly poor when processing video segments that don’t start at 00:00
Output timestamps sometimes extend beyond actual video duration
Questions
Has anyone found reliable workarounds for timestamp accuracy in Gemini 2.5 Pro?
Are there specific prompt engineering techniques that improve temporal correlation?
Would downgrading to Gemini 2.0 provide better timestamp accuracy?
Are there alternative approaches for handling video segments with temporal offsets?
Expected Outcome
Accurate timestamps that correspond to actual video content, enabling reliable time-based navigation and referencing.