Documentation: Video understanding | Gemini API | Google AI for Developers
File API processing: When using the File API, videos are sampled at 1 frame per second (FPS) and audio is processed at 1Kbps (single channel). Timestamps are added every second.
Can someone please verify the correctness of the statement?
Is it really 1Kbps and not 16Kbps?
1Kbps would be pretty awful quality.
Hi @Flipp_Fuzz,
The Gemini API’s File API uses a 1 Kbps audio stream specifically for uploading and analyzing videos for content understanding. This low-quality audio isn’t for playback; instead, it’s optimized for efficient analysis and tokenization by AI models, prioritizing speed and resource efficiency. While 1 Kbps is a very low bitrate that would sound poor if played, it’s sufficient for AI models to extract speech or simple audio cues for understanding tasks.
Thank you!