Is there a specific reason why audio inputs are significantly more expensive than video inputs?
4 Likes
Hi @kenryu,
The reason audio input is more expensive than video in Gemini 2.0 Flash is because audio is billed at ~25 tokens per second, while video is much cheaper per frame (~1 token/sec). Audio also requires more processing (like speech decoding), which adds to the cost. So overall, audio consumes more tokens and resources, making it pricier. If cost is a concern, try using Gemini Flash Lite or transcribe audio to text before input.