Gemini 2.0 Flash Audio Input Pricing

Is there a specific reason why audio inputs are significantly more expensive than video inputs?

4 Likes

Hi @kenryu,

The reason audio input is more expensive than video in Gemini 2.0 Flash is because audio is billed at ~25 tokens per second, while video is much cheaper per frame (~1 token/sec). Audio also requires more processing (like speech decoding), which adds to the cost. So overall, audio consumes more tokens and resources, making it pricier. If cost is a concern, try using Gemini Flash Lite or transcribe audio to text before input.