About gemini audio input

woodstick · March 10, 2025, 1:24pm

Hello, I’m trying to integrate Gemini Flash 2.0 with audio input into my application.
I noticed that the audio data is downsampled to 16Kbps here.

Given this, would it be okay to downsample the audio on the client side to reduce network overhead?
Would doing so affect the quality of the model’s response?

Also, does “16kbps” correspond to the following audio specifications?

mono audio channel
8 bit depth
2kHz sample rate

Govind_Keshari · March 11, 2025, 6:57am

Hi @woodstick, Welcome to forum!!!

It’s fine if you down sample the audio from client side. It won’t affect the quality of the model response. Yes, Multiple channels will be combined to a single channel.
I am not sure about the depth and sample rate, i will get back to you on this.

Thanks.

woodstick · March 19, 2025, 12:31pm

Thanks for your reply. It’ll be very helpful

Zockerpanda · September 29, 2025, 12:20pm

So, what is the actual format? My guess would either be 8kHZ, mono, s16le (2 bytes/sample) or 16kHz, mono, s8.

Topic		Replies	Views
Gemini downsamples audio files to a 16 Kbps bitrate? Gemini API ai-studio , api , audio , gemini-2-5 , gemini-flash-2-5	1	118	September 29, 2025
Is audio in videos really processed at 1Kbps and not 16Kbps? Gemini API api , gemini-api	1	94	July 7, 2025
Gemini 2.0 Flash Audio Input Pricing Gemini API gemini-flash	1	250	June 17, 2025
Reducing latency for gemini audio prompt requests? Gemini API prompt , audio	1	337	June 3, 2025
Gemini 2.5 TTS workflow questions Gemini API audio , gemini-flash	1	180	June 6, 2025

About gemini audio input

Related topics