Different behaviors on two audios with same prompt

Honglei_Zhang · June 19, 2024, 1:54pm

Prompts working well on one audio, and can NOT be guaranteed work for all other audios in the same test sets.

My prompt: “Listen carefully to the following audio file. Transcribe it verbatimly. Output should be display format with Chinese Punctuation.”
audio1 response: “1月15日，75%浓度，1号的液体，是吗？”
audio2 response: “嗯嗯对但是到了 1617 年的时候开始事情发生一些转变”

There is no punctuation on the second above, Why is there inconsistent behavior on such an important feature(punctuation) ？
Set temperature=0

Thanks.

Siva_Malasani · June 19, 2024, 9:43pm

Hi @Honglei_Zhang,

To help us investigate this issue further, would it be possible to share the audio files you mentioned? Having access to the audio samples would allow us to replicate the problem and diagnose the cause more effectively.
Thank you.

user113 · June 20, 2024, 5:11am

Using chain of thought multi-step guiding prompts + JSON format return, I think the effect will be better in this situation. Additionally, I suggest not setting the temperature to 0.

Wu_Kun · June 20, 2024, 9:30am

Temperature 0 is suggested by google doc https://ai.google.dev/gemini-api/docs/models/generative-models#model-parameters.

Honglei_Zhang · June 20, 2024, 10:14am

Thanks a lot.
But I can’t find a way to share the audio, can you give some tips?

Honglei_Zhang · June 21, 2024, 2:19am

Hello @Siva_Malasani , you can find audios here:
audio1.wav
audio2.wav

Thanks a lot !

user113 · June 22, 2024, 1:17pm

Thank you for letting me realize it again. I apologize for my oversight. I just answered based on my personal experience because when I tried it with a temperature of 0, some of its responses were often “not very smart” and rigid.

Honglei_Zhang · June 25, 2024, 5:59am

Hello @Siva_Malasani , any information I need to know ?
Urgently need your response.
Thanks.

Topic		Replies	Views
How to get stable output with one prompt while multiple requests Gemini API	2	223	June 18, 2024
Prompts for ASR task Gemini API gemini-15 , api	1	152	July 29, 2025
Is Gemini Capable of Detecting Audio Artifacts? Gemini API gemini-15 , prompt	2	143	November 15, 2024
Gemini 2.5 Pro Preview TTS: Inconsistent Voice and Tone Output Google AI Studio audio , gemini-2-5	2	289	May 31, 2026
Inconsistent Audio Output with Gemini 2.5 Pro Preview TTS Google AI Studio ai-studio , gemini , audio	24	2876	February 20, 2026

Different behaviors on two audios with same prompt

Related topics