Different behaviors on two audios with same prompt

Prompts working well on one audio, and can NOT be guaranteed work for all other audios in the same test sets.

My prompt: “Listen carefully to the following audio file. Transcribe it verbatimly. Output should be display format with Chinese Punctuation.”
audio1 response: “1月15日,75%浓度,1号的液体,是吗?”
audio2 response: “嗯嗯 对 但是 到 了 1617 年 的 时 候 开始 事情 发生 一 些 转变”

There is no punctuation on the second above, Why is there inconsistent behavior on such an important feature(punctuation) ?
Set temperature=0

Thanks.

1 Like

Hi @Honglei_Zhang,

To help us investigate this issue further, would it be possible to share the audio files you mentioned? Having access to the audio samples would allow us to replicate the problem and diagnose the cause more effectively.
Thank you.

Using chain of thought multi-step guiding prompts + JSON format return, I think the effect will be better in this situation. Additionally, I suggest not setting the temperature to 0.

Temperature 0 is suggested by google doc https://ai.google.dev/gemini-api/docs/models/generative-models#model-parameters.

Thanks a lot.
But I can’t find a way to share the audio, can you give some tips?

Hello @Siva_Malasani , you can find audios here:
audio1.wav
audio2.wav

Thanks a lot !

Thank you for letting me realize it again. I apologize for my oversight. I just answered based on my personal experience because when I tried it with a temperature of 0, some of its responses were often “not very smart” and rigid.

Hello @Siva_Malasani , any information I need to know ?
Urgently need your response.
Thanks.