Is Gemini Capable of Detecting Audio Artifacts?

CuriousCoder · November 14, 2024, 1:22pm

Hi all,

I’m trying to use Gemini Pro 1.5 to detect the artifacts in an audio file (echo, noise, distortion … etc.) but I’m getting very different results for the same file with just a slight tweak of the prompt. So, now I’m not sure if it’s a prompting issue and I need to iterate more on the prompt or it’s a limitation of Gemini and it is not really capable of handling the taks and just gives me random results.

What do you think?

OrangiaNebula · November 14, 2024, 4:27pm

Welcome to the forum. I think your task is probably OOD, it’s unlikely the model was trained on pops and clicks. One thing you might try if you haven’t already: few-shot prompting. Give the model brief samples of the kinds of artifacts you are looking for (and label them). It might work.

CuriousCoder · November 15, 2024, 11:42am

Thanks a lot, @OrangiaNebula for your reply. Yeah, I also think this task might be outside what Gemini is trained on but I wanted to take others’ opinions about this because there is nothing in the docs about it (all audio use cases demonstrated are about summarization and transcription, which probably says something about what Gemini is currently capable of doing in terms of audio processing). However, I was hoping it could help identify audio artifacts (like the existence of echo or noise in a speech audio file).

The thing is, sometimes, it gives correct results, and its description of the artifact looks like it actually understood the audio and the issues in it correctly. But other times, using the exact same file and prompt, it cannot identify the issue it was able to before. So, this was confusing to me since it became not very clear if it can or cannot do such a task.

As for few-shot prompting, I didn’t try that but it seems promising. However, the docs state that you can include a maximum of 1 audio file in a prompt request. So, I’m not sure if it will be able to handle it or not. From previous experience, it didn’t complain when I supplied multiple audio files to it at once (since I was testing its ability to compare them) but with further testing, it didn’t seem that it could actually distinguish them.

Topic		Replies	Views
Different behaviors on two audios with same prompt Gemini API	7	160	June 25, 2024
Gemini 1.5 refuses to process audio files Gemini API gemini-15 , api , web-ml	8	468	September 19, 2024
How to get consistent Multi-Speaker Transcription output from Gemini 2.5 Pro? Gemini API api , audio , gemini-25	1	103	June 9, 2025
Gemini 2.5 Flash doesn't have audio processing capability, but why? Gemini API ui , gemini-flash-2-5	3	171	June 4, 2025
Call to update documentation for Audio Understanding (Refer to timestamps) Gemini API audio , gemini-20 , documentation	1	64	May 31, 2025

Is Gemini Capable of Detecting Audio Artifacts?

Related topics