Hi all,
I’m trying to use Gemini Pro 1.5 to detect the artifacts in an audio file (echo, noise, distortion … etc.) but I’m getting very different results for the same file with just a slight tweak of the prompt. So, now I’m not sure if it’s a prompting issue and I need to iterate more on the prompt or it’s a limitation of Gemini and it is not really capable of handling the taks and just gives me random results.
What do you think?
Welcome to the forum. I think your task is probably OOD, it’s unlikely the model was trained on pops and clicks. One thing you might try if you haven’t already: few-shot prompting. Give the model brief samples of the kinds of artifacts you are looking for (and label them). It might work.
Thanks a lot, @OrangiaNebula for your reply. Yeah, I also think this task might be outside what Gemini is trained on but I wanted to take others’ opinions about this because there is nothing in the docs about it (all audio use cases demonstrated are about summarization and transcription, which probably says something about what Gemini is currently capable of doing in terms of audio processing). However, I was hoping it could help identify audio artifacts (like the existence of echo or noise in a speech audio file).
The thing is, sometimes, it gives correct results, and its description of the artifact looks like it actually understood the audio and the issues in it correctly. But other times, using the exact same file and prompt, it cannot identify the issue it was able to before. So, this was confusing to me since it became not very clear if it can or cannot do such a task.
As for few-shot prompting, I didn’t try that but it seems promising. However, the docs state that you can include a maximum of 1 audio file in a prompt request. So, I’m not sure if it will be able to handle it or not. From previous experience, it didn’t complain when I supplied multiple audio files to it at once (since I was testing its ability to compare them) but with further testing, it didn’t seem that it could actually distinguish them.
1 Like