In the documentation for audio, it states that “Gemini can only infer responses to English-language speech.” However, when I try this with Norwegian speech (even in a dialect) it works nicely! Is there a caveat or limitation I should be aware of, before relying too much on this capability?
Norwegian is a supported language, as listed here. So I am a bit confused about this conflicting documentation.
My use case (that is currently working) is that I upload audio in Norwegian, have a system prompt in Norwegian, and a prompt in Norwegian, asking Gemini AI (1.5) to transcribe and summarize the audio.