Example audios to gemma 3n 4b model as a few shot example for low resource languages

Dibyajyoti_Mishra · November 13, 2025, 3:03pm

Hi Team,

We are trying to give example audios to gemma 3n 4b model as a few shot example for low resource languages.

Uvicorn is taking 16 seconds to give us the output even on GPU. The major issue is that its not accepting the example audio through the prompt for training. Audios is ignored and we end up running pure text generation.

Keyword argument audios is not a valid argument for this processor and will be ignored.

Is there a way we can work with uvicorn to get this output?

BalakrishnaCh · November 19, 2025, 4:48am

Hi @Dibyajyoti_Mishra ,

Gemma 3n is a multimodal model that supports audio input, but the way we pass that audio data for few-shot prompting needs to follow a specific multimodal chat format, and it’s typically done through the model’s dedicated processor/pipeline, not a generic keyword like audios.

To provide audio as part of your prompt (for few-shot examples or direct transcription/translation), you must format the input according to the Gemma 3n multimodal chat template.

An example structure (following the Hugging Face docs for Gemma 3n) is:

messages = [

{

    "role": "user",

    "content": [

        {"type": "audio", "audio": "your_audio_data_or_path"},

        {"type": "text", "text": "Please transcribe this audio into English."},

    ],

},

# Add your few-shot examples here using the same structure

{

    "role": "assistant",

    "content": "Transcription result for the user's audio.",

}

]

input_ids = processor.apply_chat_template(

messages, 

add_generation_prompt=True, 

tokenize=True, 

return_tensors="pt"

)

For few-shot examples, you would place multiple alternating user (with audio and instructions) and assistant (with the correct text output) messages within this messages list before the final query. This is how the model learns the pattern from the examples.

Thanks.

Topic		Replies	Views
Gemma 3 - missing features despite announcement Gemini API api , models , gemma-3	13	4280	April 10, 2025
I can't use Multi-GPU to fine-tune the Gemma3 4B model Gemma models , gemma-3	5	1760	July 16, 2025
Is the vLLM-compatible model Gemma‑3n‑E4B‑IT supported for audio concurrency? Gemma audio , gemma-3	1	261	November 6, 2025
Gemma 3n stable, when audio file support? Gemma ai-studio , gemma-3	1	237	July 9, 2025
Anyone know how i can run all modalities of Gemma3n on ios? Gemma ml , ios	1	385	July 9, 2025

Example audios to gemma 3n 4b model as a few shot example for low resource languages

Related topics