[BUG] Gemini 2.5 Flash Native Audio outputs control characters (`<ctrl##>`) instead of audio, causing silent responses

bunny1 · January 12, 2026, 12:38pm

Summary

When using the gemini-2.5-flash-native-audio-preview-12-2025 model via the Live API, the model occasionally outputs control characters (e.g., <ctrl46><ctrl46>) to the transcript stream instead of generating audio. During these episodes, no audio is produced, leaving users in complete silence for 10-15+ seconds.

This is a critical production issue affecting voice agent deployments where users have no indication that the system is working, leading to poor user experience and abandoned sessions.

Affected Model(s)

gemini-2.5-flash-native-audio-preview-12-2025 (confirmed)
Potentially other native audio preview models in the same family

Environment

| Component | Value |

|-----------|-------|

| API | Gemini Live API via WebSocket |

| Model | gemini-2.5-flash-native-audio-preview-12-2025 |

| Response Modality | AUDIO only |

| Language | Romanian (but likely affects all languages) |

| Use Case | Production voice agent for appointment confirmations |

| SDK | google-genai Python SDK |

Expected Behavior

User speaks to the voice agent
Model processes the input
Model generates audio response
User hears the response immediately

Actual Behavior

User speaks to the voice agent
Model processes the input
Model outputs control characters (<ctrl46><ctrl46>) to the transcript stream
No audio is generated - the audio stream contains silence
User hears nothing for 10-15+ seconds
User asks “Can you hear me?” multiple times (confirming they heard nothing)
Eventually, model may recover and produce normal audio

Evidence

Session Timeline (Anonymized)


Production Session - January 9, 2026

Model: gemini-2.5-flash-native-audio-preview-12-2025

32.7s User: "Nu, nu pot veni. As vrea sa reprogramez."

(Translation: "No, I can't come. I'd like to reschedule.")

33.7s Agent transcript output: "<ctrl46><ctrl46>"

Audio output: NONE (verified via audio recording)

[Internal tool call triggered - searching for appointment slots]

[~13 SECONDS OF COMPLETE SILENCE - NO AUDIO GENERATED]

45.1s User: "M-ati auzit?"

(Translation: "Can you hear me?" - confirms no audio was heard)

47.3s Agent: "Va rog sa asteptati putin..."

(Translation: "Please wait a moment..." - audio resumes normally)

Key Observations

Transcript explicitly contains <ctrl46><ctrl46> - These characters appear in the output_transcription stream where normal text should be
Audio recording confirms complete silence - The OGG recording of the session contains zero audio during this 13-second period
User confirmation of silence - The user’s “Can you hear me?” at 45.1s proves they received no audio
Model eventually recovers - After the silence, normal audio generation resumes

Control Character Details

The control characters observed follow the pattern <ctrl##> where ## is a number. Examples seen:

<ctrl46> (most common)
Multiple consecutive occurrences: <ctrl46><ctrl46>

Questions:

What do these control characters represent internally?
Why are they leaking into the transcript output instead of being processed?
Why does their presence correlate with audio generation failure?

Reproduction

Trigger Conditions (Observed)

This issue appears to occur:

After processing user speech that requires a substantive response
More frequently when tool/function calls are involved (but not exclusively)
Inconsistently - the same input may work sometimes and fail other times

Steps to Reproduce

Set up a Gemini Live API session with gemini-2.5-flash-native-audio-preview-12-2025
Configure for AUDIO-only response modality
Engage in multi-turn conversation
At some point (unpredictable), the model will output <ctrl##> instead of audio
Observe silence in the audio stream
Check transcript to see control characters

Note: Due to the inconsistent nature of the bug, reproduction may require multiple attempts.

Impact

User Experience

Users hear nothing for 10-15+ seconds
Users assume the system is broken or didn’t hear them
Users repeatedly ask “Can you hear me?”
Sessions are abandoned due to perceived failure

Business Impact

Voice agents appear unreliable in production
Customer frustration and support burden
Cannot deploy native audio models with confidence

Workaround Attempts

No reliable workaround has been found
The issue occurs at the model level before any application-layer processing

Related GitHub Issues

This appears related to other reported audio generation issues with native audio models:

google-gemini/live-api-web-console#117 - Audio cutoff mid-speech
googleapis/python-genai#1725 - Audio generation inconsistency (Closed - Not Planned)
google-gemini/cookbook#977 - LiveAPI stops talking in a second
googleapis/js-genai#707 - Responses cut off with turnComplete

The control character output may be a specific manifestation of the broader audio generation failure pattern described in these issues.

Technical Analysis

Hypothesis

Based on the evidence, it appears that:

The model’s internal audio generation pipeline sometimes fails
When audio generation fails, the model outputs control characters as a fallback or error state
These control characters leak into the transcript stream
The audio stream remains empty/silent
Eventually, the model’s internal state recovers and normal operation resumes

Why This Differs from Audio Cutoff

This is distinct from the “audio cutoff mid-sentence” issue:

Audio cutoff: Audio starts playing, then stops early
This issue: No audio is ever generated - complete silence from the start

Requests

1. Priority Escalation

This issue is currently tracked as P2 (Moderately-important). Given the production impact on voice agent deployments, we respectfully request escalation to P1.

Justification:

Affects production systems with real users
No workaround available
Causes significant user experience degradation
Undermines confidence in native audio models

2. Timeline and Acknowledgment

We request:

Confirmation that this issue is being tracked
An estimated timeline for a fix or patch
Any interim workarounds while a fix is developed

3. Technical Clarification

We would appreciate understanding:

What the <ctrl##> characters represent
Why audio generation fails when these characters appear
Whether there’s a way to detect and handle this state

Additional Information

Session Configuration


# Model configuration used

model = "gemini-2.5-flash-native-audio-preview-12-2025"

# Response modality

response_modalities = [genai_types.Modality.AUDIO]

# VAD configuration

automatic_activity_detection = genai_types.AutomaticActivityDetection(

start_of_speech_sensitivity=genai_types.StartSensitivity.START_SENSITIVITY_HIGH,

end_of_speech_sensitivity=genai_types.EndSensitivity.END_SENSITIVITY_LOW,

prefix_padding_ms=20,

silence_duration_ms=300,

)

We Can Provide

If helpful for debugging:

Full anonymized session logs
Audio recordings showing the silence
Transcript files with control characters
Additional test sessions

Summary

The gemini-2.5-flash-native-audio-preview-12-2025 model occasionally outputs <ctrl##> control characters instead of generating audio, causing complete silence for 10-15+ seconds. This is a critical issue for production voice agents with no available workaround.

We request P1 prioritization, acknowledgment of the issue, and an estimated fix timeline.

Report prepared: January 12, 2026

Observed in production: January 9, 2026

Srikanta_K_N · January 13, 2026, 9:30am

Hi @bunny1, welcome to the community!

I tried multiple times to reproduce the issue, but every time, the transcript and audio generation was fine. But this is expected, as you have mentioned the issue happens occasionally.
Can you please send some additional logs and any specific, relevant configuration that we can use to try and reproduce this issue with the model?

Thanks!

Jon_Baek_Bomme · January 14, 2026, 9:50am

Occasionally, I experience the same issue with the Danish language.

Arun_A_S · February 25, 2026, 7:31am

I am also experiencing this issue, with the gemini-2.5-flash-native-audio-preview-12-2025 model. I don’t have screenshots which I can show right now, but our customers who use our application have reported this issue 2 times this week already and a few times earlier this month. <ctrl46><ctrl46><ctrl46><ctrl46> is what we usually receive in the output audio transcription. It doesnt happen always, so hard to reproduce.

My config is

languages - en-US or ar-EG

Input modality - AUDIO

Output Modality - AUDIO

Voice - Leda or Fenrir

I also have multiple tools configured

Using SlidingWindow Context Window Compression

"automatic_activity_detection": {
    "disabled": false,
    "start_of_speech_sensitivity": START_SENSITIVITY_HIGH,
    "end_of_speech_sensitivity": END_SENSITIVITY_HIGH,
    "prefix_padding_ms": 5,
    "silence_duration_ms": 300
}

Would appreciate this being looked into since other places are reporting this same issue, and with other models and tools as well ( label into the response ( &<ctrl46>) · Issue #5957 · google-gemini/gemini-cli · GitHub , Model pushing special tokens to Gemini CLI · Issue #4486 · google-gemini/gemini-cli · GitHub )

Let me know if any further info is needed

Topic		Replies	Views
Gemini 2.5 Native Dialog audio problems Gemini API ai-studio , audio , gemini-flash-2-5	35	2385	January 28, 2026
2.5 flash audio native - output broken in DE Gemini API models	8	553	October 18, 2025
Inconsistent Response Behavior in gemini-2.5-flash-native-audio-preview-09-2025 Voicebot Gemini API ai-studio , live-streaming	5	809	January 7, 2026
Silence in audio files generated by Gemini 2.5 Pro Preview TTS Gemini API gemini , audio	8	444	February 16, 2026
Gemini Live API: token generation suddenly stops Gemini API ai-studio , api , audio , live-streaming	13	1037	October 8, 2025

[BUG] Gemini 2.5 Flash Native Audio outputs control characters (`<ctrl##>`) instead of audio, causing silent responses

Summary

Affected Model(s)

Environment

Expected Behavior

Actual Behavior

Evidence

Session Timeline (Anonymized)

Key Observations

Control Character Details

Reproduction

Trigger Conditions (Observed)

Steps to Reproduce

Impact

User Experience

Business Impact

Workaround Attempts

Related GitHub Issues

Technical Analysis

Hypothesis

Why This Differs from Audio Cutoff

Requests

1. Priority Escalation

2. Timeline and Acknowledgment

3. Technical Clarification

Additional Information

Session Configuration

We Can Provide

Summary

Related topics