Summary
When using the gemini-2.5-flash-native-audio-preview-12-2025 model via the Live API, the model occasionally outputs control characters (e.g., <ctrl46><ctrl46>) to the transcript stream instead of generating audio. During these episodes, no audio is produced, leaving users in complete silence for 10-15+ seconds.
This is a critical production issue affecting voice agent deployments where users have no indication that the system is working, leading to poor user experience and abandoned sessions.
Affected Model(s)
-
gemini-2.5-flash-native-audio-preview-12-2025(confirmed) -
Potentially other native audio preview models in the same family
Environment
| Component | Value |
|-----------|-------|
| API | Gemini Live API via WebSocket |
| Model | gemini-2.5-flash-native-audio-preview-12-2025 |
| Response Modality | AUDIO only |
| Language | Romanian (but likely affects all languages) |
| Use Case | Production voice agent for appointment confirmations |
| SDK | google-genai Python SDK |
Expected Behavior
-
User speaks to the voice agent
-
Model processes the input
-
Model generates audio response
-
User hears the response immediately
Actual Behavior
-
User speaks to the voice agent
-
Model processes the input
-
Model outputs control characters (
<ctrl46><ctrl46>) to the transcript stream -
No audio is generated - the audio stream contains silence
-
User hears nothing for 10-15+ seconds
-
User asks “Can you hear me?” multiple times (confirming they heard nothing)
-
Eventually, model may recover and produce normal audio
Evidence
Session Timeline (Anonymized)
Production Session - January 9, 2026
Model: gemini-2.5-flash-native-audio-preview-12-2025
32.7s User: "Nu, nu pot veni. As vrea sa reprogramez."
(Translation: "No, I can't come. I'd like to reschedule.")
33.7s Agent transcript output: "<ctrl46><ctrl46>"
Audio output: NONE (verified via audio recording)
[Internal tool call triggered - searching for appointment slots]
[~13 SECONDS OF COMPLETE SILENCE - NO AUDIO GENERATED]
45.1s User: "M-ati auzit?"
(Translation: "Can you hear me?" - confirms no audio was heard)
47.3s Agent: "Va rog sa asteptati putin..."
(Translation: "Please wait a moment..." - audio resumes normally)
Key Observations
-
Transcript explicitly contains
<ctrl46><ctrl46>- These characters appear in the output_transcription stream where normal text should be -
Audio recording confirms complete silence - The OGG recording of the session contains zero audio during this 13-second period
-
User confirmation of silence - The user’s “Can you hear me?” at 45.1s proves they received no audio
-
Model eventually recovers - After the silence, normal audio generation resumes
Control Character Details
The control characters observed follow the pattern <ctrl##> where ## is a number. Examples seen:
-
<ctrl46>(most common) -
Multiple consecutive occurrences:
<ctrl46><ctrl46>
Questions:
-
What do these control characters represent internally?
-
Why are they leaking into the transcript output instead of being processed?
-
Why does their presence correlate with audio generation failure?
Reproduction
Trigger Conditions (Observed)
This issue appears to occur:
-
After processing user speech that requires a substantive response
-
More frequently when tool/function calls are involved (but not exclusively)
-
Inconsistently - the same input may work sometimes and fail other times
Steps to Reproduce
-
Set up a Gemini Live API session with
gemini-2.5-flash-native-audio-preview-12-2025 -
Configure for AUDIO-only response modality
-
Engage in multi-turn conversation
-
At some point (unpredictable), the model will output
<ctrl##>instead of audio -
Observe silence in the audio stream
-
Check transcript to see control characters
Note: Due to the inconsistent nature of the bug, reproduction may require multiple attempts.
Impact
User Experience
-
Users hear nothing for 10-15+ seconds
-
Users assume the system is broken or didn’t hear them
-
Users repeatedly ask “Can you hear me?”
-
Sessions are abandoned due to perceived failure
Business Impact
-
Voice agents appear unreliable in production
-
Customer frustration and support burden
-
Cannot deploy native audio models with confidence
Workaround Attempts
-
No reliable workaround has been found
-
The issue occurs at the model level before any application-layer processing
Related GitHub Issues
This appears related to other reported audio generation issues with native audio models:
-
google-gemini/live-api-web-console#117 - Audio cutoff mid-speech
-
googleapis/python-genai#1725 - Audio generation inconsistency (Closed - Not Planned)
-
google-gemini/cookbook#977 - LiveAPI stops talking in a second
-
googleapis/js-genai#707 - Responses cut off with turnComplete
The control character output may be a specific manifestation of the broader audio generation failure pattern described in these issues.
Technical Analysis
Hypothesis
Based on the evidence, it appears that:
-
The model’s internal audio generation pipeline sometimes fails
-
When audio generation fails, the model outputs control characters as a fallback or error state
-
These control characters leak into the transcript stream
-
The audio stream remains empty/silent
-
Eventually, the model’s internal state recovers and normal operation resumes
Why This Differs from Audio Cutoff
This is distinct from the “audio cutoff mid-sentence” issue:
-
Audio cutoff: Audio starts playing, then stops early
-
This issue: No audio is ever generated - complete silence from the start
Requests
1. Priority Escalation
This issue is currently tracked as P2 (Moderately-important). Given the production impact on voice agent deployments, we respectfully request escalation to P1.
Justification:
-
Affects production systems with real users
-
No workaround available
-
Causes significant user experience degradation
-
Undermines confidence in native audio models
2. Timeline and Acknowledgment
We request:
-
Confirmation that this issue is being tracked
-
An estimated timeline for a fix or patch
-
Any interim workarounds while a fix is developed
3. Technical Clarification
We would appreciate understanding:
-
What the
<ctrl##>characters represent -
Why audio generation fails when these characters appear
-
Whether there’s a way to detect and handle this state
Additional Information
Session Configuration
# Model configuration used
model = "gemini-2.5-flash-native-audio-preview-12-2025"
# Response modality
response_modalities = [genai_types.Modality.AUDIO]
# VAD configuration
automatic_activity_detection = genai_types.AutomaticActivityDetection(
start_of_speech_sensitivity=genai_types.StartSensitivity.START_SENSITIVITY_HIGH,
end_of_speech_sensitivity=genai_types.EndSensitivity.END_SENSITIVITY_LOW,
prefix_padding_ms=20,
silence_duration_ms=300,
)
We Can Provide
If helpful for debugging:
-
Full anonymized session logs
-
Audio recordings showing the silence
-
Transcript files with control characters
-
Additional test sessions
Summary
The gemini-2.5-flash-native-audio-preview-12-2025 model occasionally outputs <ctrl##> control characters instead of generating audio, causing complete silence for 10-15+ seconds. This is a critical issue for production voice agents with no available workaround.
We request P1 prioritization, acknowledgment of the issue, and an estimated fix timeline.
Report prepared: January 12, 2026
Observed in production: January 9, 2026