Live API - PTT with external STT & Interruptions

Peter_Kruck · July 18, 2025, 6:27pm

The Main Issue

I need to implement push-to-talk (PTT) with interruption support using:

External ASR (separate Speech-to-Text model) for audio processing
Live API for text-based conversation with audio responses

The core problem is handling interruptions properly. Users need to be able to interrupt mid-response and immediately continue the conversation, but I’m facing WebSocket protocol violations.

Why External ASR / STT Instead of Full Live API with VAD?

I specifically want PTT with interruption over full Live API with VAD because of various reasons.

Proposed Architecture

User Audio → External ASR → Live API Text Request → Audio Response Stream
                                     ↑
                              INTERRUPTION PROBLEM

The Interruption Problem

The Live API documentation shows interruption patterns using manual activity detection with raw audio:

config = {
    "response_modalities": ["AUDIO"],
    "realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}

# Send raw audio with activity signals for interruption
await session.send_realtime_input(activity_start=types.ActivityStart())
await session.send_realtime_input(audio=audio_blob)
await session.send_realtime_input(activity_end=types.ActivityEnd())  # For interruption

But this requires sending raw audio to Live API, not text from external ASR.

My Failed Attempt

I tried implementing interruption support by sending explicit signals to the Live API session.

Normal conversation works fine: I send transcribed text from external ASR to Live API, it responds with streaming audio.

# This works perfectly
await session.send_client_content(
    turns=types.Content(role="user", parts=[types.Part(text=transcribed_text)]),
    turn_complete=True
)

For interruptions: When users interrupt mid-response, I attempted to signal this by sending just turn_complete=True, thinking it would tell the API to stop generating.

# For interruption, I tried this
await session.send_client_content(turn_complete=True)  # ❌ Causes WebSocket error 1007

Result: This immediately corrupts the session with 1007 (invalid frame payload data) error. The entire session becomes unusable and must be recreated.

The issue: Sending turn_complete=True without proper content violates the Live API’s message format, but I can’t find documentation on the correct interruption approach for text-based conversations.

Core Questions About Interruption

Can I implement proper PTT interruption with external ASR + Live API text requests?
How should interruptions be handled in text-based Live API conversations?
- Can I send activityEnd signals even when using text requests?
- Is there a proper way to signal turn completion for interruption?
- Should I rely on natural interruption when sending new requests?
Am I forced to use Live API’s built-in ASR for proper interruption support?

Environment

SDK: google-genai Python SDK v1.24.0
Model: gemini-2.5-flash-preview-native-audio-dialog
Use Case: Real-time conversational AI with interruption

Request for Guidance

The documentation seems to assume raw audio input for interruption handling. Is it possible to implement proper PTT interruption with external ASR + Live API text requests? If so, what’s the correct approach?

I specifically want PTT with interruption over full Live API with VAD due to my architecture requirements. Any insights would be greatly appreciated!

References:

Topic		Replies	Views
Handling user interruptions with gemini-live-2.5-flash vertex ai model Gemini API models , audio	3	42	July 25, 2025
Disable interruptions for audio streaming for multimodal live api Gemini API api	5	409	June 24, 2025
How do I prevent the Live API from discarding audio when it's given audio while it speaks? Gemini API api , gemini-api	10	236	June 24, 2025
Received 1007 invalid payload using Gemini Live API Gemini API api , text	6	532	June 19, 2025
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	785	July 22, 2025