Live API - PTT with external STT & Interruptions

The Main Issue

I need to implement push-to-talk (PTT) with interruption support using:

  • External ASR (separate Speech-to-Text model) for audio processing
  • Live API for text-based conversation with audio responses

The core problem is handling interruptions properly. Users need to be able to interrupt mid-response and immediately continue the conversation, but I’m facing WebSocket protocol violations.

Why External ASR / STT Instead of Full Live API with VAD?

I specifically want PTT with interruption over full Live API with VAD because of various reasons.

Proposed Architecture

User Audio → External ASR → Live API Text Request → Audio Response Stream
                                     ↑
                              INTERRUPTION PROBLEM

The Interruption Problem

The Live API documentation shows interruption patterns using manual activity detection with raw audio:

config = {
    "response_modalities": ["AUDIO"],
    "realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}

# Send raw audio with activity signals for interruption
await session.send_realtime_input(activity_start=types.ActivityStart())
await session.send_realtime_input(audio=audio_blob)
await session.send_realtime_input(activity_end=types.ActivityEnd())  # For interruption

But this requires sending raw audio to Live API, not text from external ASR.

My Failed Attempt

I tried implementing interruption support by sending explicit signals to the Live API session.

Normal conversation works fine: I send transcribed text from external ASR to Live API, it responds with streaming audio.

# This works perfectly
await session.send_client_content(
    turns=types.Content(role="user", parts=[types.Part(text=transcribed_text)]),
    turn_complete=True
)

For interruptions: When users interrupt mid-response, I attempted to signal this by sending just turn_complete=True, thinking it would tell the API to stop generating.

# For interruption, I tried this
await session.send_client_content(turn_complete=True)  # ❌ Causes WebSocket error 1007

Result: This immediately corrupts the session with 1007 (invalid frame payload data) error. The entire session becomes unusable and must be recreated.

The issue: Sending turn_complete=True without proper content violates the Live API’s message format, but I can’t find documentation on the correct interruption approach for text-based conversations.

Core Questions About Interruption

  1. Can I implement proper PTT interruption with external ASR + Live API text requests?

  2. How should interruptions be handled in text-based Live API conversations?

    • Can I send activityEnd signals even when using text requests?
    • Is there a proper way to signal turn completion for interruption?
    • Should I rely on natural interruption when sending new requests?
  3. Am I forced to use Live API’s built-in ASR for proper interruption support?

Environment

  • SDK: google-genai Python SDK v1.24.0
  • Model: gemini-2.5-flash-preview-native-audio-dialog
  • Use Case: Real-time conversational AI with interruption

Request for Guidance

The documentation seems to assume raw audio input for interruption handling. Is it possible to implement proper PTT interruption with external ASR + Live API text requests? If so, what’s the correct approach?

I specifically want PTT with interruption over full Live API with VAD due to my architecture requirements. Any insights would be greatly appreciated!


References:

I am also facing the interruption issue. I am using gemini model VAD which is default feature.
i get a interruption event but ideally model should stop generating content further which is not happening . how can i stop that

Hello,

There is no official support for external ASR with Gemini live API. But you are always welcome to experiment.

For posterity, I do this with text:

Disable VAD and set activityHandling = START_OF_ACTIVITY_INTERRUPTS
Send whatever to get the “conversation” started and start receiving
Send activityStart to interrupt, stop playing and clear local buffer
Send text
Send activityEnd to get the server to continue
Start receiving and playing again