`turnComplete` flag set to `false` in `ClientContentMessage` of Multimodal Live API prevents processing of subsequent `RealtimeInputMessage`

romankupkovic · January 27, 2025, 8:51am

I am currently working with the https://github.com/google-gemini/multimodal-live-api-web-console repository.

The ClientContentMessage type allows for sending text data to the API. Using its flag turnComplete: false signals the API that the client is not finished so the data can be sent without the model generating a response to that message (useful for sending contextual data).

After sending this ClientContentMessage I would like to continue the voice conversation using the RealtimeInputMessage type message for the audio data as usual.

I just want to have the voice conversation and pipe some contextual data in between without the model explicitely responding to the contextual data.

Expected behaviour: After a ClientContentMessage with turnComplete set to false the model does NOT respond but I can send follow-up RealtimeInputMessages to which the model will respond.

Actual behaviour: After a ClientContentMessage with turnComplete set to false the model does NOT respond and it does NOT respond to follow-up RealtimeInputMessages with subsequent voice data.

I suspect the server needs a signal that the turn is completed after setting it to false, but the RealtimeInputMessage currently does not provide this data field and I can only end the turn with another ClientContentMessage with turnComplete: true . But this is not compatible with having subsequent voice conversation because the voice data is sent as a stream over the websocket and the server needs to determine end of speech.

KRows · January 27, 2025, 7:05pm

Here’s a practical solution for handling mixed text and voice input:

// First send context without completing the turn
await client.send({
  type: 'ClientContentMessage',
  content: contextualData,
  turnComplete: false
});

// Then send a special marker message
await client.send({
  type: 'ClientContentMessage',
  content: '[CONTEXT_COMPLETE]',
  turnComplete: false
});

// Now stream voice data
startVoiceStream({
  type: 'RealtimeInputMessage',
  audio: audioData
});

Consider submitting a feature request to add a turnComplete flag to RealtimeInputMessage . This would enable seamless transitions between text context and voice streaming.

romankupkovic · January 27, 2025, 7:50pm

I fear a problem with adding turnComplete flag to RealtimeInputMessage is that it requires VAD handled on client side instead on server side to pin point the last RealtimeInputMessage for setting the flag.

romankupkovic · January 27, 2025, 8:05pm

Thanks for the response

  /**
   * send normal content parts such as { text }
   */
  send(parts: Part | Part[], turnComplete: boolean = true) {
    parts = Array.isArray(parts) ? parts : [parts];
    const content: Content = {
      role: "user",
      parts,
    };

    const clientContentRequest: ClientContentMessage = {
      clientContent: {
        turns: [content],
        turnComplete: turnComplete,
      },
    };

    this._sendDirect(clientContentRequest);
    this.log(`client.send`, clientContentRequest);
  }

  /**
   * Sends a context completion marker message
   */
  sendContextComplete() {
    const contextMarker: Part = {
      text: "[CONTEXT_COMPLETE]"
    };

    const content: Content = {
      role: "user",
      parts: [contextMarker]
    };

    const clientContentRequest: ClientContentMessage = {
      clientContent: {
        turns: [content],
        turnComplete: false
      }
    };

    this._sendDirect(clientContentRequest);
    this.log(`client.contextComplete`, clientContentRequest);
  }

    // Send the text input to Gemini with full context
    client.send([{ 
      text: `context....`
    }], false);

    client.sendContextComplete();

I tried implementing it like this to fit the ClientContentMessage send() function format but it did not make subsequent RealtimeInputMessages trigger a response for me.

Topic		Replies	Views
Priming Gemini as if it spoke Gemini API gemini	1	62	January 25, 2025
Live API Hidden context Gemini API api	0	43	February 16, 2025
Update On LiveClientRealtimeInput Gemini API api	1	107	January 25, 2025
Interrupting Gemini 2 Flash Multimodal Live API seem not to work as expected Gemini API gemini-flash	0	194	January 19, 2025
Problems with Live API Audio Streaming and Function Responses Gemini API api	0	69	March 30, 2025

`turnComplete` flag set to `false` in `ClientContentMessage` of Multimodal Live API prevents processing of subsequent `RealtimeInputMessage`

Related topics