Gemini Flash Live API: How to ensure the model always uses the latest user-provided context after a sequence of context + audio turns?

I’m building an application using the Google Gemini Flash Live API (genai) where the user can send updated context (for example, a new code snippet or document text) followed by live audio input (e.g., asking a question about the latest context they just provided).

My goal is for Gemini to always have the most recent context as the basis for its response—so that, for example, if the user asks “Can you see my latest version?” right after sending an update, the model’s answer accurately reflects the latest content.

Problem:
Even though I send the updated context using sendClientContent (as a user turn), if the user then speaks (audio streamed live to Gemini), the model sometimes replies with hallucinated, old, or unrelated content, as if it did not receive the latest context.

What I’m doing:

  • On context update (e.g., new code or document text):

    await sessionInstance.sendClientContent({
        turns: [{
            role: 'user',
            parts: [{ text: `Here is my latest update:\n\`\`\`\n${latestContent}\n\`\`\`\n` }]
        }],
        turnComplete: false
    });
    
  • On audio (raw audio buffer from the client):

    await sessionInstance.sendRealtimeInput({
        media: {
            data: audioData.toString('base64'),
            mimeType: 'audio/pcm;rate=16000'
        }
    });
    
  • (I’ve also tried sending the context again as a sendClientContent turn immediately before each audio input, but that doesn’t seem to work reliably.)

Relevant Backend Handler:

ws.on('message', async (data: WebSocket.RawData) => {
    try {
        const messageStr = data.toString();

        if (messageStr.trim().startsWith('{')) {
            const message = JSON.parse(messageStr);
            if (message.type === 'context_update') {
                session.context = message.content;
                await sessionInstance.sendClientContent({
                    turns: [{
                        role: 'user',
                        parts: [{ text: `Here is my latest update:\n\`\`\`\n${message.content}\n\`\`\`\n` }]
                    }],
                    turnComplete: false
                });
            }
        } else {
            // AUDIO: (I've also tried sending the context here, see below)
            await sessionInstance.sendRealtimeInput({
                media: {
                    data: data.toString('base64'),
                    mimeType: 'audio/pcm;rate=16000'
                }
            });
        }
    } catch (e) {
        // error handling...
    }
});

What I’ve Tried:

  • Sending the context as a sendClientContent turn immediately before each audio input (with turnComplete: false or true).
  • Waiting for Gemini’s response after just a context update: Gemini does not reply until audio or a user question is sent.
  • Changing how much context I include (full content, diffs, etc).

No matter what I do, sometimes Gemini answers with content that is not the latest, or hallucinates.


Question:
How can I reliably ensure that the Gemini Flash Live API always uses the latest user-provided context for the next audio/user question turn?
Is there an official/recommended pattern to “bind” a context update and an audio question together, or to always force the model to answer about the latest user content?

Relevant API Docs:

Any help or relevant code pattern is appreciated!