Audio Input Cannot Trigger History Recall in Gemini Live API (Only Text Input Works)

Linming · December 10, 2025, 8:47am

Issue Description

When using the Gemini Live API (including the streaming “native audio” mode), loading prior conversation history via context or history only works when the user sends text input.
Audio/voice input never triggers the model to recall previously provided history, even though the same context works correctly when queried via text.

This issue appears across both the LiveKit agent integration and direct Gemini Live API calls, indicating that the problem originates from the model or API behavior itself rather than the client implementation.

Reproduction Steps

1. Prepare a History Context

Example context loaded via load_context(history) or history / messages:

User: Where does XX work?
Assistant: XX works at YYY company.

2. Start a new Live session

Model tested:

gemini-2.5-flash-native-audio-preview-09-2025
gemini-2.0-flash-live-001

3. Ask the same question using audio input**

(Audio) “Where does XX work?”

→ Model responds: “I don’t know.”

4. Without resetting the session, send the same question as text input**

(text) "Where does XX work?"

→ Model responds correctly: “XX works at YYY company.”

5. Repeat the same test using:

LiveKit Agent (audio)
LiveKit Agent Playground (audio → text)
Gemini official Live API sample code (audio → text)

All environments reproduce the same behavior:

Audio question → history not recalled
Text question → history recalled correctly

Expected Behavior

Audio input should behave the same as text input:
When prior conversation history is loaded into the session, both audio and text queries should equally be able to access and recall that history.

Actual Behavior

Text queries can successfully retrieve information from loaded history.
Audio queries consistently fail to recall any historical information, responding as if no history exists.

Environment

Tested across:

Gemini Live API — official sample code
Gemini 2.5 Flash Native Audio Preview — streaming mode
Gemini 2.0 Flash Live
LiveKit Agent (same behavior reproduced)
LiveKit Agent Playground (audio → fail, text → success)

The issue is consistent and model-independent.

Additional Notes

This behavior strongly suggests that audio inputs are not currently integrated into the history/context attention path, or the audio encoder does not consider preloaded history.
The issue is reproducible across all environments, which eliminates LiveKit or client-side problems.
A temporary workaround is to inject important history into the system prompt, but this is only a partial solution.

Sonali_Kumari1 · December 22, 2025, 7:08am

Hi @Linming , Thank you for bringing this to our attention.

Apologies for the delayed response. Could you please confirm if you are still facing the same issue?

Arsh-PV · January 19, 2026, 5:08pm

Facing the same problem,

  model = "gemini-2.5-flash-native-audio-preview-12-2025"
  config={
    "response_modalities": ["AUDIO"],
    "system_instruction": VOICE_MODE,
    "output_audio_transcription": {},
    "input_audio_transcription": {},
    "thinking_config": {
      "thinking_budget": 0
    },
    "realtime_input_config": {
      "automatic_activity_detection": {
        "disabled": True
      }
    },
    "tools": [{'google_search': {}}]
  }

  if mode == "auto":
    config["realtime_input_config"]["automatic_activity_detection"] = {
      "disabled": False,
      "start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_HIGH,
      "end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
      "prefix_padding_ms": 20,
      "silence_duration_ms": 200,
    }

  try:
    history = await pg.fetch("SELECT role, content FROM messages WHERE session_id = $1 AND content_type = 'text' ORDER BY id ASC LIMIT 20;", session_id)

    async with client.aio.live.connect(model=model, config=config) as session:

      turns = [{"role": turn["role"], "parts": [{"text": turn["content"]}]} for turn in history]
      if turns: await session.send_client_content(turns=turns, turn_complete=False if turns[-1]["role"] == "model" else True)

      CLIENTS.pop(session_id, None)
      
      await socket.connect(websocket)
      receiver_task = asyncio.create_task(socket.receive(websocket, session, mode))
      sender_task = asyncio.create_task(socket.send_live(websocket, session, session_id, valid_session["name"]))

      await asyncio.wait([sender_task, receiver_task], return_when=asyncio.FIRST_COMPLETED)

  except Exception as e:
    socket.disconnect(websocket)
    logger.exception(e)

when asked anything related to the previous content, the model responds with something like this is first time or start of the conversion …

I am fairly certain this was not an issue before

Shrestha_Basu_Mallic · January 28, 2026, 8:06am

Hi @Linming we are looking into this. Are you still facing this issue?

Topic		Replies	Views
Live API Hidden context Gemini API api	1	126	June 11, 2025
Adding Chat history problem Documentation api , prompt , gemini-flash-2-5	1	81	November 3, 2025
Gemini Live API (Native Audio): Response Latency Gradually Increases During Long Sessions Gemini API gemini-api , gemini , audio	1	270	March 6, 2026
Gemini Live Model Regression Gemini API model , gemini-flash	2	151	April 2, 2025
Gemini Flash Live API: How to ensure the model always uses the latest user-provided context after a sequence of context + audio turns? Gemini API model-code , gemini-flash	0	188	May 19, 2025