Significant delay with Gemini Live 2.5 Flash (native audio)

We’re seeing noticeable end-to-end latency with Gemini Live (real-time voice) even when using gemini-live-2.5-flash-native-audio with thinking disabled. We are using us-central1 region in vertex ai. We’d like to share our setup and ask if this matches others’ experience or if there are recommended changes.

Use case

  • Real-time voice assistant (OB-GYN clinical assistant): user speaks, model replies with native audio.

  • Flow: WebSocket Live API → we send audio, receive audio + transcriptions; we also use function/tool calling during the conversation.

const config = {
  systemInstruction: "<string>",  // one system prompt; we use a single text systemInstruction
  tools: [geminiTools],           // function declarations for tool/function calling
  generationConfig: {
    maxOutputTokens: 4096,
    thinkingConfig: {
      thinkingBudget: 0           // thinking disabled
    },
    temperature: 0.5
  },
  responseModalities: [Modality.AUDIO],
  outputAudioTranscription: {},
  realtimeInputConfig: {
    activityHandling: ActivityHandling.NO_INTERRUPTION
  },
  contextWindowCompression: { slidingWindow: {} },
  sessionResumption: resumptionHandle ? { handle: resumptionHandle } : {}
};

ai.live.connect({
  model: 'gemini-live-2.5-flash-native-audio',
  config,
  callbacks: { ... }
});

We are experiencing 10-15 seconds to get a response from the model, without function calls. Is this level of latency expected for this model + tools, or are there settings that usually help reduce delay?

1 Like