We’re seeing noticeable end-to-end latency with Gemini Live (real-time voice) even when using gemini-live-2.5-flash-native-audio with thinking disabled. We are using us-central1 region in vertex ai. We’d like to share our setup and ask if this matches others’ experience or if there are recommended changes.
Use case
-
Real-time voice assistant (OB-GYN clinical assistant): user speaks, model replies with native audio.
-
Flow: WebSocket Live API → we send audio, receive audio + transcriptions; we also use function/tool calling during the conversation.
const config = {
systemInstruction: "<string>", // one system prompt; we use a single text systemInstruction
tools: [geminiTools], // function declarations for tool/function calling
generationConfig: {
maxOutputTokens: 4096,
thinkingConfig: {
thinkingBudget: 0 // thinking disabled
},
temperature: 0.5
},
responseModalities: [Modality.AUDIO],
outputAudioTranscription: {},
realtimeInputConfig: {
activityHandling: ActivityHandling.NO_INTERRUPTION
},
contextWindowCompression: { slidingWindow: {} },
sessionResumption: resumptionHandle ? { handle: resumptionHandle } : {}
};
ai.live.connect({
model: 'gemini-live-2.5-flash-native-audio',
config,
callbacks: { ... }
});
We are experiencing 10-15 seconds to get a response from the model, without function calls. Is this level of latency expected for this model + tools, or are there settings that usually help reduce delay?