🎙️ Real-Time Meeting Bot with Multiple Speakers — Can Gemini Live API Replace Deepgram for Faster Agenda Tracking?

Yash_WO · June 6, 2025, 10:51am

Hey all,

I’m building a real-time meeting assistant where the goal is to track the progress of predefined agenda items during ongoing conversations.

Setup:

The meeting audio is captured as a raw studio stream (PCM 16kHz).
There can be 50–100 participants, though only a few speak at a time—like in any typical active discussion.
Right now, we send the stream to Deepgram for live transcription, and every 10 seconds we pass the transcribed text to GPT-4.1 Nano (or other LLMs) to:
- Detect which agenda item is being discussed
- Determine its status: Not Started, In Progress, or Completed

What We Want to Achieve:

We’re aiming for real-time agenda tracking—ideally sub-1-second updates instead of waiting 10s.

To reduce this lag, I’m exploring whether we can completely skip Deepgram and instead use the Gemini Live API for:

Both transcription and
Natural language understanding (i.e. tracking agenda progress in real-time)

My Questions:

Can Gemini Live API handle raw PCM 16kHz audio directly?

If not, what preprocessing is needed to make the stream consumable?

Can it transcribe and understand intent simultaneously, so we don’t need a separate transcription layer?
Is there a way to stream live context (like the current agenda list and previous discussion state) to Gemini Live API continuously?
Has anyone tried this kind of “one-hop” LLM streaming architecture before?

Any pointers, success stories, or even architecture sketches would be incredibly helpful. We’re happy to consider hybrid or fallback options too if complete replacement isn’t practical yet.

Thanks in advance!

Topic		Replies	Views
Build an app that can hear and see? Gemini API gemini-15	11	499	July 17, 2024
Real-Time Speech-to-Text Gemini API	1	1341	May 29, 2024
Python Implementation for Real-time Video Stream Analysis with Gemini 2.0 Multimodal Live API Gemini API api	2	434	December 30, 2024
Realtime Transcription in Multimodal Live API Gemini API ai-studio , fastapi	3	463	May 6, 2025
Transcribe text to text and vice versa, speech to speech and image to text in a flutter app using gemini Gemini API	15	688	May 20, 2024

🎙️ Real-Time Meeting Bot with Multiple Speakers — Can Gemini Live API Replace Deepgram for Faster Agenda Tracking?

Setup:

What We Want to Achieve:

My Questions:

Related topics