What Audio Model is used for NotebookLM's Audio Overview Feature?

luluthepooh · October 20, 2024, 11:20am

I’ve been trying out NotebookLM’s audio overview feature for a while and listening to actual podcasts/interviews of Steve Johnston (VP) or Raiza Martin (Senior PM) of NotebookLM at Google. They keep saying that they are using Gemini for AI, but I have never heard about the AI model used to create the audio voices. Does anyone have any idea?

I can’t seem to figure out how the audio feels like a conversation. Sure Gemini can be used to create the script but the fact that the conversation feels so genuine or how one person kind of talks immediately after the other. It doesn’t feel like it was just 2 different voices using TTS APIs because it feels quite natural.

I’ve tried various AI audio models such as OpenAI, AWS Polly, and Google TTS but nothing can replicate this conversation-like behavior. Is it publicly available? Is it an open source project?

Gemini APIs don’t have audio outputs so what is it? Very curious!

afirstenberg · October 20, 2024, 3:00pm

What is it that you find so different about the conversation-like behavior?

Google TTS can certainly do multiple speakers. The question is how natural and fluid they sound.

tocsa · October 20, 2024, 8:17pm

I listened to the deep dive podcast where the two hosts are the two NotebookLM voices. The way they talk, how they intonate, emphasize is so natural, that until I’ve heard NotebookLM I didn’t realize it was AI. The way they go back and forth, question, etc. It feels a level above the TTS I use for my multi modal, voice enabled agent submission. The responses of my agents are fine, but not as vibrant and living, don’t feel as engaging. I wouldn’t say robotic, because it’s quite good, but compared to NotebookLM it’s as robotic as you’d compare mine to a decade+ earlier TTSs.

luluthepooh · October 20, 2024, 10:59pm

Google has a new voice type called “Journey” which is pretty good (albeit a pricey audio model). You can try it here: https://cloud.google.com/text-to-speech#demo

Again, the magic of NotebookLM isn’t that it doesn’t just sound like really high-quality TTS like this Journey model. I feel like there is some sort of additional AI model that combines 2 high-quality voices in a way that feels very natural and conversation-like. It’s hard to explain unless you’ve tried it yourself: https://notebooklm.google.com/

They already have millions of users for a reason… and no it’s not because Google is marketing it at all.

luisredondo · October 24, 2024, 2:15pm

They use SoundStorm: SoundStorm. This model is not publicly available AFAIK.

luluthepooh · October 26, 2024, 12:11am

Whoa! That’s very cool and thank you very much. This is probably it. Wonder if they’ll make this available as part of Gemini APIs or something…

Topic		Replies	Views
How to Access NotebookLM Via API? Gemini API api , help_request	37	94401	January 29, 2026
Regarding Google Project ready Voice module Gemini API gemini-15 , ai-studio , api , vertexai , gemini	2	91	November 27, 2025
API for NotebookLM Gemini API notebooklm	5	5236	July 6, 2025
Will it be possible to receive text and audio data in the multimodal API? Gemini API models , gemini-api	13	986	July 22, 2025
NotebookLM API? Gemini API api , help , model	7	9561	December 23, 2025

What Audio Model is used for NotebookLM's Audio Overview Feature?

Related topics