I’m modeling cost on Live API model gemini-3.1-flash-live-preview and want to know whether input audio transcription is billed.
What i measured: on identical audio, prompt token usage is byte-identical whether input transcription is requested or not — Input transcription appears to add no tokens — it seems derived from the already-tokenized audio.
Can someone confirm that’s correct — that input transcription is not separately billed (no hidden token or per-character charge)?
Side finding: it can’t be disabled anyway
While testing we found server_content.input_transcription is returned on every turn regardless of input_audio_transcription — omitting the key, setting None, or using the typed config object all leave it on (output_audio_transcription is respected). input_audio_transcription=False is rejected by the SDK, and AudioTranscriptionConfig has no enable/disable field. So there’s no way to opt out.
Repro
import asyncio
from google import genai
from google.genai import types
MODEL = "gemini-3.1-flash-live-preview"
# PCM_16K_MONO = raw 16kHz mono PCM16 bytes of any short utterance
async def trial(label, request_input):
cfg = {"response_modalities": ["AUDIO"], "output_audio_transcription": {}}
if request_input:
cfg["input_audio_transcription"] = {} # else: omitted
client = genai.Client()
prompt_tok = n_in = 0
async with client.aio.live.connect(model=MODEL, config=cfg) as s:
await s.send_realtime_input(audio=types.Blob(data=PCM_16K_MONO, mime_type="audio/pcm;rate=16000"))
sil = b"\x00" * (16000 * 2 // 10)
for _ in range(15):
await s.send_realtime_input(audio=types.Blob(data=sil, mime_type="audio/pcm;rate=16000"))
await s.send_realtime_input(audio_stream_end=True)
async for resp in s.receive():
if resp.usage_metadata:
prompt_tok += resp.usage_metadata.prompt_token_count or 0
sc = resp.server_content
if sc and sc.input_transcription and sc.input_transcription.text:
n_in += 1
if sc and sc.turn_complete:
break
print(f"[{label}] prompt_tokens={prompt_tok} input_transcripts={n_in}")
async def main():
await trial("input transcription requested", True)
await trial("input transcription omitted", False)
asyncio.run(main())
Output (identical audio, deterministic) — same tokens either way, and input transcription returns even when omitted:
[input transcription requested] prompt_tokens=296 input_transcripts=1
[input transcription omitted] prompt_tokens=296 input_transcripts=1
Questions
- Is input transcription billed? Our data says no extra tokens — can you confirm there’s no separate charge?
- Is there any supported way to disable input transcription for this model?
Environment
google-genai 1.70.0 · Python 3.13.12 · response_modalities=["AUDIO"], 16 kHz mono PCM16, output_audio_transcription enabled.