Gemini-3.1-flash-live-preview: does input transcription add to token cost? (and can it be disabled?)

I’m modeling cost on Live API model gemini-3.1-flash-live-preview and want to know whether input audio transcription is billed.

What i measured: on identical audio, prompt token usage is byte-identical whether input transcription is requested or not — Input transcription appears to add no tokens — it seems derived from the already-tokenized audio.

Can someone confirm that’s correct — that input transcription is not separately billed (no hidden token or per-character charge)?

Side finding: it can’t be disabled anyway

While testing we found server_content.input_transcription is returned on every turn regardless of input_audio_transcription — omitting the key, setting None, or using the typed config object all leave it on (output_audio_transcription is respected). input_audio_transcription=False is rejected by the SDK, and AudioTranscriptionConfig has no enable/disable field. So there’s no way to opt out.

Repro

import asyncio
from google import genai
from google.genai import types

MODEL = "gemini-3.1-flash-live-preview"
# PCM_16K_MONO = raw 16kHz mono PCM16 bytes of any short utterance

async def trial(label, request_input):
    cfg = {"response_modalities": ["AUDIO"], "output_audio_transcription": {}}
    if request_input:
        cfg["input_audio_transcription"] = {}   # else: omitted
    client = genai.Client()
    prompt_tok = n_in = 0
    async with client.aio.live.connect(model=MODEL, config=cfg) as s:
        await s.send_realtime_input(audio=types.Blob(data=PCM_16K_MONO, mime_type="audio/pcm;rate=16000"))
        sil = b"\x00" * (16000 * 2 // 10)
        for _ in range(15):
            await s.send_realtime_input(audio=types.Blob(data=sil, mime_type="audio/pcm;rate=16000"))
        await s.send_realtime_input(audio_stream_end=True)
        async for resp in s.receive():
            if resp.usage_metadata:
                prompt_tok += resp.usage_metadata.prompt_token_count or 0
            sc = resp.server_content
            if sc and sc.input_transcription and sc.input_transcription.text:
                n_in += 1
            if sc and sc.turn_complete:
                break
    print(f"[{label}] prompt_tokens={prompt_tok}  input_transcripts={n_in}")

async def main():
    await trial("input transcription requested", True)
    await trial("input transcription omitted",   False)

asyncio.run(main())

Output (identical audio, deterministic) — same tokens either way, and input transcription returns even when omitted:

[input transcription requested] prompt_tokens=296  input_transcripts=1
[input transcription omitted]   prompt_tokens=296  input_transcripts=1

Questions

  1. Is input transcription billed? Our data says no extra tokens — can you confirm there’s no separate charge?
  2. Is there any supported way to disable input transcription for this model?

Environment

google-genai 1.70.0 · Python 3.13.12 · response_modalities=["AUDIO"], 16 kHz mono PCM16, output_audio_transcription enabled.