Authentication for Gemini Live API Raw WebSocket Connection (BidiGenerateContent)

Hello,

I am working on integrating with the Gemini Live API using direct WebSockets, specifically targeting the BidiGenerateContent service endpoint (e.g., wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent).

My goal is to establish and manage the WebSocket connection directly in my application without relying on the official client SDKs (Python, JS, etc.).

I have reviewed the “Live API - WebSockets API reference” documentation. While it clearly details the expected JSON message structures for BidiGenerateContentSetup, BidiGenerateContentRealtimeInput (including audio, activityStart, activityEnd), and server messages like BidiGenerateContentServerMessage, I am unable to find documentation specifying the required authentication mechanism for the initial WebSocket connection handshake.

I’ve seen examples for the standard REST API endpoints where authentication is done via the ?key=YOUR_API_KEY query parameter in the URL. However, WebSocket connections often use different methods during the handshake (like Authorization headers, specific subprotocols, etc.), and the documentation doesn’t confirm if the simple API key query parameter is applicable or sufficient for the wss:// endpoint.

Could you please provide guidance or point to documentation explaining the correct way to authenticate a raw WebSocket connection request to the BidiGenerateContent service?

Specifically:

  • How should credentials (API Key or a derived token like OAuth2) be passed during the WebSocket connection handshake?
  • Are specific HTTP headers required (e.g., Authorization: Bearer <token>)?
  • Is the ?key=API_KEY query parameter method supported for the WebSocket endpoint?

Any examples or clarification on the expected authentication flow for a direct WebSocket connection would be extremely helpful.

Thank you!