Hitting input token limits that are way lower than advertised in gemini 2.0

Hey folks,

I’m trying to send about 40k tokens or so, way less than what is permitted (1m) for gemini 2.0, and am seemingly having that break with the following websocket exception:

E       websockets.exceptions.ConnectionClosedError: received 1007 (invalid frame payload data) Request trace id: ffa37544583b21f9, [ORIGINAL ERROR] generic::invalid_argument: Input request contains (44599) tokens, whic; then sent 1007 (invalid frame payload data) Request trace id: ffa37544583b21f9, [ORIGINAL ERROR] generic::invalid_argument: Input request contains (44599) tokens, whic

The code works, I can take the same logic and supply less content and get what I want.

The logic is fairly simple, and expects an audio output that we then stream and play as needed. Code below. Any thoughts as to why this is failing?

config = genai.types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    system_instruction=genai.types.Content(
        parts=[genai.types.Part(text=system_prompt)]
    ),
    generation_config=genai.types.GenerationConfig(
        temperature=self.settings.genai_model_temperature,
        max_output_tokens=8192,
    ),
    speech_config=genai.types.SpeechConfig(
        voice_config=genai.types.VoiceConfig(
            prebuilt_voice_config=genai.types.PrebuiltVoiceConfig(
                voice_name=VOICES[0]
            ),
        ),
    ),
)

async with self.client.aio.live.connect(
    model=self.settings.genai_model_name, config=config
) as session:
    await session.send(combined_prompt, end_of_turn=True)
    audio_data=[]

    async for response in session.receive():
        if not response.server_content.turn_complete:
            for part in response.server_content.model_turn.parts:
                if part.inline_data and part.inline_data.data:
                    audio_data.append(np.frombuffer(chunk, dtype="int16"))


with sd.OutputStream(samplerate=24000, channels=1, dtype="int16") as stream:
    stream.write(np.concatenate(audio_data))


The goal is to improve the audio for CustomPod as the audio 2.0 has is incredible.

Audio generation is only available for “early access” as seen in the experimental model details

regardless have you been able to run the example notebook ?

Yes the notebook works, and the code above does as well for a smaller input size.

It’s only when i send it a larger amount of content do I get that error message, consistently.

Is that part of the early access limitations?

received 1007 (invalid frame payload data) Request trace id: fc1a2c4181400b5d, [ORIGINAL ERROR] generic::invalid_argument: Input request contains (95200) tokens, whic; then sent 1007 (invalid frame payload data) Request trace id: fc1a2c4181400b5d, [ORIGINAL ERROR] generic::invalid_argument: Input request contains (95200) tokens, whic

I get this same error - but the website mentions that

The following rate limits apply:

  • 3 concurrent sessions per API key
  • 4M tokens per minute

Is there any workaround

1 Like