We’re hitting a generic 400 from the Interactions API on long-running stateful chats
and want to understand whether there’s an undocumented per-chain content/size
limit, since the error message is opaque and the published 1M token context window
seems to not apply to this code path.
Setup
- API: Gemini Interactions API (
/v1beta/interactions) — the stateful endpoint,
notgenerateContent - Model:
gemini-3-flash-preview - SDK:
google-genaiPython (1.63+) - Mode: Stateful — using
previous_interaction_idto chain conversations - Tier: Paid (AI Studio API key)
The error
After many turns in a single chat session (typically 25-30+ turns of tool calls and
agent responses), a follow-up interactions.create() call returns:
google.genai._interactions.BadRequestError: Error code: 400 - { 'error': { 'message': 'Invalid input received.', 'code': 'invalid_request' } }
The error happens specifically on the call that submits function_results after the
model has emitted function_calls. The current call’s payload is not large (~130KB in our case). The session is, though.
The response body has no further detail. Response headers (when authenticating via
API key) don’t include x-goog-request-id or any of the trace IDs typically available on Vertex AI only server-timing: gfet4t7;dur=.
We replayed the failing payload in isolation — it works
The strongest evidence pointing at chain accumulation rather than content: we
extracted the exact 4 function_result payloads from the failing call (~126KB
combined) and submitted them as the only function_results in a fresh stateful chain
(no previous_interaction_id). Gemini accepted them cleanly with a normal text
response.
So:
- The current call’s payload is structurally fine
- The current call’s payload is not too large in isolation
- The combination of the existing chain state + this payload is what trips the limiit
What we ruled out by testing
We built a synthetic harness using client.interactions.create() directly to isolate
variables. None of these reproduced the error:
- Single-call payload size up to ~1.4MB of realistic content (5MB+ for repetitive
ASCII, fails at ~10MB) - Multiple “dangling” tool calls (function_calls without matching function_results
from prior turns) — tested with 10 accumulated - Tool schema bloat — tested with 30 tools defined
- Long chain length alone — tested with 30 clean buildup turns
- Combination of dangling calls + 30-turn chain + actual production result content
(~126KB) — also passed - Specific content shape — production results scan clean (no NaN/Infinity, no
control chars, valid UTF-8, all ASCII)
The published 1M token window appears to apply to model context, not this
gateway-level rejection. We hit the 400 well below 50% of 1M tokens by any
reasonable measure.
What we suspect
Cumulative chain state stored server-side via previous_interaction_id reaches a
threshold the API’s gateway enforces before invoking the model. The gemini-cli
project compresses chat history at 50% of context, suggesting Google’s own tooling
treats this as a known constraint — but it’s not documented for the public
Interactions API.
We’re guessing reasoning tokens (thought outputs from Gemini-3) accumulate alongside tool results and agent text, contributing significantly to chain size that the user can’t directly measure or control.
Questions
1. Is there a documented or undocumented per-chain content/token limit on stateful
interactions? If so, what is it?
2. Do thought outputs (reasoning tokens) get stored in the chain and replayed on
every previous_interaction_id continuation? They show up in interaction.outputs and
we suspect they contribute to chain growth.
3. Why is the error message generic (“Invalid input received.”) instead of
specifying which limit was exceeded? Standard Gemini API errors usually include
“context length exceeded” or similar.
4. What’s the recommended pattern for long-running chats? Should harnesses always
implement client-side summarize-and-restart, or is there an API-side compression
mechanism we’ve missed (e.g., thinking_level impacting stored vs replayed thoughts)?
5. Is interaction.usage.total_thought_tokens queryable on retained interactions?
That’d let us measure how much of the chain is reasoning.
Why this matters
Generic 400s on a stateful API with no actionable error message make these
production failures hard to diagnose — we spent significant time bisecting before
getting close to the cause. Better error messaging (specifying which limit was hit)
would be a major DX improvement, even if the limit itself isn’t raised.
Happy to share more details, additional repro context, or test against suggested
workarounds.