Proposal: Adjustable Token Streaming Speed (Output Throttling) for Better AI Control

Hi everyone,

I wanted to bring up a topic that has become especially relevant after the release of Antigravity 2.0. Many of us noticed a major shift in our workflow: without the familiar IDE environment visible in real-time, it often feels like we are losing tangible control over the codebase. And this isn’t just about old habits—it’s about human physiology and resource management.

Modern models (like Gemini 3.5) deliver incredible speeds, pumping out around 289 tokens per second. You can see exactly how overwhelming this looks in real life using this simulator: :backhand_index_pointing_right: Token Speed Simulator (289 t/s, Code Mode)

The Problem

This output speed is well beyond the threshold of human perception. Our cognitive “interface” (eyes and brain) simply cannot parse and review code at this rate in real-time. This creates two distinct issues:

  1. High-Speed Hallucinations: The model hallucinates and makes mistakes at that exact same blinding speed of 289 t/s. By the time a human realizes the context has gone off the rails, the agent has already burned through the budget and modified or broken dozens of files due to an initially misaligned prompt.

  2. Loss of Control and Fatigue: Developers inherently need to feel in control of their execution environment. Watching text fly by like a minigun is mentally exhausting. During intense 5-hour sessions, trying to track this chaotic flow leads to rapid cognitive burnout.

The Solution: Output Throttling Configuration

I propose adding a straightforward token output speed slider in the application settings.

Allowing users to cap the streaming speed to a comfortable range of 30–60 tokens per second matches the speed of natural human reading and comprehension.

Implementing output throttling addresses two major pain points simultaneously:

  • Restoring True Control: The developer can actually scan the incoming code, catch a flawed logical direction early, and hit Stop before the agent messes up the project infrastructure.

  • Optimizing Limits and UX: Users won’t burn through their strict rate limits and quotas within the first thirty minutes. Long sessions will become much more manageable and less stressful, drastically reducing the number of complaints about running out of daily token allowances.

Sometimes, to move faster and more reliably, we just need the option to slow things down. What do you think?