I now know why Gemini 3.5 is called flash!

buffos · May 22, 2026, 1:47pm

I am on an Ultra workspace subscription account. Gemini 3.5 consumes tokens faster than any model. 5 bars. One for each hour. I have a ticketing system for issues to be resolved. One issue, → one bar.

It’s the Flash of models in token consumption.

Using Opus 4.6, a more expensive model, will get me 5 issues resolved per bar (and more).
Flash is faster, yes. But the token consumption makes it unusable.

The models are pretty good nowadays, and for normal, subcritical tasks, they are pretty much equivalent. I do not care to one-shot create a macOS replica or a compiler. I want to be able to complete a 5-hour work interval one task at a time.

If Flash was 5x faster, then consuming 5x the tokens of Opus would have some sense. But it’s far from that.

BReal · May 22, 2026, 3:00pm

No, it would not make sense even if it was faster. the speed is irrelevant, you can output a billion tokens in 1 second but if the result equal what another model does with 1/20th of the tokens, in half the cost, it makes no sense still.
A flash model should NOT be so expensive, no matter what point of view you see it from. If it costed 5x less then it’d be an amazing model, but as it is, it’s just unusable for most people. Maybe very rich people can afford to play with it, but most normal people are not even close to being able to afford paying so much. It’s not that much better than the previous 3 flash, and to complete the same tasks it costs like 20x more tokens, just nonsense model

buffos · May 22, 2026, 3:15pm

I am not talking about token speed but “ticket completion speed.”
Model A completes the task. A perfectly in X time costing N
Model B completes the task. A perfectly in 5X time costing N

From a cost per task completion, they are equivalent. You can even say the fast model is better, because it gives you the option to do more on the same time (but spend more)

The problem is that this analogy (or counter-analogy) does not hold between Opus and Gemini Flash 3.5

BReal · May 22, 2026, 3:52pm

I think token speed is irrelevant. Maybe it can be relevant for very rich people, but the vast majority of users do not have the money to afford “spending more to do something faster”, most people only have a set budget, and want to be able to do as much as possible, as well as possible, in whatever time it takes. The difference in speed is maybe only relevant for companies doing a critical speed sensitive business but what critical business would ever use a flash model? I don’t think there’s any..
Anyways the point being, for coding specifically, people are more than ok to wait 5 more minutes for a task, but pay 5x less.
Completing the same task in 5x time should not result in higher price, even if it was applicable to 3.5 flash, and like you said it is not applicable

buffos · May 22, 2026, 5:23pm

That is definitely not true. We all care about speed. Else, they can give us slow Gemini, which does the work x5 slower, and there is no quota issue anymore. Everything is back to normal.
OR you can simply work for 1 hour and rest the next 4, since the result is the same as with a “slow” Gemini.

The amount of tokens generated during reasoning is a very important metric of AI Agents.

BReal · May 22, 2026, 5:30pm

only if you have endless money, if you’re a normal guy that can only afford 20 dollars subscription, you do not care about speed. They can give the option to pay 5x more for speed if they want, but I’m sure everyone was ok with the speed of the 3.0 flash model, it had no speed problems whatsoever, unlike the 3.5 which is a little faster but ends up costing 15x more for the same task. You wrote that you’re on ultra workspace, so you paid 100 or 200 dollars for this, well let me bring you up to speed with real world, most people can’t afford to pay 200 dollars subscriptions for AI.

buffos · May 22, 2026, 7:13pm

Its not about money. Let me give you an analogy.
You can drive a car that gets you from point A to point B in 10 minutes and costs 100 dollars
And you can go with another car that goes from A to B in 1 hour and costs 100 dollars.

You get the same job done, just faster, but you wait 50 minutes until you continue.

With the slow car , you just have the feeling you are doing something all the time. Its not about money. Both cost the same. The first car you have to drive 10 minutes and wait 50. The second drive all 60 minutes.

If you have money, you can go on with the fast car and produce x6 times more BUT if you dont the result is the same (plus free time)

Abraham2 · May 22, 2026, 7:32pm

That is assuming your model listens to you and just doest go off like a model on a mission and changes everything before you know what has happend and im going back to my git and starting from scratch but now with 0 credits. not just for flash. but for all models cause they pooled them together.

BReal · May 22, 2026, 7:49pm

Yes of course speed at no additional cost is great, but I don’t think these models give you real speed. It has speed in producing tokens, but what those tokens are is complete garbage compared to proper models, so in the end it’s actually much slower than what it looks like. AND at the same time, this is not speed at no additional cost, because the additional cost here is 15-20x more expensive than the previous flash model, and maybe only slightly better in some occasions. It’s a model made to max benchmarks and throw smoke at people’s eyes with fast token output. Pretty sure nobody at google is using 3.5 flash to output any production code, let alone using antigravity. They use codex

buffos · May 22, 2026, 9:31pm

Two things happened.

the changed the token estimation to compute and that is a dramatic increase overall
Flash generates too many tokens while reasoning

Both combined => current problem

If this is not resolved soon, I will cancel my Ultra and move on to another platform

Mark_Ariail · May 22, 2026, 10:09pm

Before the update disaster today, I was using Gemini Pro, Claude Sonnet for “brain work” and then I was using Gemini Flash as the workhorse for more basic processing, as it refreshed every 5 hours. What do all of you believe is the workhorse now? Reading the messages above leads me to believe that maybe Gemini 3.5 Flash (High and Low) may use more tokens than Gemini 3.1 Pro (High or Low). As a programmer, my brain thinks in a logical flow and maybe I’m just a dumbazz but I can’t even figure out the best way to spread the usage between models to enable continual project workflows. I’m a Google AI Pro user and I truly want to use AI to enhance my workflow and not necessarily do all of the work. Any comments would be great! Thank you ~ Mark

buffos · May 22, 2026, 11:21pm

What I can tell you now is that even simple vibe coding, with no multi-agent tasks, in ultra subscription, will exhaust all models. No model goes for more than 2 hours of work. Gemini 3.5 depletes faster than all the other models.

I am strongly considering switching. I am not sure where, Cursor? Kimi? DeepSeek. I am not sure yet

DrQwertySilence · May 23, 2026, 12:01am

You know about the worst part of this model? Is fast, but gets most of the things wrong! we have custom MCP tools that are properly described but this model is incapable of utilizing them correctly. Old Gemini 3 Flash had no problem with them. Claude has no problems with them. Wasted hours of my life today teaching an “intelligent” model how to use basic tools… and for some reason I think isn’t the model issue, it sniffs like the problem is Antigravity that cannot properly pass info to the model. Happened with Antigravity 1.0 (the token consumption between it and Gemini CLI was noticeable) and is happening with 2.0 again. Get rid of this product and let the community handle it.

BReal · May 23, 2026, 1:08am

heard cursor has nearly no limits, their new composer 2.5 is like ultra cheap and really good, I’m considering switching too, either cursor or codex

Mark_Ariail · May 23, 2026, 3:16am

Of course, there’s a hook to the new speed…see the small tooltip…“Limited time”. Another way to get our hopes up with real performance before telling us that it costs more. Getting very, very frustrated, Google!

UnitBuilds · May 25, 2026, 10:55am

Strange, I’m on an Ultra personal account (upgraded the weekend to the $100) and I’m struggling to hit my 5h quota (in a good way). And that’s across 3 pc’s and 6 projects at the same time.

UnitBuilds · May 25, 2026, 11:01am

The real issue with Antigravity and Google in general, is if you have a look at their terms of service. Gemini and Antigravity both state that Google has full right to your prompt and the model’s response. Not just for model training, in general! Which means if you’re working on anything groundbreaking, Google has every right to literally copy-paste the code your agents produce. Even on a paid account.

I used to use Gemini to scope a task, before passing it to Antigravity, 3 months of building an AI agent orchestrator, Google releases an oddly similar clone of it, called Gemini Builds. Obviously coincidence, why wouldnt they make something like that. But it prompted me into looking into the Terms of Service.

Theoretically, they can scrape all prompts for anything ‘interesting’ and pass it along to an internal dev to have a look at. That dev can say ‘yes, this is cool’ and you used to build your app.

BReal · May 31, 2026, 11:04am

Nah you’re being .. man, a company like google has product ideas that are very predictable. Of course they were going to build an AI agent orchestrator, basically every AI company is doing it. It is true that their TOS are problematic and complex, but it’s not just google, almost all these AI companies have really complex TOS.. If you wanna build something with AI that needs their apis you need to become first a lawyer to understand all the TOS of all their products, and then you can be a software architect/engineer and build the thing, and pray that in the meantime they don’t change their TOS again..

Topic		Replies	Views
Flash 3.5 is not a suitable replacement for Flash 3.0 Google Antigravity bug	2	433	May 20, 2026
Very dissapointed with new token cycle Google Antigravity feedback	10	816	June 23, 2026
About antigravity gemini quota Google Antigravity models , gemini	18	1003	May 27, 2026
3.5 Flash worst model and worst IDE for Coding with even worse limits Google Antigravity feedback	1	218	May 27, 2026
Is Gemini 3.5 Flash Actually an Improvement? Google Antigravity gemini-flash	28	1785	June 14, 2026

I now know why Gemini 3.5 is called flash!

Related topics