I am on an Ultra workspace subscription account. Gemini 3.5 consumes tokens faster than any model. 5 bars. One for each hour. I have a ticketing system for issues to be resolved. One issue, → one bar.
It’s the Flash of models in token consumption.
Using Opus 4.6, a more expensive model, will get me 5 issues resolved per bar (and more).
Flash is faster, yes. But the token consumption makes it unusable.
The models are pretty good nowadays, and for normal, subcritical tasks, they are pretty much equivalent. I do not care to one-shot create a macOS replica or a compiler. I want to be able to complete a 5-hour work interval one task at a time.
If Flash was 5x faster, then consuming 5x the tokens of Opus would have some sense. But it’s far from that.
No, it would not make sense even if it was faster. the speed is irrelevant, you can output a billion tokens in 1 second but if the result equal what another model does with 1/20th of the tokens, in half the cost, it makes no sense still.
A flash model should NOT be so expensive, no matter what point of view you see it from. If it costed 5x less then it’d be an amazing model, but as it is, it’s just unusable for most people. Maybe very rich people can afford to play with it, but most normal people are not even close to being able to afford paying so much. It’s not that much better than the previous 3 flash, and to complete the same tasks it costs like 20x more tokens, just nonsense model
I am not talking about token speed but “ticket completion speed.”
Model A completes the task. A perfectly in X time costing N
Model B completes the task. A perfectly in 5X time costing N
From a cost per task completion, they are equivalent. You can even say the fast model is better, because it gives you the option to do more on the same time (but spend more)
The problem is that this analogy (or counter-analogy) does not hold between Opus and Gemini Flash 3.5
I think token speed is irrelevant. Maybe it can be relevant for very rich people, but the vast majority of users do not have the money to afford “spending more to do something faster”, most people only have a set budget, and want to be able to do as much as possible, as well as possible, in whatever time it takes. The difference in speed is maybe only relevant for companies doing a critical speed sensitive business but what critical business would ever use a flash model? I don’t think there’s any..
Anyways the point being, for coding specifically, people are more than ok to wait 5 more minutes for a task, but pay 5x less.
Completing the same task in 5x time should not result in higher price, even if it was applicable to 3.5 flash, and like you said it is not applicable
That is definitely not true. We all care about speed. Else, they can give us slow Gemini, which does the work x5 slower, and there is no quota issue anymore. Everything is back to normal.
OR you can simply work for 1 hour and rest the next 4, since the result is the same as with a “slow” Gemini.
The amount of tokens generated during reasoning is a very important metric of AI Agents.
only if you have endless money, if you’re a normal guy that can only afford 20 dollars subscription, you do not care about speed. They can give the option to pay 5x more for speed if they want, but I’m sure everyone was ok with the speed of the 3.0 flash model, it had no speed problems whatsoever, unlike the 3.5 which is a little faster but ends up costing 15x more for the same task. You wrote that you’re on ultra workspace, so you paid 100 or 200 dollars for this, well let me bring you up to speed with real world, most people can’t afford to pay 200 dollars subscriptions for AI.
Its not about money. Let me give you an analogy.
You can drive a car that gets you from point A to point B in 10 minutes and costs 100 dollars
And you can go with another car that goes from A to B in 1 hour and costs 100 dollars.
You get the same job done, just faster, but you wait 50 minutes until you continue.
With the slow car , you just have the feeling you are doing something all the time. Its not about money. Both cost the same. The first car you have to drive 10 minutes and wait 50. The second drive all 60 minutes.
If you have money, you can go on with the fast car and produce x6 times more BUT if you dont the result is the same (plus free time)
That is assuming your model listens to you and just doest go off like a model on a mission and changes everything before you know what has happend and im going back to my git and starting from scratch but now with 0 credits. not just for flash. but for all models cause they pooled them together.
Yes of course speed at no additional cost is great, but I don’t think these models give you real speed. It has speed in producing tokens, but what those tokens are is complete garbage compared to proper models, so in the end it’s actually much slower than what it looks like. AND at the same time, this is not speed at no additional cost, because the additional cost here is 15-20x more expensive than the previous flash model, and maybe only slightly better in some occasions. It’s a model made to max benchmarks and throw smoke at people’s eyes with fast token output. Pretty sure nobody at google is using 3.5 flash to output any production code, let alone using antigravity. They use codex
Before the update disaster today, I was using Gemini Pro, Claude Sonnet for “brain work” and then I was using Gemini Flash as the workhorse for more basic processing, as it refreshed every 5 hours. What do all of you believe is the workhorse now? Reading the messages above leads me to believe that maybe Gemini 3.5 Flash (High and Low) may use more tokens than Gemini 3.1 Pro (High or Low). As a programmer, my brain thinks in a logical flow and maybe I’m just a dumbazz but I can’t even figure out the best way to spread the usage between models to enable continual project workflows. I’m a Google AI Pro user and I truly want to use AI to enhance my workflow and not necessarily do all of the work. Any comments would be great! Thank you ~ Mark
What I can tell you now is that even simple vibe coding, with no multi-agent tasks, in ultra subscription, will exhaust all models. No model goes for more than 2 hours of work. Gemini 3.5 depletes faster than all the other models.
I am strongly considering switching. I am not sure where, Cursor? Kimi? DeepSeek. I am not sure yet
You know about the worst part of this model? Is fast, but gets most of the things wrong! we have custom MCP tools that are properly described but this model is incapable of utilizing them correctly. Old Gemini 3 Flash had no problem with them. Claude has no problems with them. Wasted hours of my life today teaching an “intelligent” model how to use basic tools… and for some reason I think isn’t the model issue, it sniffs like the problem is Antigravity that cannot properly pass info to the model. Happened with Antigravity 1.0 (the token consumption between it and Gemini CLI was noticeable) and is happening with 2.0 again. Get rid of this product and let the community handle it.
Of course, there’s a hook to the new speed…see the small tooltip…“Limited time”. Another way to get our hopes up with real performance before telling us that it costs more. Getting very, very frustrated, Google!