Is Gemini 3.5 Flash Actually an Improvement?

DIANAxAKKO · May 26, 2026, 3:51am

Well… I was really looking forward to using Gemini 3.5 Flash. Before this, Gemini 3 Flash already helped me solve a lot of small problems perfectly.

But 3.5—even though you claim it’s better—doesn’t feel that way in practice. It couldn’t even fix something as simple as a list width, and instead kept changing unrelated parts of my project.

As a result, my token usage dropped from 100% to 60% just because of trying to fix a single list width issue.

Is it really better than Gemini 3 Flash?

Andrew_Royal · May 26, 2026, 10:42am

Your feedback will be deleted very soon. Yesterday I also posted that Flash 3.5 is terrible — the same issues are much worse than Cursor Compose 2.5. They can’t solve the model’s problems, but they can solve the people who raise the problems.

BReal · May 26, 2026, 1:24pm

Agree, gemini 3 flash was much better, it made less mistakes and at least even after it made mistakes you could keep using it to fix them, now with 3.5 it makes mistakes and then rate limit hit good luck you don’t have an app anymore LOL… I don’t even know what to say, other than just don’t use 3.5 flash for anything other than for basic non-coding related stuff

Marcelo64 · May 26, 2026, 2:08pm

In my opinion and the way I work its much faster and the results are much better overall. The new 3.5 is very agent oriented, thats probably why is performing very well with my optimized harness. Its super fast. I use (medium) for 90% and (high) for deep planing.

HUTAOSHUSBAND1 · May 26, 2026, 5:14pm

Simple answer: No it is not better.

Usage Limits - Worse

Unrelated Changes - More

Mistakes - More

It is just faster , but it does not matter when the price is much more and the outputs are not better.

In my projects it also even lies about having finished something and often it does not follow instructions

Yasin_Hassanien · May 26, 2026, 9:00pm

Yes its better I am using claude and codex and i noticed my usage on flash 3.5 is more now due to speed so i didnt like v1 it was the worst product ever

i think 2.0 is better still bugg it hangs i have to close ti but it is better and missing too many features other agent coders is better in handling

specially a missing feature like fork

Andrew_Royal · May 27, 2026, 12:33am

I often open the same project in different IDEs and have different AIs solve the same problem. Without direct comparison, you really won’t notice the differences. I compared Gemini 3.5 Flash and Cursor Compose 2.5 on the exact same issue. Cursor Compose 2.5’s solution was excellent, while Gemini 3.5 Flash was very poor — it only stayed on the surface and didn’t understand the essence of the problem at all. The gap between the two is enormous, not to mention when compared to GPT 5.5 and Claude 3.7.
You can also try it this way yourself, and you’ll notice the difference.

tty · May 27, 2026, 2:28am

It feels average. The quality is so good that Google may not be able to withstand heavy use.

DIANAxAKKO · May 27, 2026, 3:16am

After using it for a while, I did notice that Gemini 3.5 Flash feels faster.
With the old Gemini 3 version, I used to ask it separately to do things like commit, push, and create merge requests. But with 3.5, it seems to remember that I often ask for all three actions together, so now even if I only say “commit,” it sometimes goes ahead and does the push and merge request as well.

For this kind of behavioral change, I can just adapt to it over time, and I understand that in some situations it can actually be more convenient.

However, when I simply want it to modify my code, it actually feels worse than the old Gemini 3 version. It keeps making endless unnecessary changes to unrelated parts of the code. So from my experience, compared to Gemini 3, the 3.5 version actually feels worse at writing code, while also becoming more overprotective and meddlesome.

At the very least, its coding ability shouldn’t be worse than Gemini 3. My main use case isn’t just commit, push, and merge requests.

pwy1984 · May 27, 2026, 9:00am

Right now, I’m spending half an hour working, and then taking a break for four and a half hours.

nishtyack · May 27, 2026, 12:19pm

Count me in. I’ve come to the same conclusion: 3.5 Flash is practically useless.
At first, I thought I just needed to get used to it. But no matter what I try—even the “Low” variant that’s supposed to be smarter for simple tasks—it just burns through my Pro plan limits without actually solving anything. It acts busy but delivers very little.
Meanwhile, I switched back to Gemini 3.1 Pro in Antigravity. The difference is night and day. 3.1 Pro is much more careful with token usage and actually helps me solve real problems. It doesn’t pretend to work—it just works.
And the rate limits are just the final insult. The move to 3.5 Flash introduced extremely low limits that completely disrupt workflows. I’ve had days where I couldn’t work at all because my quotas were drained by a model that couldn’t even fix a simple issue.
For any serious work, I now rely on other models in the browser. And when I do use Antigravity, it’s 3.1 Pro all the way. It’s more expensive per token on paper, but in practice, it’s actually cheaper because it gets the job done in a fraction of the attempts.
Bottom line: 3.5 Flash just drains quotas and pretends to work. 3.1 Pro gets things done.

BReal · May 27, 2026, 2:18pm

Composer 2.5 in cursor is ultra cheap and works 10x better than gemini 3.5 flash, the difference is night and day. And composer 2.5 costs like 20 times less than gemini 3.5 flash.. like how is this even possible Google?? I really don’t understand

DrQwertySilence · May 27, 2026, 5:37pm

Maybe the next step for them is Gemini 3.5 Flash Lite (I hope it is somewhat like the old Gemini 3 Flash)

Aditya_Pagare · May 27, 2026, 6:17pm

Honestly saying it totally depends on your use case, what I have observed is Gemini Flash models are such models which are really good at rapid writing and rapid code executions with maybe hallucinations Bec they aren’t supposedly build for massive reasoning yet the model itself is good but the rate limits which users get is poor.
Honestly at scale of what you comparing you could rate , I mostly use it to make my prompts better, code corrections and smaller tasks it really performs good on such where you don’t actually need outperforming reasoning!
And I see most people find actually the eco system or the infra terrible like rate limits and quotas but the model isn’t problem honestly all the time its honestly google poor management which should take actions to improve things .
But Yes obviously there are improvements in Model but still weak in the infrastructure and long-term-connection with USER-BASE.
So overall 7.5/10 .

DIANAxAKKO · May 28, 2026, 1:22am

I’ve always used the Gemini Flash series for simple tasks. So when I was using version 3, I was very clear about what kinds of tasks it could handle perfectly.

Today I upgraded to 3.5, and I expected it to be at least as good as version 3, or even better. But I only asked it to modify the list width of the ejs-dropdown-tree component in the EJ2 package in my old project. My requirement was very simple: I just didn’t want it to stretch across the entire div.

I even clearly pointed out the exact location in the code and explicitly described my requirement. If it were version 3, I would at least see some visible changes on the UI. Even if it didn’t get it right the first time, I could usually feel some improvement after the second attempt.

But in the case of 3.5, I asked it to fix this width issue 3–4 times, and I still couldn’t see any real change. I even started to suspect that I had pointed to the wrong part of the code, but unfortunately, I hadn’t. In the end, I had to switch to another model to get it done.

From my understanding, this kind of task should be considered a small, straightforward one. I didn’t even ask it to review or refactor all ejs-dropdown-tree instances in the entire project—just a single location.

So from my experience, it didn’t feel like it had better coding ability than version 3. In fact, it felt like a downgrade in this specific case. That said, it might have improved in other areas that I simply didn’t notice, since it does seem to “think” faster on the surface.

Aditya_Pagare · May 28, 2026, 5:05am

You actually touched on something very important here, and honestly I think many people misunderstand how these newer “fast” models behave internally.

When you said it was an old project, that already changes the difficulty a lot.

Modern coding models do not only look at the exact line you mention. Most of them silently build context from:

nearby files
imports
component hierarchy
dependency usage
existing patterns across the repo
UI structure
sometimes even inferred architecture decisions

So even if your actual edit is tiny, the model may still be processing a surprisingly large contextual graph behind the scenes.

That is where the difference between “fast optimized models” and “deep reasoning models” starts becoming very noticeable.

Models like Flash are usually optimized around:

lower latency
fast token generation
lower compute cost
responsiveness at scale

And because of that, they often prioritize reaching a plausible answer quickly rather than deeply exploring multiple implementation paths internally.

So what happens in real-world coding tasks is:
the model sometimes locks onto the first “likely” interpretation of the issue and keeps making shallow adjustments around it instead of re-evaluating the deeper UI/component behavior.

That is why you felt like:
“it is changing things, but not actually solving the problem.”

And honestly, that feeling is valid.

Because for coding workflows, especially in older projects, developers care less about flashy speed and more about:

stability
correct context understanding
low hallucination rate
respecting existing architecture
and iterative improvement quality

A coding assistant does not need to be “innovative.”
It needs to be dependable.

That is why many people still prefer stronger reasoning models for development work even if they are slower.

I also think Google should focus more on this balance instead of only pushing “faster” experiences. Flash-class models should ideally become:

fast
lightweight
reliable
low-hallucination
and context-aware

because that is exactly the category most developers will use for daily practical coding tasks.

Right now, sometimes it feels like speed is dominating the optimization target more than understanding depth.

One suggestion that genuinely helps with Flash models though:
instead of only saying “fix this width issue,” try forcing the reasoning boundaries tighter.

For example:

specify expected final CSS behavior
ask it to inspect only one component path
tell it NOT to modify unrelated logic
ask for root-cause analysis before code generation
or ask it to explain why the current width behavior is happening first

Fast models usually perform much better when the search space is aggressively constrained.

St_Michael_the_Archa · June 1, 2026, 4:52pm

I honestly have not had any issues. It has been doing well for me when I have used it in AI Studio.

jt_d · June 1, 2026, 5:35pm

I never used Gemini 3 so I guess I don’t have a benchmark but 3.5 flash is the main model I use, it does everything I ask pretty well in my opinion.

And I’m not trying to be mean here but please don’t tell me we are using AI to change the width of a list or commit/push etc, that seems like a waste of usage.

DIANAxAKKO · June 2, 2026, 1:36am

Yeah, it’s a task that’s simple to the point where I could even do it manually—just adjusting a layout.

But on the other hand, if it can’t even handle tasks that simple correctly, how am I supposed to trust it with something like going through the entire project and standardizing all the list styles later on? Especially since Flash is originally designed for these kinds of small, simple tasks anyway. For more complex work, I’d use a different model.

Also, with commit/push, it’s actually pretty helpful because it writes a very detailed summary of the changes for me. For someone like me who’s too lazy to write commit messages, that’s a huge benefit. Sorry about that—my laziness leads to not writing proper change logs, which probably wastes usage.

That said, when I used it yesterday, 3.5 actually felt smarter. I gave it some slightly more difficult tasks and it handled them perfectly.

jt_d · June 2, 2026, 6:56am

I .. at UI stuff so when I went back to my old project with using AI now, I asked Gemini to update my UI and add in light mode/dark mode and to be fair, it done a good job in my opinion, the only issue I had was going to a different model (can’t remember exactly what) and asking it to update another section that I had some mis matching of styling. Maybe that’s to be expected.

Topic		Replies	Views
What did Gemini 3.5 Flash actually upgrade? Its capability is still below Cursor Composer 2.5 Google Antigravity models , gemini	1	174	May 25, 2026
I now know why Gemini 3.5 is called flash! Google Antigravity feedback	17	1174	May 31, 2026
Very dissapointed with new token cycle Google Antigravity feedback	10	821	June 23, 2026
3.5 Flash worst model and worst IDE for Coding with even worse limits Google Antigravity feedback	1	223	May 27, 2026
Feedback for Antigravity team. Gemini 3 Flash Google Antigravity gemini	8	289	April 6, 2026

Is Gemini 3.5 Flash Actually an Improvement?

Related topics