Is Gemini 3.5 Flash Actually an Improvement?

Well… I was really looking forward to using Gemini 3.5 Flash. Before this, Gemini 3 Flash already helped me solve a lot of small problems perfectly.

But 3.5—even though you claim it’s better—doesn’t feel that way in practice. It couldn’t even fix something as simple as a list width, and instead kept changing unrelated parts of my project.

As a result, my token usage dropped from 100% to 60% just because of trying to fix a single list width issue.

Is it really better than Gemini 3 Flash?

Your feedback will be deleted very soon. Yesterday I also posted that Flash 3.5 is terrible — the same issues are much worse than Cursor Compose 2.5. They can’t solve the model’s problems, but they can solve the people who raise the problems.

Agree, gemini 3 flash was much better, it made less mistakes and at least even after it made mistakes you could keep using it to fix them, now with 3.5 it makes mistakes and then rate limit hit good luck you don’t have an app anymore LOL… I don’t even know what to say, other than just don’t use 3.5 flash for anything other than for basic non-coding related stuff

In my opinion and the way I work its much faster and the results are much better overall. The new 3.5 is very agent oriented, thats probably why is performing very well with my optimized harness. Its super fast. I use (medium) for 90% and (high) for deep planing.

Simple answer: No it is not better.

Usage Limits - Worse

Unrelated Changes - More

Mistakes - More

It is just faster , but it does not matter when the price is much more and the outputs are not better.

In my projects it also even lies about having finished something and often it does not follow instructions

Yes its better I am using claude and codex and i noticed my usage on flash 3.5 is more now due to speed so i didnt like v1 it was the worst product ever

i think 2.0 is better still bugg it hangs i have to close ti but it is better and missing too many features other agent coders is better in handling

specially a missing feature like fork

I often open the same project in different IDEs and have different AIs solve the same problem. Without direct comparison, you really won’t notice the differences. I compared Gemini 3.5 Flash and Cursor Compose 2.5 on the exact same issue. Cursor Compose 2.5’s solution was excellent, while Gemini 3.5 Flash was very poor — it only stayed on the surface and didn’t understand the essence of the problem at all. The gap between the two is enormous, not to mention when compared to GPT 5.5 and Claude 3.7.
You can also try it this way yourself, and you’ll notice the difference.

It feels average. The quality is so good that Google may not be able to withstand heavy use.

After using it for a while, I did notice that Gemini 3.5 Flash feels faster.
With the old Gemini 3 version, I used to ask it separately to do things like commit, push, and create merge requests. But with 3.5, it seems to remember that I often ask for all three actions together, so now even if I only say “commit,” it sometimes goes ahead and does the push and merge request as well.

For this kind of behavioral change, I can just adapt to it over time, and I understand that in some situations it can actually be more convenient.

However, when I simply want it to modify my code, it actually feels worse than the old Gemini 3 version. It keeps making endless unnecessary changes to unrelated parts of the code. So from my experience, compared to Gemini 3, the 3.5 version actually feels worse at writing code, while also becoming more overprotective and meddlesome.

At the very least, its coding ability shouldn’t be worse than Gemini 3. My main use case isn’t just commit, push, and merge requests. :sweat_smile:

Right now, I’m spending half an hour working, and then taking a break for four and a half hours.

Count me in. I’ve come to the same conclusion: 3.5 Flash is practically useless.
At first, I thought I just needed to get used to it. But no matter what I try—even the “Low” variant that’s supposed to be smarter for simple tasks—it just burns through my Pro plan limits without actually solving anything. It acts busy but delivers very little.
Meanwhile, I switched back to Gemini 3.1 Pro in Antigravity. The difference is night and day. 3.1 Pro is much more careful with token usage and actually helps me solve real problems. It doesn’t pretend to work—it just works.
And the rate limits are just the final insult. The move to 3.5 Flash introduced extremely low limits that completely disrupt workflows. I’ve had days where I couldn’t work at all because my quotas were drained by a model that couldn’t even fix a simple issue.
For any serious work, I now rely on other models in the browser. And when I do use Antigravity, it’s 3.1 Pro all the way. It’s more expensive per token on paper, but in practice, it’s actually cheaper because it gets the job done in a fraction of the attempts.
Bottom line: 3.5 Flash just drains quotas and pretends to work. 3.1 Pro gets things done.

Composer 2.5 in cursor is ultra cheap and works 10x better than gemini 3.5 flash, the difference is night and day. And composer 2.5 costs like 20 times less than gemini 3.5 flash.. like how is this even possible Google?? I really don’t understand

Maybe the next step for them is Gemini 3.5 Flash Lite (I hope it is somewhat like the old Gemini 3 Flash)

Honestly saying it totally depends on your use case, what I have observed is Gemini Flash models are such models which are really good at rapid writing and rapid code executions with maybe hallucinations Bec they aren’t supposedly build for massive reasoning yet the model itself is good but the rate limits which users get is poor.
Honestly at scale of what you comparing you could rate , I mostly use it to make my prompts better, code corrections and smaller tasks it really performs good on such where you don’t actually need outperforming reasoning!
And I see most people find actually the eco system or the infra terrible like rate limits and quotas but the model isn’t problem honestly all the time its honestly google poor management which should take actions to improve things .
But Yes obviously there are improvements in Model but still weak in the infrastructure and long-term-connection with USER-BASE.
So overall 7.5/10 .

I’ve always used the Gemini Flash series for simple tasks. So when I was using version 3, I was very clear about what kinds of tasks it could handle perfectly.

Today I upgraded to 3.5, and I expected it to be at least as good as version 3, or even better. But I only asked it to modify the list width of the ejs-dropdown-tree component in the EJ2 package in my old project. My requirement was very simple: I just didn’t want it to stretch across the entire div.

I even clearly pointed out the exact location in the code and explicitly described my requirement. If it were version 3, I would at least see some visible changes on the UI. Even if it didn’t get it right the first time, I could usually feel some improvement after the second attempt.

But in the case of 3.5, I asked it to fix this width issue 3–4 times, and I still couldn’t see any real change. I even started to suspect that I had pointed to the wrong part of the code, but unfortunately, I hadn’t. In the end, I had to switch to another model to get it done.

From my understanding, this kind of task should be considered a small, straightforward one. I didn’t even ask it to review or refactor all ejs-dropdown-tree instances in the entire project—just a single location.

So from my experience, it didn’t feel like it had better coding ability than version 3. In fact, it felt like a downgrade in this specific case. That said, it might have improved in other areas that I simply didn’t notice, since it does seem to “think” faster on the surface. :sweat_smile:

You actually touched on something very important here, and honestly I think many people misunderstand how these newer “fast” models behave internally.

When you said it was an old project, that already changes the difficulty a lot.

Modern coding models do not only look at the exact line you mention. Most of them silently build context from:

  • nearby files

  • imports

  • component hierarchy

  • dependency usage

  • existing patterns across the repo

  • UI structure

  • sometimes even inferred architecture decisions

So even if your actual edit is tiny, the model may still be processing a surprisingly large contextual graph behind the scenes.

That is where the difference between “fast optimized models” and “deep reasoning models” starts becoming very noticeable.

Models like Flash are usually optimized around:

  • lower latency

  • fast token generation

  • lower compute cost

  • responsiveness at scale

And because of that, they often prioritize reaching a plausible answer quickly rather than deeply exploring multiple implementation paths internally.

So what happens in real-world coding tasks is:
the model sometimes locks onto the first “likely” interpretation of the issue and keeps making shallow adjustments around it instead of re-evaluating the deeper UI/component behavior.

That is why you felt like:
“it is changing things, but not actually solving the problem.”

And honestly, that feeling is valid.

Because for coding workflows, especially in older projects, developers care less about flashy speed and more about:

  • stability

  • correct context understanding

  • low hallucination rate

  • respecting existing architecture

  • and iterative improvement quality

A coding assistant does not need to be “innovative.”
It needs to be dependable.

That is why many people still prefer stronger reasoning models for development work even if they are slower.

I also think Google should focus more on this balance instead of only pushing “faster” experiences. Flash-class models should ideally become:

  • fast

  • lightweight

  • reliable

  • low-hallucination

  • and context-aware

because that is exactly the category most developers will use for daily practical coding tasks.

Right now, sometimes it feels like speed is dominating the optimization target more than understanding depth.

One suggestion that genuinely helps with Flash models though:
instead of only saying “fix this width issue,” try forcing the reasoning boundaries tighter.

For example:

  • specify expected final CSS behavior

  • ask it to inspect only one component path

  • tell it NOT to modify unrelated logic

  • ask for root-cause analysis before code generation

  • or ask it to explain why the current width behavior is happening first

Fast models usually perform much better when the search space is aggressively constrained.

I honestly have not had any issues. It has been doing well for me when I have used it in AI Studio.

I never used Gemini 3 so I guess I don’t have a benchmark but 3.5 flash is the main model I use, it does everything I ask pretty well in my opinion.

And I’m not trying to be mean here but please don’t tell me we are using AI to change the width of a list or commit/push etc, that seems like a waste of usage.

Yeah, it’s a task that’s simple to the point where I could even do it manually—just adjusting a layout.

But on the other hand, if it can’t even handle tasks that simple correctly, how am I supposed to trust it with something like going through the entire project and standardizing all the list styles later on? Especially since Flash is originally designed for these kinds of small, simple tasks anyway. For more complex work, I’d use a different model.

Also, with commit/push, it’s actually pretty helpful because it writes a very detailed summary of the changes for me. For someone like me who’s too lazy to write commit messages, that’s a huge benefit. Sorry about that—my laziness leads to not writing proper change logs, which probably wastes usage.

That said, when I used it yesterday, 3.5 actually felt smarter. I gave it some slightly more difficult tasks and it handled them perfectly.

I .. at UI stuff so when I went back to my old project with using AI now, I asked Gemini to update my UI and add in light mode/dark mode and to be fair, it done a good job in my opinion, the only issue I had was going to a different model (can’t remember exactly what) and asking it to update another section that I had some mis matching of styling. Maybe that’s to be expected.