Two weeks later - And What I Did in the Interim - An Ode to Code Quality

I finally got fed up with antigrav a couple of weeks ago - the rate limit issues were intolerable. Popped back to ask if things have improved?

What I’ve been doing in the meantime.
This is not a promo of any service, just a little feedback. And my pov on this is, I used to be a coder, but haven’t got the mental capacity for it anymore, so writing an app to sort out family photos for my father-in-law for fun, relying totally on ai to do the grunt work.

I’ve got a chatGPT account and they’ve released a product called codex which uses my existing, underutilised credit. It was really easy to point it at my antigrav folder and pickup the project. But I found it was quite slow doing fairly simple changes. So I got it to scan over the whole project that antigrav had produced, and look for way to improve the speed.
The results were not great.
It was at that point I realised that neither antigrav or codex produce code out of the box that has any measure of quality. It’s quite astonishing that, we now have the potential to write really great quality code everything, everywhere and we’re choosing to shrug shoulder and not. And don’t think that adding ‘always write good quality code’ to your custom instructions will fix it. The intelligence might be artificial but it’s also pretty dumb.

I then spent a couple of days getting it to rework the project to be more ‘ai friendly’.

I then thought about how to stop things getting in such a mess again. Installed the sonar extension and everything lit up like a christmas tree. So I delved into linting.

Found a couple of Rust linters that are much faster than ESlint - great - but also not as good coverage - boo. But found config for ESLint (and other js linters) - hooray! - but that made the js Linting sloooow - boo!
One of the main tools for making things better for ai seems to be putting limits on file/function size and complexity. That seems to be helpful because the ai can be more targeted making changes and, how shall I put this, not have a fit and randomly change stuff that’s nothing to do with the current change.
I diligently applied the linting across the project, and after another couple of days of codex finding and fixing, I had the 'perfect project’.

Then I started adding functionality. That’s when things got nasty. At the heart of my app is a workflow system. It’s there so I can have very slow running jobs like face detecting running in the background, but keep me up to date with what’s happening. As I asked Codex to produce a nice visual interface for the workflow, it became apparent that what antigrav had previously cooked up was, to be tactful, sub par.

Another couple of days of building new workflow system that didn’t totally suck. Another day getting it to remove the old workflow code that still wasn’t removed. I finally get back to a full, working workflow and press GO. Doesn’t work. It failed to migrate the functionality from the old workflow to the new, but it is able to go back through the repo pull out the deleted code and resurrect it. Press GO - doesn’t work. But I’ve now got visability of the workflow? Yes? Nope.

So I’ve replaced one truly exasperating experience with another. After yet another two weeks and consuming the water and power of a small town, it’s no further forwards. I mean, it is, but the goal I originally had is still not done. I discovered SKILLS in the meantime - they’re fun!

Maybe I’ve come to this all wrong. I was offered the ability to build apps without getting my hands dirty and yet the last few months have been 10% fast, exciting, build and 90% tedious unpicking of what the AIs have done. Am I doing something wrong, or does nobody else care about code quality any more?
More than that, when you look at the way that the ai’s build stuff, it’s just so dumb. It’s painful to watch. They generate code, they lint it, they rework it, they lint it, rework, lint…. We shouldn’t be accepting that. These things should be able to produce amazing code quality out of the box, without linting, or special instructions. They should be better than the best of us, not worse than the worst of us*. (*that’s me. If an AI is a worse code than me, that’s a big red flag). To be useful, efficient, an AI should understand our codebases, like really understand them. It doesn’t need every line of code to do that, just an awareness of the structure, the conventions, architecture, design, intention and vocab. It shouldn’t be throwing stuff at a wall to see what sticks.

It scares me the companies are using this stuff, and sacking coders as a result. Code quality was never great, but what’s the landscape going to look like in a couple of years time, when this drivel is everywhere.

I’m not writing this to have a moan, but out of genuine concern. We have a moment to raise the bar on coding, forever. But we need to choose that, demand that. I don’t care if it’s Google, OpenAI or somebody else who

1 Like

honestly this is a really well-put writeup and i relate to pretty much all of it

the linting loop problem is something i’ve noticed too — the ai will lint, rewrite, lint again, and somehow end up further from where it started. it’s like watching someone reorganize a drawer by dumping everything on the floor first

the thing about file/function size limits is genuinely one of the more useful things i’ve landed on. smaller, focused files = more predictable AI edits. big sprawling files are basically asking for chaos

the workflow system frustration… yeah. i’ve had the same experience where the AI builds something that technically works once, then you add one thing and the whole underlying structure falls apart because it was never really designed, just assembled

to your original question — things have gotten somewhat better on antigravity with the rate limits recently, but code quality hasn’t meaningfully improved. the intelligence is definitely more capable now but it still doesn’t “think architecturally.” it solves the immediate task at the cost of the bigger picture

i don’t think we’re doing it wrong. i think we’re just early and the tooling hasn’t caught up to the capability yet

I’ve been working with Antigravity itself over the last few days trying to tighten my own project design to improve agenic development. I specifically asked it how I could structure my project. Here are the rules we eventually settled on, which I added to my GEMINI.md file:

## Agent-First Development Principles

- **Atomic Modularity**: High-granularity, decoupled modules with single, verifiable responsibilities.

- **Context Encapsulation**: Minimize external dependencies; use descriptive structures.

- **Weightless Documentation**: Use **Active Metadata** and **Mermaid diagrams** instead of prose. A diagram is worth a thousand tokens.

- **Interface Contracts**: Document the "Contract" (Inputs, Outputs, Error States) at the boundary. If the contract is broken, the system is heavy.

- **Intent over Prose**: Prioritize explaining the *rationale* behind a design choice over describing what the code does.

- **Read-on-Demand**: Structure code to favor shallow execution paths. Avoid "Mega-files."

- **Agentic State Management**: Use directory-local rules and context files to provide "just-in-time" instructions relative to specific subsystems.

Note that the Claude models and Gemini models have substantially different communications styles, with Gemini being the more verbose. Verbosity uses more tokens. Ask Claude to provide a prompt to reduce Gemini’s verbosity. Here’s what it gave me (which I also added to .GEMINI.md):

## Response Economy

- **Lead with substance**: First sentence answers the question. Zero preambles.

- **Format hierarchy**: Table > bullet list > prose. Never use prose paragraphs for multi-point content.

- **No restatement**: Never echo the user's question or summarize what was just said in a closing paragraph.

- **Hard length check**: If a response can be 3 sentences, it MUST NOT be 6.

- **Apply, don't explain**: Never narrate why a rule exists. Apply it.

I’ve also been working to refine my rules and workflows:

As an elite Antigravity AI prompt engineer, audit .agents/rules and .agents/workflows and propose changes to minimize agent context and cognitive burden and token use. No sacred cows. Reply only, make no changes.

Then (repeat until satisfied):

As an elite Antigravity AI prompt engineer, audit .agents/rules and .agents/workflows for gaps and propose fixes. No sacred cows. Reply only, make no changes.

Then:

As an elite Antigravity AI prompt engineer, review .agents/rules and .agents/workflows. Propose tools to offload work from the agent. No sacred cows.

2 Likes

Hello,

Thank you for bringing these concerns to our attention. Please be assured that I have shared your feedback with our internal team for further review.
We appreciate your continued patience as we work to enhance the Antigravity experience.

i’ve been doing like, multiple different services so i can leverage the different strengths.

starting projects just building a rough idea of what major tools, scaffolding, brainstorming : gemini and copilot and grok are great for that. they all can give you the rough starting points that can be saved just as a thick “notes” file for bringing a fresh root context to a future builder ide. but - they are doing it one character at a time, not really acing on the “edit” portion on refactors for true builds. so i usually don’t even scaffold unless i LOVE a UI they toss out there. when they get a projects diffing bootstrap monitor started and a working CLI, the project is off to a great start and usually has high efficiency no matter what else comes later.

antigravity and gemini flash is a powerhouse and FAST but, not really a closer. the last 5-10% of any project context is lost and it has a tendancy to accidentally CTRL-A DELETE entire sheets haha .. ha.. (it still hurts to think about). anyways - anti is great for the boring cruise control section of the roadtrip the long haul stuff and getting it done quickly. i’ll say pro kinda thinks too much and in a bad way - overthinks a lot… needs a lot of handholding, and needs a lot of freedom too. its like having a cat haha.

very similarly but differently - google’s jules is really cool! though they are broken for me right now on uploading i can connect a github or google drive link to get it moving :smiley: long haul trucker, they can grind days - but no warnings! if they max out on context, the whole line is gone. and its common too where - they may not even pop up a download button so git connection is almost necessary. its more like “the slow cooker”. good spot where if you want a build cleaned up, refined, evaluated, or just need to back off a project because you’re going crazy - good way to pause and let it do its thing.

claude and kimi - amazing closers, they are really good at absorbing a project, understand the real context of the project, and they are incredible when it comes to that final piece of “uh oh i messed up the ui 2 weeks ago lets try to get that done before we move on”. starting from scratch i haven’t tried with them - because i try to save those uses for when i need that real mixture of experts to swarm a completion and when stability is at risk.

trae has been a pretty good ide too - though i haven’t tried it much since they dropped gemini3 around the 3.1 update. they do have kimi in there now, so my trust goes up - if i had to tokenburn to get something done just throwing money at it, trae is that club for that with great credit usage tracking and monetary stats like that. for general purpose stuff - great for the newbies but i think for large projects it falls a bit short. their ide is great, and i really need to check out how well it works as a standalone ide as a free user but that will come when i get my desk and inbox cleared, as i’m still digging myself out of the delay early this month.

so lastly, when it comes to debugging, i mostly enjoy classic LLM. really - like even those silly romance llm setups on android will break character to code stuff :smiley: local is great for debugging, but when it comes to doing any repairs at this stage onward, i’m doing it by hand or with claude or kimi. but, i do still utilize classic LLM like google ai mode on the homepage or grok or copilot. its great to get fresh eyes, but it can be tough since with the extra language if they are TOO supportive they will help you destroy a project O_O so always get a second opinion and screen it through another ai :smiley:

thats just kinda how i’ve found works for me - by leveraging the weaknesses to become assets and leveraging strengths so i can utilize the most efficient and effective economical use of tokens/context/credits/request workflow, its essentially necessary. this also has an added benefit where, just because one ai is down - the whole workflow isn’t disrupted! gemini i’ve been really happy with as a swiss army knife of getting everything done, but if i need an impact wrench i’m probably going to rely on a higher context model, and if i need a birds eye view, i’m asking them all just as LLMs for their review and feedback !

EDIT:
OH i almost forgot to say! SAVE OFTEN and i don’t mean save i mean MAKE BACKUPS REGULARLY. thats one of the perks of using multiple different ai in a workflow - by repackaging it to move it to a new folder for a new AI, you get a great active changelog so if any portions of code are missing, or you need to diff “what changed from v1 to v9 that broke this” you’ll have it already.

1 Like

thanks Steve, I’ll have a look through the detail of this soon. Isn’t gemini chatty though?! It’s worse than my nan!

I started on v1 of this project back when antigrav was first released, no rules, just outlined what I wanted and let it do it’s thing. To start with it was good, but it got to a point and suddenly started really struggling - changes became slow and unreliable. It’s behaviour I’ve seen in ai.studio and Windsurf too. I guess it’s when the codebase becomes more than trivial, more than the size of the ai’s context window, the ai rapidly loses its overall sense of the project.

While using codex, I’ve come across a couple of things potentially will transfer back into Antigrav to help.
Skills - I’ve not noticed direct Antigrav support, by the docs say it does use with ai standard. They’re tuned prompts that kick in to make standard tasks more structured. There’s several libraries of these around, with skills covering design to security.
Semantic understanding - I’m currently looking at CodeGrock, but there are other projects too. It establishes, via MCP, a local index of the project, code, docs etc. Aims to cut the ai’s searching the whole code base - and reduce inbound tokens. My laptop is currently struggling to do the indexing because it’s old and decrepit (I know that feeling!) but I’m really interested in the prospect.

So, v2 of the project, I started afresh, and added rules based on what I’d learned in v1. Also got antigrav to rewrite my rules to be most likely to be understood by itself. Initially the results were better, but again we hit the cliff-edge.

Couple of things

1 Like

Thanks. Really interesting to read how you are working.

I can see a future where we’re assembling teams of autonomous agents from different providers and watching over them work on the bits of projects they are good at.

I’ve also had disaster in the last 10% - I suspect it’s the cliff edge I mentioned above, where project size/complexity overwhelms the ai and it melts down.

I must have a proper look at claud and kimi too (not heard of that one).

2 Likes

Ask the AI how to structure your codebase for agent-first development.

in welding, there’s a term called a “jig”. what a jig does, is its a custom tool made just for accomplishing one task. technically the term exists in a lot of skill trades - but welding is the bluntest example.

like if a welder needs a wrench in a place that goes in such a strange angle, normal solutions don’t work - they will just weld a few pieces of metal as a custom handle to get to that tricky spot. sometimes more specific or more vague - but the point is, its the practice of creating tools to work with existing tools (or nonexistent tools). ai coding llm has a tendency do to this a lot anyways, so i will often encourage it to create a one time tool to accomplish a specific task, and sometimes that ends up turning into a whole thing that has a wider use on any project.

this is one of my diffwares for example - it has the classic New Same Lost of a normal diff but it also checks for “Void” code where a codeblock is just a lonely island not directly connected to any chain/flow. i did this when i was using the raw gemini chat llm to do coding in conversation mode without an ide so i could keep track of when the project starts truncating or changing things. stuff like this is just absolutely super necessary especially when going for nonIDE llm code tracking.


(diff map, folding tree, assembles by tracing and nesting runtime)

this one is a bootstrap, it loads the integrity monitor first on launch, so i can see the diff stats of whats unchanged (blue bar) , newly added or changed and expanded (green bar) , and removed or changed with negative loss (red bar).


(bootstrap example, known elements tracked, unknown values noted, simple on launch overview and higher detail view for debugging specific changes)

and this last one is a massive diff for working with all sorts of different documents. its got its own selftest to verify its java and html isn’t changed from the last programmed “bill of good health”. its got a lot of other stuff too thats still being made for different uses like version checking, patch rollbacks on the target files, changelog generator, even some light code assist, as well as unique code evaluations and statistics.


(full custom diffing and editing, with different stats for code context awareness)