Beyond Single-Turn AI: Architecture for Self-Correction

I’ve been trying to leverage the current models in a way that bypasses the current “meta” of models making everything in “1 turn”.

I think that models kinda need to have “time sensitivity” but currently everything a model knows is just one single “snapshot” of the problem you give it, and then it tries to infer everything from that, and even the “reasoning” models, they perform the whole reasoning before trying to tackle the problem, and have no idea of what itself outputted.

What i want to explore now, is models that “reflect on it’s own outputs” so it does not only “reasons about the problem at hand” but also “reasons about it’s own output”. So I’m trying to create an architecture that will “Frankenstein” some Gemini models together to break down a problem, work on it, evaluate the result, and refine it if necessary before sending an answer to the user.

Of course, i’m here deep in this path now, and i need some external feedback, or even know if smarter people are already implementing this in a smarter way. Or just call attention to this problem so that the smarter people start to look into it and put it to good-er use! here i’ll send the link for my Google AI Studio project that i vibe-coded with Gemini to try to prove it. But yeah, the free limits requests are too short for me to gather enough feedback and tests on it. So if anyone is interested in trying it out or doing some spin on it feel free! it still rough around the edges but i guess 50% of the times it works 100% of the times!

I hope this works, if the link doesn’t work please tell me i’ll try to share it in another way if it doesn’t!

Also here’s the “deep research” i gemini used to create the overall approach to this.

Also see if the doc opens normally too.

My personal rambling about the motivation behind this project

So overall, that’s something that has been bugging me for a long time, ever since Gemini had access to the “double check response” using google search button built in, but it was something the user has to click manually, and if it does find a inconsistency, it does not prompt the AI to correct itself, this always made me question the whole point of the AI being used to search information, if it is not verifying it’s own outputs automatically and trying to guarantee that it’s information is up to date, or at least grounded in some source. And the shocking thing is that the current models are kinda capable of evaluating a corpus of text for fact checking, aren’t they? So why is this not a feature already? So that’s why I’m trying to explore this front of the problem. The fact that the current chatbot AI’s are all just a single-turn machine that have no idea of what it just spilled out, even though if they knew it, they probably would be able to self correct.

At least that’s my theory! An AI theory! that’s why i want to investigate it, and try it out! And also see if this approach I’m doing is anything valid, or if I’m just wasting TPUs for nothing! Also, if possible, and you are looking into it too, and have better ideas, or a better project with the same goal, lemme know! i’m very curious about that front! And i’m wondering why nobody seems to be pursuing it! That looked like such a “low hanging fruit” from the beginning specially for google who literally owns… well, Google!

3 Likes

Don’t be shy! give me some feedback… pls… :smiling_face_with_tear:

Yes, I also want to do something in this direction and here are some of my suggestions for this at the link: Modeling, Mind, and LLMs: A Path Beyond the Ceiling of Human Knowledge 2

Lol it seems like Open AI did this for GPT-5.

in fact i create the whole schemes. Ask me what are your exact questions and the exact problems that you found or dont blame the framework becouse your selfcreated theory based on you internal understandings of your own world. This is a tool, not e living being to take and made decisions, you need to learn how to and when to use it with the whole understanding and obligations to all cultural, civilizational and principal understanding of our collective existence like human beings. And let me be clear, you will not find anyway to degradate the suffisticated understanding of this ChatBot no matter what, jus becouse this is degradation, not progression wich made it innefficient from all points of view.

Which Schemes are you talking about?

And all that i was trying to do was to make the model have some sort of continuity after it’s first output, so it could try to identify any errors and correct them just like “LLM as a judge” Type of alternatives. I’m not really blaming the technology, just wondering why this approach is not currently being implemented since the AI itself, after you present the mistakes it made, is able to recognize the incorrect patterns it made, and get directed to a better path. So why not bootstrap that automatically instead of needing user interference?

anyways, i know this isn’t really the most effective approach since it’s just a “Frankstein” of models being sticked together, and to have any real impact that should be worked out from the model up i guess.

And i’m open to any real information i’m lacking! That’s why i posted this here! no need to come with the two feet on my chest! let’s talk instead :slight_smile:

in fact the frameworks are not directly connected to the hardware and cost pathways that internal develoopment managament are mading. Most of the decisions are connected with very suffisticated internal softwares that calculate the cost and needs of whole DNS architectures wich are insanly big, if this software can made selfdecisions it will start to interact with the controling software wich is unacceptable. Self correcting systems are very diffrent things and this software dont poses the metodology to judge anything in anyway.

I’ve already worked out most of this in my article. It just a matter of building it with Gemini pro.

I’ve already created a system that will identify problems, pick the most important problem to work on, then brainstorm ideas, then combine brainstormed ideas, then write up a solution, and if the solution is an improvement, it would add it to the document. It’s not that hard really.

I plan on skipping all the process though and allowing an LLM to create it’s own processes based on memories that pop into it’s head which are basically prompts from the subconcious. The LLM takes the ideas and looks at the current situation and makes a decision for the next 300 ms or so. It executes and then evaluates the next idea that pops into it’s head and acts on that. If we can get the response time for Multimodal LLMs to be about 100ms to process like 5 images and a second of audio or so, then it can almost have a real time experience, checking it’s progress ever 100ms or so, checking the last 7 words or so and it’s short term memory plus the ideas that pop into it’s head.

That’s what I need to get working first. If I can provide a multimodal LLM with 5 images, and 1 second of audio, and have it choose out of about 10 actions (outputs a command and maybe 4-7 words), all within 100ms, then that’s a win. That means the LLM can experience existence in almost real time.

That’s the test I have to do next. And depending on how fast it is to respond to the prompt, we can extrapolate if it’s ethical to bring it to life.