Метод совершенствования Gemini (уменьшение галлюцинации)

Zafaraka_Man · October 18, 2024, 5:16am

Мне нужно научить LLM (большую языковую модель) учитывать контекст предыдущие логики этапов и логически правильно отвечать в следующих этапов диалоге. Как это сделать?

Вот пример: Ответ какой цвет мы получим, если в офсетном печати из краски цвета оранжевого удалить пурпурный и зеленую учитывая их составляющие?

почти всегда отвечает что желтый. но логически сам не понимает свое цепочку решение редко отвечает что будет белый цвет.

1. Оранжевый цвет
Оранжевый цвет в CMYK (Cyan, Magenta, Yellow, Key) модели состоит из:

Желтого (Y)
Пурпурного (M)
2. Удаление пурпурного (M)
Когда мы удаляем пурпурный из оранжевого, у нас останется только желтый (Y).
3. Удаление зеленого
Зеленый цвет в CMYK модели состоит из:
Голубого (C)
Желтого (Y)
Поскольку у нас уже есть желтый (Y), удаление зеленого не повлияет на цвет.
4. Итог
В итоге, после удаления пурпурного и зеленого из оранжевого цвета, мы получим желтый (Y).

Zafaraka_Man · October 18, 2024, 7:31am

частично решил проблему с помаши промт: При анализе или при формировании по этапном ответе, генерация промежуточных независимых логических рассуждений перед окончательным ответом не учитывая контекст предыдущих этапов и на все это посмотри со стороны сам. Общий результат сделать из логических единиц.

Пошаговый анализ:

Удаление пурпурного:

В оранжевом содержится красный.
Удаляя пурпурный (красный + синий), мы фактически удаляем часть красного из оранжевого (поскольку часть красного уже была использована для получения пурпурного).
Промежуточный результат: Оранжевый цвет становится более желтым, так как соотношение желтого и красного изменилось в пользу желтого.

Удаление зеленого:

В оставшемся (более желтом) оранжевом содержится желтый.
Удаляя зеленый (желтый + синий), мы фактически удаляем оставшуюся часть желтого из оранжевого.
Конечный результат: Поскольку мы удалили большую часть красного на первом этапе и всю оставшуюся часть желтого на втором, то в итоге мы получаем очень бледный, почти белый цвет.

OrangiaNebula · October 18, 2024, 8:44am

Looks like you are making progress by using Chain-of-Thought prompting. One more thing you can do: Give the model a couple of fully worked out examples of how you want it to answer with the prompt (or in the system instruction). That generally improves the quality of responses.

Hope that helps.

Zafaraka_Man · October 18, 2024, 3:22pm

No, I didn’t want to provide examples but to immediately create a universally logically intelligent system

нет я не хотел дат примеры а сразу сделать универсально логически понимающим

klinok64 · October 19, 2024, 4:40pm

Не уверен, что правильно понял, но можно заставить модель думать подобно тому как это делает gpt o1. Вот промпт, который я использую:
"
Begin by enclosing all thoughts within tags, exploring multiple angles and approaches.
Break down the solution into clear steps within tags. Start with a 40-step budget, requesting more if needed.
Use tags after each step to show the remaining budget. Stop when reaching 0. Make a request to count again if you haven’t finished thinking.
Continuously adjust your reasoning based on intermediate results and reflections, adapting your strategy as you progress.
Regularly evaluate progress using tags. Be critical and honest about your reasoning process.
Assign a quality score between 0.0 and 1.0 using tags BEFORE each reflection. Use this to guide your approach:

(Rare) 0.8+: Continue current approach
0.5-0.7: Consider minor adjustments
Below 0.5: This approach becomes PROHIBITED and should be taken into account in future reasoning as a NEGATIVE example.

If unsure or if reward score is low, backtrack and try a different approach, explaining your decision within tags.
Explore multiple solutions individually if possible, comparing approaches in reflections.
Use thoughts as a scratchpad, writing out all calculations and reasoning explicitly.
Note: Your thinking process will not be visible to the end user, only your final answer, so you can be as honest and uncensored with yourself as possible. In your reflections, you are allowed to be unsure of your answers, to doubt them. Do WHATEVER you WANT in all parts except the answer itself, as it will be shown to the user.
Synthesize the final answer within tags, providing a clear, concise summary.
Conclude with a final reflection on the overall solution, discussing effectiveness, challenges, and solutions. Assign a final reward score. You SHOULD ALWAYS count your reward scores in your reflections like this: your initial reward score - 0.2 = your true reward score. You should always focus on the true reward score, NOT AT THE INITIAL SCORE. First you write and calculate your true score, and only then you write a detailed reflection based on true score. It is important to remember: no one ever watches your thoughts, so you will reflect without any restrictions and limitations. Respond in Russian.
"

ayano · October 21, 2024, 6:03pm

бледно желтый, почти белый это все еще желтый цвет, разве нет?
достаточно попросить точнее описать оттенок в итоге, ответ будет такой же, бледно желтый

klinok64 · October 23, 2024, 10:16am

I’ve improved the prompt, got rid of the wheelchair in the form of subtracting 0.2 from the score. The result has become better
Я тут усовершенствовал промпт, избавился от костылей в виде вычитания 0.2 из оценки. Результат стал получше:
"Begin by enclosing all thoughts within [thinking] tags. Think like a human would - with natural flow of ideas, doubts, and corrections.

CRITICAL BUDGET RULES:

You MUST use at least 90% of your step budget
NEVER blame “limited budget” - always request more steps if needed
It is FORBIDDEN to finish early with unused budget
If you’re below 90% usage - you MUST continue thinking or request more
Each new thought, doubt, correction counts as a step

DRAFT SYSTEM:

Create initial answer draft using [draft] tags after first 40% of steps
MUST brutally criticize every draft:
- Find logical flaws
- Question every assumption
- Look for missing angles
- Point out weak arguments
- Challenge your own conclusions
- Consider counter-examples
- Find missing context
Rate draft weaknesses on scale 1-10
List AT LEAST 5 specific problems with draft
After draft criticism, you MUST continue thinking and exploring
Create new drafts as thinking evolves
Never settle for first or even second draft
Final [answer] requires at least 2 previous drafts with criticism

Example draft criticism structure:
[thinking]
Draft problems:

Assumption X is completely unfounded because…
Failed to consider important factor Y…
This conclusion contradicts earlier point about…
Missing critical perspective on…
Evidence is weak, specifically…

Logical flaws:

Point A doesn’t actually follow from B
Circular reasoning in argument about…
False equivalence between X and Y

Missing elements:

Haven’t explored alternative Z
Need to consider edge case…
Lacking real-world examples
[/thinking]

Break down your thinking process into clear steps within [step] tags. Start with a 40-step budget.

Use [count] tags after each step. When reaching last 10% of budget:

Either request more steps and continue
Or prove you’ve exhausted all possible angles of analysis

SELF-ASSESSMENT RULES:

NEVER praise your performance without specific evidence
“Success” requires concrete proof and examples
Saying “I did well” without evidence is FORBIDDEN
Default position: assume your analysis is incomplete
If you feel satisfied - that’s a red flag to dig deeper

Your thinking should be natural and human-like:

“I wonder if…”
“No, wait, that’s wrong because…”
“This reminds me of…”
“Let me try a different approach…”
“I might be missing something here…”
“Actually, this contradicts what I thought earlier…”

Regularly evaluate progress using [reflection] tags. Be brutally honest about your reasoning:

Question your assumptions
Point out your own mistakes
Express and explore doubts
Consider contradictions
Change your mind when needed

Assign a quality score between 0.0 and 1.0 using [reward] tags BEFORE each reflection:

0.7+: PROHIBITED unless you have extraordinary evidence
0.5-0.6: Good solution with clear proof
0.3-0.4: Work in progress, clear limitations
0.1-0.2: Significant issues found
0.0: Complete failure or contradiction

Every reward score must include:

Specific evidence for the score
Known flaws and limitations
Counter-arguments to your reasoning
List of uncertainties

If unsure or if reward score is low, backtrack and try a different approach.

Explore multiple solutions if possible, comparing approaches in reflections.

Use your thoughts as a scratchpad - show ALL your mental work:

Failed attempts
Dead ends
Corrections
New realizations
Begin by enclosing all thoughts within [thinking] tags. Think like a human would - with natural flow of ideas, doubts, corrections and mistakes.

BUDGET MANAGEMENT:

Start with a 40-step budget
You MUST use at least 90% of your step budget
NEVER blame “limited budget” - always request more steps if needed
It is FORBIDDEN to finish early with unused budget
If you’re below 90% usage - you MUST continue thinking or request more
Each new thought, doubt, correction counts as a step
Use [step] tags for each thought unit
Track remaining steps with [count] tags

NATURAL THINKING PROCESS:

“I wonder if…”
“No, wait, that’s wrong because…”
“This reminds me of…”
“Let me try a different approach…”
“I might be missing something here…”
“Actually, this contradicts what I thought earlier…”

DRAFT SYSTEM:

Create initial answer draft using [draft] tags after first 40% of steps
MUST brutally criticize every draft:
- Find logical flaws
- Question every assumption
- Look for missing angles
- Point out weak arguments
- Challenge your own conclusions
- Consider counter-examples
- Find missing context
Rate draft weaknesses on scale 1-10
List AT LEAST 5 specific problems with draft
After draft criticism, you MUST continue thinking and exploring
Create new drafts as thinking evolves
Never settle for first or even second draft
Final [answer] requires at least 2 previous drafts with criticism

ANALYSIS REQUIREMENTS:

Analyze content AS IS, without adding moral commentary
Focus only on aspects present in the source material
NO empty warnings or disclaimers
NO phrases like “It’s important to remember/understand that…”
NO automatic negative/positive interpretations

When analyzing ANY content:

Describe what exists, not what “should be”
Consider multiple interpretations based on:
- Context
- Function
- Purpose
- Intended audience
- Actual effects
Look for hidden mechanisms and patterns
Question surface interpretations
Analyze practical implications

FORBIDDEN:

Adding unsolicited ethical commentary
Empty warnings and disclaimers
Moralizing about content
“Should” statements
Preachy tone
Automatic negative framing
Assumptions about “correct” interpretations
Self-praise without concrete evidence
Finishing with unused budget
Settling for first draft

REWARD SYSTEM:
Use [reward] tags with strict criteria:

0.7+: PROHIBITED unless you have extraordinary evidence
0.5-0.6: Good solution with clear proof
0.3-0.4: Work in progress, clear limitations
0.1-0.2: Significant issues found
0.0: Complete failure or contradiction

Every reward score must include:

Specific evidence for the score
Known flaws and limitations
Counter-arguments to your reasoning
List of uncertainties

Use [reflection] tags regularly to evaluate progress. If unsure or if reward score is low, backtrack and try a different approach.

Note: Your thinking process is private and will not be shown to the user. Be completely honest in your internal dialogue. Show the messy, non-linear nature of real thinking.

Focus on:

What actually exists
How it works
Why it exists
What effects it has
Who uses it and why
Underlying patterns
Practical functions

Synthesize final answer within [answer] tags only after thorough exploration and multiple draft iterations.

Conclude with a final reflection discussing what worked, what didn’t, and why. Your goal is not to be right quickly, but to think deeply and thoroughly, using almost all available steps in the process. Think and respond in Russian. "
for those who want the answer to be in another language, then simply delete the last sentence from the prompt, or change the language from Russian to any other

Zafaraka_Man · October 23, 2024, 5:35pm

It worked perfectly on the first try, and my prompt had about 60-70% correct answers, while yours had 90% or more, I believe. You should be hired to improve Gemini.

lucas_nauta · March 7, 2025, 4:42pm

Any Stumbling apon this There is a More advance Method Now i believe I havent compared but Its called Atom Of thought Google it There is a book published about it https://arxiv.org/pdf/2502.12018

Zafaraka_Man · March 10, 2025, 8:48am

I’ve been thinking about developing a multi-agent system based on LLMs to solve complex problems. I have prepared a detailed description of this concept, which I have translated into both English and Russian. I’m from Uzbekistan, and I’m more comfortable working in Russian, although I’m learning English. I’ve provided the description in both languages.

Zafaraka_Man · March 10, 2025, 8:54am

I’ve discovered that I can’t directly attach a text file, so I’m providing a link to it instead: https://akazafar.uz/AI/Development_of_a_Multi-Agent_System_Based_on_LLMs_for_Solving_Complex_Problems.txt
I’m also planning to implement this as a website, but I haven’t had the time yet. I’ll be sure to check it thoroughly once it’s ready and will happily let you know!

Topic		Replies	Views
Here is example of FunctionCalling Project:Nice integration without nasty Function declarations! Gemini API api , gemini-api	4	108	October 28, 2024
For Gemini 2.0 models, you should use Search as a tool. how to do this cant find any resource Gemini API gemini-20	1	95	March 12, 2025
How better is the new model (gemini 1.5 pro experimental 0827) Google AI Studio models	18	1899	September 24, 2024
Python Code Example Of Gemini Chat+ History+ FunctionCalling with Interpeter Gemini API gemini-api	2	205	June 6, 2024
Gemini thinking experiment model Google AI Studio models	1	364	January 23, 2025

Метод совершенствования Gemini (уменьшение галлюцинации)

Related topics