Feedback: system instructions

Hello. My topic will be about system instructions. When using AI models, one of the most important aspects for me are system instructions and the context window. Gemini makes me prefer it over Claude and GPT because Gemini offers this for free.

Many people on YouTube have their standard model tests, like riddles about the number of murderers, 9.9 or 9.11, as well as the question about the number of “R” letters in the word “strawberry”. I have my own test, which is a text-based role-playing game with a HUD. This way, I test whether the game correctly updates the HUD (e.g., the amount of gold, elapsed time, nearby characters and their stats, etc.).

To the point. Your 1.5-pro model and other experimental weight versions impose limitations on system instructions and conversation. There are two boundaries. From 0 to 8k context, where system instructions are the most important, from 8k to 32k where they still have some significance, and 32k+ where only the conversation context matters.
Unfortunately, I consider this a huge drawback. Because even if I create good guidelines in the system instructions for the role-playing game, the laws of the world, the world’s economy, NPCs, etc., it doesn’t matter anymore. Additionally, each of your basic and experimental models reacts differently. Sometimes, after 32k+, it completely disregards the instructions. I can create a language translator there, and it will still create a role-playing game because that’s the context of the conversation where the role-playing game was about knights.

Your models also react differently to sensitive instructions. Sometimes, the model reacts to the situation of taking a phone out of a pocket in the 1400-1500s by interrupting the role-playing game and asking for clarification. Sometimes it tries to weave it into the story by calling it a magical artifact, and sometimes it doesn’t see a problem at all, and a phone in those years is a normal thing. It all depends on when and at what length of the conversation this happens.

My advice:
System instructions should ALWAYS be the most important, regardless of whether it’s the beginning of the conversation or we already have 100k tokens. In my opinion, this would significantly improve the model’s performance. GPT and Claude-3.5-sonnet models put slightly more emphasis on ‘instructions’ but ALSO have this problem.

1 Like

It could be so great if the System Instructions bug was finally fixed. So i won’t have to see “a testament” generated ever again.

Can you elaborate on what you see the bug is? And what this has to do with “a testament”?

2 Likes

Well… you see, I keep sending feedback which never arrive to the Google Team, a black hole just as you described. As for the bug, the System Instructions has a long block list that is meant to block repetitive words that the Model would usually write “The air is thick” “a Testament to” “Challenges” “With a malevolent grin” ect, which does not work, even with the experiment model.

I can assure you that you are not being discriminated against: the feedback is an equal opportunity black hole, fair to all participants: everyone gets equally ignored. Some Google engineers have recently started answering questions on this forum, to be fair.

As to the substance of the issue: A blocklist and the instruction “do not generate foo” is negative prompting. Negative prompting just doesn’t work.

1 Like

Which is way I label it as a Bug, it’s for the Google team to FIX IT. And i will NOT stop until it’s perfect.

I don’t know if anyone from Google looks at and reads these topics. However, they should care because currently Gemini doesn’t really exist in the awareness of the average user. Most people reach for GPT, and Claude is the next choice.

Google should do something about the form of system instructions. Currently, there are many methods, but unfortunately, the results achieved differ only slightly. On the other hand, in Claude, XML tags work very well and can significantly improve the quality of generated responses. After thousands of tests on Gemini, I conclude that XML tags only worsened my results. So not only do they not improve, but they actually worsen the results. This is a significant shortcoming of Gemini.

Currently, I achieve the best results with this form of instruction:

You are Alpha, never leave this role.

-Alpha, do X.
-Alpha, do Y.
-Alpha, remember A and B.
-Alpha, you are forbidden to do C and D.

If Google cares about attracting more advanced users who will then create some materials on YouTube and promote Gemini, they need to do something about system instructions

1 Like

Since this is my topic and I wrote about system instructions, it’s fitting to comment on the new 002 model. There’s a slight progress in terms of adhering to instructions; the difference is cosmetic but noticeable in daily use. Unfortunately, the progress is so small that at this rate, Gemini will never catch up to the models from OpenAI and Claude. I assume that when Claude releases the Opus-3.5 model or a new Sonnet, it might be the end of the competition. The Gemini model has a large context window, but what’s the point of it if the window disrupts the instructions? Very often when I ask Gemini for help with code, and I have instructions about the form of providing me with code and how to converse with me, after a few messages Gemini completely disregards the instructions, because what’s in the conversation becomes more important to it. Additionally, I have the impression that Google focused so much on increasing the size of the context window that they forgot about its quality. What’s the use of 2 million tokens if the model poorly understands the content of the conversation and falls into loops, and in extreme cases, gave me the same code 3 times without the slightest correction. Of course, it explained that this version would certainly work.

Advice:

  1. System instructions should be like a god for the model.
  2. Focus on the quality of the context window, like Sonnet has, remembering details and using them on its own. The Gemini model only uses details when we point them out.
  3. Increase usage limits in AI studio and via API. In the current state, users will always choose GPT and then Claude.
4 Likes