Hello. My topic will be about system instructions. When using AI models, one of the most important aspects for me are system instructions and the context window. Gemini makes me prefer it over Claude and GPT because Gemini offers this for free.
Many people on YouTube have their standard model tests, like riddles about the number of murderers, 9.9 or 9.11, as well as the question about the number of “R” letters in the word “strawberry”. I have my own test, which is a text-based role-playing game with a HUD. This way, I test whether the game correctly updates the HUD (e.g., the amount of gold, elapsed time, nearby characters and their stats, etc.).
To the point. Your 1.5-pro model and other experimental weight versions impose limitations on system instructions and conversation. There are two boundaries. From 0 to 8k context, where system instructions are the most important, from 8k to 32k where they still have some significance, and 32k+ where only the conversation context matters.
Unfortunately, I consider this a huge drawback. Because even if I create good guidelines in the system instructions for the role-playing game, the laws of the world, the world’s economy, NPCs, etc., it doesn’t matter anymore. Additionally, each of your basic and experimental models reacts differently. Sometimes, after 32k+, it completely disregards the instructions. I can create a language translator there, and it will still create a role-playing game because that’s the context of the conversation where the role-playing game was about knights.
Your models also react differently to sensitive instructions. Sometimes, the model reacts to the situation of taking a phone out of a pocket in the 1400-1500s by interrupting the role-playing game and asking for clarification. Sometimes it tries to weave it into the story by calling it a magical artifact, and sometimes it doesn’t see a problem at all, and a phone in those years is a normal thing. It all depends on when and at what length of the conversation this happens.
My advice:
System instructions should ALWAYS be the most important, regardless of whether it’s the beginning of the conversation or we already have 100k tokens. In my opinion, this would significantly improve the model’s performance. GPT and Claude-3.5-sonnet models put slightly more emphasis on ‘instructions’ but ALSO have this problem.