Cannot get Gemini models to follow prompt instructions

I have a text consisting of Biblical scriptures: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/text_with_line_nums.txt

I have prepended line numbers to make it easy for the model to identify individual lines.

I want the model to create an outline of the verses in the
text. Only verse numbers which begin a line can
be used as title segments.

However, the models, both gemini 1.5 flash and pro refuse to follow my
instructions.

Lines that begin like this are accepted as title segments:

line0001: 1 13 …
line0015: 4 8 …

But lines which do NOT begin with a verse number are not accepted.
This verse number cannot be used to begin a title segment:

line0005: …in the verse will be fulfilled in him ( Talmidei Rabbeinu Yona ). 2 9 …

Yet, the model always lists 2 9 in the returned outline.

Here is example from the playground. Note that when I call the model
out on this, it will correct one, but continue to make the same
mistake with other lines:

https://aistudio.google.com/app/prompts?state={"ids":["1qItVmzPE2slFY30zeabFL-mqOElQoRAw"],"action":"open","userId":"110672342279148587006","resourceKeys":{}}&usp=sharing

Here are the prompt instructions:

Task: Analyze the following line-numbered Talmud scripture and generate a JSON representation of its semantic structure, organizing content by ideas rather than strict verse divisions.

Output Format:

The JSON output should be an array of objects, where each object represents a segment of the document’s semantic structure and has the following properties:

  • title: The verse number followed by a concise summary of the main idea or topic of the segment (maximum 10 words total, including the verse number). If the summary exceeds the allowed length, include only the first words that fit within the limit followed by “…”. A title segment must begin a line. The verse number must be the first characters on the line.
  • level: Always set to 1, as there will only be one level in the hierarchy.
  • start_line: The value of the “linenoxxxx:” marker for the line where the segment begins. This must be the line with the verse number that begins the line.
  • has_children: Always set to “N”, as there will be no child segments.
  • children: An empty array, as there will be no child segments.

Segmentation Guidelines:

  1. A new segment can ONLY begin at the start of a new line where a verse number (e.g., “1 13” or “16 16”) appears as the very first content on that line. This is critical and must be strictly followed.
  2. If a verse number appears in the middle or at the end of a line, it cannot be used as the start of a new segment. The next valid segment will begin with the next verse number that appears at the start of a line. If a verse number starts anywhere except the beginning of the line, then the semantic idea of that verse must be included with the previous verse.For example, these verse numbers can begin title segments:
line0001: 1 13 ...
line0015: 4 8 ...

But this verse number cannot be used to begin a title segment:

line0011: and continue to live unaffected by them. 3 13 ...

By “start of line,” we mean that the verse number MUST be the first characters after the line0000: marker.
3. Group verses together based on shared ideas or topics. A segment may include one or more verses if they discuss the same concept.
4. Create new segments when there’s a significant shift in topic, argument, or speaker, but only if condition #1 is met.
5. Ensure that each segment represents a coherent idea or discussion point.

Additional Instructions:

  • Pay close attention to where verse numbers appear. Only those at the very beginning of a line can start a new segment.
  • If a verse number appears mid-line or at the end of a line, it should be considered part of the current segment, not the start of a new one.
  • Each proposed segment must be verified to ensure it starts with a verse number at the beginning of the line. If it does not, exclude it from the output and report the discrepancy.
  • Ensure that the entire input text is accounted for in the output JSON, with no content omitted.

Example Output Structure:

[
  {
    "title": "1 13 Rabbi Yosei on synagogue blessings",
    "level": 1,
    "start_line": "line0001",
    "has_children": "N",
    "children": []
  },
  {
    "title": "3 13 Discussion on Divine Presence",
    "level": 1,
    "start_line": "line0068",
    "has_children": "N",
    "children": []
  },
  ...
]

Only return the JSON object. Do not include any explanatory text or additional formatting.

Important Note: The verse number must the first characters in the linexxxx: in order to be used as a title segment.

End System Message

This is what is consistently returned:

[
{
“title”: “1 13 Rabbi Yosei on synagogue blessings”,
“level”: 1,
“start_line”: “line0001”,
“has_children”: “N”,
“children”:
},
{
“title”: “2 9 Discussion on the power of sight”,
“level”: 1,
“start_line”: “line0005”,
“has_children”: “N”,
“children”:
},
{
“title”: “3 13 Abaye on demon abundance”,
“level”: 1,
“start_line”: “line0011”,
“has_children”: “N”,
“children”:
},
{
“title”: “4 8 Rav Huna on the protection from demons”,
“level”: 1,
“start_line”: “line0015”,
“has_children”: “N”,
“children”:
},
{
“title”: “5 30 Rava summarizing effects of demons”,
“level”: 1,
“start_line”: “line0020”,
“has_children”: “N”,
“children”:
},

etc…

Any suggestions?

1 Like

I checked your source file and found that it has problems with character encoding, please tell me which language and which encoding, how can I open this file to be able to read the original?

1 Like

Thank you for the response. This is hardly a Drupal issue. It is a Google Gemini Flash 1.5 issue. The problem, as I have hopefully outlined, is that the model will NOT follow the instructions given.

The text is Hebrew, but the characters do not translate correctly when uploaded to AWS S3 bucket. The characters are correct in the text files I am sending to the model.

However, the instructions to the model are to not create title segments for verses that begin in the middle of a line. All the verses are in English, and the Hebrew character has little to do with the instructions – unless the model is somehow confused by this.

1 Like

Indeed, I observed the work of artificial intelligence with the sacred letters of the Hebrew alphabet, and it does not always work correctly, especially when I translated google translator or artificial intelligence open AI. I don’t know how it is in Jimini, but I will probably have to retrain the Gemini model by giving it the necessary set data so that it understands Hebrew and is not confused.This is what came to my mind and I decided to share it with you.

I understand but i hardly understand allot

2 Likes

As I said, I did not ask the model to translate the Hebrew. Only to locate the verse numbers, all of which are in English. I shared the prompt and the model response here: https://aistudio.google.com/app/prompts?state={"ids":["1qItVmzPE2slFY30zeabFL-mqOElQoRAw"],"action":"open","userId":"110672342279148587006","resourceKeys":{}}&usp=sharing