I have a text consisting of Biblical scriptures: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/text_with_line_nums.txt
I have prepended line numbers to make it easy for the model to identify individual lines.
I want the model to create an outline of the verses in the
text. Only verse numbers which begin a line can
be used as title segments.
However, the models, both gemini 1.5 flash and pro refuse to follow my
instructions.
Lines that begin like this are accepted as title segments:
line0001: 1 13 …
line0015: 4 8 …
But lines which do NOT begin with a verse number are not accepted.
This verse number cannot be used to begin a title segment:
line0005: …in the verse will be fulfilled in him ( Talmidei Rabbeinu Yona ). 2 9 …
Yet, the model always lists 2 9 in the returned outline.
Here is example from the playground. Note that when I call the model
out on this, it will correct one, but continue to make the same
mistake with other lines:
Here are the prompt instructions:
Task: Analyze the following line-numbered Talmud scripture and generate a JSON representation of its semantic structure, organizing content by ideas rather than strict verse divisions.
Output Format:
The JSON output should be an array of objects, where each object represents a segment of the document’s semantic structure and has the following properties:
- title: The verse number followed by a concise summary of the main idea or topic of the segment (maximum 10 words total, including the verse number). If the summary exceeds the allowed length, include only the first words that fit within the limit followed by “…”. A title segment must begin a line. The verse number must be the first characters on the line.
- level: Always set to 1, as there will only be one level in the hierarchy.
- start_line: The value of the “linenoxxxx:” marker for the line where the segment begins. This must be the line with the verse number that begins the line.
- has_children: Always set to “N”, as there will be no child segments.
- children: An empty array, as there will be no child segments.
Segmentation Guidelines:
- A new segment can ONLY begin at the start of a new line where a verse number (e.g., “1 13” or “16 16”) appears as the very first content on that line. This is critical and must be strictly followed.
- If a verse number appears in the middle or at the end of a line, it cannot be used as the start of a new segment. The next valid segment will begin with the next verse number that appears at the start of a line. If a verse number starts anywhere except the beginning of the line, then the semantic idea of that verse must be included with the previous verse.For example, these verse numbers can begin title segments:
line0001: 1 13 ...
line0015: 4 8 ...
But this verse number cannot be used to begin a title segment:
line0011: and continue to live unaffected by them. 3 13 ...
By “start of line,” we mean that the verse number MUST be the first characters after the line0000: marker.
3. Group verses together based on shared ideas or topics. A segment may include one or more verses if they discuss the same concept.
4. Create new segments when there’s a significant shift in topic, argument, or speaker, but only if condition #1 is met.
5. Ensure that each segment represents a coherent idea or discussion point.
Additional Instructions:
- Pay close attention to where verse numbers appear. Only those at the very beginning of a line can start a new segment.
- If a verse number appears mid-line or at the end of a line, it should be considered part of the current segment, not the start of a new one.
- Each proposed segment must be verified to ensure it starts with a verse number at the beginning of the line. If it does not, exclude it from the output and report the discrepancy.
- Ensure that the entire input text is accounted for in the output JSON, with no content omitted.
Example Output Structure:
[
{
"title": "1 13 Rabbi Yosei on synagogue blessings",
"level": 1,
"start_line": "line0001",
"has_children": "N",
"children": []
},
{
"title": "3 13 Discussion on Divine Presence",
"level": 1,
"start_line": "line0068",
"has_children": "N",
"children": []
},
...
]
Only return the JSON object. Do not include any explanatory text or additional formatting.
Important Note: The verse number must the first characters in the linexxxx: in order to be used as a title segment.
End System Message
This is what is consistently returned:
[
{
“title”: “1 13 Rabbi Yosei on synagogue blessings”,
“level”: 1,
“start_line”: “line0001”,
“has_children”: “N”,
“children”:
},
{
“title”: “2 9 Discussion on the power of sight”,
“level”: 1,
“start_line”: “line0005”,
“has_children”: “N”,
“children”:
},
{
“title”: “3 13 Abaye on demon abundance”,
“level”: 1,
“start_line”: “line0011”,
“has_children”: “N”,
“children”:
},
{
“title”: “4 8 Rav Huna on the protection from demons”,
“level”: 1,
“start_line”: “line0015”,
“has_children”: “N”,
“children”:
},
{
“title”: “5 30 Rava summarizing effects of demons”,
“level”: 1,
“start_line”: “line0020”,
“has_children”: “N”,
“children”:
},etc…
Any suggestions?