Processing multiple text excerpts with Gemini API

I am evaluating the Gemini API for the following task: Given a number of text excerpts I would like the API to analyze each excerpt and then produce a response that contains the results of each excerpt analysis, one result per excerpt. For example, if I supply 100 texts: “Text1”, “Text2”, …, “Text100”, I want back a list of 100 results, one result for each text.

I supply my instructions in the system_instruction and the model seems in general to understand the instructions in that it produces correct results for some of the text excerpts. However it also produces the wrong number of results: if I supply 100 texts, I may get back less or sometimes more(!) than 100 results. The model seems to work better when I limit my texts to 10.

My text excerpts are generally small (sentences) and the results are expected in the application/json format and fit within the max_output_tokens limit (8192). The model that I am currently evaluating is gemini-2.0-flash. I am using it via the Python SDK (google-genai==1.11.0).

I do not know if the model simply loses count or if I am misusing it in some way. I have tried to supply the text excerpts as individual parts and also as a single text with an ad-hoc separator (e.g. the text ---BOUNDARY---) between excerpts.

Any guidance greatly appreciated.

Hi @billziss , Welcome to the forum.

I just tried providing text excerpts as a single prompt and expecting the response in JSON schema. Attaching a colab gist file for reference. Hope it helps.

Thank you for your response.

I have had problems making this work until I introduced an “ID” for each of my text excerpts. Here is the prompt that works for me:

Input format:

You are given a series of text excerpts. Each excerpt is prefixed by a single line that
contains the string "----ITEM:" followed by a unique identifier, which serves as the
excerpt ID. For example:

    ----ITEM: X1
    This is the text of excerpt X1.
    This is another line of text from excerpt X1.
    ...

    ----ITEM: AnotherExcerpt
    Even more text.

Task:

[snip]

Output format:

[snip]
1 Like