Processing multiple text excerpts with Gemini API

billziss · May 19, 2025, 3:29pm

I am evaluating the Gemini API for the following task: Given a number of text excerpts I would like the API to analyze each excerpt and then produce a response that contains the results of each excerpt analysis, one result per excerpt. For example, if I supply 100 texts: “Text1”, “Text2”, …, “Text100”, I want back a list of 100 results, one result for each text.

I supply my instructions in the system_instruction and the model seems in general to understand the instructions in that it produces correct results for some of the text excerpts. However it also produces the wrong number of results: if I supply 100 texts, I may get back less or sometimes more(!) than 100 results. The model seems to work better when I limit my texts to 10.

My text excerpts are generally small (sentences) and the results are expected in the application/json format and fit within the max_output_tokens limit (8192). The model that I am currently evaluating is gemini-2.0-flash. I am using it via the Python SDK (google-genai==1.11.0).

I do not know if the model simply loses count or if I am misusing it in some way. I have tried to supply the text excerpts as individual parts and also as a single text with an ad-hoc separator (e.g. the text ---BOUNDARY---) between excerpts.

Any guidance greatly appreciated.

GUNAND_MAYANGLAMBAM · May 20, 2025, 2:28pm

Hi @billziss , Welcome to the forum.

I just tried providing text excerpts as a single prompt and expecting the response in JSON schema. Attaching a colab gist file for reference. Hope it helps.

billziss · May 20, 2025, 9:59pm

Thank you for your response.

I have had problems making this work until I introduced an “ID” for each of my text excerpts. Here is the prompt that works for me:

Input format:

You are given a series of text excerpts. Each excerpt is prefixed by a single line that
contains the string "----ITEM:" followed by a unique identifier, which serves as the
excerpt ID. For example:

    ----ITEM: X1
    This is the text of excerpt X1.
    This is another line of text from excerpt X1.
    ...

    ----ITEM: AnotherExcerpt
    Even more text.

Task:

[snip]

Output format:

[snip]

Topic		Replies	Views
API periodically ignoring multiple documents Gemini API gemini-15 , api , gemini-api	9	233	October 1, 2024
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	37	1213	July 11, 2025
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	236	June 11, 2025
"finishReason" : "MAX_TOKENS" - But Text is Empty Gemini API prompt , rate-limits	10	830	July 9, 2025
Extracting Structured Text from Multi-Page Scanned Documents Gemini API gemini-15 , ai-studio , models	1	92	March 18, 2025

Processing multiple text excerpts with Gemini API

Related topics