Structured output in gemini-2.5-flash-lite batch mode (input file)

Hello,
I’m trying to set up structured output in gemini-2.5-flash-lite batch mode using file input (python). In inline input you can pass pydantic object with 'response_schema': list[Recipe] (like this) which works, but what is the proper way of passing the structured output schema when using input file? I tried passing JSON Schema but it didn’t seem to work?? As stated in docs it should work with gemini-2.5, does it include gemini-2.5-flash-lite?

Here is what I’m trying to achieve

CATEGORIES = ["Polityka", "Gospodarka", "Świat", "Kultura", "Sport", "Nauka", "Technologia", "Kraj", "Edukacja", "Inne"]

class Summary(BaseModel):
    title: str
    snippet: str
    bulletPoints: List[str]
    categories: List[Literal[CATEGORIES]]
. . .

inline_requests.append(
            {
                "key": cluster.get("cluster_id"),
                "request": {
                    "contents": [
                        {
                            "parts": [{"text": prompt}],
                            "role": "user",
                        }
                    ]
                },
                "generationConfig": {
                    "response_mime_type": "application/json",
                    "response_schema": Summary.model_json_schema(),
                }
            }
        )
. . .

As I stated, I tried different approaches:

  1. Passing pydantic object
  2. passing Summary.model_json_schema() to the responseJsonSchema field
  3. passing Summary.model_json_schema() to a normal response_schema field
  4. passing a dictionary

Also I didn’t find information about the name of the field that needs to be set in the request? responseSchema has response_schema field but I found that only in the provided examples, there is no example on responseJsonSchema. Is it response_json_schema? It didn’t work both ways though.

I have noticed now that generationConfig should be generation_config instead? Still didn’t work

Hello,

For batch processing, the response schema must be included directly within each JSON object in your .jsonl file. Each line, representing a single request, should follow the structure shown below:

json_line = {
    "key": f"request_{i}",
    "request": {
        "contents": [{"parts": [{"text": "CONTENT"}]}],
        "system_instruction": {"parts": [{"text": "SYSTEM_INSTRUCTION"}]},
        "generation_config": {
            "response_mime_type": "application/json",
            "response_schema": response_schema
        }
    }
}

Based on your example, the response_schema would be defined as follows:

response_schema = {
    "type": "OBJECT",
    "properties": {
        "title": {"type": "STRING","description": "Tytuł podsumowania."},
        "snippet": {"type": "STRING","description": "Krótki fragment/streszczenie."},
    "bulletPoints": {
        "type": "ARRAY",
        "description": "Lista kluczowych punktów w formie punktorów.",
        "items": {"type": "STRING"}
        },
    "categories": {
        "type": "ARRAY",
        "description": "Lista kategorii, do których pasuje treść.",
        "items": {
            "type": "STRING",
            "enum": [
                "Polityka",
                "Gospodarka",
                "Świat",
                "Kultura",
                "Sport",
                "Nauka",
                "Technologia",
                "Kraj",
                "Edukacja",
                "Inne"
                ]
            }
        }
    },
    "required": [
        "title",
        "snippet",
        "bulletPoints",
        "categories"
        ]
}

The primary difference between this format and using the .model_json_schema() method is that the type keys must be capitalized (e.g., ‘OBJECT’, ‘STRING’) to ensure REST API compatibility.