Gemini 2.0: use a list of Pydantic objects at response schema

Vincent_Garcia · December 19, 2024, 2:49pm

I have an image and I want to use Gemini 2 Flash to detect the bounding boxes of the different objects in the image. I’d like the output to be a list of objects (see the Object class in the code bellow).

The problem I have is that I cannot figure out how to use Pydantic along side response_schema to make it work. Here is the code:

import os

from pydantic import BaseModel, RootModel

from typing import List

from google import genai
from google.genai import types
from PIL import Image


class Object(BaseModel):
    label: str
    box_2d: list[int]


class ObjectsList(RootModel):
    root: List[Object]


image = Image.open(SOME_IMAGE_PATH)

prompt = "Detect objects box 2d."

client = genai.Client(api_key=os.environ["API_KEY"])

response = client.models.generate_content(
    contents=[image, prompt],
    model="gemini-2.0-flash-exp",
    config=types.GenerateContentConfig(
        response_mime_type= 'application/json',
        response_schema=ObjectsList,
    ),
)

This results in a Extra inputs are not permitted [type=extra_forbidden, input_value={'Object': {'properties':...2d'], 'type': 'OBJECT'}}, input_type=dict] error.

Instead of passing ObjectsList to response_schema, I first tried passing list[Object] or List[Object] with no luck.

Is it possible to use Pydantic to have a list of objects as output?

camadi · December 20, 2024, 9:09pm

Pydantic actually works with Gemini 2.0 (response_schema). Please check the code below which works for me. The error message is not coming from Gemini 2.0.

import os

from google import genai
from google.genai.types import GenerateContentConfig

from PIL import Image
from pydantic import BaseModel

os.environ["API_KEY"] = <YOUR_API_KEY>

client = genai.Client(api_key=os.environ["API_KEY"])


image = Image.open(SOME_IMAGE_PATH)
prompt = "Detect objects box 2d."

class Object(BaseModel):
    label: str
    box_2d: list[int]
    
response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=[image, prompt],
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=Object,
    ),
)

print(response.text)

Vincent_Garcia · December 21, 2024, 1:02pm

Your solution does not work. Maybe I was not clear in my initial message / question.

To be more precise, your solution does something (with no error) which is different from what I’ve asked. What I’ve asked was:

Is it possible to use Pydantic to have a list of objects as output?

The Object has 2 fields: label (which is a string) and box_2d (which is a list of integers). By using response_schema=Object in the GenerateContentConfig, you’re asking Gemini to return you only one Object. Here is the output of your solution:

{
  "box_2d": [
    53,
    285,
    283,
    436
  ],
  "label": "wallet"
}

Gemini’s output is a JSON with one object, which is exactly what you’ve asked.
But it’s not a list of objects, hence it does not answer my problem.

camadi · January 3, 2025, 9:08pm

@Vincent_Garcia Thank you for the clarification.

It appears this is not supported out-of-the-box in the SDK. Please refer to this GitHub issue which has a workaround. Alternatively, you can refer to this code below:

import os
import json

from pydantic import BaseModel, RootModel, TypeAdapter
from typing import List

from google import genai
from google.genai import types
from PIL import Image

class Object(BaseModel):
    label: str
    box_2d: list[int]

class ObjectsList(RootModel):
    root: List[Object]

def get_schema(cls: BaseModel):
    """
    Converts a Pydantic model to a JSON schema dictionary.
    """
    schema = cls.model_json_schema()
    if "$defs" not in schema:
        return schema

    defs = schema.pop("$defs")

    def _resolve(schema):
        if "$ref" in schema:
            ref = schema.pop("$ref")
            schema.update(defs[ref.split("/")[-1]])
        if "properties" in schema:
            for prop in schema["properties"].values():
                _resolve(prop)
        if "items" in schema:
            _resolve(schema["items"])
        schema.pop("title",None)

    _resolve(schema)
    return schema

SOME_IMAGE_PATH = "Cajun_instruments.jpg"

image = Image.open(SOME_IMAGE_PATH)

prompt = "Detect objects box 2d."


response = client.models.generate_content(
    contents=[image, prompt],
    model="gemini-2.0-flash-exp",
    config=types.GenerateContentConfig(
        response_mime_type='application/json',
        response_schema=get_schema(ObjectsList),  # Use the get_schema function here
    ),
)

# Load JSON string and validate with Pydantic
obj = TypeAdapter(ObjectsList).validate_python(json.loads(response.text))
print(obj)

Output:

Vincent_Garcia · January 14, 2025, 1:39pm

I see that you are using a JSON schema here (which is the solution I’m using too). I understand that Pydantic cannot be used for this simple use case. Thanks for your answer.

Topic		Replies	Views
Response Schema from Pydantic? Gemini API api , python	6	3941	December 20, 2024
openai.BadRequestError: Error code: 400 when try to generate structured nested output Gemini API api	5	241	March 18, 2025
oneOf in response_schema Gemini API api	25	1714	April 13, 2025
How to do batch Inference on Prompt Image pairs with Gemini API without getting errors Gemini API gemini-15 , bug , api	1	312	May 28, 2024
A Json response should be json parsable Gemini API api	17	1126	July 4, 2024

Gemini 2.0: use a list of Pydantic objects at response schema

Related topics