Gemini 2.0: use a list of Pydantic objects at response schema

I have an image and I want to use Gemini 2 Flash to detect the bounding boxes of the different objects in the image. I’d like the output to be a list of objects (see the Object class in the code bellow).

The problem I have is that I cannot figure out how to use Pydantic along side response_schema to make it work. Here is the code:

import os

from pydantic import BaseModel, RootModel

from typing import List

from google import genai
from google.genai import types
from PIL import Image


class Object(BaseModel):
    label: str
    box_2d: list[int]


class ObjectsList(RootModel):
    root: List[Object]


image = Image.open(SOME_IMAGE_PATH)

prompt = "Detect objects box 2d."

client = genai.Client(api_key=os.environ["API_KEY"])

response = client.models.generate_content(
    contents=[image, prompt],
    model="gemini-2.0-flash-exp",
    config=types.GenerateContentConfig(
        response_mime_type= 'application/json',
        response_schema=ObjectsList,
    ),
)

This results in a Extra inputs are not permitted [type=extra_forbidden, input_value={'Object': {'properties':...2d'], 'type': 'OBJECT'}}, input_type=dict] error.

Instead of passing ObjectsList to response_schema, I first tried passing list[Object] or List[Object] with no luck.

Is it possible to use Pydantic to have a list of objects as output?

1 Like

Pydantic actually works with Gemini 2.0 (response_schema). Please check the code below which works for me. The error message is not coming from Gemini 2.0.

import os

from google import genai
from google.genai.types import GenerateContentConfig

from PIL import Image
from pydantic import BaseModel

os.environ["API_KEY"] = <YOUR_API_KEY>

client = genai.Client(api_key=os.environ["API_KEY"])


image = Image.open(SOME_IMAGE_PATH)
prompt = "Detect objects box 2d."

class Object(BaseModel):
    label: str
    box_2d: list[int]
    
response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=[image, prompt],
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=Object,
    ),
)

print(response.text)

Your solution does not work. Maybe I was not clear in my initial message / question.

To be more precise, your solution does something (with no error) which is different from what I’ve asked. What I’ve asked was:

Is it possible to use Pydantic to have a list of objects as output?

The Object has 2 fields: label (which is a string) and box_2d (which is a list of integers). By using response_schema=Object in the GenerateContentConfig, you’re asking Gemini to return you only one Object. Here is the output of your solution:

{
  "box_2d": [
    53,
    285,
    283,
    436
  ],
  "label": "wallet"
}

Gemini’s output is a JSON with one object, which is exactly what you’ve asked.
But it’s not a list of objects, hence it does not answer my problem.

@Vincent_Garcia Thank you for the clarification.

It appears this is not supported out-of-the-box in the SDK. Please refer to this GitHub issue which has a workaround. Alternatively, you can refer to this code below:

import os
import json

from pydantic import BaseModel, RootModel, TypeAdapter
from typing import List

from google import genai
from google.genai import types
from PIL import Image

class Object(BaseModel):
    label: str
    box_2d: list[int]

class ObjectsList(RootModel):
    root: List[Object]

def get_schema(cls: BaseModel):
    """
    Converts a Pydantic model to a JSON schema dictionary.
    """
    schema = cls.model_json_schema()
    if "$defs" not in schema:
        return schema

    defs = schema.pop("$defs")

    def _resolve(schema):
        if "$ref" in schema:
            ref = schema.pop("$ref")
            schema.update(defs[ref.split("/")[-1]])
        if "properties" in schema:
            for prop in schema["properties"].values():
                _resolve(prop)
        if "items" in schema:
            _resolve(schema["items"])
        schema.pop("title",None)

    _resolve(schema)
    return schema

SOME_IMAGE_PATH = "Cajun_instruments.jpg"

image = Image.open(SOME_IMAGE_PATH)

prompt = "Detect objects box 2d."


response = client.models.generate_content(
    contents=[image, prompt],
    model="gemini-2.0-flash-exp",
    config=types.GenerateContentConfig(
        response_mime_type='application/json',
        response_schema=get_schema(ObjectsList),  # Use the get_schema function here
    ),
)

# Load JSON string and validate with Pydantic
obj = TypeAdapter(ObjectsList).validate_python(json.loads(response.text))
print(obj)

Output:

1 Like

I see that you are using a JSON schema here (which is the solution I’m using too). I understand that Pydantic cannot be used for this simple use case. Thanks for your answer.