How to get structured output for image input

There’s no such example in the API docs for this case.

Hey @Muhammad_Zafar , Welcome to the forum.

Please refer the sample code below for structured output using image input

from google import genai
from pydantic import BaseModel


class Json(BaseModel):
  title: str
  description: str


client = genai.Client(api_key="GEMINI_API_KEY")

files = [
        client.files.upload(file="sample.jpg"),
    ]
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=['Give me title and description of the image.', files],
    config={
        'response_mime_type': 'application/json',
        'response_schema': Json,
    },
)
# Use the response as a JSON string.
print(response.text)
1 Like

Thank you very much for your response, is it possible with JavaScript as well?

Yes, JavaScript SDK is supported.

I mean can we get an example for it as well, as I couldn’t find it in docs. Thanks.

Still waiting, help please :slightly_smiling_face:

You can try the code below:

import {
    GoogleGenerativeAI,
    SchemaType,
  } from "@google/generative-ai";
import fs from "fs";
import dotenv from 'dotenv';
dotenv.config();


const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

// Converts local file information to base64
function fileToGenerativePart(path, mimeType) {
    return {
      inlineData: {
        data: Buffer.from(fs.readFileSync(path)).toString("base64"),
        mimeType
      },
    };
  }
  
  async function run() {
    const schema = {
        description: "List of cities",
        type: SchemaType.ARRAY,
        items: {
          type: SchemaType.OBJECT,
          properties: {
            city: {
              type: SchemaType.STRING,
              description: "Name of the city",
    
              nullable: false,
            },
          },
          required: ["city"],
        },
      };
    const model = genAI.getGenerativeModel({
        model: "gemini-1.5-pro",
        generationConfig: {
          responseMimeType: "application/json",
          responseSchema: schema,
        },
      });
  
    const prompt = "List all the cities from the given image";
  
    const imageParts = [
      fileToGenerativePart("/sample.png", "image/png")
    ];
    
  
    const generatedContent = await model.generateContent([prompt,imageParts]);
    
    console.log(generatedContent.response.text());
  }
  
  run(); 
1 Like

line 1: import { GoogleGenerativeAI, SchemaType } from “@google/generative-ai”;
ERROR: Module ‘“@google/generative-ai”’ has no exported member ‘SchemaType’.

line 38: responseMimeType: “application/json”,
ERROR: Object literal may only specify known properties, and ‘responseMimeType’ does not exist in type ‘GenerationConfig’

line 47: const generatedContent = await model.generateContent([prompt, imageParts]);
ERROR: Type ‘{ inlineData: { data: string; mimeType: any; }; }’ is not assignable to type ‘string | Part’.

1 Like

Can you try updating to the latest version and see if it works?

npm install @google/generative-ai@0.24.0
1 Like

Fixed!! Thank you so much, I have been wanting to do this for weeks. Can we get structured output for audio/video input as well?

1 Like

Absolutely, you can get structured output for both audio and video as well. :grinning_face: