How to get structured output for image input

Muhammad_Zafar · April 1, 2025, 6:03am

There’s no such example in the API docs for this case.

GUNAND_MAYANGLAMBAM · April 1, 2025, 9:15am

Hey @Muhammad_Zafar , Welcome to the forum.

Please refer the sample code below for structured output using image input

from google import genai
from pydantic import BaseModel


class Json(BaseModel):
  title: str
  description: str


client = genai.Client(api_key="GEMINI_API_KEY")

files = [
        client.files.upload(file="sample.jpg"),
    ]
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=['Give me title and description of the image.', files],
    config={
        'response_mime_type': 'application/json',
        'response_schema': Json,
    },
)
# Use the response as a JSON string.
print(response.text)

Muhammad_Zafar · April 2, 2025, 9:42am

Thank you very much for your response, is it possible with JavaScript as well?

GUNAND_MAYANGLAMBAM · April 2, 2025, 12:59pm

Yes, JavaScript SDK is supported.

Muhammad_Zafar · April 2, 2025, 1:11pm

I mean can we get an example for it as well, as I couldn’t find it in docs. Thanks.

Muhammad_Zafar · April 4, 2025, 4:01pm

Still waiting, help please

GUNAND_MAYANGLAMBAM · April 4, 2025, 4:29pm

You can try the code below:

import {
    GoogleGenerativeAI,
    SchemaType,
  } from "@google/generative-ai";
import fs from "fs";
import dotenv from 'dotenv';
dotenv.config();


const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

// Converts local file information to base64
function fileToGenerativePart(path, mimeType) {
    return {
      inlineData: {
        data: Buffer.from(fs.readFileSync(path)).toString("base64"),
        mimeType
      },
    };
  }
  
  async function run() {
    const schema = {
        description: "List of cities",
        type: SchemaType.ARRAY,
        items: {
          type: SchemaType.OBJECT,
          properties: {
            city: {
              type: SchemaType.STRING,
              description: "Name of the city",
    
              nullable: false,
            },
          },
          required: ["city"],
        },
      };
    const model = genAI.getGenerativeModel({
        model: "gemini-1.5-pro",
        generationConfig: {
          responseMimeType: "application/json",
          responseSchema: schema,
        },
      });
  
    const prompt = "List all the cities from the given image";
  
    const imageParts = [
      fileToGenerativePart("/sample.png", "image/png")
    ];
    
  
    const generatedContent = await model.generateContent([prompt,imageParts]);
    
    console.log(generatedContent.response.text());
  }
  
  run();

Muhammad_Zafar · April 4, 2025, 11:50pm

line 1: import { GoogleGenerativeAI, SchemaType } from “@google/generative-ai”;
ERROR: Module ‘“@google/generative-ai”’ has no exported member ‘SchemaType’.

line 38: responseMimeType: “application/json”,
ERROR: Object literal may only specify known properties, and ‘responseMimeType’ does not exist in type ‘GenerationConfig’

line 47: const generatedContent = await model.generateContent([prompt, imageParts]);
ERROR: Type ‘{ inlineData: { data: string; mimeType: any; }; }’ is not assignable to type ‘string | Part’.

GUNAND_MAYANGLAMBAM · April 5, 2025, 3:10am

Can you try updating to the latest version and see if it works?

npm install @google/generative-ai@0.24.0

Muhammad_Zafar · April 5, 2025, 7:00am

Fixed!! Thank you so much, I have been wanting to do this for weeks. Can we get structured output for audio/video input as well?

GUNAND_MAYANGLAMBAM · April 7, 2025, 4:19pm

Absolutely, you can get structured output for both audio and video as well.

Topic		Replies	Views
Do stream responses support structured output? Gemini API api , models	1	156	April 15, 2025
Google AI Studio - Issue with Exported Python Code – Missing Input/Output Examples Gemini API python , gemini-flash	3	80	April 1, 2025
Unable to upload files to Gemini 2.0 : File not exists in Gemini API Gemini API gemini-20	6	397	May 11, 2025
I found an error in Google AI Studio documentation for multimodal Gemini 1.5 models with images or video using curl Google AI Studio gemini-15 , api , models	0	163	December 10, 2024
A Json response should be json parsable Gemini API api	17	1134	July 4, 2024

How to get structured output for image input

Related topics