Structured output from API using ResponseSchema, am I doing it correct?

Hi guys,

I’m new to coding and need to use Gemini for my work. I’m trying to get a JSON output that only includes the CMO, COO, CEO, or CFO. Is this code good enough, or are there any ways I can improve it?

import time
import pandas as pd
from pydantic import BaseModel
import enum

client = genai.configure(api_key='xxxxxx')
model = genai.GenerativeModel("gemini-1.5-pro-latest")

# Define the status options
class Status(enum.Enum):
    PROMOTION = "Promotion"
    DEMOTION = "Demotion"
    NO_CHANGE = "No Change"
    LATERAL_MOVE = "Lateral Move"

# Define the Pydantic model for structured output
class JobChange(BaseModel):
    Status: Status
    Position: str | None  # Only CMO, CFO, COO, or CEO for promotions, otherwise null

# Define the function to query Gemini
def analyze_job_change_with_limit(prev, curr):
    if pd.isna(prev) or pd.isna(curr):
        return {"Status": "No Change", "Position": None}
    if prev == curr:
        return {"Status": "No Change", "Position": None}

    prompt = (
        "Detect hierarchical changes between two job titles. Identify if the change is a promotion, demotion, lateral move, or no change. "
        "For promotions, return only 'CMO', 'CFO', 'COO', or 'CEO' in the 'Position' field. Otherwise, return null.\n\n"
        f"Previous Title: {prev}\n"
        f"Current Title: {curr}"
    )

    response = model.generate_content(
        prompt,
        generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema=JobChange
        ),
    )

    # Enforce rate limiting to comply with 15 RPM
    time.sleep(4)

    return response.text

# Example usage:
result = analyze_job_change_with_limit("Vice President of Marketing and President of Operations", "Chief Operations Officer")
print(result)

Hi @UnlockMatrix , Welcome to the forum.

Your code looks fine. You can comment out the below lines of code. Your code will handle that automatically.

Thanks a lot for your response!

Also, I have few doubts. I need to do this over 41000 rows. If my input token is around 100 tokens and output token is around 30 tokens.

Total tokens used: 5,330,000

Input cost: 0.41 USD (for 4,100,000 tokens at $0.10 per 1M)
Output cost: 0.492 USD (for 1,230,000 tokens at $0.40 per 1M)

Total cost will be only 0.902 USD? Or am I making some mistake?

Second thing:
Do you recommend that I should find a way to reduce the API calls? Or is it fine considering the cost?

This seems to be correct. For exact pricing for each model, you can follow this link.