Token Count Differences between google-generativeai and OpenAI API for Gemini in Python

L_T · February 21, 2025, 7:26pm

I’m testing Gemini via both the google-generativeai python package and the OpenAI-compatible API for Gemini. I noticed that token usage seems to differ between the two methods, and I’m trying to understand why.

Has anyone looked into how token counting is handled in these two implementations? I know that I’m using PIL and base64 in the following methods but still the problem remains. Using google-generativeai the token count is around 1300, while with openai it is 3000!!

This is code using openai:

import os
import json
import tempfile
import static.prompts as p


def encode_image(image_path):
    import base64

    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')


def create_embeddings_for_image_description(s3_client, image_path, message_type):
    from openai import OpenAI

    try:
        op_client = OpenAI(
            api_key=os.environ.get("GOOGLE_API_KEY"),
            base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
        )

        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file_path = temp_file.name
            s3_client.download_file('mybucket', image_path, temp_file_path)

        try:
            base64_image = encode_image(temp_file_path)

            response = op_client.chat.completions.create(
                model="gemini-2.0-flash",
                messages=[
                    {
                        "role": "system",
                        "content": p.GENERATE_PERSPECTIVE_IMAGES
                    },
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": "What is in this image?",
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                },
                            },
                        ],
                    }
                ],
				temperature=1,
            )
            print(f"Response: {response}")
            return

        except Exception as e:
            raise
        finally:
			os.unlink(temp_file_path)
    except Exception as outer_e:
        raise

This is the code using google-generativeai:

import os
import json
import tempfile
import static.prompts as p


def create_embeddings_for_image_description(s3_client, image_path, message_type):
    import PIL.Image
    import google.generativeai as genai
    from google.generativeai.types import GenerationConfig

    try:
        generation_config = GenerationConfig(
            temperature=1,
            top_p=0.95,
            top_k=40,
            max_output_tokens=8192,
            response_mime_type="application/json"
        )

        genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
        model = genai.GenerativeModel(
            model_name='gemini-1.5-flash',
            generation_config=generation_config,
            system_instruction=p.GENERATE_PERSPECTIVE_IMAGES
        )

        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file_path = temp_file.name
            s3_client.download_file('mybucket', image_path, temp_file_path)

        try:
            img = PIL.Image.open(temp_file_path)

            response = model.generate_content([message_type, img])
            response.resolve()

            print(f"Response: {response}")
			return

        except Exception as e:
            raise
        finally:
			os.unlink(temp_file_path)
    except Exception as outer_e:
        raise

Jay · February 22, 2025, 8:03am

You are sending images as part of user content.

The multimodal techniques for transforming images into tokens, and techniques for splitting images into sections, for each provider, are naturally going to be different.

OpenAI in particular will limit any image sent to it so the maximum size of the shorter dimension is 768 pixels.

L_T · February 25, 2025, 1:20pm

Thank you.
Also I notice that using google-genai (new and recommended sdk) consumes lot more tokens than google-generativeai (the code suggested in google ai studio uses this one) for the same tasks. Do you know why?

Topic		Replies	Views
Understad token count Gemini API api , prompt	4	164	February 27, 2025
Token counts mismatch - 9x discrepancy! Gemini API bug , api	9	306	April 17, 2025
Image pricing for Gemini 2.o Flash Gemini API gemini-flash , gemini-20	2	291	March 5, 2025
Strange image token counting with gemini-2.0-flash Gemini API gemini-flash	2	159	June 6, 2025
Gemini vision pricing Gemini API api , vision , python	4	440	October 6, 2024

Token Count Differences between google-generativeai and OpenAI API for Gemini in Python

Related topics