Token Count Differences between google-generativeai and OpenAI API for Gemini in Python

I’m testing Gemini via both the google-generativeai python package and the OpenAI-compatible API for Gemini. I noticed that token usage seems to differ between the two methods, and I’m trying to understand why.

Has anyone looked into how token counting is handled in these two implementations? I know that I’m using PIL and base64 in the following methods but still the problem remains. Using google-generativeai the token count is around 1300, while with openai it is 3000!!

This is code using openai:

import os
import json
import tempfile
import static.prompts as p


def encode_image(image_path):
    import base64

    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')


def create_embeddings_for_image_description(s3_client, image_path, message_type):
    from openai import OpenAI

    try:
        op_client = OpenAI(
            api_key=os.environ.get("GOOGLE_API_KEY"),
            base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
        )

        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file_path = temp_file.name
            s3_client.download_file('mybucket', image_path, temp_file_path)

        try:
            base64_image = encode_image(temp_file_path)

            response = op_client.chat.completions.create(
                model="gemini-2.0-flash",
                messages=[
                    {
                        "role": "system",
                        "content": p.GENERATE_PERSPECTIVE_IMAGES
                    },
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": "What is in this image?",
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                },
                            },
                        ],
                    }
                ],
				temperature=1,
            )
            print(f"Response: {response}")
            return

        except Exception as e:
            raise
        finally:
			os.unlink(temp_file_path)
    except Exception as outer_e:
        raise

This is the code using google-generativeai:

import os
import json
import tempfile
import static.prompts as p


def create_embeddings_for_image_description(s3_client, image_path, message_type):
    import PIL.Image
    import google.generativeai as genai
    from google.generativeai.types import GenerationConfig

    try:
        generation_config = GenerationConfig(
            temperature=1,
            top_p=0.95,
            top_k=40,
            max_output_tokens=8192,
            response_mime_type="application/json"
        )

        genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
        model = genai.GenerativeModel(
            model_name='gemini-1.5-flash',
            generation_config=generation_config,
            system_instruction=p.GENERATE_PERSPECTIVE_IMAGES
        )

        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file_path = temp_file.name
            s3_client.download_file('mybucket', image_path, temp_file_path)

        try:
            img = PIL.Image.open(temp_file_path)

            response = model.generate_content([message_type, img])
            response.resolve()

            print(f"Response: {response}")
			return

        except Exception as e:
            raise
        finally:
			os.unlink(temp_file_path)
    except Exception as outer_e:
        raise

You are sending images as part of user content.

The multimodal techniques for transforming images into tokens, and techniques for splitting images into sections, for each provider, are naturally going to be different.

OpenAI in particular will limit any image sent to it so the maximum size of the shorter dimension is 768 pixels.

1 Like

Thank you.
Also I notice that using google-genai (new and recommended sdk) consumes lot more tokens than google-generativeai (the code suggested in google ai studio uses this one) for the same tasks. Do you know why?