Upload token count and input token count are not same

When I upload a PDF file (~33k tokens per count_tokens()), generate_content() fails with an INVALID_ARGUMENT error claiming the input token count exceeds the model limit (~1.2M tokens).
It seems the SDK is serializing or expanding the uploaded file differently between count_tokens() and generate_content().

2025-10-25 00:09:21,699 - httpx - INFO - HTTP Request: POST https://generativelanguage.googleapis.com/upload/v1beta/files “HTTP/1.1 200 OK”
2025-10-25 00:09:24,932 - httpx - INFO - HTTP Request: POST “HTTP/1.1 200 OK”
2025-10-25 00:09:27,655 - httpx - INFO - HTTP Request: POST “HTTP/1.1 200 OK”
Uploaded file tokens: total_tokens=33541 cached_content_token_count=None
2025-10-25 00:09:27,658 - google_genai.models - INFO - AFC is enabled with max remote calls: 10.
2025-10-25 00:10:03,628 - httpx - INFO - HTTP Request: POST “HTTP/1.1 400 Bad Request”
Traceback (most recent call last):
File “/Users/anshulkumar/backfin/tet.py”, line 22, in
response = client.models.generate_content(model=“gemini-2.5-flash-lite”,
contents=[uploaded_files,prompt])
File “/Users/anshulkumar/backfin/.venv/lib/python3.13/site-packages/google/genai/models.py”, line 5202, in generate_content
response = self._generate_content(
model=model, contents=contents, config=config
)
File “/Users/anshulkumar/backfin/.venv/lib/python3.13/site-packages/google/genai/models.py”, line 4178, in _generate_content
response_dict = self._api_client.request(
‘post’, path, request_dict, http_options
)
File “/Users/anshulkumar/backfin/.venv/lib/python3.13/site-packages/google/genai/_api_client.py”, line 755, in request
response = self._request(http_request, stream=False)
File “/Users/anshulkumar/backfin/.venv/lib/python3.13/site-packages/google/genai/_api_client.py”, line 684, in _request
errors.APIError.raise_for_response(response)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File “/Users/anshulkumar/backfin/.venv/lib/python3.13/site-packages/google/genai/errors.py”, line 101, in raise_for_response
raise ClientError(status_code, response_json, response)
google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {‘error’: {‘code’: 400, ‘message’: ‘The input token count exceeds the maximum number of tokens allowed 1237083.’, ‘status’: ‘INVALID_ARGUMENT’}}

1 Like

Hello,

Hello! Welcome to the forum!!!

both are handling PDF are different way, It looks like generate_content() is processing the full Text, Structure, and Layout of the PDF, which adds a massive amount of tokens. count_tokens() likely isn’t accounting for that deep structural data, hence the lower estimate.

For context, here are the official docs:

Understand and Count Tokens Confirms that count_tokens is an estimate, while generate_content reports actual consumption including all overhead.

Document Understanding Guide Details how PDFs are parsed via native vision to extract structure and layout.

Gemini API Developer Guide Explains how parameters like media_resolution affect token usage for capturing fine details.

Hope this helps!