Problem Description
I noticed a discrepancy between the total_token_count value returned by the API and the expected calculation based on usage metadata, particularly when using cache (CreateCachedContent).
Expected Behavior
I would have expected total_token_count to be calculated as the sum of non-cached tokens (prompt_token_count - cached_content_token_count) and generated tokens (candidates_token_count). This would reduce the total token count for requests using cache, reflecting a cost savings.
Current Behavior
The total_token_count value instead appears to be calculated as the full sum of prompt_token_count and candidates_token_count, ignoring the cached portion. This behavior is cost-misleading, as it doesn’t reflect the benefit of cache.
Reproduction Steps
Make a first API call with a large prompt.
Make a second call with the same prompt, ensuring the content is served from the cache.
Analyze the usage metadata from the second call.
Technical Details
API Call (cached):
response = self.client.models.generate_content(
model=self.model,
contents=[dynamic_prompt],
config=types.GenerateContentConfig(
cached_content=cache_id,
**self.model_config)
)
Example of USAGE METADATA (without cache, i.e. without the cached_content=cache_id line, in the code):
cache_tokens_details=None cached_content_token_count=None candidates_token_count=194 candidates_tokens_details=[ModalityTokenCount(
modality=<MediaModality.TEXT: ‘TEXT’>,
token_count=194
)] prompt_token_count=5202 prompt_tokens_details=[ModalityTokenCount(
modality=<MediaModality.TEXT: ‘TEXT’>,
token_count=5202
)] thoughts_token_count=None tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=5396 traffic_type=None
Example of USAGE METADATA (before using cache):
cache_tokens_details=[ModalityTokenCount(
modality=<MediaModality.TEXT: ‘TEXT’>,
token_count=4115
)] cached_content_token_count=4115 candidates_token_count=199 candidates_tokens_details=[ModalityTokenCount(
modality=<MediaModality.TEXT: ‘TEXT’>,
token_count=199
)] prompt_token_count=5332 prompt_tokens_details=[ModalityTokenCount(
modality=<MediaModality.TEXT: ‘TEXT’>,
token_count=5332
)] thoughts_token_count=None tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=5531 traffic_type=None
Expected vs. Actual Count:
Expected Count: = 1217 + 199 = 1416
Actual Count: 5332 + 199 = 5531
Impact
This behavior makes it difficult to accurately estimate the costs of API calls that use the cache, as token count optimization is not apparent in the usage data. This can lead to incorrect cost calculations and potential resource waste.