Input TPM limit on the models(except 2.0 Flash, which is 1M) is 250K on the free tier.
Does that mean that context length via the API on the free tier can never reach more than 250K? Does history in the request count towards TPM quota?
Input TPM limit on the models(except 2.0 Flash, which is 1M) is 250K on the free tier.
Does that mean that context length via the API on the free tier can never reach more than 250K? Does history in the request count towards TPM quota?
Further testing showed that I can go over 250K context and not triggering Input TPM limit.
How is TPM limit calculated, then? I’m really confused.
Hello,
As you mentioned the free tier limit is 250k tokens per minute as specified in rate limit documentation. You should get an error once this limit exceeds.
Yes, thanks, I’ve read the documentation.
Although, I can process well over 250K with 2.5 models on the free key(which, according to docs, are 250K TPM) in a single request.
Clearly, that means that either:
Screenshot shows that the video(the museum example from AI studio) well exceeds 250K and was prompted in a single request.
Would you mind using the token counting methods below to compare the results? I see you’re already using method 2.
Method 1:
client.models.count_tokens(
model="model_name", contents=[your_content]
)
Method 2:
response = client.models.generate_content(
model="model_name", contents=[your_content]
)
print(response.usage_metadata)
Sure.
Just as before, 30m example video from AI studio was used(American Museum of Natural History Tour - 30m - Google for Developers (360p, h264).mp4
).
Explore a natural history museum: towering dinosaur skeletons, detailed animal dioramas, and diverse exhibits on evolution and geology.
Usage metadata: cache_tokens_details=None cached_content_token_count=None candidates_token_count=23 candidates_tokens_details=None prompt_token_count=531016 prompt_tokens_details=[ModalityTokenCount( modality=<MediaModality.TEXT: 'TEXT'>, token_count=16 ), ModalityTokenCount( modality=<MediaModality.VIDEO: 'VIDEO'>, token_count=473400 ), ModalityTokenCount( modality=<MediaModality.AUDIO: 'AUDIO'>, token_count=57600 )] thoughts_token_count=304 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=531343 traffic_type=None
Tokens counting: total_tokens=531016 cached_content_token_count=None
Code used:
# Was ran in a colab notebook.
# %pip install -U -q "google-genai>=1.16.0"
# !wget [link truncated] -O huge1.mp4 -q
from google import genai
from google.genai import types
from IPython.display import Markdown
import time
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
MODEL_ID = "gemini-2.5-flash"
client = genai.Client(api_key=GOOGLE_API_KEY)
def upload_video(video_file_name):
video_file = client.files.upload(file=video_file_name)
while video_file.state == "PROCESSING":
print('Waiting for video to be processed.')
time.sleep(10)
video_file = client.files.get(name=video_file.name)
if video_file.state == "FAILED":
raise ValueError(video_file.state)
print('Video processing complete: ' + video_file.uri)
return video_file
huge_video = upload_video('huge1.mp4')
prompt = "Summarize this video, be short, up to 20 words."
video = huge_video
response = client.models.generate_content(
model=MODEL_ID,
contents=[
video,
prompt,
]
)
tokens = client.models.count_tokens(model=MODEL_ID, contents=[video,prompt])
Markdown(f"{response.text}\n\nUsage metadata: {response.usage_metadata}\n\n Tokens counting: {tokens}")
Hello,
We reproduced your code and had the same observations as you, we will discuss this with our internal team and get back to you with more information.
Thank you for your patience.