Is there a library or an online demo website that annotates the number of tokens used in each Gemini model and the segmentation results of each token? Or is it included in the development milestones?
Background:
As a Korean, I find the calculation of tokens for Korean prompting to be relatively complex.
I am working on a presentation for Build with AI, and I am looking to create an example sample that compares the TikToken approach to the Gemini Korean tokenizing method. (The cost and performance aspects of token methods for each language are very important, as they are of interest to the regional community of users.)
Thanks for your question. You may try using model.count_tokens method to measure the context length. Please see Count tokens documentation to know more.
If you are using Gemini on Vertex AI (hence google-cloud-aiplatform package), you can even get the total_billable_characters [ref]. This is useful since Gemini charges based the number of characters not tokens.
In summary:
count_tokens or total_tokens(Vertex AI) to control the context size
Thank you both for your responses! To summarize, both the Gemini API and Vertex AI offer token calculation APIs, and the basis for billing calculations is the number of characters, so separate token calculations may be omitted in billing considerations.
Now, my interest lies in the response time of the Gemini API. Structurally, the response time could be influenced by tokens. To understand this influence of tokens, I’ll need to compare the computation times for sentences in Korean and English. (Since this topic is not related to this post, I’ll create a new post for any further inquiries on the matter.)
Thank you for your information.
I noticed that model.count_tokens takes some time to finish. Is that possible to get the input and output token numbers after getting the prediction of the model? response = model.generate_content(img).
It would not be convenient to keep tracking the current used token number before and after the above line.
The API specification for Candidate (Candidate | Google AI for Developers | Google for Developers) includes the field tokenCount, type integer. This field is not marked optional. The API as specified is supposed to tell the client how many tokens the model reply represents.
The current v1beta implementation does not populate this field. Since that’s a difference between specification and implementation, it’s a bug. According to at least one Google engineer, it’s a known bug. Tagging @Josh_Gordon_Google who can likely confirm.
Hopefully the bug will be addressed before GA. As is, to properly keep track of tokens, a client has to send that last response back to the server and issue a countTokens operation to find out how many tokens were added to the history. And clients have started implementing wonderfully imaginative workarounds to avoid having the extra server traffic impact their rate quota.