I am using the Google Generative AI package in my Flutter app to interact with Gemini models. I send messages using _chat.sendMessage(Content.text())
. However, I have noticed an issue with token usage metadata, specifically with the prompt token count.
Issue Explanation:
When I send a message, the prompt token count should reflect only the tokens in my latest input. However, instead of counting just the new prompt tokens, it keeps accumulating previous prompt and candidate token counts.
Here’s an example of what happens:
- First message:
- User:
"Hi"
→ Prompt token count: 1 - Model Response:
"Hello, how are you?"
→ Candidate token count: 5
- Second message:
- User:
"I am fine"
- Expected prompt token count: 3 (since
"I am fine"
has 3 tokens) - Actual prompt token count: 9 (previous prompt: 1 + previous candidate: 5 + current prompt: 3)
This means the prompt token count is increasing incorrectly, as it includes tokens from previous messages instead of just the latest prompt.
Why is this a problem?
- Inefficient token usage: The prompt token count should reflect only the current input, not past messages.
- Increased costs: Since prompt tokens contribute to billing, this leads to unnecessary extra charges.
- Unintended context behavior: It seems like the entire conversation history is being resent automatically, which I do not want.
Questions I Need Answers To:
- Why is the
_chat.sendMessage(Content.text())
method accumulating previous tokens? - Is this expected behavior or a bug in the Google Generative AI package?
- How can I ensure that only the current prompt’s tokens are counted?
- Is there a way to use context efficiently without resending previous messages?
- Will I be charged for the full accumulated prompt token count (9 in the example above), or only for the new prompt tokens (which should be 3)?
Would appreciate any insights or workarounds! If anyone can help, please reach out to me “Modified by moderator”
Thanks in advance!