Historically with multi-turn conversations (and with other providers), I’ve passed the entire conversation (with all assistant, user, and tool outputs) back to the chat when initializing a new turn.
With Gemini, there are thought signatures (https://ai.google.dev/gemini-api/docs/thought-signatures) which suggests that some amount of memory and reasoning is retained inside this opaque token.
I noticed I can omit the tool outputs and Gemini is still able to answer questions about data returned in turns prior, likely referencing the thought signatures I sent (although it does seem slower in doing so)
Can anyone from the Google team recommend the correct approach here? Should I use thought signatures + memory tool to enable the model to read from memory when it thinks it needs to?
Are there any tradeoffs not documented with using thought signatures?
Thanks!