We are working on optimizing costs for our voice agent utilizing gemini-2.5-flash via the livekit.plugins.google plugin. Our system prompts are large (5,000+ tokens), so we are leveraging Gemini’s Explicit Context Caching by passing a pre-warmed cached_content ID.
However, because this is an inbound voice bot, every single phone call contains dynamic runtime variables unique to that user session (e.g., customer_name, account_balance, loan_eligibility).
If we bake these variables into the static Cache ID, we cause massive cache-miss overhead and risk variable hallucination across callers. To bypass this, we tried passing the dynamic variables inside the agent initialization’s system_instruction field alongside the cached_content ID, expecting them to blend.
Instead, the plugin completely drops the system_instruction parameter, throwing this warning:
{
"message": "dropping ['system_instruction'] from Gemini request because cached_content='projects/225719900046/locations/asia-south1/cachedContents/119712627008995328' is set; these fields must be baked into the CachedContent resource",
"level": "WARNING",
"name": "livekit.plugins.google"
}
Questions:
-
Is it a strict limitation of the Gemini API or the LiveKit integration that prevents passing runtime-appended
system_instructionrules on top of an explicitcached_contentresource? -
What is the recommended LiveKit pattern to utilize explicit context caching for the static instruction layout while still declaring dynamic session metadata safely on a per-job basis?