Hi @Volodya_Bochek,
“Persistent Session” usually refers to Implicit Caching (automatic), whereas passing content once to refer to it later is Explicit Caching (manual).
“Explicit Caching” is distinct from a “Chat Session.” Explicit caching uses a cached_content ID, while a session uses history.
With the Gemini API’s explicit caching feature, you can pass large content to the model once to create a cached object. For all subsequent requests, you simply refer to those cached tokens, allowing the model to reuse the data without re-processing the entire instruction.
For Example:
from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_API_KEY")
model_name = "gemini-1.5-flash"
# 1. Create the cache (Pass content once)
cache = client.caches.create(
model=model_name,
config=types.CreateCachedContentConfig(
display_name="system_instruction_cache",
system_instruction="INSERT_YOUR_LARGE_TEXT_CONTENT_HERE",
ttl_seconds=3600, # Cache lives for 1 hour
)
)
# 2. Use the cache (Refer to it in subsequent requests)
response = client.models.generate_content(
model=model_name,
contents="Based on the cached instructions, please analyze this.",
config=types.GenerateContentConfig(
cached_content=cache.name # This refers to the stored tokens
)
)
print(f"Tokens reused from cache: {response.usage_metadata.cached_content_token_count}")