The current OpenAI compatible format should not support Explicit caching. What about implicit caching?
Hi @jy_z , Welcome to the forum.
Currently implicit context caching is not supported in OpenAI compatible, though it can be submitted as a feature request.
Thanks
I just tested with the following bash shell script:
#!/bin/bash
# --- Configuration ---
API_ENDPOINT="https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" # Replace with the actual endpoint (e.g., from Google AI Studio, a third-party provider, etc.)
API_KEY="Your API Key" # Replace with your actual API key
FILE_PATH="report.md" # Replace with the path to your large text file
MODEL_NAME="gemini-2.5-flash-preview-04-17" # Replace with the specific Gemini 2.5 model name supported by your endpoint
# --- Construct JSON Payload using jq ---
# Reads the file content and embeds it as a string within the JSON
# --- Construct JSON Payload using jq and pipe directly to curl ---
# Uses --rawfile based on your jq --help output
jq -n \
--arg model "$MODEL_NAME" \
--arg user "$1" \
--rawfile file_content "$FILE_PATH" \
'{
"model": $model,
"messages": [
{
"role": "user",
"content": $file_content
},
{
"role": "user",
"content": $user
}
],
"max_tokens": 4096
}' | curl -X POST "$API_ENDPOINT" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d @-
The following output of the bash script indicated that it is supported:
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Based on the analysis, the report concludes that neither full-time nor part-time entrepreneurship is advisable at this time due to high risks and unfavorable conditions, recommending waiting for a better opportunity.","role":"assistant"}}],"created":1747123256,"model":"gemini-2.5-flash-preview-04-17","object":"chat.completion","usage":{"completion_tokens":44,"prompt_tokens":116482,"prompt_tokens_details":{"cached_tokens":114667},"total_tokens":117488}}
Hey @limcheekin , Thank you for confirming. By the way, did you try the 2.5-pro model? It sometimes seems to have issues with implicit content caching and can be inconsistent.