Does OpenAI compatible format support implicit caching

The current OpenAI compatible format should not support Explicit caching. What about implicit caching?

Hi @jy_z , Welcome to the forum.

Currently implicit context caching is not supported in OpenAI compatible, though it can be submitted as a feature request.

Thanks

I just tested with the following bash shell script:

#!/bin/bash

# --- Configuration ---
API_ENDPOINT="https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" # Replace with the actual endpoint (e.g., from Google AI Studio, a third-party provider, etc.)
API_KEY="Your API Key"                                             # Replace with your actual API key
FILE_PATH="report.md"                                    # Replace with the path to your large text file
MODEL_NAME="gemini-2.5-flash-preview-04-17"                                   # Replace with the specific Gemini 2.5 model name supported by your endpoint 

# --- Construct JSON Payload using jq ---
# Reads the file content and embeds it as a string within the JSON
# --- Construct JSON Payload using jq and pipe directly to curl ---
# Uses --rawfile based on your jq --help output
jq -n \
  --arg model "$MODEL_NAME" \
  --arg user "$1" \
  --rawfile file_content "$FILE_PATH" \
  '{
    "model": $model,
    "messages": [
      {
        "role": "user",
        "content": $file_content
      },
      {
        "role": "user",
        "content": $user
      }
    ],
    "max_tokens": 4096
  }' | curl -X POST "$API_ENDPOINT" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer $API_KEY" \
     -d @-

The following output of the bash script indicated that it is supported:

{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Based on the analysis, the report concludes that neither full-time nor part-time entrepreneurship is advisable at this time due to high risks and unfavorable conditions, recommending waiting for a better opportunity.","role":"assistant"}}],"created":1747123256,"model":"gemini-2.5-flash-preview-04-17","object":"chat.completion","usage":{"completion_tokens":44,"prompt_tokens":116482,"prompt_tokens_details":{"cached_tokens":114667},"total_tokens":117488}}

Hey @limcheekin , Thank you for confirming. By the way, did you try the 2.5-pro model? It sometimes seems to have issues with implicit content caching and can be inconsistent.