Implicit caching is not working as expected

Context: I ran a batch job consisting of 100 requests for text extraction.

Setup Details:

  • System Prompt: Identical and very long across all 100 requests.

  • Output Schema: Identical and large across all 100 requests.

  • User Input: The only variable input (different texts for processing).

Expected Behavior: Due to the identical, complex prompt and schema, I anticipated a high cache hit rate (significantly above 75%) to minimize computational costs on the fixed components.

Observed Result:

  • Total Requests: 100

  • Cache Hits: Only 25 (25% utilization).

This low cache utilization is unexpected and inefficient, suggesting the caching mechanism is not optimally recognizing the repeated work (Prompt/Schema overhead).

Request for Clarification:

  1. What are the exact caching rules when the System Prompt and Schema are constant, but the User Input is variable?

  2. Why did only 25 requests hit the cache when the costly, non-variable components were repeated 100 times?

another topic