Context: I ran a batch job consisting of 100 requests for text extraction.
Setup Details:
-
System Prompt: Identical and very long across all 100 requests.
-
Output Schema: Identical and large across all 100 requests.
-
User Input: The only variable input (different texts for processing).
Expected Behavior: Due to the identical, complex prompt and schema, I anticipated a high cache hit rate (significantly above 75%) to minimize computational costs on the fixed components.
Observed Result:
-
Total Requests: 100
-
Cache Hits: Only 25 (25% utilization).
This low cache utilization is unexpected and inefficient, suggesting the caching mechanism is not optimally recognizing the repeated work (Prompt/Schema overhead).
Request for Clarification:
-
What are the exact caching rules when the System Prompt and Schema are constant, but the User Input is variable?
-
Why did only 25 requests hit the cache when the costly, non-variable components were repeated 100 times?
