503 The service is currently unavailable when using Context caching Feature

I’m trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn’t returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

For more details, please refer to the following GitHub issue.

2 Likes

Yes, I ran into this issue as well. Context Caching currently only works reliably with less than 500k tokens for me personally. I’m using 1.5 Flash.

1 Like

This issue is back for me today…

1 Like

same, for gemini-2.0-flash, either ‘Server disconnected without sending a response.’ or HTTP 503 Error

1 Like

It’s now May 20, 2025 and I’m getting this 503 error while Video Processing