503 The service is currently unavailable when using Context caching Feature

user100 · August 8, 2024, 3:23am

I’m trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn’t returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

For more details, please refer to the following GitHub issue.

github.com/google-gemini/generative-ai-python

503 The service is currently unavailable when using Context caching Feature

opened 02:57AM - 08 Aug 24 UTC

okada1220

type:bug status:triaged component:other

### Description of the bug: I'm trying to create a cache by reading the con…tents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core. It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core. ### Code ``` import google.generativeai as genai import os gemini_api_key = os.environ.get("GEMINI_API_KEY") genai.configure(api_key=gemini_api_key) documents = [] file_list = ["xxx.pdf", "yyy.pdf", ...] for file in file_list: gemini_file = genai.upload_file(path=file, display_name=file) documents.append(gemini_file) gemini_client = genai.GenerativeModel("models/gemini-1.5-flash-001") total_token = gemini_client.count_tokens(documents).total_tokens) print(f"total_token: {total_token}") # total_token: 592403 gemini_cache = genai.caching.CachedContent.create(model=“models/gemini-1.5-flash-001”, display_name=“sample”, contents=documents) ``` ### Version - Python 3.9.19 - google==3.0.0 - google-ai-generativelanguage==0.6.6 - google-api-core==2.19.0 - google-api-python-client==2.105.0 - google-auth==2.29.0 - google-auth-httplib2==0.2.0 - google-generativeai==0.7.2 - googleapis-common-protos==1.63.0 ### Actual vs expected behavior: ### Actual behavior ``` Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "The service is currently unavailable." debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.175.234:443 {created_time:"2024-08-06T13:37:03.077186006+09:00", grpc_status:14, grpc_message:"The service is currently unavailable."}" > The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.9/site-packages/google/generativeai/caching.py", line 219, in create response = client.create_cached_content(request) File "/usr/local/lib/python3.9/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 874, in create_cached_content response = rpc( File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__ return wrapped_func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable raise exceptions.from_grpc_error(exc) from exc google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable. ``` ### Expected behavior ``` gemini_cache = genai.caching.CachedContent.create(model="models/gemini-1.5-flash-001", display_name="sample", contents=documents) print(gemini_cache) # CachedContent( # name='cachedContents/l5ataay9naq2', # model='models/gemini-1.5-flash-001', # display_name='sample', # usage_metadata={ # 'total_token_count': 592403, # }, # create_time=2024-08-08 01:21:44.925021+00:00, # update_time=2024-08-08 01:21:44.925021+00:00, # expire_time=2024-08-08 02:21:43.787890+00:00 # ) ``` ### Any other information you'd like to share? - https://ai.google.dev/gemini-api/docs/caching?lang=python#considerations > The minimum input token count for context caching is 32,768, and the maximum is the same as the maximum for the given model. (For more on counting tokens, see the [Token guide](https://ai.google.dev/gemini-api/docs/tokens)). Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the `models/gemini-1.5-flash-001` model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues. Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.

deon · August 8, 2024, 1:50pm

Yes, I ran into this issue as well. Context Caching currently only works reliably with less than 500k tokens for me personally. I’m using 1.5 Flash.

Jack_LV · February 11, 2025, 10:28pm

This issue is back for me today…

Long_Peng · April 29, 2025, 12:01am

same, for gemini-2.0-flash, either ‘Server disconnected without sending a response.’ or HTTP 503 Error

nate_walter · May 20, 2025, 11:34pm

It’s now May 20, 2025 and I’m getting this 503 error while Video Processing

Topic		Replies	Views
Constant 503's from gemini-1.5-pro API Gemini API feedback , models	6	488	February 11, 2025
Any attempt to use cached context results in 500 Internal Server error Gemini API api , model	1	183	December 25, 2024
Unreliability of Gemini API - Error while creating cache Gemini API bug	31	660	February 25, 2025
models.generateContent files with 500 Gemini API bug	2	84	July 23, 2024
Error: The model is overloaded Gemini API model	44	14237	June 4, 2025

503 The service is currently unavailable when using Context caching Feature

Related topics