How to manage prompt/context complexity v. 504 Deadline

Hi,

I observed that for prompts of a certain complexity (a few hundred words) applied to large contexts (500K tokens) I frequently received 504 Deadline errors. I asked how to manage deadline, but got no answer. As I experimented I realized that reducing either prompt complexity or context size made the 504s go away.

There must be some rough mathematics that can be applied so we can figure out whether a given inference is likely to be accomplished within deadline. (I say “rough” because length of prompt != inferential complexity.)

Can someone provide insight on this?

Hi @Fred_Zimmerman

I’d recommend testing with higher timeouts using request options:
e.g.

response = model.generate_content(request,
                                  request_options={"timeout": 600})
1 Like