This might be one of those questions that’s so simple to answer it’s blown completely over my head, but alas I’m drawing a blank.
How exactly can I ping any of the API endpoints (via REST) to ensure they return a status code 200 before I do stuff?
There have been several instances now where I’ve received an error code 500 (internal server error), which mucks with my game logic. It would be nice to check for this code before I run anything to it. That way, I can lock features until the ping returns 200s again.
I’m also assuming I don’t get charged per ping, right? And that I can send a reasonable amount of pings without actually causing issues? Ideally, I’d like to call it once per frame in my game, hence why I’m asking.
A practical approach is: when you get a 500, resubmit the exact same request that caused it. Treat it similar to the 429, you can retry the call. Chances are, your request will be randomly routed to a different, healthy server that can service your request.
Put a cap on the number of automatic retries, of course.
And you don’t get charged for things that didn’t work, the 400’s or 403’s or 500’s. They don’t get billed.
Ah, that might actually work. It feels like more of a bandaid, but for now, it should do the trick.
Lol, I love you boo, but imagine running that @ 1000 FPS . Or even just 100 FPS. Even if I scaled my game to force it into 60 FPS, that’s a lot. Now imagine this per player. It won’t scale well.
In a perfect world, it would still be nice to have a constant run-time check, but retrying the call will do for now. Thanks!
A ping at those rates is pretty unreasonable, and wouldn’t improve your gameplay a whole lot. If you assume a ping response time of 10ms (which is generous for a UDP ping - and the API is TCP), you’re down to 100 FPS. You absolutely don’t want to limit your game play to an external feature check.
@OrangiaNebula has the right answer - attempt a request and use exponential backoff if you get an error situation. And don’t tie this to the main game activity thread.
For general uptime checks, you can run the “Hi” prompt every minute or so. For tokens per minute output rate measurements, you would run other prompts that generate tokens, and a good performance metric is output tokens per minute … this approximation is better with smaller input prompts that generate a lot of output tokens.
But for errors, you will have to retry, exponential backoff as said above, and then wait, or you will have to accept this and “drop a frame”, or whatever, like actual video games.
But I think the delay from making the call through an API and coming back is multiple seconds at least. And so your are hundreds or thousands of frames behind. This is true for even local inferences.
You would have to build a small effecient local model to get the kind of frame rates you want. I know NVIDIA had a demo of some such a little while back, but that’s local stuff we’re talking.
As for making the API call with “Hi”, this is the only way to check that there isn’t an internal model error, or some other internal issue. Most of the time, the API endpoint/network layer is just fine. It’s the internal code/network that crashes. I know tangentially because I see these when I built my own internal API endpoints … it always points to some “internal server error”, which for me, is a bug in the code.
But if you still want to see if the general enpoint is alive, without getting into the internals, you can try something like an empty GET request. But not all API’s are REST, or have a GET method associated with them.