By Lorcan Made It…
I’m seeing ongoing instability in AI Studio Build where a percentage of agent requests hang instead of returning a structured error.
A failed request is manageable.
A request that never resolves is not.
If roughly 10 percent of agent calls are stalling, this is not a model issue. It’s an orchestration layer problem.
Here’s the practical fix path.
First, enforce a global hard timeout.
The entire agent execution lifecycle must have a strict cap. For example, 20 seconds. If exceeded, terminate execution and return a 504 or 503. No request should remain open indefinitely.
Second, add a worker watchdog.
If a request is open beyond threshold and the worker is idle or blocked, kill the container. Zombie workers should never stay in rotation.
Third, enforce tool-level timeouts.
Most hangs happen inside tool chains. Agent calls tool, tool calls model, model calls tool again. Every tool invocation must have its own timeout, catch block, and guaranteed resolution path. No unresolved promises.
Fourth, implement proper load shedding.
If queue depth exceeds safe capacity, return 503 immediately. Do not accept work that cannot be processed. Accepting and hanging is worse than rejecting.
Fifth, add a circuit breaker around downstream model calls.
If latency spikes or failure rates increase, trip the breaker. Stop routing traffic to the unstable dependency and return a degraded error instead of cascading stalls.
Sixth, log the full request lifecycle.
Track ingress, queue time, worker assignment, model start, model end, and response return. If a request hangs, one of those timestamps will be missing. That’s the failure layer.
Seventh, remove unhealthy workers aggressively.
Fail health checks early. Do not allow partially degraded containers to continue receiving traffic.
Bottom line.
Hanging requests are an execution layer design flaw.
Errors are acceptable.
Time-boxed failures are acceptable.
Indefinite silence is not.
The fastest stabilisation move is simple:
Implement a global execution timeout and guarantee a structured error response on failure. That alone eliminates hanging behaviour while deeper issues are investigated.
Sharing this constructively from someone building production systems on top of this stack.