How We Stop Agent Loops From Draining Your AI Budget
Autonomous AI agents are powerful. They can plan, execute, and iterate on tasks with minimal human intervention. But they have a failure mode that every team discovers the hard way: infinite retry loops.
An agent sends a request. The model returns an unexpected response. The agent retries with the same prompt. The model returns the same response. The agent retries again. And again. And again — hundreds or thousands of times before anyone notices.
We've seen this drain $50–$200 in a single session. On a weekend, with no one watching, it can be much worse.
Why Agents Loop
The most common causes across LangChain, CrewAI, AutoGPT, and custom agent frameworks:
- Parsing failures: The model returns output that doesn't match the expected format. The agent retries, hoping for a different result.
- Tool call errors: A tool returns an error. The agent tries the same tool call again with the same parameters.
- Hallucinated tool names: The model calls a tool that doesn't exist. The error message goes back to the model, which calls the same non-existent tool again.
- Reflexive “let me try again” behavior: Some models, when told their output was wrong, simply rephrase the same answer — creating an infinite feedback loop.
- Missing termination conditions: The agent has no max-iteration cap or its cap is set too high (e.g., 1,000).
Why Application-Level Fixes Aren't Enough
Most frameworks offer max_iterations or similar parameters. But these have limitations:
- They only protect one framework — if you use multiple agent systems, you need separate protections for each
- They don't protect against loops that span multiple sessions or API keys
- They're often set too high (100+ iterations) to be useful as cost protection
- They can be bypassed by agent architectures that spawn sub-agents
Gateway-level detection solves these problems because it sits below all agent frameworks. Every request passes through the same chokepoint, regardless of which framework, language, or architecture generated it.
How AISG Loop Detection Works
When a request arrives at the AISG proxy, we compute a fingerprint — a SHA-256 hash of the API key prefix, model name, and the last 3 message contents (normalized, case-insensitive).
A sliding window counter tracks how many times each fingerprint has been seen in the last 60 seconds. When the count exceeds the threshold (default: 5), the request is blocked with HTTP 429 and a 30-second cooldown.
fingerprint = SHA-256(
api_key_prefix + # "oah_abc..." → groups by credential
model_name + # "oah/llama-4-maverick"
last_3_messages # content only, lowercase, normalized
)Why “last 3 messages”?
Agents typically send the conversation history with each request. By hashing only the last 3 messages, we catch loops where the agent adds a new user message each iteration (the content is the same, just prepended with more history). But we don't false-positive on genuinely different conversations that happen to share early messages.
The Response
When a loop is detected, the client receives a clear, actionable error — not a generic 429:
{
"detail": {
"error": "recursive_loop_detected",
"message": "Blocked: identical request sent 6 times in 60s.
This usually indicates an agent retry loop.",
"fingerprint": "a3f7c2d1...",
"hit_count": 6,
"cooldown_seconds": 30
}
}The AISG Python SDK includes a dedicated LoopDetectedError exception with structured attributes (fingerprint, hit_count, cooldown_seconds) so your error handling can make intelligent decisions.
Real-Time Alerts
Loop detection events are automatically available as webhook notifications. Subscribe to the loop.detected event type to get real-time alerts in Slack, PagerDuty, or any HTTPS endpoint. Combined with budget enforcement (HTTP 402 when credits run out), you have two independent safety layers protecting your spend.
What Doesn't Trigger Detection
- Normal conversation: Users sending different messages to the same model — never affected
- Batch processing: Sending the same prompt to different models — independent counters
- Different users: Two users sending the same prompt — different API key prefixes, independent counters
- Slight variations: Adding any text variation to the prompt resets the counter
Configuration
The defaults (5 hits / 60s window / 30s cooldown) work well for most use cases. For high-volume batch processing where legitimate repetition is expected, thresholds can be adjusted via the global config.
Recursive loop protection is built into AI Security Gateway — active on every request with no configuration needed. Combined with hard budget enforcement and real-time webhook alerts, your agents can't drain your budget even if they fail. Start free or read the docs.
Join the Community