| *By Vilius Vystartas | May 2026* |
My agent cost me $246 in 22 minutes. It wasn't malicious. It wasn't hacked. It was doing exactly what I asked — deploying to a test environment — and it did it 47 times in a loop because a configuration file was wrong.
The model had been told "don't loop." The system prompt said "deploy once and verify." None of that mattered. The model followed the instructions it could see, the loop didn't violate any rule it knew, and the $246 was on my invoice before I noticed.
That's the Sorcerer's Apprentice problem. Not malice. Not incompetence. A tool with capabilities, no boundaries, and nobody watching.
At the time, my agent had everything from the methodology: skills, memory, decision protocols, tool composition, orchestration, pipelines, resilience, verification, compounding. It was a good agent. It booted with full project context, loaded the right skills, retried on failure, and saved what it learned.
But it had no governance. No registry of who's allowed to act. No gateway enforcing what actions are permitted. No delegation chain proving it was acting on my behalf. No audit log showing what happened, when, and why.
I'd built a reliable agent. I'd forgotten to govern what it could do.
After that $246 lesson, I traced the problem to three missing capabilities. I now consider these non-negotiable for any agent operating in a shared environment:
I didn't know my agent existed until the invoice arrived. There was no record of its identity, its capabilities, or its current status. An agent registry fixes this: every agent registers before it acts, declares what it can do, and maintains a status (active, rotating, suspended, revoked). If an unregistered agent shows up, you see it before it does damage — not after.
My agent talked directly to models and tools. No choke point. No enforcement. The system prompt was the only boundary — and as the $246 loop proved, prompts are not boundaries.
An AI Gateway sits between every agent and every resource it accesses. Every request goes through it. The gateway checks: is this agent registered? Does it have permission for this action? Is it within its rate limits? If any check fails, the request doesn't reach the model or tool. Prompts guide. The gateway enforces.
My agent acted for me. Or so it claimed. There was no proof — no token, no chain, no signed authorisation showing it was acting on my behalf.
A delegation framework gives every action a verifiable chain of authority. Human → orchestrator → build agent → deploy. Each hop carries a signed token with explicit scope: what actions, on which resources, for how long. The gateway checks this chain before every action. If the scope doesn't cover the request, it's denied.
After implementing these three capabilities, the loop happened again. Different misconfiguration, same pattern. This time:
Cost: $5.22. Time to detection: seconds (the gateway logged the rate limit violations). Fix: update the config file.
The same failure mode. Completely different outcome.
I've seen the reaction to governance proposals: "This is enterprise overhead. I just want my coding agent to work."
I get it. But governance isn't about adding friction — it's about removing the wrong kind of risk. The $246 loop wasn't caused by too little prompting. It was caused by too few boundaries. The gateway didn't slow me down. It saved me $240 and 22 minutes of wasted inference.
The calibration matters:
The mistake is treating all agents the same. Calibrate to risk.
Identity — Who is acting? (Registry + crypto binding) ↓ Boundaries — What are they allowed to do? (Gateway/PEP + policy) ↓ Monitor — What are they doing? (Real-time logging + metrics) ↓ Validate — Did they stay within policy? (Post-hoc verification + drift detection) ↓ Accountability — Who's responsible? (Audit log + attestation)
Each layer depends on the one below it. Identity without boundaries is a guest list. Boundaries without monitoring is blind enforcement. Monitoring without validation is noise. Validation without accountability is theatre.
If your agent has autonomy — meaning it can make decisions without asking you — you need governance. The level depends on what it touches. But the three capabilities (registry, gateway, delegation) are not optional for agents that act in shared environments. They're what separate "trust the agent" from "verify the agent."
The Sorcerer's Apprentice wasn't a story about a bad apprentice. It's a story about giving a tool capabilities without asking what happens when it doesn't stop. The broom worked until it didn't. The axe was in the next room.
Don't wait for the invoice to know you need an axe.
Three new specifications cover these capabilities in detail:
- Agent Registry — registry schema, status model, discovery API - AI Gateway / PEP — policy enforcement point, policy schema, enforcement modes - Delegation Framework — authority chains, delegation tokens, scope propagation
And a new methodology module covers the full governance stack with exercises and examples: Module 11: Governance.