2026-05-26
The Sorcerer's Apprentice Problem: Why Every Autonomous Agent Needs Governance
| *By Vilius Vystartas | May 2026* |
My agent cost me $246 in 22 minutes. It wasn't malicious. It wasn't hacked. It was doing exactly what I asked — deploying to a test environment — and it did it 47 times in a loop because a configuration file was wrong.
The model had been told "don't loop." The system prompt said "deploy once and verify." None of that mattered. The model followed the instructions it could see, the loop didn't violate any rule it knew, and the $246 was on my invoice before I noticed.
That's the Sorcerer's Apprentice problem. Not malice. Not incompetence. A tool with capabilities, no boundaries, and nobody watching.
What I Had Then
At the time, my agent had everything from the methodology: skills, memory, decision protocols, tool composition, orchestration, pipelines, resilience, verification, compounding. It was a good agent. It booted with full project context, loaded the right skills, retried on failure, and saved what it learned.
But it had no governance. No registry of who's allowed to act. No gateway enforcing what actions are permitted. No delegation chain proving it was acting on my behalf. No audit log showing what happened, when, and why.
I'd built a reliable agent. I'd forgotten to govern what it could do.
The Three Things Every Agent Needs Before It Can Act
After that $246 lesson, I traced the problem to three missing capabilities. I now consider these non-negotiable for any agent operating in a shared environment:
1. An Agent Registry
I didn't know my agent existed until the invoice arrived. There was no record of its identity, its capabilities, or its current status. An agent registry fixes this: every agent registers before it acts, declares what it can do, and maintains a status (active, rotating, suspended, revoked). If an unregistered agent shows up, you see it before it does damage — not after.
2. An AI Gateway
My agent talked directly to models and tools. No choke point. No enforcement. The system prompt was the only boundary — and as the $246 loop proved, prompts are not boundaries.
An AI Gateway sits between every agent and every resource it accesses. Every request goes through it. The gateway checks: is this agent registered? Does it have permission for this action? Is it within its rate limits? If any check fails, the request doesn't reach the model or tool. Prompts guide. The gateway enforces.
3. A Delegation Framework
My agent acted for me. Or so it claimed. There was no proof — no token, no chain, no signed authorisation showing it was acting on my behalf.
A delegation framework gives every action a verifiable chain of authority. Human → orchestrator → build agent → deploy. Each hop carries a signed token with explicit scope: what actions, on which resources, for how long. The gateway checks this chain before every action. If the scope doesn't cover the request, it's denied.
What Changes When These Exist
After implementing these three capabilities, the loop happened again. Different misconfiguration, same pattern. This time:
- The registry recorded the agent's status and capabilities. The agent was registered as a "staging deployer" — not an "anything deployer."
- The gateway had a rate limit: one deployment per 10 minutes per agent. After the first deploy, the gateway rejected the second for 9 minutes 59 seconds. The loop ran one deployment and stopped.
- The delegation chain proved the agent was acting on my behalf for staging deploys. The audit log showed exactly one deploy — and 47 blocked attempts.
Cost: $5.22. Time to detection: seconds (the gateway logged the rate limit violations). Fix: update the config file.
The same failure mode. Completely different outcome.
Governance Is Not Bureaucracy
I've seen the reaction to governance proposals: "This is enterprise overhead. I just want my coding agent to work."
I get it. But governance isn't about adding friction — it's about removing the wrong kind of risk. The $246 loop wasn't caused by too little prompting. It was caused by too few boundaries. The gateway didn't slow me down. It saved me $240 and 22 minutes of wasted inference.
The calibration matters:
- Sandbox agents — local coding assistants, no production access. Minimal governance: maybe a registry, no gateway needed.
- CI/CD agents — they deploy to staging, run tests, create PRs. Registry + basic gateway with rate limits.
- Production agents — they touch live data, deploy to prod, interact with customers. Full stack: registry, gateway, delegation, audit.
- Compliance agents — handling financial transactions or health data. Everything. Signed attestations per session, 7-year audit retention, approved-models-only policies.
The mistake is treating all agents the same. Calibrate to risk.
The Stack
Identity — Who is acting? (Registry + crypto binding) ↓ Boundaries — What are they allowed to do? (Gateway/PEP + policy) ↓ Monitor — What are they doing? (Real-time logging + metrics) ↓ Validate — Did they stay within policy? (Post-hoc verification + drift detection) ↓ Accountability — Who's responsible? (Audit log + attestation)
Each layer depends on the one below it. Identity without boundaries is a guest list. Boundaries without monitoring is blind enforcement. Monitoring without validation is noise. Validation without accountability is theatre.
What This Means for You
If your agent has autonomy — meaning it can make decisions without asking you — you need governance. The level depends on what it touches. But the three capabilities (registry, gateway, delegation) are not optional for agents that act in shared environments. They're what separate "trust the agent" from "verify the agent."
The Sorcerer's Apprentice wasn't a story about a bad apprentice. It's a story about giving a tool capabilities without asking what happens when it doesn't stop. The broom worked until it didn't. The axe was in the next room.
Don't wait for the invoice to know you need an axe.
Three new specifications cover these capabilities in detail:
- Agent Registry — registry schema, status model, discovery API - AI Gateway / PEP — policy enforcement point, policy schema, enforcement modes - Delegation Framework — authority chains, delegation tokens, scope propagation
And a new methodology module covers the full governance stack with exercises and examples: Module 11: Governance.