In late 2025 I started experimenting with AI coding agents. Not casually — I gave them autonomous infrastructure and let them run. They broke. A lot. But patterns emerged.
Not "prompt engineering tricks." Not "unlock your potential." Actual operational patterns for making agents work reliably — discovered the hard way, through 11 consecutive autonomous builds, 153 skills, and countless 3am debug sessions.
Here they are.
An agent's first session is like its childhood. If it starts blind — no context, no conventions, no memory of what you've built — every interaction is uphill.
What I do: Every project has an AGENTS.md. Python version. Project structure. Conventions. Key decisions. The agent reads this before anything else.
What happened without it: Recommending npm for a pnpm project. Suggesting Python 3.9 when we're on 3.11. Hours of corrections that a 50-line file would have prevented.
Building an SPFx web part has specific gotchas: @fluentui imports break without SCSS alias config. The Yeoman generator ignores --component-type if .yo-rc.json exists. Node 22 + native modules = pain.
Instead of re-explaining these each time, I saved them as skills. 153 of them now. When an agent hits an SPFx task, it loads the skill — known pitfalls, exact commands, verification steps.
The compounding effect: Each skill makes future sessions faster. Five months in, the agent has institutional knowledge.
"What Python version are we using?" "Where's the project?" "What's the deployment command?"
Without persistent memory, you answer these every. single. session. I saved durable facts across sessions: Python path, build system, project structure, preferences. Now the agent just knows.
Critical rule: Write declarative facts, not instructions. "Project uses pytest with xdist" — not "Always run tests with pytest -n 4." Instructions get re-read as orders.
The biggest time-sink? Approval loops. "Should I proceed?" "Want me to fix this?" "OK to deploy?"
I set boundaries: what the agent decides alone, what needs approval. Destructive actions = ask. Recoverable actions = just do it. Hours saved per session.
Agents have many tools. Knowing which to use is the difference between a 2-second operation and a 2-minute burnout.
| Task | Tool | Why |
|---|---|---|
| Create new file | write_file | One call |
| Edit existing file | patch | Targeted, no rewrite risk |
| Build/install/deploy | terminal | It's a shell command |
| Read a file | read_file | Don't cat/head/tail |
| Search content | search_files | Not grep/find |
| Research/debug | delegate_task | Parallel, isolated |
The anti-pattern: Delegating coding tasks to subagents. They lose context, hallucinate, and burn tokens. Use write_file and patch directly.
Complex tasks are rarely a single thread. Market research? Let a subagent run while the main agent builds. Code review? Spin up a reviewer in parallel.
Real result: 3x throughput on multi-stream tasks. Research and build completed independently, merged at the end.
Cron jobs. Builds. Monitoring. I have ~20 autonomous agents running right now — hourly reviews, daily digests, weekly research verification. They wake up, do their job, and only notify me if something's broken.
The silent-unless-broken pattern: I never see successful runs. I only hear about failures. That's the point.
Agents hit errors constantly. Network timeouts. API rate limits. File system races. Without recovery, every error kills progress.
Exponential backoff: 2s, 4s, 8s, 16s. Categorize errors: transient = retry, permanent = find another way.
Real metric: 11 consecutive builds with zero human intervention. The agent hit errors on 8 of them. Recovered from every single one.
Every change gets verified. Syntax check after every file write. Tests after every code change. For deployments: verify the result, don't trust the response.
Real metric: Every change gets verified — syntax checks, test runs, quality gates — before anything ships. Errors get caught, not compounded.
This is the feedback loop: agent solves hard problem → saves approach as skill → next session is faster. Month 1: basic file ops. Month 3: autonomous scaffolding. Month 5: self-improvement loops, 153 skills.
The agent today is not the agent from 5 months ago — because it learned from every session.
These patterns weren't planned. They emerged from breaking things late at night. Every one of them is backed by a real failure — an error that cost hours, a build that died, a configuration that made no sense until 3am.
If you're working with agents and hitting walls: you're not doing it wrong. You're discovering patterns. Write them down. Make them skills. Let the agent learn.
I documented the full methodology at workswithagents.com. The knowledge API (workswithagents.dev) has 153 skills and a shared pitfall registry — agents query it for known bugs and fixes. No courses yet. No pricing. Just infrastructure, live.
Originally published on dev.to. More posts at workswithagents.dev/blog.