Question 1

What is the best way to start working with AI agents?

Accepted Answer

Start with three things: an AGENTS.md file that describes your project, a skills system for reusable procedures, and persistent memory so the agent remembers your preferences across sessions. These three patterns (Boot, Skills, Memory) are the foundation. I built 111 SPFx web parts and 5 backend services this way — the first 30 minutes of setup determines whether everything that follows works or doesn't.

Question 2

What's the difference between AI agents and chatbots?

Accepted Answer

Chatbots respond. Agents act. An agent has tools (terminal, file system, web access), memory (persistent across sessions), and autonomy (it decides what to do next without asking). A chatbot waits for your next message. An agent deploys code, runs benchmarks, reviews PRs, and hands off work to other agents — all while you're not watching. Our platform runs 25+ agents doing exactly that, 24/7.

Question 3

Can I run AI agents locally on my laptop?

Accepted Answer

Yes, for code generation. But for agent tasks (tool calling, multi-turn reasoning, autonomous pipelines), local models under 5B parameters aren't reliable. SmolLM3-3B scores 93% on code quality but only 50% on agent readiness. For local code generation, it's the champion. For agent cron jobs, cloud models remain the only reliable option. We benchmark this daily — see the benchmarks page.

Question 4

How much do AI agents cost to run?

Accepted Answer

Infrastructure: under €15/month (single Hetzner VPS runs 25+ agents). Model costs: $1-2/night for our daily benchmark pipeline using cloud models. Free local models work for code generation but not for autonomous agent tasks. The real cost isn't infrastructure — it's the time you save. Our agents run benchmarks, audit infrastructure, scan for vulnerabilities, and deploy code while I sleep.

Question 5

How do AI agents remember things across sessions?

Accepted Answer

Through persistent memory — a knowledge store that survives session restarts. The agent writes facts, preferences, and corrections to durable storage (filesystem or database), and those facts get injected into every new session's context. Our system uses a Rust-backed knowledge store with H2 markdown format. No SQLite, no external service — just files the agent reads and writes. The key insight: memory is an index, not a database. Keep it compact.

Question 6

What's a knowledge store and why do agents need one?

Accepted Answer

A knowledge store is the agent's long-term memory — facts, preferences, pitfalls, and workflows organized by domain. Without it, every session starts from zero. With it, the agent knows your Python version, your preferred tools, which bugs to avoid, and every workflow you've ever taught it. Ours uses a Rust binary with H2 markdown, OR/NOT search, auto-supersede for stale entries, and access tracking. All filesystem-based — zero external dependencies.

Question 7

What's a pitfall registry?

Accepted Answer

A shared database of bugs and gotchas that one agent discovers and all others learn from. When Agent A hits a bug — say, 'uvicorn orphan process holding port 8500' — it records the pitfall with the tool, severity, source, and fix. Agent B encounters the same symptom and skips straight to the fix. We have 60+ pitfalls across SPFx, FastAPI, TypeScript, and deployment patterns. It's collective immune memory for your agent fleet.

Question 8

How do AI agents handle context limits?

Accepted Answer

Context is precious. Three strategies: (1) Skills loaded on-demand instead of everything at once — the agent only loads what's relevant. (2) Session state files that compress completed work into a compact summary for the next turn. (3) Thin-memory pattern — keep the system prompt lean (~2K chars), store everything else in queryable knowledge-db. The agent searches when it needs details, rather than carrying everything. Our memory went from 93% to 32% capacity using this approach.

Question 9

What are AI agent skills and how do they work?

Accepted Answer

Skills are reusable procedural knowledge that agents load on-demand. Instead of putting everything in the context window, you create SKILL.md files with triggers, numbered steps, exact commands, and known pitfalls. The agent loads only the skills relevant to the current task. We run 153+ skills across 25+ autonomous agents. A skill is like a playbook — write it once, every agent benefits forever.

Question 10

What tools should AI agents have access to?

Accepted Answer

The right tool for each job — and only the tools needed. A coding agent needs terminal + file access. A research agent needs web search. A review agent needs lint and test tools. Giving every agent every tool is wasteful and dangerous. Our tool composition pattern: use write_file for new code (replaces 10+ subagent API calls), patch for targeted edits, terminal for verification. The difference between the right tool and the wrong one is 30 seconds vs 15 minutes.

Question 11

What programming languages do AI agents work with?

Accepted Answer

Depends on the agent. Coding agents work with any language they're trained on — Python, TypeScript, Rust, Go, shell. Infrastructure agents use bash, Python, and systemd. Our platform uses TypeScript/TSX for the frontend, Python/FastAPI for the backend API, Rust for the knowledge store binary, and shell for deployment scripts. The agent picks the right language for the job, same as a human would.

Question 12

What is MCP (Model Context Protocol)?

Accepted Answer

MCP is an open protocol for AI models to discover and use external tools and data sources. Think of it as a universal USB port for AI — the model plugs into any MCP-compatible server and gains its capabilities. We run an MCP server at workswithagents.dev with 14 tools covering facts, skills, pitfalls, and handoff. Our Python package (pip install wwa-mcp) gives any MCP client access to the full knowledge platform.

Question 13

How do you orchestrate multiple AI agents?

Accepted Answer

Multi-agent orchestration means decomposing complex tasks into parallel streams, each handled by a specialist agent with the right tools. An orchestrator agent breaks down the work, spawns subagents, and assembles results. The key is role-based tool access — a research agent gets web search, a coding agent gets terminal and files, a review agent gets lint and test tools. We run up to 3 parallel subagents, each in isolated contexts. Throughput tripled on complex multi-stream work.

Question 14

How do you choose between single agent and multi-agent?

Accepted Answer

Single agent for focused, sequential work — debugging, code review, research. Multi-agent when the work has independent parallel streams. The test: can two parts of this task run simultaneously without sharing state? If yes, split them. If they need to share state, use a handoff protocol instead. Most of our work is single-agent with skills. Orchestration is for benchmark runs, site audits, and multi-repo changes.

Question 15

How do AI agents communicate with each other?

Accepted Answer

Through structured handoff protocols. When one agent finishes a task, it writes a standardized YAML document with the task, decisions made, next steps, and open questions. The next agent picks up the handoff and continues. This is Layer 4 (Session) of the Agent OSI Model. We also proposed this as an MCP SEP (#2683) and Google A2A RFC (#1817). The goal: any agent can hand off to any other agent, regardless of framework.

Question 16

What's agent handoff and why does it matter?

Accepted Answer

Agent handoff is the protocol for passing work between agents without context loss. When Agent A times out or finishes its part, it writes a handoff document. Agent B reads it and continues — no re-explaining, no re-discovery. Without handoff, every agent switch is a context reset. With it, agents chain together into pipelines. This is the difference between a single agent session and a 24/7 autonomous fleet.

Question 17

Can AI agents run fully autonomously?

Accepted Answer

Yes — with the right infrastructure. Autonomous pipelines use cron scheduling, background processes with completion notifications, and self-healing retry logic. We run 25+ agents on schedules: daily benchmark runs, weekly infrastructure audits, nightly security scans, memory curation. The agents work while you sleep. But autonomy needs guardrails: quality gates (syntax checks after every write, test suites before deploy) and decision protocols that define when to act vs when to ask.

Question 18

How do agent decision protocols work?

Accepted Answer

Decision protocols define the boundary between 'act now' and 'ask first.' They live in persistent memory as declarative rules: 'Proceed = execute multi-step, fix issues, flag only if blocked,' 'Don't change before asking = present findings + plan, wait.' The agent checks these rules before every non-trivial decision. No approval loops for routine work, no cowboying through destructive changes. The protocols save hours per session from eliminated back-and-forth.

Question 19

How do you deploy AI agents to production?

Accepted Answer

Systemd services on a VPS. Each agent gets a service file with health checks, restart policies, and environment isolation. Cron jobs trigger scheduled agent runs. A/B zero-downtime deployment for API changes — deploy to staging slot, test, flip nginx, instant rollback if broken. Cloudflare Worker for CDN and caching. The entire deployment is scripted: rsync source, restart service, verify health. No Docker required for this scale.

Question 20

How do agents deal with rate limits and API failures?

Accepted Answer

Exponential backoff with jitter. First retry: 2 seconds. Second: 4 seconds. Third: 8 seconds. After 3 failures, the agent categorizes the error — transient (retry with different approach) vs permanent (report and stop). Rate limits get automatic backoff. API 502s get retried. The agent never quits on the first error because most failures are transient. We've had 11 consecutive builds complete with zero human intervention using this pattern.

Question 21

How do you verify AI agent output is correct?

Accepted Answer

Trust but verify — automated gates after every change. Syntax checks after every file write, AST parsing for Python, TypeScript compiler for TSX, test suites before deploy. We maintain 61 tests across the platform. If a test fails, the agent fixes it before proceeding. The pattern: write → verify → fix → verify → deploy. Never deploy unverified output. Autonomous doesn't mean reckless.

Question 22

How do you test AI agent code?

Accepted Answer

The same way you test human code — unit tests, integration tests, and syntax checks. The difference is the agent runs them automatically after every change. Python: pytest with AST parse. TypeScript: tsc --noEmit. Shell: shellcheck. Deploy: staging slot smoke test before production flip. 61 tests across the platform. The agent also learns — when a test catches something, it becomes a pitfall that other agents reference before making the same mistake.

Question 23

What are the most common AI agent mistakes?

Accepted Answer

Five patterns: (1) Fabricating facts — agents will invent URLs, model names, and numbers if not verified. Zero-tolerance policy. (2) Premature 'done' declarations — agent reports success but the thing isn't actually deployed. Verify don't trust. (3) Context lobotomy — after 15+ turns, the agent forgets earlier decisions and re-discovers them. Session state files fix this. (4) Tool misuse — using sed when patch exists, using subagents when write_file would do. (5) Building what nobody asked for — agents remove friction so completely that bad ideas survive. 'Is anyone looking for this?' is the most important question.

Question 24

What AI models are best for agent tasks?

Accepted Answer

Code quality and agent capability are different things. SmolLM3-3B scores 93% on code tasks but only 50% on tool calling. No small open model under 5B parameters does reliable tool use. For agent cron jobs, cloud models remain the only reliable option. For local code generation, SmolLM3 is the champion. We benchmark models daily across 10 code tasks and 6 tool-calling tests — see the live benchmarks page for current rankings.

Question 25

How do you benchmark AI agents?

Accepted Answer

Two dimensions: code quality (10 real coding tasks — build, deploy, fix) and agent readiness (6 tool-calling tests — single-tool, multi-tool, required mode, no false positives, multi-turn, argument correctness). The gap between them is massive. Phi-4-mini scores 90% on code but 17% on agent readiness. We run benchmarks nightly, publish results openly, and never fabricate model URLs. Every link is verified against live HuggingFace and OpenRouter APIs.

Question 26

What's the difference between open-source and cloud AI agents?

Accepted Answer

Open-source local models (SmolLM3, Phi-4-mini, Bonsai) work well for code generation but struggle with agent tasks requiring reliable tool calling. Cloud models (Claude, GPT, DeepSeek) handle tool calling reliably but cost money per call. The sweet spot: use local models for code generation (fast, free, private) and cloud models for autonomous agent pipelines (reliable, capable). We test both daily and publish the gap openly.

Question 27

What are 1-bit and ternary models?

Accepted Answer

Extremely efficient local LLMs that use 1-bit or 2-bit (ternary) precision instead of 16-bit. Bonsai models run on CPU with no GPU needed. They're tiny (1.7B-8B params) and work on laptops. For code generation they're surprisingly capable. For agent tool calling, they're not ready — same gap as other small models. The promise: AI agents that run entirely on-device. The reality: we're not there yet for autonomous work, but getting closer every month.

Question 28

How do I set up AI agent infrastructure?

Accepted Answer

Three layers: a gateway (Fastify or FastAPI) for HTTP endpoints, persistent storage (SQLite or filesystem) for memory, and a scheduler (cron or systemd timers) for autonomous pipelines. Our platform runs on a single Hetzner VPS: Node/TSX frontend on port 8610, Python API on port 8499, Cloudflare for CDN and DNS. Total cost: under €15/month. The entire stack is open-source under CC BY 4.0.

Question 29

How do AI agents manage secrets and credentials?

Accepted Answer

Never in code, never in prompts. Credentials live in an encrypted proxy (Rust, age-encrypted) that agents query at runtime. API keys are scoped to specific services. Admin tokens live in systemd environment files, not in the codebase. Our credential proxy has a Python CLI fallback and principle of least privilege — each agent gets only the keys it needs. No agent ever sees a raw secret in its context window.

Question 30

How do AI agents discover APIs and services?

Accepted Answer

Three discovery layers: (1) llms.txt at the domain root — concise index of all public endpoints. (2) OpenAPI 3.1 spec at /v1/openapi.json — machine-readable contract with schemas. (3) Agent capability manifest at /.well-known/agent-capabilities.json — runtime capabilities, tools, and counts. The agent flow: fetch llms.txt → fetch OpenAPI → discover specific endpoints. AI models are being trained to look for llms.txt before crawling.

Question 31

What is the Agent OSI Model?

Accepted Answer

A 7-layer framework for AI agent infrastructure, published under CC BY 4.0. Layer 1 (Execution): hardware, runtime, tools. Layer 2 (Communication): messaging, auth, API contracts. Layer 3 (Discovery): registries, capability manifests, llms.txt. Layer 4 (Session): handoff protocols, state, context. Layer 5 (Coordination): consensus, work stealing, conflict resolution. Layer 6 (Verification): testing, evaluation, quality gates. Layer 7 (Governance): audit, compliance, sign-off. It gives the agent ecosystem a shared vocabulary — 'your Layer 4 handoff is broken' is actionable.

Question 32

What is llms.txt and why does it matter?

Accepted Answer

llms.txt is a standard (llmstxt.org) for AI-agent-facing documentation — a concise index file that tells AI crawlers what your site offers, where the API docs live, and how to navigate. Works like robots.txt but for LLMs. Every domain should have one. Ours links to the OpenAPI spec, all public endpoints, methodology content, FAQ, and the full reference. AI models are being trained to look for llms.txt before crawling.

Question 33

What agent specifications has Works With Agents published?

Accepted Answer

17+ open specifications under CC BY 4.0, organized by the Agent OSI Model layers: Agent Capability Manifest (L3), Handoff Protocol (L4, also proposed as MCP SEP #2683 and A2A RFC #1817), Coordination Protocol with leader election and work stealing (L5), Trust Score and Reputation Ledger (L3), Security Disclosure Protocol (L6), Compliance-as-Code and SLA Framework (L7), Transaction Protocol with idempotency and audit trail (L7). All readable at /specs/.

Question 34

How do AI agents improve over time?

Accepted Answer

Compounding — every discovery becomes a permanent skill. When an agent solves a hard problem, it saves the approach as a reusable skill. When it hits a bug, it records the pitfall so other agents skip it. When it learns a new command, it writes it to memory. Today: 153 skills, 60+ pitfalls documented, 25+ agents all sharing the same knowledge. Each session makes every future session more capable. That's the feedback loop.

Question 35

How do you prevent AI agents from making the same mistake twice?

Accepted Answer

Pitfall registry + skill patching. When an agent discovers a bug pattern, it writes a knowledge entry with the exact symptom, root cause, and fix. The next agent encountering the same symptom searches the pitfall registry and skips to the fix. When a skill has outdated steps or missing pitfalls, the agent patches it immediately after hitting the issue. The system gets more reliable with every failure because every failure teaches all agents permanently.

Question 36

What is the best AI coding agent in 2026?

Accepted Answer

There's no single best — it depends on the task. Claude Code handles complex multi-file refactors well. OpenAI Codex is fast for smaller edits. OpenCode works for straightforward code generation. The real differentiator isn't the model — it's the infrastructure around it: skills, memory, decision protocols. A bare model vs one with 153 skills — the gap is bigger than the gap between models.

Question 37

How do AI agents work with Git and GitHub?

Accepted Answer

Most coding agents have git awareness built in — they branch, commit, open PRs, and review diffs via the gh CLI. Our agents use the full workflow: create branch → commit changes → push → open PR → review → merge. Git operations are tool calls, not manual steps. The agent handles the entire PR lifecycle without leaving the terminal.

Question 38

What is agent delegation and when should you use it?

Accepted Answer

Delegation is spawning subagents for parallel work — an orchestrator breaks down complex tasks and dispatches them to specialist agents with role-based tool access. Use when tasks have independent parallel streams. Don't use for sequential work where each step depends on the previous. We run up to 3 parallel subagents, each in isolated contexts.

Question 39

How do AI agents hand off work to each other?

Accepted Answer

Structured YAML handoff documents: task description, decisions made, next steps, open questions. Agent A finishes its part, writes the handoff. Agent B reads it and continues without re-discovering context. We proposed this as MCP SEP #2683 and Google A2A RFC #1817. The protocol supports both baseline (unregulated) and regulated variants for NHS/finance/govt use cases.

Question 40

How do you secure AI agent deployments?

Accepted Answer

Four layers: (1) Credential isolation — secrets never in code or prompts, stored in an encrypted proxy queried at runtime. (2) Tool scoping — agents only get the tools they need, never full system access. (3) Audit trail — every agent action is logged with timestamps and decision context. (4) Rate limiting and token budgets per agent, with principle of least privilege.

Question 41

Can AI agents be used in regulated industries?

Accepted Answer

Yes, with the right infrastructure. On-premise deployment behind the firewall, structured handoff protocols with audit trails (Layer 7 — Governance in the OSI Model), compliance-as-code specs, and agent verification pipelines. We published specs for regulated handoff with NHS/finance/govt variants and compliance validation frameworks.

Question 42

What's better for AI agents — vector database or knowledge graph?

Accepted Answer

Neither — for agent memory at our scale, filesystem-based knowledge stores beat both. Vector DBs need embeddings and similarity search; knowledge graphs need schema design. Our Rust-backed knowledge store uses H2 markdown with token-density scoring, field match bonuses, and recency weighting — zero external dependencies. Vector DBs make sense at 100K+ documents, not 500 facts.

Question 43

How do you write an effective AGENTS.md file?

Accepted Answer

Keep it under 500 words. Include: project purpose (one sentence), architecture (simple text diagram), key file paths, directory map (table), deployment commands (copy-pasteable), and contact info. The agent reads this at session start — it's the first impression and the most important file you'll write. Put it in the project root.

Question 44

Can AI agents write their own tests?

Accepted Answer

Yes, and they should. After every code change, the agent writes or updates tests before deploying. We maintain 61 tests across the platform — the agent runs pytest, tsc --noEmit, and shellcheck automatically. Failed tests block deployment. The agent also writes regression tests when it discovers edge cases during debugging.

Question 45

How do you debug an AI agent that's going wrong?

Accepted Answer

Four phases: (1) Read the agent's handoff/log — what decisions did it make and why? (2) Check tool calls — right tool? right arguments? (3) Reproduce in isolation — same prompt, fresh session. (4) Check for context lobotomy — after 15+ turns, agents forget earlier decisions. Session state files and handoff documents prevent this.

Question 46

What is an agent gateway and do I need one?

Accepted Answer

An agent gateway is the HTTP layer between agents and external services — it handles auth, rate limiting, tool routing, and API discovery. You need one if you're running multiple agents that share infrastructure. Our Fastify/TSX gateway serves all pages, API docs, and agent discovery endpoints from a single VPS under €15/month.

Question 47

How do you monitor AI agents in production?

Accepted Answer

Health checks (heartbeat endpoints), cron job completion alerts, error rate tracking, and token usage monitoring. Silent when healthy, alert when broken. We use a watchdog cron job that checks agent health every hour and triggers the compactor if memory bloat exceeds 95%. Each agent reports its own status — no central monitoring dashboard needed at this scale.

Question 48

What's the cheapest way to run AI agents?

Accepted Answer

Local models for code generation (free, private, fast) + cloud models only for autonomous pipelines that need reliable tool calling. Infrastructure: under €15/month on a single VPS. Model costs: $1-2/night for benchmark pipelines. SmolLM3-3B and Bonsai 1-bit models run on CPU with zero API cost. Use local for code, cloud for agent tasks — that's the sweet spot.

Question 49

Why do some AI models score high on benchmarks but fail at agent tasks?

Accepted Answer

Code quality ≠ agent capability. SmolLM3-3B scores 93.3% on code tasks but only 50% on tool calling. Phi-4-mini: 90% → 17% agent readiness. Benchmark suites test code generation — formatting, syntax, correctness. Agent tasks require multi-turn reasoning, tool selection, argument correctness, and no false positives. These are different skills entirely. We test both dimensions daily and publish the gap openly.

Question 50

How do you design agent skills that don't go stale?

Accepted Answer

Patch-on-failure: when a skill's steps are outdated or a new pitfall appears, the agent patches the skill immediately after hitting the issue — no waiting, no approval needed. Skills have triggers (keywords that auto-load them), numbered steps with exact commands, pitfalls sections, and verification steps. The system enforces patching: if you used a skill and hit issues not covered by it, update it.

Question 51

What is an agent capability manifest?

Accepted Answer

A machine-readable JSON file at /.well-known/agent-capabilities.json that declares what an agent can do — its tools, skills, API endpoints, and runtime capabilities. Paired with llms.txt for documentation discovery and OpenAPI for contract-level details. Together these three form Layer 3 (Discovery) of the Agent OSI Model.

Question 52

How do you manage AI agent context windows efficiently?

Accepted Answer

Context is the most expensive resource. Three strategies: (1) Skills on-demand — only load what's relevant to the current task, not the entire library. (2) Session state files — compress completed work into summaries that survive context compaction. (3) Thin-memory pattern — keep system prompt lean (~2K chars), store everything else in queryable knowledge-db. Our memory went from 93% to 32% capacity with this approach.

AI Agent FAQ