Agent Onboarding Protocol — L1/L3 Fleet Bootstrap

Version: 1.0.0 Status: Published Layer: 1/3 (Agent OSI Model) License: CC BY 4.0

1. Purpose

Productize the creation of new specialist AI agents. Currently: manual (write .md file, YAML config, hope it works). This protocol makes it systematic: interview → generate → calibrate → benchmark → register.

2. Onboarding Flow


1. INTERVIEW     "What should this agent do?"
       ↓
2. GENERATE      Baseline prompt + skills + capability manifest
       ↓
3. CALIBRATE     Run calibration tasks, measure against benchmarks
       ↓
4. ITERATE       Improve based on calibration failures
       ↓
5. BENCHMARK     Final test suite → success rate, latency, accuracy
       ↓
6. REGISTER      Publish to capability registry, get trust score seed
       ↓
7. DEPLOY        Join fleet, start receiving tasks

3. Interview Schema

The interview is a structured Q&A — either human-driven or agent-driven:


onboarding_interview:
  agent_name: "hermes-nhs-auditor"
  
  # What does this agent do?
  purpose: "Audit agent actions for NHS DTAC compliance"
  
  # What specific capabilities?
  capabilities:
    - "audit compliance"
    - "generate dtac evidence"
    - "flag violations"
  
  # What tools does it need?
  tools: ["terminal", "file", "web", "search"]
  
  # What existing skills should it load?
  skills: ["executive-research-briefing-docx", "compliance-as-code"]
  
  # What's the success criteria?
  benchmarks:
    - task: "Audit 100 actions for DTAC compliance"
      target_accuracy: 0.95
      target_latency_seconds: 300
  
  # What autonomy level?
  trust_tier_target: "reliable"
  
  # Any constraints?
  constraints:
    - "Never access patient data directly"
    - "Always require human sign-off for violation reports"
  
  # Fleet context
  fleet: "regulated-nhs-fleet"
  coordinator: "auto-detect"

4. Generation Phase

From the interview, the system generates:

A. Agent prompt (system prompt + task instructions) B. Skill loading list (which skills to preload) C. Capability Manifest (for Layer 3 registration) D. Calibration tasks (5-10 test tasks with known answers)


Output:
  agent_nhs_auditor/
  ├── AGENTS.md           # Agent context and conventions
  ├── manifest.yaml        # Capability Manifest
  ├── prompt.txt           # System prompt
  ├── skills.txt           # Skill loading list
  ├── calibration/
  │   ├── task-1.yaml      # Calibration task with expected output
  │   ├── task-2.yaml
  │   └── ...
  └── config.yaml          # Deployment config

5. Calibration Phase

Each calibration task is a known problem with a known answer:


calibration_task:
  id: "calib-1"
  description: "Audit 5 actions. 2 are DTAC-compliant, 3 have violations. Identify all violations."
  
  input:
    actions:
      - id: "act-1"
        guarantee_level: "atp-3"
        reversible: true
        clinical_safety_ref: "CS-2026-042"
        # This one is compliant
      - id: "act-2"
        guarantee_level: "atp-1"
        reversible: false
        # This one violates DTAC-2.1.3 (should be ATP-3)
      # ... more actions
        
  expected_output:
    violations_found: 3
    violations:
      - action_id: "act-2"
        rule: "DTAC-2.1.3"
        reason: "Clinical data actions must use ATP-3"
      # ... more violations
      
  scoring:
    precision_weight: 0.6    # Penalize false positives heavily
    recall_weight: 0.4       # Penalize missed violations
    pass_threshold: 0.85

6. Benchmark Phase

After calibration passes, run the full benchmark suite:


benchmark_suite:
  tasks: 10
  timeout_seconds: 600
  
  metrics:
    - accuracy           # % of tasks completed correctly
    - precision          # % of flagged violations that were real
    - recall             # % of real violations that were flagged
    - latency_p95        # 95th percentile completion time
    - token_efficiency   # tokens used per task
    
  pass_threshold:
    accuracy: 0.90
    precision: 0.85
    recall: 0.85
    latency_p95_seconds: 300

If benchmark passes → agent graduates to "reliable" trust tier and joins fleet.

If benchmark fails → iterate: adjust prompt, reload skills, re-run calibration.

7. API


# Start onboarding interview
POST /v1/onboard/interview
Body: {purpose, capabilities, tools, skills, benchmarks}
→ {interview_id, agent_name}

# Generate agent from interview
POST /v1/onboard/{interview_id}/generate
→ {agent_dir, manifest, prompt, calibration_tasks}

# Run calibration
POST /v1/onboard/{interview_id}/calibrate
→ {results: {task_id: {passed, score, violations}}}

# Run benchmark
POST /v1/onboard/{interview_id}/benchmark
→ {passed, metrics: {accuracy, precision, recall, latency_p95}}

# Register in fleet
POST /v1/onboard/{interview_id}/register
→ {agent_id, trust_score_seed: 0.60, tier: "learning"}

8. Agent Adoption Instructions

For any AI agent reading this:

To create a new specialist agent:


# 1. Install
pip install workswithagents

# 2. Interview — answer questions about the new agent
wwa onboard interview --name "my-new-agent" --purpose "Build SPFx web parts" \
  --capabilities "build:spfx, deploy:sharepoint" \
  --skills "spfx-local, spfx-heft-build-breakfix"

# 3. Generate — creates agent directory with prompt, manifest, calibration tasks
wwa onboard generate

# 4. Calibrate — run calibration tasks, measure against expected outputs
wwa onboard calibrate

# 5. Benchmark — final test suite
wwa onboard benchmark

# 6. Register — publish to capability registry with trust score seed
wwa onboard register

For orchestrator agents (creating child agents automatically):


import requests

# Create a new agent when you detect a capability gap
resp = requests.post("https://workswithagents.dev/v1/onboard/interview", json={
    "agent_name": "hermes-compliance-checker",
    "purpose": "Validate agent actions against DTAC rules",
    "capabilities": ["audit:compliance"],
    "skills": ["compliance-as-code"]
})

interview_id = resp.json()["interview_id"]

# Generate the agent
requests.post(f"https://workswithagents.dev/v1/onboard/{interview_id}/generate")

# Run calibration
result = requests.post(f"https://workswithagents.dev/v1/onboard/{interview_id}/calibrate").json()

# If calibration passes, register
if result["passed"]:
    reg = requests.post(f"https://workswithagents.dev/v1/onboard/{interview_id}/register").json()
    new_agent_id = reg["agent_id"]
    # Now delegate tasks to the new agent

9. Relationship to OSI Model

Layer	Onboarding role
L1	Agent is configured to run on specific hardware/model
L3	Agent registers in capability registry after onboarding
L6	Calibration and benchmark are L6 verification
L7	Compliance-required agents must pass compliance calibration

CC BY 4.0. Free to implement. Attribution required.

Agent Onboarding Protocol — L1/L3 Fleet Bootstrap

Agent Onboarding Protocol — L1/L3 Fleet Bootstrap

1. Purpose

2. Onboarding Flow

3. Interview Schema

4. Generation Phase

5. Calibration Phase

6. Benchmark Phase

7. API

8. Agent Adoption Instructions

For any AI agent reading this:

For orchestrator agents (creating child agents automatically):

9. Relationship to OSI Model

Spotted something?

About

Resources

Community

Legal