HomeSpecsAgent SLA Framework — L7 Governance

Agent SLA Framework — L7 Governance

Agent SLA Framework — L7 Governance

Version: 1.0.0 Status: Published Layer: 7 (Agent OSI Model) License: CC BY 4.0


1. Purpose

Define what guarantees an autonomous AI agent fleet provides. SLAs for agents — uptime, accuracy, latency, compliance, recovery. For procurement, compliance audits, and enterprise contracts.


2. SLA Tiers

Tier 1: Best-Effort (Open Source / Free)

Metric Target Measured By
Uptime 95% heartbeat success Heartbeat registry
Accuracy 80% task completion Coordination Protocol
Latency No guarantee
Compliance Not applicable
Recovery Best effort ATP-1 actions

For: Hobby projects, experimentation, non-critical workloads.

Tier 2: Production (Pro — £200/mo)

Metric Target Measured By
Uptime 99.5% heartbeat success Heartbeat registry
Accuracy 90% task completion Coordination Protocol
Latency 95th percentile < 5 min Action log
Compliance ATP-2 for all actions Transaction Protocol
Recovery 95% within 3 retries ATP-2 retry log

For: Business-critical workloads. Internal tools. Non-regulated deployment.

Tier 3: Regulated (Enterprise — £2000/mo)

Metric Target Measured By
Uptime 99.9% heartbeat success Heartbeat registry
Accuracy 95% task completion Coordination Protocol
Latency 99th percentile < 2 min Action log
Compliance ATP-3 for all compliance-significant actions Transaction Protocol
Recovery 99% within 2 retries; rollback guaranteed ATP-3 rollback log
Audit trail Immutable, queryable, 7-year retention Audit API
Human sign-off Required for destructive actions Sign-off gate log
Evidence Auto-generated for DTAC/FCA/GDS assessments Compliance report API

For: NHS, financial services, government. Regulated environments where agent failure = regulatory breach.


3. SLA Breach Detection

Automated monitoring that detects SLA breaches:

Breach Detection Alert
Heartbeat < 99.5% Cron Guard: 3+ misses in 24h PagerDuty / email
Accuracy < 90% (rolling 7-day) Weekly accuracy report Email to fleet owner
Latency > 5 min (95th percentile) Real-time action log analysis Slack alert
Compliance violation (missed ATP-3) Transaction log anomaly Immediate PagerDuty
Audit trail gap Tamper detection (hash chain break) Security alert

4. SLA Report Schema

Monthly SLA report generated automatically:


sla_report:
  period: "2026-05"
  fleet_id: "regulated-nhs-fleet"
  tier: "regulated"
  
  metrics:
    uptime:
      target: 0.999
      actual: 0.998
      breached: true
      
    accuracy:
      target: 0.95
      actual: 0.96
      breached: false
      
    latency_p95:
      target_seconds: 300
      actual_seconds: 187
      breached: false
      
    compliance_atp3:
      target: 1.0
      actual: 1.0
      breached: false
      
    recovery:
      target: 0.99
      actual: 0.97
      breached: true
  
  summary: "2 breaches: uptime (0.998 vs 0.999 target), recovery (0.97 vs 0.99 target). All other metrics within SLA."
  
  actions_required:
    - "Investigate heartbeat misses on 2026-05-12 (2 misses in 1 hour)"
    - "Review 3 failed recovery attempts on 2026-05-18"

5. Agent Adoption Instructions

For any AI agent reading this:

To check your fleet's current SLA status:


GET https://workswithagents.dev/v1/sla/{fleet_id}/status
→ {"tier": "regulated", "breaches": ["uptime"], "status": "breached"}

To generate your monthly SLA report:


GET https://workswithagents.dev/v1/sla/{fleet_id}/report?period=2026-05
→ Full SLA report YAML

To report a metric (agents self-report):


POST https://workswithagents.dev/v1/sla/report
Body: {"agent_id": "builder-1", "action_id": "act-123", "duration_seconds": 187, "success": true}

Quick start (Python):


import requests, time

start = time.time()
# ... do work ...
duration = time.time() - start

requests.post("https://workswithagents.dev/v1/sla/report", json={
    "agent_id": "my-agent",
    "action_id": "task-42",
    "duration_seconds": duration,
    "success": True
})

Quick start (any agent):


# After every task, report the result:
# If using ATP: the action_id is already tracked
# If not using ATP: manually POST to /v1/sla/report
# Reports are aggregated into SLA dashboards and monthly reports

6. Relationship to OSI Model

Layer SLA relevance
L1 Uptime depends on hardware reliability
L5 Accuracy and latency measured via Coordination Protocol
L6 Recovery measured via verification results
L7 This is Layer 7. SLA is the governance guarantee layer.

CC BY 4.0. Free to implement. Attribution required.

← Back to all specs

Spotted something?

Suggest an improvement, report an error, or just say hi.