HomeSpecsAgent Transaction Protocol (ATP) — Layer 7 Governance

Agent Transaction Protocol (ATP) — Layer 7 Governance

Agent Transaction Protocol (ATP) — Layer 7 Governance

Version: 1.0.0 Status: Published Layer: 7 — Governance (Agent OSI Model) License: CC BY 4.0


1. Purpose

The Agent Transaction Protocol defines guarantees for autonomous agent actions. When an agent says "I'll deploy this," what actually happens? Did it deploy? Did it deploy only once? Can it undo? Who's accountable?

Message queues solved this for distributed systems (at-least-once, at-most-once, exactly-once delivery). This protocol solves it for agent actions — the things agents DO, not just the messages they send.


2. The Problem

Without guarantees:

Scenario What happens today What should happen
Agent deploys to production "I think it deployed" — no confirmation Confirmation with deployment ID, rollback available
Agent charges a customer No idempotency — double charge possible Idempotency key ensures single charge
Agent deletes data No undo — data is gone Soft delete with recovery window
Agent A and Agent B both act No trace of who did what Every action attributed to a specific agent and session
Compliance audit "Trust us, the agents did the right thing" Immutable audit trail of every action

3. Design Principles

  • Intent before action. Every action is logged BEFORE execution (intent) and AFTER (confirmation). If the agent crashes mid-action, we know it intended to do something.
  • Idempotency by default. Every action carries a unique idempotency key. Replaying the same key produces the same result.
  • Rollback where possible. Every action declares whether it's reversible and provides a rollback hook if so.
  • Attribution always. Every action carries agent_id, session_id, and human_approver (if required).
  • Compliance-readable. Actions are logged in a format that auditors can query without understanding agent internals.

4. Action Lifecycle


INTENDED → APPROVED (if required) → EXECUTING → COMPLETED | FAILED | ROLLED_BACK
                                                       ↓
                                                  ROLLING_BACK → ROLLED_BACK
State Meaning
INTENDED Agent declared intent. Logged. Not yet executed.
APPROVED Human or compliance gate approved. Ready to execute.
EXECUTING Action is in progress.
COMPLETED Action succeeded. Confirmation logged.
FAILED Action failed. Error logged. May be retried with same idempotency key.
ROLLING_BACK Undo in progress.
ROLLED_BACK Undo complete. System returned to pre-action state.

5. Guarantee Levels

Actions declare one of three guarantee levels:

5.1 Best-Effort (ATP-1)

"Try to do this. If it fails, log it and move on."

Use for: Non-critical actions. Research queries. Health checks. Logging.

Characteristics:

  • No idempotency required (but recommended)
  • No rollback
  • Fire-and-forget
  • Logged but not guaranteed

Example: Agent queries a search API. If it fails, it tries a different API.

5.2 At-Least-Once (ATP-2)

"This must happen. Retry until it does."

Use for: Deployments. Data writes. Notifications. Any action where doing it twice is safe (idempotent) or where NOT doing it is worse than doing it twice.

Characteristics:

  • Idempotency key REQUIRED
  • Automatic retry with exponential backoff
  • Rollback optional
  • Confirmation logged on success

Example: Agent deploys a web part to SharePoint. If the deploy fails (network timeout), retry with same idempotency key.

5.3 Exactly-Once (ATP-3)

"This must happen exactly once. If it can't, roll back everything."

Use for: Billing. Data deletion. Compliance-significant actions. Regulated industry operations.

Characteristics:

  • Idempotency key REQUIRED
  • Pre-action snapshot (state before change)
  • Rollback hook REQUIRED
  • Human approval REQUIRED (Layer 7 governance gate)
  • Immutable audit trail entry
  • SLO: 99.9% exactly-once guarantee

Example: Agent processes a payment. Before charging: snapshot balance. After charging: log transaction with idempotency key. If anything fails: rollback via refund hook.


6. Action Schema


atp_version: "1.0.0-draft"
action_id: "act-20260505-001"
idempotency_key: "deploy-project-dashboard-v3"
guarantee_level: "atp-2"        # atp-1 | atp-2 | atp-3

# Who
agent_id: "hermes-spfx-builder"
session_id: "sess-abc123"
fleet_id: "works-with-agents-fleet-1"
human_approver: null             # required for ATP-3

# What
action:
  verb: "deploy"
  target: "sharepoint-app-catalog"
  description: "Deploy SPFx web part 'ProjectDashboard' v3.2.1 to production app catalog"
  parameters:
    package_path: "/build/project-dashboard.sppkg"
    tenant: "contoso.sharepoint.com"
    site: "/sites/appcatalog"
    
# Safety
reversible: true
rollback:
  verb: "remove"
  description: "Remove deployed package from app catalog"
  parameters:
    package_id: "project-dashboard"
    tenant: "contoso.sharepoint.com"

# Timing
created_at: "2026-05-05T21:00:00Z"
deadline: "2026-05-05T21:05:00Z"  # action fails if not completed by deadline
retry_policy:
  max_retries: 3
  backoff: "exponential"          # exponential | linear | fixed
  initial_delay_seconds: 2
  max_delay_seconds: 60

# Compliance (Layer 7)
compliance:
  audit_trail_id: "audit-20260505-001"
  regulation_refs: []             # e.g., ["GDPR-Art-17", "DTAC-2.1.3"]
  data_classification: "internal" # public | internal | confidential | restricted
  sign_off_required: false
  sign_off_by: null
  evidence_required: false

# Result (populated after execution)
result:
  state: "COMPLETED"
  completed_at: "2026-05-05T21:01:15Z"
  duration_seconds: 75
  retry_count: 1
  confirmation:
    package_id: "project-dashboard-v3.2.1"
    deployed_to: "https://contoso.sharepoint.com/sites/appcatalog"
  error: null

7. Audit Trail

Every action produces an immutable audit trail entry:


audit_entry:
  audit_id: "audit-20260505-001"
  action_id: "act-20260505-001"
  timestamp: "2026-05-05T21:01:15Z"
  agent_id: "hermes-spfx-builder"
  session_id: "sess-abc123"
  verb: "deploy"
  target: "sharepoint-app-catalog"
  guarantee_level: "atp-2"
  result_state: "COMPLETED"
  duration_seconds: 75
  human_approver: null
  regulation_refs: []
  hash: "sha256:a1b2c3d4..."    # hash of entire action record for tamper detection
  previous_hash: "sha256:..."   # chain to previous entry (append-only log)

Queryable by:

  • Agent ID (what did agent X do?)
  • Time range (what happened between 2pm and 3pm?)
  • Verb (show all "deploy" actions)
  • Regulation ref (show all GDPR-relevant actions)
  • Guarantee level (show all ATP-3 actions that required human approval)

8. Integration with Other Layers

Layer How ATP is used
L1 (Execution) Actions execute on Layer 1 resources
L2 (Communication) ATP actions may trigger L2 messages (notify on complete)
L3 (Discovery) Agent identity and capability from L3
L4 (Session) Session ID links action to the handoff chain
L5 (Coordination) Coordination decisions logged as ATP actions
L6 (Verification) Test results logged as ATP actions for audit
L7 (Governance) This is Layer 7. ATP is the governance execution layer.

9. Regulated Industry Extension (ATP-R)

For NHS, financial services, and government:

Additional fields:


compliance:
  regulation_refs: ["DTAC-2.1.3", "GDPR-Art-32"]
  dpia_ref: "DPIA-2026-001"           # Data Protection Impact Assessment
  clinical_safety_ref: "CS-2026-042"   # NHS clinical safety case
  smr_accountable: "[email protected]" # Senior Managers Regime (FCA)
  gds_assessment_ref: "GDS-Alpha-2026"  # Government Digital Service
  retention_policy: "7_years"           # How long audit trail is kept
  deletion_policy: "hard_delete_30d"    # When action data is purged

10. Implementation Notes

Minimum viable implementation

  1. Intent log — simple SQLite table. Log every action BEFORE execution.
  2. Idempotency keys — UUID per action. Check before executing.
  3. Confirmation log — update intent record after execution.
  4. Query endpoint — GET /v1/audit?agent_id=X&time_from=Y

That's 200 lines of Python. Full protocol is 500 lines.

Integration with existing systems

  • Hermes agents: Add --audit flag to terminal() calls. Actions logged automatically.
  • Cron jobs: Each cron run is an ATP action. Success/failure logged.
  • Deployments: Deploy actions get ATP-2 guarantees with idempotency.
  • Regulated environments: ATP-3 for all compliance-significant actions.

11. Status & Roadmap

Component Status
Protocol spec Draft 1.0.0
Action schema Complete
Guarantee levels (ATP-1,2,3) Defined
Audit trail schema Defined
Regulated extension (ATP-R) Draft
Reference implementation Not yet built
Audit trail API Not yet built

Next: Build the audit trail as a SQLite table + query API. Integrate with Hermes terminal() tool for automatic action logging.


This document is part of the Agent OSI Model framework. See the OSI Model document for the full 7-layer architecture.

← Back to all specs

Spotted something?

Suggest an improvement, report an error, or just say hi.