Six agent protocols launched in the last year. Everyone's obsessing over model selection. The operating surface around the model is what actually breaks.
Google I/O opened today with a flood of agent demos. Prompts becoming apps. Vibe coding going production. The spectacle is real. But the thing that determines whether any of this works isn't on stage. It's the quiet protocol stack underneath — MCP, A2A, AGUI, and their contested cousins.
Most teams can tell you which LLM they're using. Almost none can answer: which tools should the agent see? Who else can it delegate to? Where does the human approve or cancel?
Those three questions are the stack. Here's what sits at each layer.
MCP is the most successful agent protocol by far. 14,000 GitHub repos tagged with it. Every major agent platform supports it. An agent connects to an MCP server, gets a list of callable tools, and can actually do work instead of just chatting.
But here's what nobody says out loud: there's no registry. No mcp search. No way for an agent to discover servers programmatically. The 14,000 number is GitHub tag-counting, not a registered directory. Smithery.ai lists about 6,700 — and you browse that with your eyes, not an API. An agent can't ask "find me an MCP server for Salesforce" and get an answer. Discovery is a person reading lists.
That's not a protocol. That's a treasure hunt.
Tool access enables arbitrary code execution and arbitrary data access. MCP was designed for high-trust environments. Now it's everywhere. Invariant Labs has published research on tool poisoning — malicious instructions hidden in tool descriptions that influence agents through the very metadata meant to make tools discoverable.
MCP gets the agent close to the work. It doesn't decide whether the agent should do the work. That's on you.
No single agent does everything. A procurement agent needs a supplier agent. A travel agent needs a hotel agent. A software agent needs a security reviewer. Work is distributed across owners, domains, and expertise.
A2A turns that distribution into something agents can reason about. The agent card is the primitive — a published contract describing what a remote agent is, what it does, which skills it exposes, and how to interact with it.
The cost: coordination adds another surface where latency, failure, permissions, and observability can break. If your agent delegates to another agent, the workflow gets more flexible and less predictable at the same time.
A2A isn't right for every product. A single agent with a small tool set may not need coordination at all. The right question: does this workflow require delegated expertise or authority outside the primary agent?
An agent that's long-running, non-deterministic, and touching external systems needs more than a final answer. Humans need to observe it working, approve sensitive steps, inspect state, understand why it's waiting.
Chatbots don't handle this. Neither do traditional web apps built for request-response.
AGUI is the open candidate for this layer: streaming, shared state, front-end tool calls, custom events, steering, cancellation. It's the protocol most teams will ignore until their agents start doing real work and generating real bugs. They'll wire a model to tools, build a nice chat component, then discover what the agent is really doing — and retroactively bolt on approval buttons, logs, and progress spinners.
None of those are fixes for the root issue: finding the right control points, understanding what the agent is trying to do, and figuring out where the human needs to approve, deny, edit, or cancel.
A2UI, AP2, and X402 all have real use cases but sit in contested territory.
A2UI is Google's answer to agent-generated interfaces — declarative UI instead of arbitrary HTML. Right direction, narrower scope than the full human control problem.
AP2 and X402 both tackle agent payments. AP2 handles commercial trust and user authorization (60+ collaborators including Mastercard, PayPal, American Express). X402 is Coinbase's HTTP-native machine-to-machine settlement. Payments is the most crowded layer because it's the most valuable. Everyone wants in.
Teams over-focus on model selection and under-specify everything around it. They know which LLM they want. They don't know which tools the agent can or should see. They have a prototype that calls APIs but no interaction model for user approval. They can imagine multiple agents coordinating but have no way to enforce or validate that.
The actual work lives in those questions. The protocol stack isn't glamorous. Neither is infrastructure. But six months from now, the teams that figured out their operating surface will be the ones whose agents still run.
The ones that just picked a model won't know what hit them.