Eveningside Labs
Notes from the build floor · May 2026
On May 6, 2026, at Code with Claude in San Francisco, Anthropic shipped multi-agent orchestration in Claude Managed Agents — alongside Dreaming and Outcomes. It went out as a public beta. No new model. No benchmark drop. Just scaffolding. And the scaffolding is the point.
What actually shipped
Multi-agent orchestration lets a lead agent break a complex job into pieces and delegate each piece to a specialist sub-agent with its own model, prompt, and tools. Sub-agents run in parallel on a shared filesystem and feed results back into the lead agent's context. The lead agent can check in on sub-agents mid-workflow. Every step is visible and replayable in the Claude Console.
If you have built anything past a single-prompt workflow, you have probably hand-rolled this pattern already — routing, shared state, failure handling, retries. Anthropic is now absorbing that layer as a first-class platform primitive.
How Claude's multi-agent orchestration runs
Lead Agent
claude-haiku · plans & delegates
Sub-agent
deploy logs
Sub-agent
error traces
Sub-agent
metrics
Sub-agent
support tickets
shared filesystem · persistent events · auditable
The mechanics, in plain English
A few details that matter more than the marketing copy:
- Per-agent model selection. The lead can run on Haiku for cheap orchestration while sub-agents that need depth run Opus. Every's Spiral writing tool already does exactly this — Haiku lead, Opus drafters.
- Shared filesystem, not just shared text. Sub-agents write artifacts other agents can read. Closer to how a real engineering team operates than a chat thread passing strings around.
- Persistent events. The lead agent can return to a sub-agent and ask "what did you find?" because the sub-agent's history is durable, not stuck in volatile context.
- Auditability in Claude Console. Each sub-agent's reasoning and tool calls show up step-by-step. For enterprise teams, this is the part that unblocks the legal review — not the throughput numbers.
Who is actually using this
Anthropic's stage examples were specific enough to be worth pointing at:
Netflix · Platform team
A lead agent analyzes build logs across hundreds of source repositories. Sub-agents scan in parallel and surface only the patterns worth acting on.
Every · Spiral
A Haiku lead fields incoming writing requests and delegates drafting to multiple Opus sub-agents. Drafts only return to the user after passing an Outcomes rubric scored against Every's editorial standards.
Harvey · Legal AI
Reported task completion rates climbed roughly 6× after combining dreaming with the orchestration changes. Worth flagging: that benchmark is Anthropic's customer telling Anthropic. We are watching for independent numbers.
Why this is the right thing for Anthropic to ship
The frontier model race has cooled. The competition that actually matters now is harness versus harness — what happens around the model, not inside it. Codex versus Claude Code is a more interesting fight in 2026 than GPT versus Opus.
A single agent with a single context window hits limits fast on real work. A lead-and-delegate structure does not have the same ceiling. The architectural question — how do you get useful work out of a model that runs for hours without a human watching — turns out to be the production bottleneck for almost every team trying to ship agentic AI. Multi-agent orchestration is Anthropic's answer to that question, and it is a serious one.
What we are porting at Eveningside Labs
We have spent the last several months building agentic workflows for clients across ERP, sales intelligence, and content automation — mostly Claude API plus n8n plus Supabase, with custom orchestration glue holding it together. A few patterns we are now redesigning around Managed Agents:
- Sales intelligence pipelines. A lead agent triages an inbound brief; sub-agents enrich the target company, the founder's profile, the recent funding history, and the current tech stack in parallel. The lead synthesizes a single, sourced brief.
- Content operations. One agent generates a draft; a critic agent runs it through a brand-voice rubric; a final agent ships to the CMS. Outcomes plus multi-agent orchestration in one workflow.
- ERP reconciliation. For our manufacturing-sector ERP work, longer-running cases like monthly reconciliation are a natural fit — a lead agent walks the books and dispatches sub-agents to verify line items against POs, GST records, and bank statements.
[YASH — drop a specific anecdote here from a real Eveningside client engagement. One paragraph. Numbers, names if you can share them, the messy bits. This is the part that makes the post un-replicable.]
What we are watching, with eyes open
This is a public beta and we treat it that way:
- Vendor lock-in is real. Managed Agents is a great runtime. It is also a moat. Build your agent logic — prompts, tool definitions, evals — portable across runtimes. The safer assumption in 2026 is that the runtime layer commoditizes.
- The benchmarks are still Anthropic's. Harvey's 6× and Wisedocs' 50% are real customer numbers, but they are first-party. We will trust them more when independent third parties run the same evals.
- Costs compound. Multi-agent means more tokens, more parallel runs, more per-second compute. Model the economics per workload before you put this in front of a paying client — the cost profile of "lead Haiku + four Opus sub-agents" is not the same as "one Sonnet call".
The takeaway
Multi-agent orchestration is not a flashy launch. It is the layer that makes long-horizon agent work actually shippable. If you are building production AI workflows in 2026, this is the foundation worth designing on. If you are still hand-rolling orchestration glue, this is the week to start the migration plan.
EVENINGSIDE LABS
Building agentic systems on Claude?
We design and ship multi-agent workflows for founders and teams who do not have months to figure this out themselves. If your roadmap depends on getting Claude to do real work, talk to us.
hello@eveningsidelabs.com →
Written by Yash Bharwad, founder of Eveningside Labs. Notes from the build floor.