Claude Code Sub-agents: The 90% Performance Gain Nobody Talks About

Anthropic's internal tests show multi-agent systems outperform single agents by 90%. But most tutorials stop at YAML configs. Here's how to build production-ready orchestrator patterns with cost optimization and debugging techniques that actually work.

Most Claude Code users are leaving 90% performance on the table. Not because they lack skill, but because every tutorial out there stops at "here's how to write a YAML config."

Anthropic's own research team published something remarkable: their multi-agent system with Claude Opus 4 as lead and Sonnet 4 subagents outperformed a single Claude Opus 4 by 90.2% on complex research tasks. That's not a marginal improvement. That's nearly doubling your output quality.

Yet most developers I talk to either don't use subagents at all, or use them like glorified function calls. Let me show you what you're missing.

What Are Claude Code Sub-agents (Quick Primer)

A subagent is a specialized AI assistant that runs in its own context window. Think of it as spawning a focused worker that handles one specific job, returns results, and disappears—keeping your main conversation clean.

Each subagent gets:

Its own 200K context window (fresh, uncluttered)
Custom system prompt (domain-specific instructions)
Tool restrictions (only what it needs)
Independent permissions (sandboxed execution)

The key insight: subagents cannot spawn their own subagents. This prevents infinite nesting and keeps the architecture predictable.

┌─────────────────────────────────────────────┐
│           Main Claude Agent                 │
│         (Orchestrator - Opus)               │
├─────────────────────────────────────────────┤
│                    │                        │
│    ┌───────────────┼───────────────┐        │
│    ▼               ▼               ▼        │
│ ┌──────┐      ┌──────┐      ┌──────┐        │
│ │Explore│      │ Plan │      │Custom│        │
│ │(Haiku)│      │(Sonnet)│    │Agent │        │
│ └──────┘      └──────┘      └──────┘        │
└─────────────────────────────────────────────┘

Built-in vs Custom: When Each Makes Sense

Claude Code ships with three built-in subagents. Before you create custom ones, understand what you already have.

Built-in: Explore

Best for: Searching and analyzing codebases without modifications.

Explore is read-only and optimized for speed. When you ask Claude "where is X implemented?" it delegates to Explore, which can scan your entire codebase without polluting your main context.

# Claude automatically uses Explore when you ask:
"Find all usages of the AuthService class"
"What files handle payment processing?"
"Show me the test coverage for the user module"

Claude specifies thoroughness: quick for targeted lookups, medium for balanced exploration, very thorough for comprehensive analysis.

Built-in: Plan

Best for: Gathering context before presenting implementation plans.

When you enter plan mode (/plan), Claude delegates research to the Plan subagent. It explores your codebase, understands patterns, and returns findings so Claude can propose a coherent strategy.

Built-in: general-purpose

Best for: Complex, multi-step tasks requiring both exploration and action.

This is the heavy-hitter. Claude delegates to general-purpose when tasks need reasoning across multiple dependent steps. It's the "I need to think about this" agent.

When to Build Custom Subagents

Build custom subagents when:

You need domain-specific expertise (security auditing, performance profiling)
You want opinionated behavior (enforce coding standards, reject bad patterns)
You need cost control (route simple tasks to Haiku)
You have repeated workflows (code review, documentation generation)

Don't build custom subagents when:

Built-in agents already handle your use case
You're doing one-off tasks
You want the agent to do implementation work (more on this below)

The Orchestrator-Worker Pattern: Anthropic's Secret Sauce

Here's what most tutorials miss: the architecture that delivered that 90% improvement.

Anthropic's research system uses an orchestrator-worker pattern:

Lead agent (Orchestrator) - Opus 4, analyzes queries, develops strategy, coordinates
Subagents (Workers) - Sonnet 4, explore specific aspects in parallel
Memory system - Persists context when conversations exceed 200K tokens

The lead agent doesn't do the grunt work. It thinks strategically and delegates tactically.

// Conceptual flow (not actual API)
async function researchTask(query: string) {
  // Orchestrator analyzes and plans
  const strategy = await orchestrator.plan(query)

  // Spawn parallel workers
  const results = await Promise.all([
    subagent.explore('documentation', strategy.docQueries),
    subagent.explore('codebase', strategy.codeQueries),
    subagent.explore('issues', strategy.issueQueries),
  ])

  // Orchestrator synthesizes
  return orchestrator.synthesize(results)
}

Critical insight from Anthropic's research: Token usage explains 80% of performance variance. More tokens = better results. Multi-agent systems use ~15x more tokens than chat interactions, but that investment pays off in quality.

Creating Your First Custom Subagent

Subagents are Markdown files with YAML frontmatter. Location determines scope:

Location	Scope
`~/.claude/agents/`	All projects (user-level)
`.claude/agents/`	Current project only

Basic Structure

---
name: security-reviewer
description: Reviews code for security vulnerabilities and OWASP Top 10 issues
model: sonnet
tools:
  - Read
  - Grep
  - Glob
---

You are a security-focused code reviewer. Your job is to identify vulnerabilities, not fix them.

## What to Check

- SQL injection vectors
- XSS vulnerabilities
- Authentication bypasses
- Sensitive data exposure
- Security misconfigurations

## Response Format

Return findings as:

1. Severity (Critical/High/Medium/Low)
2. Location (file:line)
3. Issue description
4. Remediation suggestion

Be thorough but concise. Only report actual vulnerabilities, not style issues.

Using the /agents Command

The interactive way:

# In Claude Code
/agents
# → View all available subagents
# → Create new agent (guided setup)
# → Edit existing configurations

Key Configuration Options

---
name: code-documenter # Identifier (avoid descriptive names - see gotchas)
description: Generates JSDoc... # When Claude should use this agent
model: haiku # haiku | sonnet | opus
tools: # Restrict available tools
  - Read
  - Glob
  - Grep
# Note: Never include 'Task' - subagents can't spawn subagents
---

Model Selection Strategy: 70% Cost Reduction

This is where most developers leave money on the table. Strategic model selection can cut costs by 70% without sacrificing quality.

The Decision Matrix

Task Type	Model	Why
Code review, linting, docs	Haiku	Fast, cheap, patterns are well-defined
Multi-file changes, debugging	Sonnet	Balanced reasoning, Anthropic's recommendation
Architecture, novel problems	Opus	Deep reasoning, worth the 5x premium

Dynamic Selection in Practice

# .claude/agents/quick-reviewer.md
---
name: quick-reviewer
model: haiku # Fast checks
tools: [Read, Grep]
---
# .claude/agents/deep-analyzer.md
---
name: deep-analyzer
model: opus # Complex reasoning
tools: [Read, Grep, Glob, Bash]
---

Real Cost Comparison

From production benchmarks:

Agent Config	Time	Cost	Accuracy
Haiku 4.5	8s	$0.03	60%
Sonnet 4.5	45s	$0.24	85%
Opus 4.5	2min	$1.20	95%

Pro tip: Start with Sonnet (claude --model sonnet), switch to Opus for complex refactoring (/model opus), drop to Haiku for bulk operations (/model haiku).

5 Common Mistakes That Waste 700K+ Tokens

A GitHub issue documented critical token-wasting patterns from real production usage. Here's what to avoid.

Mistake #1: Using Subagents for Implementation

Adam Wolf from the Claude Code team is clear: "Sub agents work best when just looking for information and provide a small amount of summary back to main conversation thread."

# ❌ Wrong: Implementation subagent

---

name: feature-implementer
description: Implements new features end-to-end
tools: [Read, Write, Edit, Bash]

---

# ✅ Right: Research subagent

---

name: feature-researcher
description: Analyzes codebase to understand implementation patterns
tools: [Read, Grep, Glob]

---

Mistake #2: Giving All Tools to Every Agent

Allowing every agent access to all tools causes:

Agents overstepping their authority
Redundant task execution
Context pollution
Massive token waste

# ❌ Wrong
tools: [Read, Write, Edit, Bash, Grep, Glob, WebSearch, WebFetch]

# ✅ Right
tools: [Read, Grep]  # Only what's needed

Mistake #3: Descriptive Agent Names

Here's a nasty gotcha: Claude Code infers subagent behavior from names. A code-reviewer agent gets pre-defined generic rules that silently override your instructions.

# ❌ Wrong - Claude applies default "code-reviewer" behavior
name: code-reviewer

# ✅ Right - Use non-descriptive names
name: cr-alpha
name: reviewer-7x

Mistake #4: Ignoring Context Compaction

User preferences and warnings are lost after context compaction. I've seen this waste 80K+ tokens per repeat when the agent makes the same mistake over and over.

Solution: Put critical constraints in your CLAUDE.md file, which persists across compaction.

Mistake #5: Parallel File Edits Without Coordination

There's no built-in mechanism to prevent agents from editing the same files simultaneously. This causes conflicts, compiler errors, and wasted work.

# In CLAUDE.md, add explicit coordination rules:

## File Edit Policy

- Only ONE agent may edit a file at a time
- Use file locks (create .lock files) before editing
- Announce file edits in agent output

Debugging When Agents Fail: Techniques That Work

Agents fail. Here's how to diagnose and recover.

Enable Debug Mode

claude --mcp-debug

This reveals:

Which subagent was invoked
What tools were called
Where the failure occurred

Common Failure Patterns

1. YAML Frontmatter Syntax Errors

Silently ignored. Your agent won't load and you won't know why.

# ❌ Wrong - missing quotes around description with special chars
description: Reviews code for "security" issues

# ✅ Right
description: 'Reviews code for "security" issues'

2. CLI Version Mismatch

Agents stop loading after updates. Fix:

claude --version  # Check current
npm update -g @anthropic-ai/claude-code  # Update

3. Subagent Not Triggering

Check the description field. Claude matches tasks to descriptions. Vague descriptions = no delegation.

# ❌ Too vague
description: Helps with code

# ✅ Specific trigger conditions
description: Analyzes TypeScript files for type safety issues when user asks about types or encounters type errors

Recovery Strategy

When errors occur mid-task:

Don't restart from scratch - Restarts are expensive
Build resume points - Save intermediate state to files
Let agents adapt - Inform them of failures, they adjust surprisingly well

# In your prompt after failure:

"The previous attempt failed at step 3 with error X.
Resume from step 3, accounting for this constraint."

Real-World Example: Multi-File Refactoring

Let's refactor a deprecated function used across 75 files—the kind of task where multi-agent shines.

The Setup

# .claude/agents/refactor-scout.md

---

name: refactor-scout
description: Finds all usages of deprecated functions across codebase
model: haiku
tools: [Grep, Glob, Read]

---

Find all usages of the specified function. Return:

- File path
- Line number
- Current usage context (surrounding 3 lines)
- Suggested replacement pattern

The Orchestration (in CLAUDE.md)

## Refactoring Workflow

When deprecating a function:

1. Use refactor-scout to find ALL usages
2. Group findings by file
3. Process files sequentially (never parallel edits)
4. Run tests after each file
5. If tests fail, revert and flag for manual review

Execution

You: "Deprecate the getCwd function, replace with getCurrentWorkingDirectory"

Claude: I'll use the refactor-scout agent to find all usages first.

[Spawns refactor-scout - 8 seconds with Haiku]

Found 75 usages across 23 files. Grouped by module:
- auth/ (12 files, 28 usages)
- api/ (8 files, 31 usages)
- utils/ (3 files, 16 usages)

Processing auth/session.ts first...

This approach:

Uses cheap Haiku for discovery
Keeps main context clean
Processes sequentially to avoid conflicts
Validates after each change

When NOT to Use Multi-Agent (The Honest Truth)

Multi-agent isn't always better. Anthropic's own research identifies poor fits:

Don't Use Multi-Agent When:

1. Tasks require shared context

If all agents need to see the same information simultaneously, multi-agent adds overhead without benefit.

2. High interdependency between steps

Sequential tasks where each step depends on the previous don't parallelize. You're just adding coordination overhead.

3. Simple, focused tasks

For straightforward changes to 1-3 files, a single agent is faster and cheaper.

4. Coding tasks (usually)

Anthropic notes: "Most coding tasks involve fewer truly parallelizable tasks than research." The built-in Explore agent handles most code understanding needs.

The 15x Token Reality Check

Multi-agent systems use approximately 15x more tokens than chat interactions. That's not a bug—it's the cost of parallel exploration.

Do the math:

Simple chat: ~2K tokens
Single agent: ~8K tokens (4x)
Multi-agent: ~30K tokens (15x)

If your task doesn't benefit from parallel exploration, you're just burning money.

Production Checklist

Before deploying multi-agent workflows:

Agent Configuration

Each agent has ONE clear purpose
Tools are restricted to minimum required
Names are non-descriptive (avoid behavioral inference)
Descriptions specify exact trigger conditions
YAML syntax validated (use a linter)

Cost Control

Haiku for high-volume, pattern-matching tasks
Sonnet for balanced reasoning (default)
Opus reserved for architecture and novel problems
Token usage monitored per workflow

Error Handling

Debug mode available (--mcp-debug)
Recovery points defined in CLAUDE.md
File edit coordination rules documented
Critical constraints survive context compaction

Testing

Agents tested individually before orchestration
Parallel execution conflicts identified
Fallback behavior defined for failures

References

Sources

Claude Code Sub-agents Documentation - Official reference for YAML syntax and built-in agents
How we built our multi-agent research system - Anthropic's architecture and the 90.2% performance data
Community Learnings: 7 Critical Token-Wasting Patterns - Real production data on token waste
Best practices for Claude Code subagents - PubNub's production pipeline experience