Reference Architectures As of 2026-05

6 canonical Claude patterns. Hand-drawn diagrams — meant to be re-drawn on a whiteboard, not pasted into a slide. See feature-inventory.md for canonical feature status.

Legend:

User / client

Claude API

Tools / MCP

Storage / data

Output / sink

1. RAG copilot

Sonnet 4.6 · Files API · prompt caching · citations · (optional MCP for live data)

Cost band

Low — Sonnet + caching dominates.

Latency

1–3s typical end-to-end.

Governance

Citations mandatory. Log retrieved chunks for audit.

Time to disruption

0–3 mo · ship now.

User question hits the app, which retrieves relevant chunks from a search index over content uploaded to Files API. The static system prompt + tool definitions stay cached on Claude's side; only the retrieved context + question are fresh per request. Sonnet 4.6 returns a grounded answer with citation spans pointing back to the source documents.

Use when

Domain Q&A over a stable corpus. Compliance requires source attribution. Latency budget allows 1–3s.

Don't use when

Answer requires multi-step reasoning across systems (use agentic). Corpus changes per-request (rebuild as live tool use).

Operationalize with: eval-starter-pack.md — grounding + format-compliance evals are how you catch citation drift and untethered answers · mcp-starter-pack.md — internal-docs server is the live-data MCP for this pattern · governance-overlay.md §9 — what to log per request when citations are mandatory.

2. Agentic workflow

Sonnet 4.6 (Opus 4.7 escalation) · Agent SDK · MCP · Skills · memory tool · plugins for distribution

Cost band

Mid–High. Loop fan-out matters; cap iterations.

Latency

10s–10m. Pattern is async-tolerant.

Governance

Sandbox tool runtime. Audit every tool call.

Time to disruption

3–12 mo · eval discipline first.

The Agent SDK runs the plan-act-observe loop. Sonnet 4.6 is the default planner; escalate to Opus 4.7 when the agent flags a hard step. Tools come through MCP servers (one connector per system, reused across agents). Skills hold the domain playbook so the agent has expert procedures, not just primitives. Memory tool persists state across runs for long-running agents. Distribute the whole bundle as a plugin so other teams install one thing.

Use when

Multi-step task that decomposes naturally into tool calls. Domain has stable procedures worth encoding as Skills. Async-tolerant.

Don't use when

Task is single-shot. Latency budget is < 3s. Tools don't exist or have no API surface (use computer use instead).

Operationalize with: mcp-starter-pack.md — 7 read-only server templates populate the MCP layer in the diagram · eval-starter-pack.md — tool-call-accuracy + grounding + cost-per-task evals are non-negotiable for agentic loops · claude-code-starter-skills.md — Skills template structure (when-to-use / failure-mode / owner) is portable beyond Claude Code · governance-overlay.md §14 — sandbox tool runtime, treat tool returns as untrusted.

3. Batch enrichment

Haiku 4.5 · Batch API · prompt caching · Files API for inputs

Cost band

Lowest. Haiku + batch + cache compounds.

Latency

Up to 24h SLA per job.

Time to disruption

0–3 mo · ship now.

Governance

Sample audit on output. Track per-job cost.

For document classification, extraction, summarization, evalsets, scheduled enrichment. Build N requests with the same cached schema/instruction prefix; submit as one Batch job; poll for completion. Haiku 4.5 handles the bulk of extraction work cheaper than any other tier. Output goes to a sink and audit log.

Use when

Latency-tolerant. High volume. Schema/instructions reused across records. Eval suites overnight.

Don't use when

Real-time response required. Per-record customization is high. Extraction needs deep reasoning (use Sonnet, not Haiku).

Operationalize with: cost-calculator.html — model the Haiku + batch + cache compounding before you commit a job size · eval-starter-pack.md — format-compliance + regression evals run cheaply on the Batch API itself · governance-overlay.md §9 + §11 — per-job audit trail and retention policy for batch outputs.

4. Domain expert assistant

Sonnet 4.6 + thinking · Skills · MCP · Files · citations · plugins for org distribution

Cost band

Mid. Cache amortizes large skill/policy prefix.

Latency

3–8s w/ thinking on hard cases.

Governance

Citations mandatory. Per-recommendation audit log.

Time to disruption

0–3 mo · ship now.

The pattern that earns the highest ROI for regulated verticals (legal, finance, clinical, compliance, claims). Skills carry the domain procedures + style + decision rules. MCP exposes the systems of record. Files holds the policy/regulatory corpus. Sonnet 4.6 with extended thinking returns cited recommendations. Distribute the whole bundle as one plugin so a new region or business unit installs it in minutes.

Use when

Vertical with proprietary procedures and high-value decisions. Citation/audit is mandatory. Multiple teams will adopt.

Don't use when

Generic Q&A — the customization stack is overkill. Decisions are low-stakes — simpler RAG copilot suffices.

Operationalize with: claude-code-starter-skills.md — Skills template shape (when-to-use / failure-mode / owner) ports directly to domain Skills · mcp-starter-pack.md — read-only servers (issue tracker, internal docs, observability, API catalog) populate the systems-of-record layer · eval-starter-pack.md — grounding + refusal-calibration evals are mandatory for high-stakes verticals · governance-overlay.md §7 + §9 — EU AI Act high-risk deployer obligations + audit log requirements.

5. Code automation

Claude Code · Opus 4.7 + Sonnet 4.6 · plugins (commands + skills + hooks + MCP) · sub-agents · computer use 2.0 (optional)

Cost band

Per-engineer subscription or API metered. Cache + Sonnet default keeps it predictable.

Latency

Interactive in CLI. Headless in CI matches build time.

Governance

Hooks enforce policy at tool boundary. Settings.json under source control.

Time to disruption

0–3 mo · ship now.

Engineering teams use Claude Code (CLI + IDE) day-to-day. Sonnet 4.6 is the default; Opus 4.7 handles hard refactors. Sub-agents (Task tool) parallelize investigation work. The team plugin bundles slash commands, Skills (repo conventions, language patterns), hooks (PreToolUse policy enforcement), and MCP servers (issue tracker, internal docs) so a new engineer's setup is a one-line install. Headless mode runs the same loop in CI. Detail in claude-code-adoption-guide.md.

Use when

Engineering teams shipping software. Repo-aware tasks. Refactors, migrations, code review, scheduled maintenance.

Don't use when

Pattern is API-only (use Agent SDK directly). User isn't a developer.

Operationalize with: claude-code-adoption-guide.md — phased rollout, settings hierarchy, sub-agents · claude-code-starter-skills.md — 8 team-grade Skill templates (PR review, test gen, migration guard, refactor scout, etc.) · hooks-starter-pack.md — 10 hook templates with phased Phase 1→4 rollout matrix · mcp-starter-pack.md — 7 read-only server templates (issue tracker, docs, CI logs, code search) · eval-starter-pack.md — Phase 3 governance gate before plugin promotion.

6. Embedded copilot

Sonnet 4.6 (Haiku 4.5 for triage) · cached app context · MCP to host app · memory tool for personalization

Cost band

Mid — Haiku triage + 90% cached input on system + app schema compounds. ~$0.001–$0.01 per interaction typical.

Latency

Sub-2s for triaged answers (Haiku). 2–6s when Sonnet handles tool use. Streaming UI hides the rest.

Governance

Memory tool retention + per-user/per-tenant isolation. App-context redaction before send. MCP server scoped to current user.

Time to disruption

3–12 mo · memory tool beta + app integration cost.

The copilot lives inside an existing app — CRM record sidebar, ticketing detail page, IDE panel — not as a standalone chat. Haiku 4.5 triages: simple lookups answer immediately; complex reasoning routes to Sonnet 4.6 with tool use. The system prompt + tool definitions + app schema are cached (90% off cached input on hit). An MCP server scoped to the host app reads the current record, searches related records, and drafts (but does not commit) actions. Memory tool persists per-user preferences across sessions. The response renders inline — never as a popup the user has to context-switch to.

Use when

Existing app with workflow context (CRM, ticketing, IDE, internal tools). Users want help inside their flow, not a separate chat tab. Personalization across sessions matters.

Don't use when

No host app yet (build the app first). Pure document Q&A (use Pattern 1 RAG). Multi-step autonomous execution (use Pattern 2 agentic).

Operationalize with: claude-code-starter-skills.md — Skills package in-app procedures (drafting, summarizing, classifying) · mcp-starter-pack.md — host-app MCP server (read-only by default, gated mutate via Phase 4) · eval-starter-pack.md — refusal-calibration + grounding evals matter most for in-app responses · governance-overlay.md — §1 data flow when prompts include app context, §11 memory tool retention.

Before you build any of these

Picking the architecture is the easy part. Picking the right first use case is what stalls 80% of pilots. Score 2–6 candidate use cases on 5 weakest-link axes (value, time-to-signal, data readiness, risk, sponsor clarity) before committing to any pattern above.

Start here: pilot-selection-worksheet.html — Week 0 use-case scorer · adoption-playbook.md — 90-day rollout arc that wraps the pattern you pick · build-vs-buy-worksheet.html — 5-axis Claude-vs-alternatives scorer for the same use case.