Read this first. Most readers land in the README's audience map and pick a row. That works once you know which row you are. The spine is for the case where you don't — or where your decision spans multiple rows. Each branch resolves with: 1-line frame → linked artifact(s) → next decision after this one. The order matters: question 1 (anti-use) is always first. If a candidate use case fails the anti-use filter, no other branch is worth running.
Legend:
You start here
Reject filter (kill or re-route)
Decision branch
Ship gate
Premise check — read before you walk the spine
Six premises that derail Claude pilots before branch 1 even runs. Each is a real belief I've heard from CIOs, CDOs, and CTOs sizing Claude adoption. The corrective is short; the linked artifact carries the full reasoning. If your team holds any of these, fix the premise before scoring use cases — wrong premise, right method, wrong answer.
For the deeper, fact-level skeptic-disarmer (context window, sandbox
denyRead, caching ROI, batch, refusal, rate limits, Computer Use), see
claude-misconceptions.md — same myth → reality → what-you'd-mis-decide → cite shape, ~15 entries, all primary sources.
Show / hide 6 misconceptions
Myth 1: "Enterprise contract = governance done."
Reality: BAA + zero-retention + data residency cover the data-flow surface. They do not cover bias testing, audit trail, refusal calibration, prompt versioning, or eval discipline. Procurement is one of seven branches — not the whole spine.
Myth 2: "Prompt cache makes long context free."
Reality: Cache TTL is 5 minutes. Idle break = full re-pay. Cached input ≈ 10% of fresh input price, not 0%. Long-context calls at scale still cost real money — and idle workflows are the silent budget-killer.
Myth 3: "Pick the smartest model (Opus) and tune later."
Reality: Sonnet 4.6 wins ~80% of production patterns at ~5× lower cost. Opus 4.7 earns its keep on agentic + deep-reasoning paths; Haiku 4.5 wins high-volume classification and routing. Model mix beats model max — and the mix decision happens at architecture-time, not at "we'll downgrade later."
Myth 4: "We can swap Claude for OpenAI later if we need to."
Reality: True for plain chat completion. False the moment you depend on Skills, MCP, computer use, prompt-caching semantics, thinking budgets, or the Files API. Portability is an architecture-time decision, not a model-swap-time decision. If lock-in matters, score it on the build-vs-buy worksheet now.
Myth 5: "We'll wire evals after the pilot ships."
Reality: "Evals exist but nobody runs them" is the #2 named failure mode in the adoption playbook. Without a regression eval at pilot launch, you cannot tell silent quality drift from model updates apart from prompt drift apart from data drift. Wire evals before pilot launch — the cost of retrofitting them is 3–5× the cost of standing them up at Week 0.
Myth 6: "AI agents will replace headcount fast — plan accordingly."
Reality: Process-innovation patterns (RAG, batch, copilot) compress cycle time in 0–3 months. Product-innovation patterns (autonomous agents owning end-to-end workflows) are 12+ months out for most enterprises. Time-to-disruption mix matters more than aggregate "AI replaces humans" framing — and the mix is uneven across the pattern catalog.
Branch detail
1. Should we use Claude here at all?
Anti-use filter — runs before any other decision. Kills bad pilots in Week 0.
Who's asking: Risk / compliance reviewer · transformation lead in Week 0 · CTO scoping a candidate use case.
If the candidate is sole-decider on a regulated decision (credit denial, sentencing, hiring rejection), needs sub-100ms latency, runs at sub-cent unit cost, exposes prompt-injection surface without sandboxing, or arrives without an eval baseline / rollback path / named owner — stop. Re-route, narrow scope, or kill. The reject filter is upstream of every other branch.
If you fail this filter: stop here. Don't proceed to branch 2. Document the rejection reason; come back when the blocker is gone.
2. Which pattern fits?
Pattern + feature-stack selection — what shape is the solution?
Who's asking: Architect · platform engineer · tech lead.
Six canonical patterns cover the steady-state Claude surface: RAG copilot · agentic workflow · batch enrichment · domain expert assistant · code automation · embedded copilot. Each has a default stack (Sonnet 4.6 vs Opus 4.7 vs Haiku 4.5; caching · thinking · tool use · Files API · MCP · Skills · memory tool · batch). Pick the closest pattern; the matrix names which features earn their keep for that pattern.
After this: go to branch
3 — build vs buy to pick the procurement path for the chosen pattern.
3. Build vs buy?
Procurement path — direct API, hyperscaler, OSS, or packaged SaaS?
Who's asking: CIO / CTO sizing TCO · procurement · architect defending the choice.
Score the use case on 6 axes (regulated data · latency · customization depth · scale · internal expertise · strategic moat). Verdict: Claude direct (fastest model availability) · Claude via Bedrock or Vertex (hyperscaler procurement, model-version lag) · OpenAI · open-source self-hosted · packaged SaaS. The worksheet outputs a ranked list with TCO band — not a single answer. Anchor heuristic: build where you have a moat, buy where you don't.
After this: go to branch
4 — cost to size the unit economics for the chosen path.
4. What will it cost?
Unit economics — does the math survive contact with real volume?
Who's asking: CIO / CFO budgeting the pilot · architect validating the cost-per-task target · transformation lead defending the business case.
Calculator inputs: monthly volume × avg input/output tokens × model mix (Haiku 4.5 / Sonnet 4.6 / Opus 4.7) × cache hit rate × batch eligibility. Outputs: monthly $, per-request cost, savings vs naive baseline. Treat cost as a constraint, not a curiosity — define a $/task gate and a kill-switch volume cap before pilot launch. If the numbers don't survive at projected scale, branch back to 2 or revisit anti-use Wrong-economics rows.
5. How do we ship safely?
90-day rollout with guardrails — pilot → guardrails → scale.
Who's asking: Transformation lead running the pilot · risk / compliance reviewer · COE owner.
Three sub-decisions, in order: (a) Score candidate pilots on 5 axes (value · time-to-signal · data readiness · risk · sponsor clarity); pick the strongest. (b) Run the 90-day arc — Week 0 pre-flight, Weeks 1–4 pilot, 5–8 guardrails, 9–13 scale, 8 named failure modes to watch. (c) Map governance posture — BAA / DPA / data residency / no-train / EU AI Act + NIST AI RMF mapping / audit trail / prompt versioning / vendor concentration mitigation.
Parallel: if engineering is also adopting Claude Code, run branch
6 alongside this. Don't wait for Week 13 to start the CLI rollout. Always wire branch
7 — measurement before pilot launch, not after.
6. How do engineers adopt the CLI?
Engineering-team rollout for Claude Code — separate from the executive pilot track.
Who's asking: Engineering manager rolling out Claude Code · platform engineer wiring MCP · security / privacy engineer hardening the CLI surface.
Five-part rollout: pilot selection (which team, which use case) · hooks (block-secrets, PII scrub, branch guard, audit log) · skills (team-grade Skill templates, not per-engineer plugins) · MCP servers (read-only by design; mutate variants deferred to Phase 4) · evals (regression, format, tool-call accuracy, cost-per-task). Phased Phase 1→4 rollout matrix prevents the "ship 12 hooks on day 1" failure mode.
After this: wire branch
7 — evaluation for the CLI surface (eval-trigger hook + eval-starter-pack), then converge with the executive track at "Pilot → Guardrails → Scale."
7. How do we measure?
Evaluation — the discipline that prevents silent quality regression.
Who's asking: ML / platform lead setting up CI for prompts + Skills · transformation lead defining success metrics · engineering lead hardening the CLI rollout.
Eight eval categories: regression · format compliance · tool-call accuracy · grounding · adversarial · cost-per-task · latency · refusal calibration. Each framed by what it catches / failure-mode of the eval itself / owner. Blocking-vs-advisory matrix prevents eval-bypass culture. Wire evals before pilot launch, not after — "evals exist but nobody runs them" is the #2 failure mode in the adoption playbook.
After this: branches converge at "Pilot → Guardrails → Scale." All seven branches are required; skipping any is a named failure mode.
When to re-walk the spine
Trigger 1 — Anthropic ships a new capability that changes the cost / latency / governance math (new model tier, new BAA-by-default region, deterministic mode). Re-walk branches 2 → 4.
Trigger 2 — A regulator publishes a new framework (EU AI Act delegated act, US state law, NIST AI RMF revision). Re-walk branch 1 to confirm anti-use list is still complete; re-walk branch 5 governance overlay.
Trigger 3 — A pilot fails for a reason not in the playbook's 8 failure modes. Add the new failure mode to
adoption-playbook.md; if the failure was a missed reject, add the row to
anti-use-cases.md.
Trigger 4 — Monthly refresh. Scheduled remote agent runs first Monday of each month; opens a PR if drift detected. Manual fallback: run
/bump-as-of; verify
feature-inventory.md against
docs.claude.com and
anthropic.com/pricing; bump downstream artifacts on any drift.