Decision Spine As of 2026-05

The single entry-point flow for every decision in this repo. Pick the question you're trying to answer; spine routes you to the right artifact + the next decision that follows.

Read this first. Most readers land in the README's audience map and pick a row. That works once you know which row you are. The spine is for the case where you don't — or where your decision spans multiple rows. Each branch resolves with: 1-line frame → linked artifact(s) → next decision after this one. The order matters: question 1 (anti-use) is always first. If a candidate use case fails the anti-use filter, no other branch is worth running.

Legend:

You start here

Reject filter (kill or re-route)

Decision branch

Ship gate

Premise check — read before you walk the spine

Six premises that derail Claude pilots before branch 1 even runs. Each is a real belief I've heard from CIOs, CDOs, and CTOs sizing Claude adoption. The corrective is short; the linked artifact carries the full reasoning. If your team holds any of these, fix the premise before scoring use cases — wrong premise, right method, wrong answer.

For the deeper, fact-level skeptic-disarmer (context window, sandbox denyRead, caching ROI, batch, refusal, rate limits, Computer Use), see claude-misconceptions.md — same myth → reality → what-you'd-mis-decide → cite shape, ~15 entries, all primary sources.

Show / hide 6 misconceptions

Myth 1: "Enterprise contract = governance done."

Reality: BAA + zero-retention + data residency cover the data-flow surface. They do not cover bias testing, audit trail, refusal calibration, prompt versioning, or eval discipline. Procurement is one of seven branches — not the whole spine.

→ See governance-overlay.md for the full posture map.

Myth 2: "Prompt cache makes long context free."

Reality: Cache TTL is 5 minutes. Idle break = full re-pay. Cached input ≈ 10% of fresh input price, not 0%. Long-context calls at scale still cost real money — and idle workflows are the silent budget-killer.

→ See cost-calculator.html for cache-hit math.

Myth 3: "Pick the smartest model (Opus) and tune later."

Reality: Sonnet 4.6 wins ~80% of production patterns at ~5× lower cost. Opus 4.7 earns its keep on agentic + deep-reasoning paths; Haiku 4.5 wins high-volume classification and routing. Model mix beats model max — and the mix decision happens at architecture-time, not at "we'll downgrade later."

→ See feature-decision-matrix.html for pattern-by-pattern model fit.

Myth 4: "We can swap Claude for OpenAI later if we need to."

Reality: True for plain chat completion. False the moment you depend on Skills, MCP, computer use, prompt-caching semantics, thinking budgets, or the Files API. Portability is an architecture-time decision, not a model-swap-time decision. If lock-in matters, score it on the build-vs-buy worksheet now.

→ See build-vs-buy-worksheet.html (axis: customization depth + strategic moat).

Myth 5: "We'll wire evals after the pilot ships."

Reality: "Evals exist but nobody runs them" is the #2 named failure mode in the adoption playbook. Without a regression eval at pilot launch, you cannot tell silent quality drift from model updates apart from prompt drift apart from data drift. Wire evals before pilot launch — the cost of retrofitting them is 3–5× the cost of standing them up at Week 0.

→ See eval-starter-pack.md for the 8-template starter pack.

Myth 6: "AI agents will replace headcount fast — plan accordingly."

Reality: Process-innovation patterns (RAG, batch, copilot) compress cycle time in 0–3 months. Product-innovation patterns (autonomous agents owning end-to-end workflows) are 12+ months out for most enterprises. Time-to-disruption mix matters more than aggregate "AI replaces humans" framing — and the mix is uneven across the pattern catalog.

→ See feature-decision-matrix.html Time-to-disruption column + reference-architectures.html meta cells.

Branch detail

1. Should we use Claude here at all?

Anti-use filter — runs before any other decision. Kills bad pilots in Week 0.

Who's asking: Risk / compliance reviewer · transformation lead in Week 0 · CTO scoping a candidate use case.

If the candidate is sole-decider on a regulated decision (credit denial, sentencing, hiring rejection), needs sub-100ms latency, runs at sub-cent unit cost, exposes prompt-injection surface without sandboxing, or arrives without an eval baseline / rollback path / named owner — stop. Re-route, narrow scope, or kill. The reject filter is upstream of every other branch.

anti-use-cases.md
artifacts/anti-use-cases.md — 5 categories: Hard nos · Wrong tool · Wrong economics · Governance no-go · Premature.

If you fail this filter: stop here. Don't proceed to branch 2. Document the rejection reason; come back when the blocker is gone.

If you pass: go to branch 2 — pattern fit.

2. Which pattern fits?

Pattern + feature-stack selection — what shape is the solution?

Who's asking: Architect · platform engineer · tech lead.

Six canonical patterns cover the steady-state Claude surface: RAG copilot · agentic workflow · batch enrichment · domain expert assistant · code automation · embedded copilot. Each has a default stack (Sonnet 4.6 vs Opus 4.7 vs Haiku 4.5; caching · thinking · tool use · Files API · MCP · Skills · memory tool · batch). Pick the closest pattern; the matrix names which features earn their keep for that pattern.

reference-architectures.html
artifacts/reference-architectures.html — 6 hand-drawn patterns with cost band + governance notes. feature-decision-matrix.html
artifacts/feature-decision-matrix.html — patterns × features grid with conditional rationale.

After this: go to branch 3 — build vs buy to pick the procurement path for the chosen pattern.

3. Build vs buy?

Procurement path — direct API, hyperscaler, OSS, or packaged SaaS?

Who's asking: CIO / CTO sizing TCO · procurement · architect defending the choice.

Score the use case on 6 axes (regulated data · latency · customization depth · scale · internal expertise · strategic moat). Verdict: Claude direct (fastest model availability) · Claude via Bedrock or Vertex (hyperscaler procurement, model-version lag) · OpenAI · open-source self-hosted · packaged SaaS. The worksheet outputs a ranked list with TCO band — not a single answer. Anchor heuristic: build where you have a moat, buy where you don't.

build-vs-buy-worksheet.html
artifacts/build-vs-buy-worksheet.html — 5-axis scorer with ranked verdict + rationale.

After this: go to branch 4 — cost to size the unit economics for the chosen path.

4. What will it cost?

Unit economics — does the math survive contact with real volume?

Who's asking: CIO / CFO budgeting the pilot · architect validating the cost-per-task target · transformation lead defending the business case.

Calculator inputs: monthly volume × avg input/output tokens × model mix (Haiku 4.5 / Sonnet 4.6 / Opus 4.7) × cache hit rate × batch eligibility. Outputs: monthly $, per-request cost, savings vs naive baseline. Treat cost as a constraint, not a curiosity — define a $/task gate and a kill-switch volume cap before pilot launch. If the numbers don't survive at projected scale, branch back to 2 or revisit anti-use Wrong-economics rows.

cost-calculator.html
artifacts/cost-calculator.html — interactive estimator with prompt cache + batch math.

After this: go to branch 5 — ship safely to wire the pilot end-to-end.

5. How do we ship safely?

90-day rollout with guardrails — pilot → guardrails → scale.

Who's asking: Transformation lead running the pilot · risk / compliance reviewer · COE owner.

Three sub-decisions, in order: (a) Score candidate pilots on 5 axes (value · time-to-signal · data readiness · risk · sponsor clarity); pick the strongest. (b) Run the 90-day arc — Week 0 pre-flight, Weeks 1–4 pilot, 5–8 guardrails, 9–13 scale, 8 named failure modes to watch. (c) Map governance posture — BAA / DPA / data residency / no-train / EU AI Act + NIST AI RMF mapping / audit trail / prompt versioning / vendor concentration mitigation.

pilot-selection-worksheet.html
artifacts/pilot-selection-worksheet.html — Week 0 use-case scorer with blocker flags. adoption-playbook.md
artifacts/adoption-playbook.md — 90-day rollout, 8 failure modes, reference team structure. governance-overlay.md
artifacts/governance-overlay.md — risk + compliance overlay specific to Claude.

Parallel: if engineering is also adopting Claude Code, run branch 6 alongside this. Don't wait for Week 13 to start the CLI rollout. Always wire branch 7 — measurement before pilot launch, not after.

6. How do engineers adopt the CLI?

Engineering-team rollout for Claude Code — separate from the executive pilot track.

Who's asking: Engineering manager rolling out Claude Code · platform engineer wiring MCP · security / privacy engineer hardening the CLI surface.

Five-part rollout: pilot selection (which team, which use case) · hooks (block-secrets, PII scrub, branch guard, audit log) · skills (team-grade Skill templates, not per-engineer plugins) · MCP servers (read-only by design; mutate variants deferred to Phase 4) · evals (regression, format, tool-call accuracy, cost-per-task). Phased Phase 1→4 rollout matrix prevents the "ship 12 hooks on day 1" failure mode.

claude-code-adoption-guide.md
artifacts/claude-code-adoption-guide.md — engineering rollout, security model, measurement. claude-code-starter-skills.md
artifacts/claude-code-starter-skills.md — 8 team-grade Skill templates. hooks-starter-pack.md
artifacts/hooks-starter-pack.md — 10 hook templates with phased rollout matrix. mcp-starter-pack.md
artifacts/mcp-starter-pack.md — 7 read-only MCP server templates.

After this: wire branch 7 — evaluation for the CLI surface (eval-trigger hook + eval-starter-pack), then converge with the executive track at "Pilot → Guardrails → Scale."

7. How do we measure?

Evaluation — the discipline that prevents silent quality regression.

Who's asking: ML / platform lead setting up CI for prompts + Skills · transformation lead defining success metrics · engineering lead hardening the CLI rollout.

Eight eval categories: regression · format compliance · tool-call accuracy · grounding · adversarial · cost-per-task · latency · refusal calibration. Each framed by what it catches / failure-mode of the eval itself / owner. Blocking-vs-advisory matrix prevents eval-bypass culture. Wire evals before pilot launch, not after — "evals exist but nobody runs them" is the #2 failure mode in the adoption playbook.

eval-starter-pack.md
artifacts/eval-starter-pack.md — 8 eval templates with blocking/advisory matrix.

After this: branches converge at "Pilot → Guardrails → Scale." All seven branches are required; skipping any is a named failure mode.

When to re-walk the spine

Trigger 1 — Anthropic ships a new capability that changes the cost / latency / governance math (new model tier, new BAA-by-default region, deterministic mode). Re-walk branches 2 → 4.

Trigger 2 — A regulator publishes a new framework (EU AI Act delegated act, US state law, NIST AI RMF revision). Re-walk branch 1 to confirm anti-use list is still complete; re-walk branch 5 governance overlay.

Trigger 3 — A pilot fails for a reason not in the playbook's 8 failure modes. Add the new failure mode to adoption-playbook.md; if the failure was a missed reject, add the row to anti-use-cases.md.

Trigger 4 — Monthly refresh. Scheduled remote agent runs first Monday of each month; opens a PR if drift detected. Manual fallback: run /bump-as-of; verify feature-inventory.md against docs.claude.com and anthropic.com/pricing; bump downstream artifacts on any drift.