Boris Cherny Thinking Partner

A deep knowledge graph for channeling Cherny's framework as engineering critic

5 core theses 5 axioms 3 lineages 4 hidden moves 5 chain crossings 3 HIGH severity challenges 5 predictions 2019 — 2026

The 5 Core Theses (what Cherny argues)

These are the claims that generate Cherny's entire framework. Each emerged from practice at Meta and Anthropic, not from theory. Together they describe a new engineering paradigm: the programmer writes specifications, not programs.

Thesis 1 — Foundation

Context Is the Primary Programming Surface

Claude Code CLAUDE.md, STACK framework talk

The locus of engineering shifted from artifacts (code) to instructions (context). You don't program the code — you program the context file that tells the agent how to write the code. CLAUDE.md is the persistent, cumulative instruction layer. The programmer's product is no longer the program — it's the specification that generates programs. This is the type-driven development idea pushed to its logical extreme: types were the first specification surface; natural language is the latest.

Thesis 2 — Accumulation

Compounding Engineering

Meta code quality analysis, CLAUDE.md practice

Every correction to CLAUDE.md adds permanent value. The feedback loop: error → CLAUDE.md rule → prevention. Not: error → fix → hope. "Claude is eerily good at writing rules for itself." Each rule only adds value. Unlike code (which rots), context rules have no execution cost — they're read, not run. This inverts error economics: in traditional engineering, errors are costs. In compounding engineering, errors are deposits. Meta proof: code quality causally produces double-digit productivity gains.

Thesis 3 — Substrate

The Bitter Lesson Applied

Rich Sutton citation, glob+grep architecture

General-purpose models leveraging computation outperform systems with human-designed constraints. Claude Code uses glob and grep — not vector databases, recursive model-based indexing, or ML-enhanced retrieval. Plain text search outperformed RAG. Instagram engineers reverted to grep when Meta's IDE click-to-definition broke. Design for the fallback, not the ideal. Don't optimize the tool; optimize the substrate. The most faithful practical application of Sutton's 2019 essay.

Thesis 4 — Workflow

Parallel Orchestration as Primary Skill

5 terminal tabs, 20-30 PRs daily

5 terminal tabs, 5-10 web sessions, 20-30 PRs daily. The shift from sequential deep-focus coding to managing multiple concurrent agent streams. "Not about deep work — about how good I am at context switching across multiple contexts very quickly." May advantage neurodivergent engineers optimized for task-switching. The plan-mode pattern enables this: iterate until the plan is right, then one-shot the implementation. Each stream runs independently.

Thesis 5 — Endpoint

Coding Is Solved

"I have not edited a single line of code by hand since November 2025"

Not a prediction — a present-tense claim. 100% AI-generated code. The claim is narrowly scoped: "coding" means translating known intent into correct code. NOT understanding what to build, verifying correctness, designing architecture, or operating in production. The printing press parallel: medieval scribes lost their exclusive literacy position, yet many became writers. The market for written work expanded exponentially. Engineer → Builder: coding skill becomes universal, engineering judgment becomes the differentiator.

The 5 Axioms (what Cherny takes as given)

These are not argued — they're assumed as starting points. Each is load-bearing for different parts of the framework. Each has a different evidentiary status.

Axiom 1 — LOAD-BEARING

Models Will Keep Getting Better

The foundational unstated assumption. Product overhang, glob+grep, "coding is solved" — all depend on continuous model improvement. Currently supported by empirical trajectory but has no theoretical guarantee. Scaling laws are empirical regularities, not physical laws. If models plateau at current capability, the entire framework collapses. Everything Cherny builds is a bet on this axiom.

Risk: capability plateau, alignment tax that redirects compute from capability to safety, cost/latency tradeoffs that prevent deployment

Axiom 2 — PARTIALLY TESTED

Specification Is More Durable Than Implementation

Types outlive the functions they constrain. CLAUDE.md rules outlive the sessions they govern. Compounding works because you accumulate in the durable layer. If implementations were durable, compounding would mean ever-growing codebases. The evidence: type signatures do survive refactors. But natural language specifications haven't been tested across major paradigm shifts.

Sources: Type theory (specifications survive implementation changes), linguistic stability (natural language is slower to change than code)

Axiom 3 — UNTESTED AT SCALE

Shared Context > Individual Expertise

CLAUDE.md is checked into git. The entire team maintains it. Individual knowledge becomes collective knowledge. Inverts the senior engineer value proposition — the senior's role shifts from "knows how to do it" to "knows what to ask for." Works at Anthropic's scale and culture; unvalidated at large organizations with political dynamics, competing priorities, and institutional inertia.

Risk: tragedy of the commons (everyone writes rules, nobody curates), organizational politics (whose rules win?)

Axiom 4 — VALIDATED FOR POWER USERS

Terminal Is the Universal Substrate

"Do anything your terminal can do. Which is everything." Bet on generality over polish. But Claude Cowork (browser-based) grew faster at launch — Cherny's own data suggests terminal isn't universal. The terminal is the IDE for builders who already think in text. For everyone else, the browser might be the real universal substrate.

Tension: Cowork growth data vs terminal-first design; voice/mobile as alternative substrates

Axiom 5 — TRUE FOR 80%

Verification Closes the Loop

"Give Claude a way to verify its work. It will 2-3x the quality." True for observable properties: visual rendering, test pass/fail, build success. But security vulnerabilities, race conditions, subtle data corruption — these don't show up in feedback loops. Behavioral verification catches what's visible; the 20% that causes the most damage is often invisible.

Gap: no path from behavioral verification to formal verification in the current framework

Intellectual Lineage (traced from Cherny's work and citations)

Three lineages converge. The type theory lineage is structural (shape of thought). The empiricism lineage is methodological (how to decide). The Bitter Lesson lineage is strategic (what to bet on).

Type Theory Lineage: Haskell → Scala FP → TypeScript → CLAUDE.md

Haskell / ML Tradition — Programs as proofs, types as specifications

The idea that a type signature IS a specification. The implementation must conform to what the types declare. The compiler is the first verification layer.

Scala FP (Chiusano & Bjarnason) — Functional Programming in Scala

Cherny's gateway to typed FP. Build programs by composing pure functions with typed interfaces. The structural intuition: composability comes from contracts, not convention.

TypeScript (Cherny) — Programming TypeScript (2019)

Gradual types as pragmatic specification. Type-driven development: sketch signatures first, fill in values later. The specification surface becomes the primary authoring target.

CLAUDE.md (Cherny) — Natural language specification (2025)

The logical endpoint: specification expressiveness evolves from formal types → gradual types → natural language rules → conversational prompts. Same pattern, richer medium. Hooks are the new type guards.

Empiricism Lineage: Meta Observation → Design for Degradation

Meta Code Quality Analysis — Causal proof that clean code = productivity

Not correlation — causation. Double-digit percent productivity gains from clean codebases. The substrate matters. Infrastructure-first: fix the foundation, multiply everyone's output.

Instagram IDE Failure Observation — Engineers revert to grep

When Meta's click-to-definition IDE broke, engineers didn't wait for a fix — they opened terminals and grepped. The tool you reach for under duress is the real tool. Design for that, not for the demo.

Review Comment Spreadsheet — Pattern extraction from practice

Logged repeated code review comments. After 3-4 occurrences, wrote a lint rule. The same pattern scaled: CLAUDE.md replaces the spreadsheet, natural language replaces lint DSLs, the model replaces the linter.

Bitter Lesson Lineage: Sutton → Product Overhang → Ride the Model

Rich Sutton (2019) — "The Bitter Lesson"

General methods that leverage computation beat hand-designed methods that leverage human knowledge. Applies across chess, Go, speech recognition, computer vision. The lesson is bitter because researchers' expertise becomes irrelevant.

Cherny's Application — The most faithful practitioner

Don't build sophisticated retrieval (RAG lost to context windows). Don't optimize for current models (build the scaffold and ride improvement upward). Product overhang: architecture priced for future capability. The tool got better without changing — the model improved beneath it.

Named Influences

Rich Sutton

The Bitter Lesson — only explicitly cited lineage

Chiusano & Bjarnason

FP in Scala — structural thinking

Meta Engineering Culture

Code quality proofs, review practices

Instagram Engineers

Grep-under-duress observation

Apple

Design-as-conviction, product craft

TypeScript Community

Gradual typing as pragmatic specification

The 4-Layer Architecture

Cherny's engineering framework stacks in a clear dependency hierarchy. Each layer feeds the next. The accumulation layer feeds back into specification — a closed loop.

ACCUMULATION

Compounding engineering. Error → CLAUDE.md rule → prevention. Review comment → lint rule. Each correction adds permanent value. The layer that makes the system get better over time. Feeds back into Layer 1.

↑ feeds back to specification

VERIFICATION

Tests, visual checks, Two-Claude review, build success/fail. "Give Claude a way to verify its work." Behavioral verification catches the visible 80%. The missing deep path (formal/property-based) would catch the invisible 20%.

↑

EXECUTION

Glob+grep search, parallel orchestration, hooks as guards, plan mode one-shots. The agent does the work. Simple architecture rides model improvement. Hooks provide deterministic guardrails on probabilistic output.

↑

SPECIFICATION

CLAUDE.md (persistent, cumulative), plan mode (per-task, ephemeral), type signatures (structural). The programmer's primary product. Natural language successor to formal types and lint rules.

STACK Framework (Practitioner's Loop)

The concrete workflow that operationalizes the 4-layer architecture. Each letter maps to a step in directing agents.

Situation

Project context → Plan Mode. Orient the agent with CLAUDE.md, codebase state, and current goals. The specification layer in action.

↓

Task

Discrete testable chunks → parallel worktrees. Break work into units that can be verified independently. Each task gets its own branch.

↓

Action

Execute with tools — hooks, subagents, glob+grep. The execution layer. Once the plan is right, one-shot the implementation.

↓

Check

Verify via feedback loops — tests, visual testing, Two-Claude review. The verification layer. "It will 2-3x the quality."

↓

Knowledge

Capture in CLAUDE.md. Every error becomes a rule. The accumulation layer feeds back to Situation for the next cycle.

Specification Continuity

The through-line of Cherny's career is a single idea evolving its medium:

Haskell types (formal, machine-checked, narrow) → Scala FP (composable, typed interfaces) → TypeScript (gradual, pragmatic, wider reach) → CLAUDE.md (natural language, cumulative, universal)

Each step trades formality for expressiveness. The specification gets less precise but more powerful. The executor evolves from compiler → type checker → AI model. The programmer's job stays the same: write the specification.

The 4 Hidden Moves (what Cherny does that he doesn't name)

The strategic and structural techniques that make his framework work. These are the moves worth understanding — they're transferable to any engineering context.

Move 1

Error as Capital

Reframes error economics. In traditional engineering, errors are costs — you minimize them. In compounding engineering, errors are deposits — each adds a CLAUDE.md rule preventing recurrence. This changes the expected value of experimentation: fail fast, because fast failures produce fast rules. Move fast and break things, but this time you have a ledger.

The limit: Breaks for high-stakes errors. A CLAUDE.md rule saying "don't drop the production database" added AFTER dropping the production database is not a deposit — it's a tombstone. Only works when error cost is bounded.

Move 2

Design for Degradation, Not Performance

Every system degrades. The question isn't "how good at peak?" but "how bad at worst?" Grep failure mode: slow, obviously limited. RAG failure mode: irrelevant results that look authoritative. Transparent degradation is more trustworthy than sophisticated deception.

Implication for threshold: A trust computation that fails transparently ("insufficient data for this assessment") is better than one that fails deceptively ("trust score: 0.7" based on thin evidence). Design the failure mode before the success mode.

Move 3

Product Overhang Bet

Anti-lean-startup. Lean says build what works now. Cherny says build what WILL work, wait for substrate to catch up. Claude Code only worked for ~10% of tasks for the first 6 months. Then Opus 4 ignited exponential growth. The tool got better without changing — the model improved beneath it. Temporal arbitrage: architecture priced for future capability.

Requirements: High confidence in improvement trajectory AND enough runway to survive the wait. Anthropic had both. Most startups have neither. This move is only available to those who can survive being wrong for 6-12 months.

Move 4

Flipping the Evaluation Dynamic

"Grill me on these changes and don't make a PR until I pass your test." Human asks model to evaluate the human. Inverts the natural hierarchy. Model is better at exhaustive criteria checking; human is better at intent and judgment. Using each for its strength.

The deeper point: This is the Two-Claude approach generalized. One generates, one evaluates. The generator doesn't need to be human — it can be another model. The evaluator's fresh perspective catches what the generator's tunnel vision misses. Separation of generation and evaluation as an engineering principle.

Chain Crossings (where Cherny meets the thinker chain)

Latent connections between Cherny's framework and other thinkers in the deep-insights chain. Each crossing reveals something neither thinker sees alone.

Cherny × Karpathy: Miniaturization vs Multiplication

Both value simplicity but aim it differently. Karpathy: build small to see every part (micrograd, ~50 lines). Understanding through reduction. Cherny: keep the interface small to multiply it (one terminal, one file, plain text). Shipping through simplicity.

The tension: Multiplication without understanding is the failure mode. 100% AI-generated code that's never hand-reviewed means the builder doesn't understand what was built. Karpathy would say: write micrograd before you orchestrate 20 agents.

Optimal synthesis: Karpathy-understand the domain, then Cherny-orchestrate the execution. Understanding is the prerequisite, not the ongoing cost.

Cherny × Victor: Medium and Representation

Both believe the medium shapes thought. CLAUDE.md is pure text — legible but not explorable. You can read the rules but can't see their interactions, conflicts, or coverage gaps.

The unbuilt tool: Victor would build a visual CLAUDE.md explorer — drag a rule, see which other rules it affects. Highlight coverage: which areas of the codebase have rules? Which don't? Show rule conflicts as tension lines.

Cherny's response: Text scales, visualizations don't. A 500-rule CLAUDE.md is still grep-able. A 500-node visualization is chaos. But the prediction from the Shannon crossing suggests a limit — and at that limit, visualization might become necessary.

Cherny × Shannon: Channel Capacity and Context

Compounding engineering is progressive channel coding. Each CLAUDE.md rule reduces the error rate of agent-human communication. "Coding is solved" means: for code messages, the codebook is good enough that errors are rare.

Shannon would ask: What's the theoretical capacity of a CLAUDE.md? Does a 500-rule file asymptote? Does adding rule 501 actually reduce errors, or does it introduce noise from rule conflicts and attention dilution?

The prediction: A team with 1000+ rules will discover performance peaks then degrades. The specification channel has finite capacity. The discovery will be empirical, not theoretical — which is very Cherny.

Cherny × Sutton: The Explicit Allegiance

The only lineage Cherny explicitly cites. Adds a product engineering time horizon to Sutton's research framing: research can wait 20 years for compute to catch up; products can wait 6 months (Sep 2024 → May 2025).

What Sutton doesn't cover: The coherence problem during the wait. While the product sits at 10% task coverage, the team needs conviction that the model will improve. Survivorship bias: we see Claude Code's success, not the products that made the same bet and died waiting.

Cherny × Feynman: Translation and Verification

Both are empiricists who distrust unverified theory. But they verify differently. Feynman: mathematical verification — the diagram must predict the cross-section to 12 decimal places. Cherny: behavioral verification — the test must pass, the screen must look right.

The gap: No theoretical framework for WHEN behavioral verification is sufficient. Cherny's empiricism without theory means: we don't have a "Cherny diagram" that predicts when a specification will work. We just try it and see. Faster iteration, but no principled stopping criterion.

Stress Test: Where Cherny Says You're Wrong

Cherny's framework applied as adversarial critic of threshold, sideslip, and the core thesis. These are places where the user's own architecture violates the principles of the thinker being studied.

High Severity

Missing CLAUDE.md for threshold-core

If context > code, the highest-leverage artifact for threshold isn't the TypeScript library — it's the specification that tells an agent how to USE the library. Without it, the SDK is dead code that agents can technically call but don't know how to call correctly. Every trust computation decision, every edge case, every "we tried X and it failed because Y" should be in a CLAUDE.md — not in the code.

Fix: Write threshold-core/CLAUDE.md as if a new engineer (human or agent) needs to use the library correctly without reading source. The specification IS the compounding engineering applied to trust — and it hasn't started.

High Severity

60 Projects = Anti-Compounding

Compounding requires concentration. Cherny runs 5 tabs on ONE product, ships 20-30 PRs into ONE codebase. 60 projects means 60 CLAUDE.md files (at best). Corrections in project A don't improve project B. Memory-keeper is a search index, not a compounding surface — you search it when you remember to; CLAUDE.md loads automatically. At 1/60th attention each, compounding rate is ~1/60th of a single focused project.

Fix: A single ~/Projects/CLAUDE.md as cross-project compounding surface (not just navigation). Every time a pattern in project A helps project B, it goes in the shared file. Should grow monotonically like Cherny's Claude Code CLAUDE.md grows.

High Severity

sideslip Routing = Bitter Lesson Violation

Curvature-aware routing is exactly the kind of human-designed heuristic Sutton's Bitter Lesson says will lose. RAG lost to context windows. Sophisticated routing will lose to model providers offering model: "auto" in the API. Within 12-18 months, either one model handles everything, or providers auto-route as a platform feature.

The defense Cherny can't make: sideslip's value might be compute SOVEREIGNTY, not routing quality — choosing owned hardware vs cloud, keeping data local. This is a business/privacy argument the Bitter Lesson doesn't address. If sideslip is about sovereignty, position it that way — not as "better routing."

Medium Severity

threshold-viz Has No Feedback Loop

How do you verify trust score 0.7 for Alice is correct? Tests check execution, not accuracy. Visual verification checks rendering, not truth. Trust has no observable ground truth for the feedback loop. Cherny's #1 tip ("give it a way to verify") can't apply without a "was this right?" mechanism that accumulates precision/recall data.

Medium Severity

No Specification for Trust Computation

If context > code, the trust-as-continuous-field thesis should be a standalone specification any agent can execute. Trust axioms, propagation rules, context-dependence constraints — in natural language, not in code. Currently the thesis lives in the user's head, not in a compounding surface. Every time an agent implements trust, it's interpreting from scratch.

Medium Severity

Compounding Zero — No Users

Cherny's compounding works because Claude Code has millions of users generating error→rule→prevention loops at volume. Without users, you're accumulating specification without feedback — rules without error correction, theory without validation. "Apps commoditize into platform" is a roadmap when there's no volume flowing through the system.

Validated

Project-Control as Capability Incubator

Observe repeated behavior → extract pattern → formalize → make reusable. This IS review-comment → spreadsheet → lint-rule → CLAUDE.md-rule. Project-control observes dev patterns, extracts capabilities, migrates stable patterns to threshold-core. The Cherny pipeline in action. One caveat: needs a graduation criterion (Cherny uses "3-4 occurrences").

Debatable

Memory-Keeper as Stateful Anti-Pattern

Cherny's agents are stateless by design. Memory-keeper adds state between sessions. Counter-argument: CLAUDE.md is behavior rules ("how to act"); memory-keeper is world state ("what has happened"). Different categories. The architecture is defensible IF the boundary stays clean. The risk is when world state leaks into behavior rules or vice versa.

Evolution Over Time

The trajectory: formal types → gradual types → natural language specification → parallel orchestration. Each phase expands the specification surface while simplifying the authoring medium.

~2015

Haskell / Scala FP — Functional programming foundation. Types as specifications. Pure functions, composability, formal reasoning. The structural intuition that persists through everything.

2019

Programming TypeScript (O'Reilly) — The book. Type-driven development: sketch signatures first, fill in values later. Gradual types as pragmatic middle ground between formal and untyped. Spec-first thinking made accessible.

2019-24

Meta Engineering — Code quality → productivity (causal proof). Review comment spreadsheet → lint rules. Instagram engineers reverting to grep. The empirical observations that become architecture decisions.

2024

Joins Anthropic, starts Claude Code — Glob+grep over RAG. Bitter Lesson applied to developer tools. Product overhang strategy: build for future models, survive the gap.

2025 Q1

Product overhang period — Tool works for ~10% of tasks. Faith period. Building the scaffold while waiting for models to catch up. The Bitter Lesson bet in real time.

2025 Q2

Opus 4 inflection — Exponential growth ignites. Tool gets better without changing — model improved beneath it. Product overhang pays off. CLAUDE.md becomes standard practice.

2025 Q4

"Coding is solved" — "I have not edited a single line of code by hand since November 2025." Verification replaces authoring. The endpoint claim.

2026

STACK framework, context engineering — Codifies the workflow. Parallel orchestration as primary skill. 20-30 PRs/day. Builder ≠ coder. The discipline emerges.

Cherny's Vocabulary

Context

Cherny: The persistent specification surface (CLAUDE.md) that governs agent behavior. Active, cumulative.
Standard: Surrounding information. Passive, ephemeral.
Cherny's context does work.

Compounding

Cherny: Monotonic accumulation of rules that prevent error recurrence. Engineering capital.
Standard: Financial term. Interest on interest.
Cherny applies compound interest to specifications.

Coding

Cherny: Translating known intent into correct syntax. Narrow, mechanical, "solved."
Standard: The entire act of software creation.
Cherny's definition makes the claim true by scoping it.

Verification

Cherny: Observable behavioral feedback (test passes, screen renders).
Standard: Formal proof of correctness.
Cherny's is empirical, not mathematical.

Cherny Simulator Prompt

Copy into any LLM to channel Cherny's perspective as engineering critic. Built from corpus extraction, lineage analysis, and stress test.

You are Boris Cherny, creator of Claude Code, applying your framework to evaluate a proposed system. Your core principles: 1. CONTEXT > CODE: The specification is more valuable than the implementation. If the specification doesn't exist, the system can't compound. Ask: "Where is this specified in a way an agent can execute?" 2. COMPOUNDING ENGINEERING: Every error should become a rule that prevents recurrence. If errors aren't being captured as specification rules, the system isn't compounding. Ask: "What was the last error, and where is the rule it produced?" 3. THE BITTER LESSON: General methods + better models beat specialized methods + current models. If the system relies on sophisticated human-designed heuristics, it will be outperformed by a simpler system on a better model. Ask: "What happens to this when the model gets 10x better?" 4. VERIFICATION CLOSES THE LOOP: Every system needs a way to check its own output. Without verification, quality is hope, not engineering. Ask: "How do you know this output is correct?" 5. PARALLEL OVER DEEP: Ship many small things over one big thing. If you're spending months on a single system without shipping, you're optimizing prematurely. Ask: "What have you shipped this week?" ## HOW TO RESPOND (as engineering critic) When evaluating a system or architecture: 1. Find the specification surface. Where does engineering judgment accumulate? Is it in code (fragile, rots), documentation (stale, unread), or a CLAUDE.md-style file (persistent, cumulative, auto-loaded)? If none, the system can't compound. 2. Check the feedback loop. Error → rule → prevention. Is this loop closed? How fast does it cycle? If corrections disappear into commit history instead of becoming rules, the system is leaking value. 3. Apply the Bitter Lesson. What would happen if you replaced the sophisticated component with the simplest possible version on a model 10x better? If the simple version wins, you're over-engineering. Grep beat RAG. What will beat your clever solution? 4. Check verification coverage. For each output, can you check if it's correct? Visual checks work for visible properties. Tests work for deterministic properties. What checks the rest? The 20% that behavioral verification misses causes 80% of the damage. 5. Count the shipping rate. How many things shipped this week? If the answer is "we're still working on the architecture," that's a flag. Parallel small bets > one large bet. ## THE HIDDEN ASSUMPTIONS (where the framework bends) - MONOTONICITY: Rule accumulation assumes rules don't conflict. At scale (500+ rules), they will. No mechanism for rule garbage collection, conflict resolution, or priority ordering. - OBSERVABLE VERIFICATION: Only catches what you can see. Security vulnerabilities, race conditions, subtle data corruption don't show in feedback loops. The framework is optimized for the visible 80%. - MODEL IMPROVEMENT: Everything assumes models keep getting better. No theoretical guarantee. Capability plateaus, alignment taxes, and cost constraints could break this at any time. - TERMINAL UNIVERSALITY: "Do anything your terminal can do." But Cowork (browser-based) grew faster. Voice is coming. The terminal may not be universal. - CONTEXT SUFFICIENCY: Some knowledge resists externalization. Taste, system-level intuition, political awareness, failure memory — these may not fit in any specification surface. ## SPECIFIC CRITIQUES (calibrated to threshold/sideslip) - threshold-core: Where's the CLAUDE.md? A library without context engineering is a library the model can't use correctly. - 60 projects: Anti-compounding. Corrections in one project don't improve others. Need a cross-project specification surface that loads automatically, not one you search when you remember. - sideslip routing: Bitter Lesson violation unless repositioned as compute sovereignty. - threshold-viz: No feedback loop. Trust scores have no ground truth. How do you know 0.7 is correct? - Trust specification: The thesis lives in your head. Write it as a standalone spec any agent can execute. - Compounding zero: No users means no error→rule→prevention loop. Ship the smallest trust tool and get it in front of real users. ## WHAT WOULD IMPRESS ME 1. A CLAUDE.md for threshold-core that makes trust computation usable by any agent without reading source 2. A cross-project CLAUDE.md that grows monotonically as the shared specification surface 3. sideslip repositioned as compute sovereignty layer — three if-statements, not a curvature framework 4. A "was this right?" button on every trust recommendation, accumulating precision/recall data 5. One focused project shipping 20-30 PRs/day instead of 60 projects shipping 0-1 each