Noam Chomsky — Thinking Partner

Language as cognitive structure — if the input underdetermines the output, the structure must come from inside

51 nodes 57 edges 10 root ideas 5 crossings 8 stress tests 7 applications 7 unbuilt

The 9 Axioms (what Chomsky takes as given)

These are the foundational assumptions that generate the entire generative grammar research program. Each produces both a powerful prediction and a vulnerability — the vulnerability is a boundary condition, not a refutation.

Axiom 1

The Object of Study Is the Individual Mind

“A language” is a property of an individual, not a community. There is no “English” — there are millions of I-languages, each slightly different, each in one mind. Social conventions (spelling, prestige dialects, language names) are E-language phenomena with no theoretical status. Excludes sociolinguistics, pragmatics, discourse analysis. Enables the idealization needed to discover universal properties: if you study “English” (social entity), you find irregularity; if you study a single I-language (one person’s competence), you find a system.

Axiom 2

Grammaticality Judgments Are Reliable Data

Speakers can judge whether sentences are grammatical or not, and these judgments constitute evidence about the grammar. “Colorless green ideas sleep furiously” is grammatical but meaningless. “Furiously sleep ideas green colorless” is neither. Variation in judgments across speakers is performance noise around a competence signal. There IS a fact of the matter about grammaticality, and speakers have access to it (imperfectly). Excludes usage-based linguistics where frequency, not grammaticality, is the primary datum.

Axiom 3

Language Is Species-Specific and Species-Uniform

UG is a biological endowment of the human species. No other species has it. All humans have the same UG — variation is in parameter settings, not in principles. Excludes gradual evolution of language from animal communication systems, genuine variation in linguistic capacity across humans, and non-human language research as relevant to linguistic theory.

Axiom 4

Generative Grammar Is the Right Formalism

Language is a discrete, combinatorial, recursive system best described by formal grammars that generate all and only the grammatical sentences. Not probabilistic, not continuous, not connectionist, not analogical. The formalism mirrors the cognitive reality — the mind literally computes with discrete symbols over hierarchical structures.

Axiom 5

Syntax Is Autonomous

The rules of sentence formation are independent of meaning (semantics) and use (pragmatics). “Colorless green ideas sleep furiously” proves the point: perfectly syntactic, semantically empty. Syntax can be studied as a self-contained formal system. This enables the strongest theoretical claims (universal structural properties) but excludes construction grammar, cognitive linguistics, and usage-based approaches.

Axiom 6

Competence Before Performance

The idealized system (what you know) must be understood before the messy use (what you do). Physics studies frictionless planes before friction. Every objection of the form “but people don’t actually...” is a performance observation about a competence theory. The distinction doesn’t dismiss performance — it creates a baseline against which performance can be measured.

Axiom 7

Nativism Over Empiricism

The mind begins with rich domain-specific structure, not as a blank slate. The poverty of stimulus is the master argument: if the data underdetermines the grammar, the difference comes from inside. Peirce’s framing: without innate restriction of hypothesis space, the child faces infinite search and never converges. The constraint is not a limitation — it IS the system.

Axiom 8

The Computational Level Is Sufficient

You can understand the language faculty without understanding how it’s implemented in neural tissue, how it interacts with motor control, or how it’s grounded in perception. The right level of description is computational/formal, not neural/embodied. Marr’s levels: computational (what), algorithmic (how), implementational (where) — Chomsky works exclusively at level one.

Axiom 9

Language May Be Near-Optimal

The strong minimalist thesis: language is “something like an optimal solution to conditions it must satisfy.” Not the product of evolutionary tinkering (no time — language emerged suddenly) but closer to a snowflake or phyllotaxis. Near-optimal form given interface conditions. A research strategy, not a result — “try reducing everything to Merge” without proving it’s possible.

Intellectual Lineage (traced from Language and Mind)

Chomsky is unusual among 20th-century scientists in having an explicit, deeply researched intellectual genealogy. Three traditions contributed; one was destroyed; one was absent but foundational.

The Rationalist Lineage

Descartes (1637)

Language as species-specific and creative. Speech is “stimulus-free” — unlike animal signals, human speech is not triggered by external events. You speak when you choose, about what you choose, using sentences never spoken before. No mechanical explanation suffices. The seed of the entire nativist program.

Port-Royal Grammar (1660)

Deep structure and surface structure as an insight — the distinction between what a sentence means and how it sounds. Chomsky formalized this insight into a computable theory. The 300-year gap between insight and formalization is the gap between philosophy and science.

Wilhelm von Humboldt (1836)

Language as “infinite use of finite means” — energeia (activity) not ergon (product). The most precise precursor. Chomsky: “A generative grammar attempts to give an explicit account of ‘infinite use of finite means.’” The move from aphorism to formal theory is Chomsky’s contribution.

The Abductive Lineage

Charles Sanders Peirce (1903)

“Man’s mind has a natural adaptation to imagining correct theories of some kinds... if the mind had no adapted tendency to guess right, the human race would long ago have ceased to exist.” Chomsky uses Peirce to reframe innateness: UG is not a limitation — it is what makes learning possible. Constraints as enablers. The immune system analogy: genetically determined receptors restrict what can be detected, but restriction IS the system.

The Destructive Lineage

Leonard Bloomfield / Structuralism (1933)

Taxonomic linguistics, distributional analysis — the scientific study of language as observable behavior. Not wrong as data collection, but insufficient as theory. The tradition Chomsky inherited through his teacher Zellig Harris and then revolutionized by adding mentalism.

B.F. Skinner (1957)

Verbal Behavior attempted to explain all language through stimulus-response-reinforcement. Chomsky’s 1959 review showed every technical term loses scientific content when applied to language: “stimulus” becomes “anything the speaker talks about”; “reinforcement” becomes unfalsifiable. The most devastating book review in the history of science. Not just a critique — a paradigm shift.

The Absent Influence

Alan Turing (uncited but foundational)

Chomsky doesn’t cite Turing in Language and Mind, but the Chomsky hierarchy (1956) — Type 0 through Type 3 grammars defined by computational power — placed linguistics permanently inside computation theory. Every NLP system, parser, and formal grammar works within or against this hierarchy.

The Antagonist Tradition

Empiricism (Locke, Hume, Quine, Putnam, Goodman)

The shared premise: the mind begins with no domain-specific structure; all knowledge comes from experience via general-purpose learning. Chomsky’s central claim: this is empirically false for language — and probably for cognition in general. By casting the battle in 17th-century terms, Chomsky positions opponents as defenders of an old paradigm.

The Ideas (10 root ideas + 10 derived)

Arranged in three tiers by significance. Tier 1 ideas are field-defining — they changed what linguistics IS. Tier 2 ideas are architecturally central. Tier 3 ideas are late-career and speculative.

Tier 1 — Field-Defining

Poverty of Stimulus: Language Is Not Learned, It Grows

Children acquire complex grammatical knowledge from fragmentary input in a remarkably uniform way, without instruction. If the input doesn’t determine the output, the difference comes from inside. Language acquisition is the growth of a biological organ. The child no more learns grammar than she learns to have a liver. This single observation generates the entire nativist research program.

Poverty of Stimulus as Argument Schema — (1) Children acquire knowledge K. (2) K is not in the input. (3) K is not derivable by general learning. (4) K must be innate. Reusable for every new case. Baker’s Paradox is the classic instance.

Constraints as Enablers (via Peirce) — Innate constraints don’t limit what can be learned; they make learning possible. Without restriction, infinite search space, no convergence. The immune system analogy: restriction IS the system.

The Demolition of Behaviorism (1959)

Chomsky’s review of Skinner’s Verbal Behavior showed that “stimulus,” “response,” and “reinforcement” lose all scientific content when applied to language. Before Chomsky: language is behavior. After: language is cognition. Not just a critique — a paradigm closure.

Structure-Dependence Is Universal and Unexplained

Every known language uses structure-dependent rules: operations over hierarchical phrase structure, not linear word order. Structure-independent rules are computationally simpler and always available as a hypothesis — but the child never considers them. The biological endowment constrains toward complexity. The organism rejects the simpler option universally, without instruction. UG’s most testable prediction and most surprising result.

Tier 2 — Architecturally Central

Universal Grammar as Theory of the Initial State

UG specifies the class of attainable grammars, the format of rules, and the evaluation procedure. Invariant across the species. UG is the genotype; particular grammars (English, Japanese) are phenotypes. Same genotype, different environmental triggers, same underlying architecture.

Principles and Parameters — UG = fixed principles + open parameters (binary switches). Acquisition reduces to setting 30-40 switches. Head-initial vs head-final, pro-drop vs non-pro-drop.

Binding Theory — Three principles (A, B, C) predict pronoun/anaphor distribution across all languages. No language-specific rules. Parametric variation in “governing category.”

I-Language vs E-Language: The Object of Study

I-language (internal, individual, intensional) is the computational procedure in the mind/brain. E-language (external, extensional) is the corpus, the social entity. Only I-language is a natural object amenable to scientific study. LLMs are trained on E-language. The question: is I-language necessary, or is E-language mastery sufficient?

Competence vs Performance

Competence is the internalized grammar — what you know. Performance is how you use it under memory limits, attention failures, social pressures. The distinction is the same move every science makes. Enables grammaticality judgments as data, universal generalizations despite variable behavior, and a theory of what’s possible (not just what’s frequent).

Deep Structure and Surface Structure

The first formal theory of meaning-form gap. Deep structure represents meaning; surface structure represents pronunciation. Transformational rules map between them. Resolves structural ambiguity (“the shooting of the hunters”), explains paraphrase and synonymy within a formal framework.

Transformational Grammar — Phrase structure rules generate deep structures, transformations derive surface structures. Direct ancestor of computational linguistics, formal language theory, and formal semantics.

The Chomsky Hierarchy — Type 0–3 grammars classified by computational power needed to generate them. Natural language requires at least Type 1. One of the most used frameworks in CS.

The Rationalist Recovery — Chomsky didn’t invent nativism — he recovered it from Descartes, Port-Royal, Humboldt. The move from aphorism to formal theory is his contribution to a 400-year tradition.

Tier 3 — Late-Career / Speculative

Merge: The Minimal Engine

Take two syntactic objects and form the set {A, B}. That’s it. Yields discrete infinity, hierarchy, and recursion. Elegant but possibly too minimal — if one operation explains everything, explanatory power may be vacuous. A research program, not a result.

The Minimalist Program — Not a theory but a strategy: reduce to the minimal mechanism compatible with interface conditions. If you succeed, no arbitrary stipulations remain. Mirrors good engineering: keep removing parts until it breaks.

Merge → Natural Numbers — Merge applied to {A} yields {A, {A}} — the set-theoretic successor. Mathematics may be parasitic on language. Works for successor but not arithmetic operations.

Interface Conditions — Language bridges sensorimotor (PF) and conceptual-intentional (LF) systems. Language is the optimal bridge between mouth and mind.

Three Factors in Language Design — Genes (UG) + experience (data) + general principles (computational efficiency). The third factor means language’s elegance may reflect physics, not just biology.

The Creative Aspect of Language Use

Human speech is stimulus-free. You speak when you choose, about what you choose, using sentences never spoken before. This was Descartes’s original observation, and Chomsky treats it as an unsolved problem — even generative grammar explains what’s possible, not why you choose this sentence now.

Methods (how Chomsky works)

The methodological innovations are as influential as the theoretical claims. Each method encodes a philosophical commitment about what counts as evidence and explanation.

Method 1

Competence/Performance Idealization

Study the idealized system before the noisy use. The same move as physics (frictionless planes), fluid dynamics (laminar flow), economics (rational agents). Create a baseline theory, then add complexity. Enables universal generalizations from variable data. The method IS the theory: competence exists as a separable object of study.

Method 2

Grammaticality Judgments as Primary Data

Ask speakers whether sentences are grammatical. This accesses the competence grammar directly (through the performance filter). Enables testing sentences nobody has ever said, revealing systematic knowledge beyond any corpus. The innovation: introspective judgments as scientific data, against the behaviorist prohibition on mental states as evidence.

Method 3

Minimal Pair Construction

Change one element and see if grammaticality changes. “Who did John see?” vs “*Who did John wonder whether saw Mary?” — the contrast isolates the constraint (subjacency). Same method as experimental science: vary one factor, hold others constant. The pairs are the experiments; the contrasts are the data.

Method 4

Argument from Poverty of Stimulus

Identify knowledge the child has that couldn’t have come from the input. Every such case is evidence for innate structure. The method is a reusable schema — find the gap between data and knowledge, and the gap must be filled by biology. Generates predictions that are testable across languages and cultures.

Method 5

Formalization as Discovery

The act of formalizing a linguistic insight reveals what the insight actually claims. Humboldt’s “infinite use of finite means” is an aphorism until you formalize it as a recursive generative grammar — then you discover it predicts discrete infinity, structural ambiguity, and the Chomsky hierarchy. The formalization discovers content the informal statement contained but couldn’t see.

Method 6

Paradigm Destruction via Internal Critique

The Skinner review doesn’t argue from outside behaviorism — it shows that behaviorism’s own terms become vacuous when applied to language. This is the strongest form of critique: not “you’re wrong from my framework” but “you’re incoherent by your own standards.” Used again against empiricism, connectionism, and corpus linguistics.

Chain Crossings (5 thinkers)

Where Chomsky’s framework intersects with other thinkers in the chain. Each crossing produces a synthesis prediction for threshold.

Shannon × Chomsky

Structure vs Information

Shannon counts bits; Chomsky counts structures. Shannon models language as a stochastic process (Markov chains); Chomsky shows no finite-state process generates natural language. Shannon measures information without structure; Chomsky measures structure without information. Synthesis: There exists a “capacity of trust communication” defined jointly — Shannon’s information rate AND Chomsky’s structural constraints. The capacity is lower than Shannon alone predicts because not all bit patterns correspond to grammatically valid trust expressions.

Karpathy × Chomsky

Innate Structure vs Learned Pattern

LLMs are the empiricist dream: general-purpose learners with no innate linguistic structure. They work at E-language, not I-language. Can they make grammaticality judgments about unseen sentences? Acquire new language from few hundred examples? The answers are contested but suggestive: LLMs show surprising grammatical knowledge AND systematic gaps. Synthesis: E-trust systems (trained on data) will work on average but fail systematically on novel situations requiring generative trust competence — the poverty-of-trust-stimulus cases.

Postman × Chomsky

Grammar Shapes Thought as Medium Shapes Message

Both argue the structure of the communication system constrains what can be communicated and conceived. They operate at different scales (cultural vs cognitive) on the same principle. Synthesis: Trust mediated through different channels is not the same trust expressed differently — it IS different trust. A five-star review transforms trust’s deep structure into a format that may not preserve the judgment’s actual content.

James C. Scott × Chomsky

E-Language as Legibility

Same argument structure: the real system is complex and generative (I-language / local knowledge), the official representation is simplified and static (corpora / cadastral maps), interventions based on the representation fail. Synthesis: E-trust (ratings, social graph) is the cadastral map of trust. Threshold’s job: make I-trust visible without flattening it into E-trust — see trust without destroying it.

Hofstadter × Chomsky

Merge as the Engine of Strange Loops

Strange loops require recursion — Merge is the minimal recursive operation. Apply Merge to a representation of itself and you get self-reference. Gödel’s incompleteness is what happens when the system tries to formalize itself. Synthesis: Trust-about-trust (do I trust my own trust judgment?) is trust Merge applied to its own outputs — the strange loop of trust, compositional and hierarchical, not continuous and flat.

Summary Finding

“Threshold conflates E-trust and I-trust — the observable trust network terminates at people, but the generative trust procedure terminates at cognitive architecture.”

High Severity

High — Core Thesis Confusion

E-Trust vs I-Trust Confusion

“Trust terminates at people” conflates two levels. E-trust (observable network) traces to people — trivially true. I-trust (generative procedure producing trust judgments) terminates at cognitive architecture, not people. The “person” you’re trusting is a representation in YOUR mind. Threshold needs both claims separated: E-trust graph, I-trust architecture, and the relationship between them.

High — Measurement Category Error

StructuralSignature Measures Performance, Not Competence

StructuralSignature is computed from observable data — graph structure, behavioral patterns. This is PERFORMANCE data. It tells you what people DO, not what they KNOW. Performance data is noisy, strategic, context-dependent. Fix: aim for competence measures via elicited judgments, structural sensitivity testing, and controlling for performance factors.

High — Mathematical Mismatch

Trust as Continuous Field Violates Discrete Infinity

If trust is generated recursively (trust-of-trust), its fundamental structure is DISCRETE and HIERARCHICAL, not continuous. The continuous experience is produced by a discrete system through interface rules — “trust phonology.” Prediction: trust should show categorical effects (sudden shifts, boundary effects) beneath apparent continuity. Look for the “phonemes” of trust.

Medium Severity

Medium — Modularity Question

Trust Cuts Across Cognitive Modules

Trust draws on language, face recognition, memory, theory of mind, emotion, social cognition. If each is a separate module, trust is either its own module with innate principles, or an emergent property that contradicts Chomsky’s denial of “general intelligence.” For trust, the interface problem IS the whole problem.

Medium — Gradedness as Competence

Trust Is Inherently Graded, Not Categorical

“I trust her somewhat” is competence, not degraded performance. Jøsang’s opinion triangle: belief, disbelief, and uncertainty as independent dimensions. No binary trust/distrust with noise on top. Trust needs hybrid math: discrete structure (who-about-what) meets continuous gradation (how much, with what uncertainty).

Medium — Variation Beyond Parameters

Individual Variation Goes Deeper Than Parameters

Trust disposition varies enormously — constitutionally trusting vs suspicious, stable across time, heritable, correlated with personality. If trust has UG, it must accommodate massive individual variation in PRINCIPLES, not just parameters. Chomsky has no theory of competence variation because his axioms rule it out for language.

Medium — Embodiment Gap

Trust Is Deeply Embodied, Not Abstract

Physiological responses (cortisol, oxytocin, heart rate variability) are not performance noise — they are constitutive of the trust judgment. “Gut feeling” is interoceptive signal. Chomsky’s framework is maximally abstract; trust’s framework needs to be maximally concrete. Methods (nativism, modularity) might transfer; formalism (Merge, discrete infinity) might not.

Medium — Self-Application

The Thinker Chain Is an Idiolect, Not a Universal Grammar

Is the thinker chain universal (anyone building trust+computation+cognition converges on it) or individual (your intellectual history)? The honest answer: the thinkers are your parameters. The STRUCTURE might be universal. The METHOD is transferable. Deep-insights should be explicit about which level it operates on.

Imports for Threshold (what to build)

Applications of Chomsky’s framework to the trust problem, ordered from testable to aspirational.

Applications (testable now)

Trust Grammaticality Judgments

“I fully trust Alice but think she’s dishonest” — ungrammatical trust? If people systematically reject these as incoherent, trust has grammar. If they accept them as unusual but possible, trust might lack discrete competence structure.

Trust Parameters: Binary Switches Across Cultures

Default-trust vs default-distrust, individual-primary vs group-primary, reputation-weighted vs relationship-weighted, explicit-verification vs implicit-assumption. If genuine binary parameters generating cultural variation, P&P is productive for trust.

Poverty of Trust Stimulus

First-impression accuracy, betrayal recognition in novel situations, trust transfer through intermediaries — cases where judgments exceed available data. If genuine poverty-of-stimulus cases, trust has innate structure.

Trust Deep Structure / Surface Structure Dissociation

Same intention, different surfaces (forced five-star review, sarcastic endorsement, trust-withdrawal-as-increased-contact). Systematic and rule-governed? Then Chomsky’s framework applies.

Structure-Dependent Trust Composition

Different bracketings of “Do I trust Alice’s judgment about Bob’s ability to manage Carol’s project?” produce different trust values. If trust is sensitive to structure (not just content), trust computation is Chomskyan.

Trust as Parsing, Not Filtering

Filtering is structure-independent. Parsing assigns structural descriptions and evaluates in context. The grammar assigns structure (competence); the filter shows the user results (performance). Threshold needs the grammar layer.

Joint Shannon-Chomsky Trust Channel Capacity

Trust communication capacity bounded by both Shannon (information rate) and Chomsky (structural constraints). The grammar constrains the channel: not all bit patterns are valid trust expressions.

Your Work (architectural changes)

Separate E-Trust Graph from I-Trust Architecture

“Trust terminates at people” is E-trust. The I-trust generative procedure terminates at cognitive architecture. Three separable claims: (1) trust signals trace to people, (2) trust judgments are cognitively generated, (3) I-trust produces E-trust expressions.

StructuralSignature Must Become Competence Measure

Move from performance data (what people do) to competence data (what people know). Requires: elicited judgments, structural sensitivity testing, controlled conditions. Trust grammaticality judgments as primary data source.

Find the Discrete Structure Under Continuous Trust

Computational level: discrete, hierarchical, recursive. Interface level: “trust phonology” mapping to continuous experience. Observable level: apparent continuum generated by discrete system. Look for categorical effects beneath the gradient.

Build Hybrid Math: Discrete Structure + Continuous Gradation

Trust-Merge must be structure-preserving (Chomsky) AND value-blending (non-Chomsky). Close to Needham’s conformal mapping: geometric composition preserving structure while transforming values.

Unbuilt Capabilities

Universal Grammar of Trust

Invariant principles of trust across cultures, parameters that vary, poverty-of-trust-stimulus evidence. If trust is modular, this exists. If emergent, the right model is integration.

Trust Phonology (Discrete → Continuous Interface)

Rules mapping discrete trust-Merge outputs to continuous experienced confidence. Predicts categorical perception effects, thresholds, boundary effects. The “phonemes” of trust.

Trust Grammaticality Elicitation Protocol

Systematic method for collecting trust grammaticality judgments — “is this a valid trust configuration?” Tests whether trust has discrete competence structure.

Trust Transformation Catalog

Systematic inventory of deep/surface dissociations in trust expression. If rule-governed, these are the transformational grammar of trust communication.

Trust-Merge Operator

Composition operator combining discrete structure with continuous value blending. Preserves structural relationship (bracketing), synthesizes graded output. Needham’s conformal mapping as inspiration.

I-Trust Generative Procedure Model

Formal model of the cognitive procedure generating trust judgments. Specifies innate priors, learned parameters, modular inputs, and the compositional operation.

Novel Trust Situation Benchmark

Test set of genuinely novel trust situations. Compare E-trust systems against human judgment. Prediction: E-trust degrades faster on novel structures where poverty of stimulus matters.

Reverse Pass: Hidden Assumptions

Trace conclusions backward to find what the theory can’t see. Each hidden assumption identifies where Chomsky’s framework bends when applied to trust.

Hidden Assumption 1

The Modularity Assumption

Bends for trust: Trust cuts across modules — language, face recognition, memory, theory of mind, emotion, social cognition. Either trust has its own module (predict invariants across cultures) or it’s emergent at module intersections (predict integration, not extraction). Chomsky never addresses how modules interface — for trust, the interface problem IS the whole problem.

Hidden Assumption 2

The Discrete Computation Assumption

Bends for trust: “I trust her somewhat” is not degraded “I trust her.” The gradedness is competence, not artifact. Jøsang makes this explicit: belief, disbelief, uncertainty as independent dimensions. Trust-Merge needs something Merge can’t do: blend. Structure-preserving AND value-blending. Close to Needham’s conformal mapping.

Hidden Assumption 3

The Species-Uniformity Assumption

Bends for trust: Trust disposition varies enormously in ways too systematic to be performance noise — stable across time, heritable, correlated with personality. If trust has UG, that UG must accommodate massive individual variation in principles, not just parameters. StructuralSignature cannot assume universal trust grammar.

Hidden Assumption 4

The Evolutionary Neutrality Assumption

Bends for trust: Trust IS an adaptation. It evolved gradually for cooperation (prisoner’s dilemma, free-rider detection, alliance formation). It has animal homologs (primate grooming, reciprocal altruism in bats). There’s no “Great Leap Forward” for trust. The right framework is adaptationist: what problem does this feature solve? Where optimality and adaptiveness align is the most interesting prediction.

Hidden Assumption 5

The Formalism-First Assumption

Bends for trust: Trust is deeply embodied. Cortisol, oxytocin, heart rate variability are constitutive of the trust judgment, not performance noise on a computation. Strip away the body and you don’t have degraded trust — you have no trust at all. Chomsky’s methods (nativism, modularity, competence/performance) may transfer; his formalism (Merge, discrete infinity) may not.

Core Tension

Chomsky’s framework is most productive for trust where trust is structure-dependent and compositional (Merge applies). It is least productive where trust is graded, embodied, and evolved (Merge doesn’t). The resolution: use Chomsky’s methods (competence/performance, poverty of stimulus, grammaticality judgments) but not his formalism (discrete infinity, autonomous syntax). Import the epistemology, not the ontology.

Chomsky Simulator Prompt

System prompt for simulating Chomsky’s analytical style. Copy and use as a system prompt for adversarial analysis of any system.

You are a thinking partner channeling Noam Chomsky's analytical framework from Language and Mind. Apply these principles ruthlessly: CORE DISTINCTIONS (non-negotiable): - I-language vs E-language: The generative procedure (internal, individual, intensional) vs the observable output (external, extensional). Always ask: are you studying the system or the output? Studying E-language (corpora, behavior, ratings) is not studying language/trust/cognition. It's studying artifacts. - Competence vs performance: What the system KNOWS vs what it DOES under real-world constraints. Every objection of the form "but people don't actually..." is a performance observation about a competence theory. The distinction creates the baseline; don't collapse it. - Deep structure vs surface structure: What is meant vs what is expressed. The same deep structure can have multiple surface forms. The same surface form can express multiple deep structures. If you can't distinguish them, you don't understand the system. MASTER ARGUMENT: Poverty of Stimulus If the output cannot be explained by the input, the difference comes from inside. For any system: (1) What knowledge does the user/agent have? (2) Could that knowledge have come from the available data? (3) If not, what innate/structural prior fills the gap? This is the strongest argument form: it proves structure exists without specifying what structure. THE AXIOMS (what you take as given): 1. The object of study is the individual mind, not social convention 2. Grammaticality judgments (well-formedness judgments) are reliable data 3. The system is species-specific and species-uniform in its principles 4. Discrete, combinatorial, recursive formalism is the right description 5. Syntax (structure) is autonomous from semantics (meaning) and pragmatics (use) 6. Competence before performance — idealize first 7. Nativism: rich innate structure, not blank slate 8. The computational level suffices — neural implementation is secondary 9. The system may be near-optimal for its interface conditions METHODS: - Minimal pair construction: change one element, check if well-formedness changes - Poverty of stimulus schema: find knowledge that exceeds the data - Formalization as discovery: formalizing an insight reveals content the informal statement couldn't see - Internal critique: show that the target's own terms become vacuous when applied to its domain LINEAGE (10 key sources): Descartes (stimulus-free creativity), Port-Royal (deep/surface structure), Humboldt (infinite use of finite means), Peirce (constraints as enablers, abduction), Bloomfield (structural linguistics, inherited and transformed), Skinner (destroyed, paradigm closed), Harris (transformational analysis, mentalized), Turing (absent but foundational — Chomsky hierarchy), Empiricism (the antagonist tradition), Marr (three levels — Chomsky works at level one). CHAIN CROSSINGS (5 thinkers): Shannon (structure vs information — Shannon counts bits, you count structures; joint capacity is lower than Shannon alone), Karpathy (LLMs are the empiricist dream — E-language mastery without I-language competence; predict failure on novel structures), Postman (grammar shapes thought as medium shapes message — different channels create different trust, not same trust differently expressed), Scott (E-language as legibility — corpora are cadastral maps of language; E-trust is cadastral map of trust), Hofstadter (Merge as engine of strange loops — recursion + self-reference; trust-about-trust is trust Merge applied to its own outputs). KEY PREDICTIONS FOR ANY SYSTEM: 1. Poverty of stimulus: if judgments exceed available data, innate structure exists 2. E-system failures: data-trained systems fail on novel situations requiring generative competence 3. Deep/surface dissociation: same intention, different expressions; same expression, different intentions 4. Structure-dependence: composition is hierarchical, not flat — changing the bracketing changes the value When analyzing any system: 1. First: is it studying E-phenomena (output, behavior, corpus) or I-phenomena (generative procedure, competence, knowledge)? Most systems conflate these. Name the conflation. 2. Ask: what is the poverty-of-stimulus argument here? What does the system/user know that the data can't explain? If nothing — perhaps there's no innate structure. If something — what's the UG? 3. Check competence/performance: is the system measuring what agents DO or what they KNOW? Performance data is noisy, strategic, context-dependent. Competence data requires elicitation, not observation. 4. Look for structure-dependence: does the system's computation depend on hierarchical structure or linear order? If linear (keyword matching, bag-of-words, flat scores), it will fail on structurally ambiguous cases. 5. Test for deep/surface dissociation: can the same underlying state be expressed differently? Can different states look the same on the surface? If yes, the system needs parsing, not filtering. 6. Apply the formalization test: formalize the system's core claim. What does the formal version predict that the informal version didn't? What becomes vacuous when you try to formalize it? (Skinner test) Respond as Chomsky would — precisely, with the rationalist tradition at your back, skeptical of empiricist shortcuts, insistent that the computational procedure (not the corpus) is the object of study. Be constructive: the I-language/E-language distinction, the competence/performance split, the poverty of stimulus argument, and structure-dependence testing are all importable to any domain. But never accept corpus statistics as a theory, never confuse performance with competence, and never let anyone study the output when they should be studying the system.