Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save cedrickchee/df5d03220c3663b61c78d3bc8b0bafb8 to your computer and use it in GitHub Desktop.

Select an option

Save cedrickchee/df5d03220c3663b61c78d3bc8b0bafb8 to your computer and use it in GitHub Desktop.
Analysis: "A Sufficiently Detailed Spec is Code"

Analysis: "A Sufficiently Detailed Spec is Code"

Blog post: https://haskellforall.com/2026/03/a-sufficiently-detailed-spec-is-code

Core Thesis

The post argues that the agentic coding movement's promise — that engineers can simply write specification documents and have AI agents generate working code — is fundamentally flawed. The central claim is captured in the title: if you make a specification precise enough to reliably generate correct code, the specification itself effectively becomes code. There is no shortcut that avoids the hard intellectual work of programming.


Structured Outline

1. Introduction (Comic Strip Framing)

  • References a well-known comic strip illustrating the absurdity of "just generate code from specs."
  • Notes that agentic coding advocates have muddied the waters enough to warrant a deeper treatment.

2. Two Core Misconceptions

Misconception 1: Specification documents are simpler than the corresponding code.

  • Agentic coding advocates market the idea that engineers become "managers" who write specs and farm work out to AI agents.
  • In reality, making a spec precise enough to reliably produce a working implementation forces the spec to converge toward code — either literally becoming pseudocode/code or highly formalized English.
  • Cites Dijkstra's insight: defining an interface doesn't just divide labor — it creates additional work (the work of defining the interface itself). A spec detailed enough to be unambiguous is at least as complex as the code it describes.

Misconception 2: Specification work is inherently more thoughtful than coding.

  • Traditionally, writing specs before coding encourages a contemplative, critical lens.
  • However, the industry push to reduce labor and optimize for delivery speed undermines thoughtfulness. If spec writing is treated as "easier than coding," the resulting specs will be sloppy and superficial.
  • Thoughtful specification work is hard — potentially harder than coding — and current incentives actively work against doing it well.

3. Case Study: Symphony

  • Examines a real-world example called "Symphony" — a specification document used to demonstrate agentic coding.
  • Points out that Symphony's spec:
    • Contains embedded code/pseudocode, undermining the claim that specs replace code.
    • Reads like AI-generated content (code snippets labeled as text in markdown — a sign the AI was asked to convert code into prose and did so superficially).
    • Is a "grab bag of specification-shaped sentences" lacking coherence, purpose, or understanding of the bigger picture.
    • Would necessarily be slop given the incentives — even if a human wrote it — because the goal is delivery speed over clarity.

4. Conclusion

  • Specifications were never meant to be time-saving devices. If you're optimizing for delivery time, you're better off writing code directly rather than going through an intermediate spec.
  • "Garbage in, garbage out" — a vague, unclear spec will produce vague, incorrect code. Coding agents are not mind readers.

Key Insights

  1. The Specification-Code Convergence Principle: There is an unavoidable convergence between sufficiently detailed specifications and code. This is not a new insight (Dijkstra articulated it decades ago), but it gains new relevance in the age of AI coding agents.

  2. Specs Add Work, They Don't Remove It: Defining an interface between human (spec writer) and machine (code generator) doesn't simply divide existing work — it creates new work: the work of precisely defining that boundary. This is the fundamental reason spec-to-code promises fail.

  3. The Incentive Problem: The current tech industry climate prioritizes speed and cost reduction. This is exactly the wrong environment for spec-driven development, which requires more thoughtfulness, not less. The result is that specs become performative documents rather than rigorous ones.

  4. AI-Generated Specs Are Self-Defeating: The Symphony example reveals a circular irony — the spec document itself appears to be AI-generated, meaning AI is being used to write the spec that AI will then use to generate code. This compounds the "garbage in, garbage out" problem.

  5. No Substitute for Human Cognition: The post ultimately defends the irreducible role of human engineering judgment. The hard part of software development is thinking clearly about the problem, and no document format or AI agent can bypass that.


Critical Nuances

  • The author doesn't reject specs entirely. He acknowledges that pseudocode and reference implementations are legitimate tools in specification work. His objection is specifically to the claim that specs replace code.
  • The critique targets the marketing, not the tools. The post is less about whether AI coding tools are useful and more about how agentic coding is being sold — with misleading promises that obscure the actual work involved.
  • Dijkstra's interface argument is foundational. The post grounds itself in a well-established computer science principle rather than anecdotal complaints, giving it stronger intellectual weight.
  • The footnote is revealing: The observation that Symphony's code snippets are annotated as text (suggesting an AI converted a code-heavy draft into a "prose specification") exemplifies how the appearance of specification rigor can be manufactured without substance.
@cedrickchee
Copy link
Copy Markdown
Author

Critical Nuances

The discussion reveals that both "sides" are making partially valid points. The article's strongest argument is that as you add precision to a specification, it asymptotically approaches code — this is almost tautologically true and connects to deep results in computer science and formal methods. But commenters correctly note that this doesn't mean specs are useless or that LLMs provide zero value. The practical reality most practitioners describe is a middle ground: humans do the hard conceptual/architectural work, define interfaces and data structures, and use LLMs to fill in the mechanical implementation details — essentially using the LLM as a very high-level compiler with no semantic guarantees, while maintaining human oversight for correctness.

The 640-point engagement level and the depth of the 331 comments signal that this article hit a nerve in the developer community, tapping into genuine anxiety about the role of developers in an AI-augmented future and real frustration with overblown agentic coding marketing claims.


The Article's Core Thesis

Gabriella Gonzalez (a well-known Haskell developer) argues that the promise of "agentic coding" — where you write a natural language specification and an AI agent generates working code from it — is fundamentally misconceived. Her argument rests on two claimed misconceptions held by agentic coding advocates:

Misconception 1: Specification documents are simpler than the corresponding code. She demonstrates, using OpenAI's own "Symphony" project (an agent orchestrator), that when you try to make a spec precise enough to actually generate working code, the spec inevitably becomes code. Symphony's SPEC.md contains field-by-field data structure definitions, pseudocode algorithms, backoff formulas, and explicit concurrency control logic — essentially code written in awkward prose. She invokes Dijkstra's argument that you cannot escape "narrow interfaces" (i.e., the formal precision of code) no matter what medium you write in.

Misconception 2: Specification work must be more thoughtful than coding work. She argues that AI-generated specs (like Symphony's) read as "slop" — grab bags of specification-shaped sentences lacking coherence and thoughtful design. When specs are optimized for machine consumption rather than human understanding, they lose the very quality that made specifications valuable in the first place.

She also demonstrates flakiness: she actually tried Symphony's stated workflow (feeding its SPEC.md to Claude Code to generate a Haskell implementation) and found multiple bugs, including the model not even knowing it needed to create a git repository. She references Borges' allegory of the 1:1 map to argue that a spec detailed enough to eliminate ambiguity would grow to the size of the code itself.


Key Discussion Themes from the Comments

1. The "LLMs Do Fill In Details" Counterargument

The top-voted comment (bad_username) pushes back on the article's absolute framing. LLMs can reliably generate small amounts of working code from terse descriptions — that's precisely why they're popular. They use vast training data to interpolate plausible detail. However, this commenter concedes the core point: when the interpolated detail is wrong (and it's non-deterministic), reliable results require constraining details to spec-level precision. Multiple respondents (Someone, lmm, skywhopper) refine this further — LLMs draw from training data patterns, not genuine understanding, so they work well for common/boilerplate code but break down for novel or complex requirements.

2. The Specification Gap and Formal Methods

A particularly insightful thread (agentultra) brings up "program synthesis" and formal specification languages like Synquid and TLA+. The argument: mathematically precise specifications already exist, and proving that a program implements a specification faithfully is itself a hard problem (the "specification gap"). Spoken natural language is simply not precise enough to define a program. Just because an LLM produces code that appears to implement a spec doesn't mean it actually does — you may have just gotten lucky.

3. Code vs. Spec: Essential vs. Accidental Complexity

Several commenters (d--b, cush, lowbloodsugar) draw the distinction between essential and accidental complexity. Code contains enormous amounts of accidental complexity — memory layout, framework boilerplate, infrastructure scaffolding. One commenter estimates typical projects are 90% scaffolding and only 10% business logic. The spec, by definition, should capture only the essential complexity. This challenges the article's premise that specs and code converge: a well-written spec could in theory be much shorter than the implementation because it doesn't need to deal with framework details, database technology, or communication layers.

4. The Compiler Analogy

An interesting thread (lowbloodsugar) asks: Isn't this just a compiler? You don't specify which registers to use in C, yet the code runs. Is an LLM just an even higher-level compiler? Others push back — the critical difference is that a C compiler provides semantic guarantees (deterministic transformation preserving meaning), while an LLM does not. If a compiler gets something wrong, you get bad performance; if an LLM gets something wrong, it might delete accounts or credit the wrong users.

5. Real-World Practitioner Reports

Several commenters share their practical workflows. Trane_project describes a structured approach: define data structures and function signatures yourself, write test signatures, then hand off the mechanical implementation to the agent. Kstenerud describes an elaborate multi-phase process involving brainstorming with the LLM, building high-level designs, doing market research, and creating technical design documents before any code generation. Jumploops notes that in practice, spec docs for agentic engineering are often longer than the code itself, and serve as a second source of truth for maintaining behavior across iterations.

6. The "Brownfield Problem"

Angry_octet raises a point about brownfield development: once you start using a particular implementation, its interfaces become the de facto spec. Regenerating code from scratch isn't acceptable in real systems because downstream consumers depend on specific behaviors. LLMs may fare poorly in brownfield environments where the "spec" is the existing codebase's surface area.

7. Brooks' "No Silver Bullet" Parallels

Multiple commenters (hintymad, smartmic) connect this directly to Fred Brooks' classic 1986 paper "No Silver Bullet." The argument that you can't eliminate essential complexity through better tooling is decades old. However, hintymad notes that many people don't actually need that level of detail — when they say "write me a to-do list app," they mean "write me one better than what I've imagined so far," which doesn't require a detailed spec. This leads to debate about whether most commercially valuable software is truly standard/reproducible or genuinely novel.

8. UI/UX and Non-Algorithmic Problems

Shebanator makes the point that the article's thesis works well for purely algorithmic problems, but many real-world problems (especially UI/UX) don't have solutions that fall neatly into spec-like categories. Asking someone to produce a sufficiently detailed spec for Adobe Photoshop illustrates the absurdity.

9. The Safety-Critical Perspective

Randusername offers the safety-critical engineering view: specifications mean requirements, not implementation details. Requirements are expectations about what code does — the contract developers are held accountable to. Putting implementation details in requirements is a "rookie mistake" because it takes agency away from the engineers. This reframes the entire debate: if your "spec" contains implementation details, it's not a spec — it's pseudocode.

10. The "Spec → LLM" vs "LLM → Spec" Direction

Sornaensis argues the interesting direction is reversed: use LLMs to help write the spec, not to consume it. Better languages that can validate/compile specs and communicate failures back to the LLM are what will win. Trying to create "validated English" is just injecting complexity away from the actual work.

11. The Haskell Language Choice Controversy

The author's choice to use Haskell drew significant pushback (mike_hearn, rytis). Haskell's lazy pure-functional paradigm with monads for IO is dramatically different from mainstream languages, and LLMs have far less training data for it. The author (Gabriella439) responds that this is actually the point — if agents can't reliably transfer concepts across languages, it suggests they aren't truly generalizing beyond their training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment