Literate programming tools are mostly obsolete

Literate programming tools are utopian technology, like Plan Nine and Lisp machines: they appear often in descriptions of how the world ought to be, but rarely in actual use.

There's a reason for this. Literate programming tools traditionally have four features:

  1. Comments by default: text is interpreted as comments unless explicitly marked as code.
  2. Reordering: there is a macro system that allows writing code in a different order than that required by the language.
  3. Markup: comments can contain more than just text — they may contain markup in some language like Markdown or HTML, perhaps with additional language-specific features as in Javadoc.
  4. Typesetting: source and comments can be cross-referenced and printed neatly in HTML (or occasionally in print formats like TEχ or PDF, but at the moment HTML is the one that matters).

Only one of these features is of much use.

In the literate ideal, programs have more comments than actual code, so making them the default saves typing. This level of commenting might be plausible for APL or hairy assembly code or mathematical wizardry, but none of these are common. Real code, even very good real code, usually has much more code than comments, so while making comments the default might be justified as a device to encourage comments, it doesn't make programs shorter.

The original literate programming system, Knuth's WEB, was for Pascal, which is strict about the order of code; in particular it expects all definitions to appear before they're used. It also has no macros and no lambda, so it's limited in its ability to factor out code even when order is not a problem. WEB's macros address both problems, making Pascal much more flexible. Most modern languages, however, are much less rigid and more expressive, so they don't need the help.

Markup sounds like it would help with readability of comments. Unfortunately, most of what we have to say in comments is simple prose, which markup can't improve much. When it is used, markup (unless it's very lightweight, like Markdown) adds enough clutter to make comments less readable in source form — and almost all reading of comments is in that form. The architects of Utopia may wish we saw our code in typeset form, but we mostly see it in our editors.

Typesetting, at least, has been a great success — so much so that editors have absorbed it. Today, most programmers consider an editor primitive if it doesn't have language-aware autoindent and syntax-coloring, and many also expect name completion, type hints and the ability to jump to definitions. Though originally intended for fancy printouts, typesetting and cross-referencing are now considered too important to postpone to compile time; we expect them while editing. Only documentation is still generated by a separate pass.

I don't see literate programming as a failure; it explored the important question of how tools can make programs more readable. But the best answers to that question turned out to be embodied in languages, not literate programming tools. It wasn't obvious a priori that readability matters more when editing code than when publishing it, nor that it was simpler to make languages more expressive than to add flexibility in another layer of tools. These things are obvious in retrospect, but it took experience with tools to discover them. Rest in peace, WEB: you taught us not to need you any more.

1 comment:

  1. From what I understand, (semi-)literate Haskell is quite important in the Haskell community. The simple "Bird style" (any line beginning with ">" is code, anything else is comments) and the fact that there is compiler support for both Bird style and LaTeX (where code is wrapped in \begin{code}...\end{code} blocks) probably make the difference. Since Haskell has few ordering restrictions, there's no need to support reordering, but it has the rest.

    ReplyDelete

It's OK to comment on old posts.