Why is breadth-first numbering hard?

John Launchbury gave Chris Okasaki an annoying puzzle:

Given a tree T, create a new tree of the same shape, but with the values at the nodes replaced by the numbers 1 .. |T| in breadth-first order.

Go ahead and solve it. I'll wait.

If you want to solve it functionally, I'll wait longer.

Chris posed this puzzle to many functional programmers, and found that they had a surprisingly hard time with it. They took a long time to solve it, and their solutions were seldom elegant. He came up with various hypotheses as to why: did the programmers not know breadth-first traversal or queues? Did they prematurely commit to lists or pattern matching? He didn't seem to find any of them convincing. Neither do I.

One hypothesis he didn't mention is that most functional programmers see a recursive data structure and immediately try to process it by straighforward structural recursion, with a call tree isomorphic to the data structure. When you have many tools, and you encounter a nail, you reach for your hammer, right? But in this case structural recursion is the wrong tool, and it takes a while for programmers to backtrack far enough to notice.

It may take even longer for them to identify the right tool. Queues, like hashtables, are a little awkward for functional programmers, because their most natural implementations are stateful, as are many of their applications. They're almost always used linearly (i.e. there's only one version of the queue at a time), so eschewing state buys no useful flexibility, and incurs the extra hassle of explicitly passing the updated queue around. It also prevents using the efficient circular-buffer representation, just as it usually prevents using hashtables.

They're also a little awkward to use in functional languages, because none of the most familiar and widely implemented functional data structures (lists, tree dictionaries, tree sets, tries) is easily used as a queue, so would-be queue users must look up a queue library, or build one, or use pairs of lists (if they know this trick), or use some inappropriate data structure, or give up and use some other algorithm. Which is what most of Chris's subjects did.

Meanwhile, Java users use its ordinary LinkedList class (which is a doubly-linked list, and thus a reasonably efficient deque) to win contests without having to worry about any of this. Can your functional language do as well?

3 comments:

  1. Finger trees are another possibility for functional deques. David Wortman and I will be publishing a SRFI that provides deque, set, and map facades over finger trees.

    ReplyDelete
    Replies
    1. I would also mention finger trees after reading this blog article. The Data.Sequence from Haskell (a finger-tree sequence) is the first structure I reach for whenever I want array-like or deque-like properties.

      I grant the author's claim that "none of the most familiar and widely implemented functional data structures (lists, tree dictionaries, tree sets, tries) is easily used as a queue" might have been true at some point prior to 2009. These days the finger tree sequence is available in every major functional language, and is much nicer than the doubly linked list for slicing, splits, access to the middle, etc.

      Delete
  2. I went ahead and attempted the puzzle before reading Okasaki's elegant solution. While queues make for easy BFS traversals, it wasn't immediately clear to me how to rebuild the tree after. I ended up using a different approach: transpose the tree into a sparse vector (represented via IntMap) such that nodes are numbered in a level ordering, then accumulate over the level ordered representation, then transpose back.

    ReplyDelete

It's OK to comment on old posts.