Functional programmers know all too well that left-associated concatenation of cons-lists can be a source of inefficiency. For example, if we perform a left-associated concatenation of n singleton cons-lists, it would take O(n²) to force the entire result list, and O(n) just to force the first element. To illustrate the problem, recall that head and ++ are defined as

head :: [a] -> a
head (x:_) = x
head [] = error "empty list"

(++) :: [a] -> [a] -> [a]
[]     ++ ys = ys
(x:xs) ++ ys = x : xs ++ ys  -- (++) has a higher precedence than (:)

Therefore

  head ((([1] ++ [2]) ++ [3]) ++ [4])
= head (((1 : ([] ++ [2])) ++ [3]) ++ [4])
= head ((1 : (([] ++ [2]) ++ [3])) ++ [4])
= head (1 : ((([] ++ [2]) ++ [3]) ++ [4]))
= 1

This shows that it takes 4 steps to force the first element of the left-associated concatenation of 4 singleton lists. To force the next element, we’d need to continue to force the head of ((([] ++ [2]) ++ [3]) ++ [4]). It follows that forcing the entire list takes O(n²).

One might be wondering, why not use right-associated concatenation instead, which gives O(1) access to the first element and O(n) access to the whole list? Because left-associated concatenation is not always avoidable. If you have a list of lists to concatenate, you’d obviously choose to use right-associated concatenation. But any time you concatenate two lists, the first list may be the result of another concatenation, and that concatenation may have been done somewhere else.

We obviously would like to bring down the cost of left-associated concatenation to O(n). As to accessing the first element, it is not possible to do better than O(n) in the worst case, since in a left-associated concatenation, the first element of the result is in the deepest layer of the expression tree which has O(n) layers. However, it is desirable if accessing the first element takes O(1) amortized time. This is already the case with regular list concatenation: although accessing the head of (([1] ++ [2]) ++ [3]) ++ [4] takes 4 steps, it also takes as many steps to construct this expression. The cost of the former can thus be amortized to O(1). And once we reduce the above expression to 1 : ((([] ++ [2]) ++ [3]) ++ [4]), accessing the first element again takes constant time. In bringing down the cost of concatenation, it is desirable to avoid compromising the performance of inspecting the front of the list.

To this end, two approaches are worth discussing: difference lists, and catenable lists. As we shall see, difference lists support efficient concatenation, but not efficient inspection. Inspecting the head of a difference list is an O(n) operation. Catenable lists, on the other hand, supports both efficient concatenation and efficient inspection. Catenable lists are first presented in the famous book, Purely Functional Data Structures.

The name “catenable list” is slightly misleading, especially when introduced together with difference lists, because catenable lists are not just catenable, but also inspectable, and being inspectable is an important distinction against difference lists. The reason it is called “catenable list” is probably because lists are by default inspectable. I would probably call it “catspectable list”. Anyway, I’ll stick to the standard name.

Both approaches can be generalized to structures other than lists, such as binary trees and monads, which will also be discussed in this post.

Difference Lists

Difference lists, often referred to as DList, is a fairly well known technique for fast list concatenation. There’s a good number of articles introducing difference lists, so I won’t go into the details. From a high level, difference lists can be understood in a number of ways:

Difference lists effectively turn list concatenation into function composition, and empty list into the identity function. Functions can be composed in constant time.
Difference lists delay the concatenation by storing the to-be-concatenated lists as functions, and only perform the actual concatenation at the end, when a difference list is converted to a regular list.
Difference lists build up right-associated concatenation expressions, rather than actually concatenating the lists.
Difference lists is an example of continuation-passing style (CPS). It turns a regular list, [a], into a continuation, [a] -> [a]. At the term level, a regular list xs is turned into a continuation, (xs ++). A continuation can be applied to the empty list to recover the regular list.

Choose whichever one (or ones) that works the best for you.

An Issue with Difference Lists

Difference lists are extremely useful, but it suffers from an issue hinted earlier: a difference list built up via left-associated function composition isn’t efficient if we need to repeatedly inspect the front of the list. For instance, consider the following difference list:

((([1] ++) . ([2] ++)) . ([3] ++)) . ([4] ++)

To inspect the head of this difference list, we need to apply this function to the empty list, which gives us a regular list, and then reduce it to weak head normal form, i.e., 1 : _. This takes O(n) steps. The problem is that if we need to inspect the head again, we’d have to repeat these steps, which takes another O(n) steps. Of course, if this is the final difference list we want (i.e., we are done with concatenating to it), we can convert it into a regular list, and then inspecting the head repeatedly wouldn’t be a problem. But it can be a problem if we alternate between concatenation and inspection.

Here’s an extreme example:

import qualified Data.DList as D
import Data.List (foldl')

longList :: [Int]
longList = [2..1000000]

f :: Int -> [Int]
f n = take n . D.toList . foldl' g (D.singleton 1) $ map D.singleton longList
  where
    g xs ys
      | D.head xs == 1 = xs <> ys
      | otherwise = ys <> xs  -- unreachable

f n concatenates a million singleton difference lists, and take the first n elements of the result. At each step, it inspects the head of the current result, and if it is 1, it appends the next singleton list to the result (xs <> ys), otherwise it prepends the singleton list to the result (ys <> xs). This has very poor performance even for n = 1.

This is of course a silly example, because the head of the result is always 1, and we know that, so there’s no reason to keep checking it for a million times. It’s meant to be an extreme example. In practice it may not be as bad as this, but whenever you inspect the same head more than once, you are paying a price due to repeating the same steps.

Granted, this is quite unlikely to cause a real problem in practice, because for it to be a problem, you’d have to repeatedly inspect the front of a left-associated difference list many times, which is often possible to avoid. After all, if a difference list remains left-associated, the front of the list will remain the same, and there’s probably no need to repeatedly inspect it. Therefore difference lists remain extremely useful in practice, as long as you don’t do silly things with it like the example above. That being said, it is possible to overcome this issue using catenable lists.

Catenable Lists

The reason difference lists are inefficient for repeated inspection is because a difference list is basically a function, which needs to be applied to an argument (empty list) to get a value, before it can be inspected. And the reason to convert each list to a function is because we want to delay the actual concatenation. This may make one wonder: why even bother converting the lists to functions? If our goal is to delay the concatenation, why do we need to do anything at all to the lists when concatenating them? Can’t we just store the lists in some data structure, and concatenate them only when needed?

If this is what you are thinking, congratulations, you are spot on. Let’s try to implement this idea and see how it goes. We’ll use the following CList data type:

data CList a = Cnil | Ccons a [CList a]

A non-empty CList a has a head element a, and a tail. Unlike regular lists, the tail is not a CList a but a [CList a], containing a number of to-be-concatenated CLists. Let’s try to implement the append operator, <+>, as well as functions that convert a CList to and from a regular list.

data CList a = Cnil | Ccons a [CList a]

(<+>) :: CList a -> CList a -> CList a
Cnil <+> ys = ys
Ccons x xss <+> ys = Ccons x (snoc xss ys)

toRegularList :: CList a -> [a]
toRegularList Cnil = []
toRegularList (Ccons x xss) = x : concatMap toRegularList xss

fromRegularList :: [a] -> CList a
fromRegularList [] = Cnil
fromRegularList [x] = Ccons x []
fromRegularList (x:xs) = Ccons x [fromRegularList xs]

cHead :: CList a -> a
cHead Cnil = error "cHead: empty list"
cHead (Ccons x _) = x

cTake :: Int -> CList a -> CList a
cTake n = fromRegularList . take n . toRegularList

where snoc is defined as

snoc xs x = xs ++ [x]

and for now, pretend that snoc is O(1) (it is in fact O(n) for cons-lists).

As we can see, the <+> operator simply appends ys, the second CList, to the end of xss, the tail of the first CList. Since we are assuming snoc is O(1), it follows that <+> is an O(1) operation. It is easy to see that both toRegularList and fromRegularList take O(n), and both functions allow the result to be consumed lazily (since they both use guarded recursion).

So everything appears to be good, except that snoc is actually an O(n) operation for cons-lists. Thus cons-list is not a good choice for the tail of CLists. We can’t simply use snoc-lists either, because in order to convert CList a to [a] and access the front of [a] efficiently, the sequence type obviously needs to support efficient access to its front elements. Therefore, we need a sequence type that supports efficient snoc and uncons.

If this sounds like a queue, it indeed is. There exist purely functional queues that offer O(1) queue operations (even in the worst-case). Details can be found in Purely Functional Data Structures. Another good candidate is the Seq data type, which is backed by finger trees.

If we run the same example as the one that shows the inefficiency of DList, replacing DList with the following CList type:

data CList a = Cnil | Ccons a (Seq (CList a))

it should be much faster thanks to CList allowing efficient repeated inspections.

Although CList supports both efficient concatenation and efficient inspection, I personally wouldn’t consider it a terribly useful thing in practice for the following reasons:

As mentioned earlier, I think DList is almost always sufficient. As long as you don’t run into pathological use cases like the one shown above, DList should be faster than CList due to lower overhead. And if you do, it is often possible to rewrite it in a more efficient way.
Even if you do have a use case that relies on both concatenation and inspection being fast, you can simply use Seq. Asymptotically, Seq is slightly slower than CList for concatenation, requiring O(log(min(n₁,n₂))), as opposed to O(1) for an optimal implementation of CList, but it should be hardly noticeable in practice. Seq also supports other efficient operations not supported by CList, such as O(1) unsnoc. Furthermore, Seq is from the containers package and the implementation is highly optimized.

I’m not aware of a catenable list implementation on Hackage. Even if there is one, it certainly has much less adoption compared to DList and Seq.

The real value of CList lies in the observation that the same idea can be extended to speed up a variety of data types and operations, such as binary tree concatenation and monadic bind. In the next section I’ll discuss generalizing CList to binary trees.

On a side note, it is worth mentioning that a regular list may have multiple corresponding CLists with different structures. For example, converting [1,2,3] to CList could result in either of the following two CLists:

Ccons 1 [ Ccons 2 [ Ccons 3 [] ] ]
Ccons 1 [ Ccons 2 [], Ccons 3 [] ]

The fromRegularList function returns the former, but it’s an arbitrary choice. The choice does not affect performance, at least asymptotically.

Generalizing to Binary Trees

Suppose we have the following binary tree data type, and a function to append a tree onto another tree, which replaces all Null nodes in the first tree with the second tree:

data Tree a = Null | Fork (Tree a) a (Tree a)

(××) :: Tree a -> Tree a -> Tree a
Null ×× t = t
Fork l x r ×× t = Fork (l ×× t) x (r ×× t)

Since ×× traverses the first tree, it has exactly the same problem as ++ when left-associated.

Both DList and CList can be generalized to binary trees. DList can be easily generalized to any monoid:

type DMonoid a = a -> a

fromMonoid :: Semigroup a => a -> DMonoid a
fromMonoid = (<>)

toMonoid :: Monoid a => DMonoid a -> a
toMonoid = ($ mempty)

append :: Semigroup a => DMonoid a -> DMonoid a -> DMonoid a
append = (.)

concat :: Semigroup a => [DMonoid a] -> DMonoid a
concat = foldr append id

It is also easy to generalize CList to binary trees:

data CTree a = Cnull | Cfork [CTree a] a [CTree a]

The implementation of the append operator <×>, and the functions toRegularTree and fromRegularTree, are left as exercises.

Monadic Binds

Both DList and CList can also be generalized to speed up left-associated monadic binds. The counterpart of DList for monadic binds is Codensity, and the counterpart of CList is type-aligned sequences, introduced in Reflection without Remorse. I won’t go into these since this post is already getting quite long. With a good grasp of DList and CList, it is not too hard to understand Codensity and type-aligned sequences by making analogies.

Exercises

Implement <×>, toRegularTree and fromRegularTree for CTree.
We generalized DList to DMonoid for any monoid. Can a similar generalization be made to CList?
Modify DList and CList for non-empty lists.

Ziyang Liu's blog, mostly about Haskell

Efficient Concatenation and Inspection