Functional programmers know all too well that left-associated concatenation of cons-lists can
be a source of inefficiency. For example, if we perform a left-associated concatenation of n
singleton cons-lists, it would take O(n2) to force the entire result list, and
O(n) just to force the first element. To illustrate the problem, recall that
++ are defined as
head :: [a] -> a head (x:_) = x head  = error "empty list" (++) :: [a] -> [a] -> [a]  ++ ys = ys (x:xs) ++ ys = x : xs ++ ys -- (++) has a higher precedence than (:)
head ((( ++ ) ++ ) ++ ) = head (((1 : ( ++ )) ++ ) ++ ) = head ((1 : (( ++ ) ++ )) ++ ) = head (1 : ((( ++ ) ++ ) ++ )) = 1
This shows that it takes 4 steps to force the first element of the left-associated
concatenation of 4 singleton lists. To force the next element, we’d need to
continue to force the head of
((( ++ ) ++ ) ++ ). It follows that forcing the
entire list takes O(n2).
One might be wondering, why not use right-associated concatenation instead, which gives O(1) access to the first element and O(n) access to the whole list? Because left-associated concatenation is not always avoidable. If you have a list of lists to concatenate, you’d obviously choose to use right-associated concatenation. But any time you concatenate two lists, the first list may be the result of another concatenation, and that concatenation may have been done somewhere else.
We obviously would like to bring down the cost of left-associated concatenation to O(n). As to
accessing the first element, it is not possible to do better than O(n) in the worst case, since
in a left-associated concatenation, the first element of the result is
in the deepest layer of the expression tree which has O(n) layers. However, it is desirable if accessing the first
element takes O(1) amortized time. This is already the case with regular
list concatenation: although accessing the head of
(( ++ ) ++ ) ++  takes
4 steps, it also takes as many steps to construct this expression. The cost of the former can thus
be amortized to O(1). And once we reduce the above expression to
1 : ((( ++ ) ++ ) ++ ),
accessing the first element again takes constant time. In bringing down the cost of concatenation, it is
desirable to avoid compromising the performance of inspecting the front of the list.
To this end, two approaches are worth discussing: difference lists, and catenable lists. As we shall see, difference lists support efficient concatenation, but not efficient inspection. Inspecting the head of a difference list is an O(n) operation. Catenable lists, on the other hand, supports both efficient concatenation and efficient inspection. Catenable lists are first presented in the famous book, Purely Functional Data Structures.
The name “catenable list” is slightly misleading, especially when introduced together with difference lists, because catenable lists are not just catenable, but also inspectable, and being inspectable is an important distinction against difference lists. The reason it is called “catenable list” is probably because lists are by default inspectable. I would probably call it “catspectable list”. Anyway, I’ll stick to the standard name.
Both approaches can be generalized to structures other than lists, such as binary trees and monads, which will also be discussed in this post.
Difference lists, often referred to as
DList, is a fairly well known technique for fast
list concatenation. There’s a good number of articles introducing difference lists, so I won’t go
into the details. From a high level, difference lists can be understood in a number of
- Difference lists effectively turn list concatenation into function composition, and empty list into the identity function. Functions can be composed in constant time.
- Difference lists delay the concatenation by storing the to-be-concatenated lists as functions, and only perform the actual concatenation at the end, when a difference list is converted to a regular list.
- Difference lists build up right-associated concatenation expressions, rather than actually concatenating the lists.
- Difference lists is an example of continuation-passing style (CPS). It turns a regular list,
[a], into a continuation,
[a] -> [a]. At the term level, a regular list
xsis turned into a continuation,
(xs ++). A continuation can be applied to the empty list to recover the regular list.
Choose whichever one (or ones) that works the best for you.
An Issue with Difference Lists
Difference lists are extremely useful, but it suffers from an issue hinted earlier: a difference list built up via left-associated function composition isn’t efficient if we need to repeatedly inspect the front of the list. For instance, consider the following difference list:
((( ++) . ( ++)) . ( ++)) . ( ++)
To inspect the head of this difference list, we need to apply this function to the empty list, which gives
us a regular list, and then reduce it to weak head normal form, i.e.,
1 : _. This takes O(n) steps. The problem
is that if we need to inspect the head again, we’d have to repeat these steps, which takes another O(n) steps.
Of course, if this is the final difference list we want (i.e., we are done with concatenating to it), we can convert it into
a regular list, and then inspecting the head repeatedly wouldn’t be a problem. But it can be a problem if we alternate between
concatenation and inspection.
Here’s an extreme example:
import qualified Data.DList as D import Data.List (foldl') longList :: [Int] longList = [2..1000000] f :: Int -> [Int] f n = take n . D.toList . foldl' g (D.singleton 1) $ map D.singleton longList where g xs ys | D.head xs == 1 = xs <> ys | otherwise = ys <> xs -- unreachable
f n concatenates a million singleton difference lists, and take
n elements of the result. At each step, it inspects the head of the current result, and if it is 1,
it appends the next singleton list to the result (
xs <> ys), otherwise it prepends
the singleton list to the result (
ys <> xs). This has very poor performance
even for n = 1.
This is of course a silly example, because the head of the result is always 1, and we know that, so there’s no reason to keep checking it for a million times. It’s meant to be an extreme example. In practice it may not be as bad as this, but whenever you inspect the same head more than once, you are paying a price due to repeating the same steps.
Granted, this is quite unlikely to cause a real problem in practice, because for it to be a problem, you’d have to repeatedly inspect the front of a left-associated difference list many times, which is often possible to avoid. After all, if a difference list remains left-associated, the front of the list will remain the same, and there’s probably no need to repeatedly inspect it. Therefore difference lists remain extremely useful in practice, as long as you don’t do silly things with it like the example above. That being said, it is possible to overcome this issue using catenable lists.
The reason difference lists are inefficient for repeated inspection is because a difference list is basically a function, which needs to be applied to an argument (empty list) to get a value, before it can be inspected. And the reason to convert each list to a function is because we want to delay the actual concatenation. This may make one wonder: why even bother converting the lists to functions? If our goal is to delay the concatenation, why do we need to do anything at all to the lists when concatenating them? Can’t we just store the lists in some data structure, and concatenate them only when needed?
If this is what you are thinking, congratulations, you are spot on. Let’s try to implement this idea
and see how it goes. We’ll use the following
CList data type:
data CList a = Cnil | Ccons a [CList a]
CList a has a head element
a, and a tail. Unlike regular lists, the tail is
CList a but a
[CList a], containing a number of to-be-concatenated
CLists. Let’s try to
implement the append operator,
<+>, as well as functions that convert a
CList to and from a regular
data CList a = Cnil | Ccons a [CList a] (<+>) :: CList a -> CList a -> CList a Cnil <+> ys = ys Ccons x xss <+> ys = Ccons x (snoc xss ys) toRegularList :: CList a -> [a] toRegularList Cnil =  toRegularList (Ccons x xss) = x : concatMap toRegularList xss fromRegularList :: [a] -> CList a fromRegularList  = Cnil fromRegularList [x] = Ccons x  fromRegularList (x:xs) = Ccons x [fromRegularList xs] cHead :: CList a -> a cHead Cnil = error "cHead: empty list" cHead (Ccons x _) = x cTake :: Int -> CList a -> CList a cTake n = fromRegularList . take n . toRegularList
snoc is defined as
snoc xs x = xs ++ [x]
and for now, pretend that
snoc is O(1) (it is in fact O(n) for cons-lists).
As we can see, the
<+> operator simply appends
ys, the second
CList, to the end of
xss, the tail of the first
CList. Since we are assuming
snoc is O(1), it follows that
<+> is an
O(1) operation. It is easy to see that both
O(n), and both functions allow the result to be consumed lazily (since they both use guarded recursion).
So everything appears to be good, except that
snoc is actually an O(n) operation for cons-lists.
Thus cons-list is not a good choice for the tail of
CLists. We can’t simply use
snoc-lists either, because in order to convert
CList a to
[a] and access the front of
[a] efficiently, the sequence type obviously needs to support efficient access to its front
elements. Therefore, we need a sequence type that supports efficient
If this sounds like a queue, it indeed is. There exist purely functional
queues that offer O(1) queue operations (even in the worst-case). Details
can be found in Purely Functional Data Structures. Another good candidate
Seq data type, which is backed by finger trees.
If we run the same example as the one that shows the inefficiency of
DList with the following
data CList a = Cnil | Ccons a (Seq (CList a))
it should be much faster thanks to
CList allowing efficient repeated inspections.
CList supports both efficient concatenation and efficient inspection,
I personally wouldn’t consider it a terribly useful thing in practice for the following reasons:
- As mentioned earlier, I think
DListis almost always sufficient. As long as you don’t run into pathological use cases like the one shown above,
DListshould be faster than
CListdue to lower overhead. And if you do, it is often possible to rewrite it in a more efficient way.
- Even if you do have a use case that relies on both concatenation and inspection being fast, you can simply use
Seqis slightly slower than
CListfor concatenation, requiring O(log(min(n1,n2))), as opposed to O(1) for an optimal implementation of
CList, but it should be hardly noticeable in practice.
Seqalso supports other efficient operations not supported by
CList, such as O(1)
Seqis from the
containerspackage and the implementation is highly optimized.
I’m not aware of a catenable list implementation on Hackage. Even if there is one, it certainly has
much less adoption compared to
The real value of
CList lies in the observation that the same idea can be extended to
speed up a variety of data types and operations, such as binary tree concatenation and
monadic bind. In the next section I’ll discuss generalizing
CList to binary trees.
On a side note, it is worth mentioning that a regular list may have multiple corresponding
CLists with different structures. For example, converting [1,2,3] to
CList could result in either of the following two
Ccons 1 [ Ccons 2 [ Ccons 3  ] ] Ccons 1 [ Ccons 2 , Ccons 3  ]
fromRegularList function returns the former, but it’s an arbitrary choice.
The choice does not affect performance, at least asymptotically.
Generalizing to Binary Trees
Suppose we have the following binary tree data type, and a function to append
a tree onto another tree, which replaces all
Null nodes in the first tree
with the second tree:
data Tree a = Null | Fork (Tree a) a (Tree a) (××) :: Tree a -> Tree a -> Tree a Null ×× t = t Fork l x r ×× t = Fork (l ×× t) x (r ×× t)
×× traverses the first tree, it has exactly the same problem as
CList can be generalized to binary trees.
DList can be easily generalized
to any monoid:
type DMonoid a = a -> a fromMonoid :: Semigroup a => a -> DMonoid a fromMonoid = (<>) toMonoid :: Monoid a => DMonoid a -> a toMonoid = ($ mempty) append :: Semigroup a => DMonoid a -> DMonoid a -> DMonoid a append = (.) concat :: Semigroup a => [DMonoid a] -> DMonoid a concat = foldr append id
It is also easy to generalize
CList to binary trees:
data CTree a = Cnull | Cfork [CTree a] a [CTree a]
The implementation of the append operator
<×>, and the functions
fromRegularTree, are left as exercises.
CList can also be generalized to speed up left-associated monadic binds.
The counterpart of
DList for monadic binds is Codensity, and the counterpart of
CList is type-aligned sequences, introduced
in Reflection without Remorse. I won’t go
into these since this post is already getting quite long.
With a good grasp of
CList, it is not too hard to understand Codensity
and type-aligned sequences by making analogies.
- We generalized
DMonoidfor any monoid. Can a similar generalization be made to
CListfor non-empty lists.