Data.List.genericLength and Space Leaks

The codebase I regularly work with uses in many places integer types that are not Int, including Integer, Natural, Word32, Word64, etc. A number of functions in Data.List, however, only works with Int, such as length, and so there’s a lot of fromIntegrals floating around, which isn’t pretty. I was once tempted to get rid of all these fromIntegrals by using genericLength, and a coworker reviewing the PR pointed out that the doc of genericLength mentions the following:

It is, however, less efficient than length.

My first reaction was, how can genericLength be slower than fromIntegral . length? If that is the case, why wouldn’t it be implemented as fromIntegral . length?

So I took a look at the actual implementation:

genericLength           :: (Num i) => [a] -> i
{-# NOINLINE [1] genericLength #-}
genericLength []        =  0
genericLength (_:l)     =  1 + genericLength l

{-# RULES
  "genericLengthInt"     genericLength = (strictGenericLength :: [a] -> Int);
  "genericLengthInteger" genericLength = (strictGenericLength :: [a] -> Integer);
 #-}

strictGenericLength     :: (Num i) => [b] -> i
strictGenericLength l   =  gl l 0
                        where
                           gl [] a     = a
                           gl (_:xs) a = let a' = a + 1 in a' `seq` gl xs a'

It is apparent from the above that genericLength is basically implemented as a right fold, except that when i is either Int or Integer, it uses strictGenericLength instead, which is essentially a strict left fold. The default right fold implementation can be extremely inefficient because it builds up a thunk,

1 + (1 + (1 + ...

that is as large as the list itself, before starting to reduce it. For a detailed explanation on why foldr (and the lazy foldl) can cause space leaks, and why the strict foldl' is a much better option for such cases, see “foldr foldl foldl’”. What this means is that genericLength is as efficient as length for Int and Integer, but much less efficient for many other integer types like Word and Natural.

This begs the question: why is genericLength implemented this way? Why not always use strictGenericLength? I asked this very question on Stack Overflow, and the answer is that it is implemented this way so that it can work with lazy natural numbers. This is also mentioned in this GHC wiki page. The wiki page also mentions another argument against genericLength: code that uses it and does not give the result an explicit signature will get Integer by default, which is usually not what one wants. In this case, though, GHC will give you a “type defaults” warning, so as long as you enable all warnings, you shouldn’t get an Integer unknowingly.

The other generic functions in Data.List, including genericTake, genericDrop, genericSplitAt, genericIndex and genericReplicate, are safe to use as they don’t have the same issue as genericLength. To sum up, it is a good idea to almost always avoid genericLength unless you really know what you are doing.