Stage Fright Peeled Away: Writing the 'peel' Function with Template Haskell

I was recently updating a work project’s user guide, and added a page that involves Template Haskell (TH). A reviewer noted that TH can be challenging to grasp and suggested adding pointers to learning resources. I’ve always found the trickiest part of TH to be the stage errors, which often confuse even experienced Haskell developers. I couldn’t find an existing TH tutorial that covers stage errors with the level of detail I was hoping for, so I decided to fill that gap myself.

Understanding stage errors in TH not only helps make sense of GHC compilation errors, but also helps one write correct TH code that produces the intended output. Moreover, this knowledge translates well to other languages with similar metaprogramming constructs.

This post starts with an explanation of stages and stage errors, and then applies that knowledge to demonstrate the implementation of the peel function, which unrolls a given number of layers from an arbitrary recursive function at compile time.

Stage Errors Explained

TH revolves around two key concepts: quotations ([| ... |] and look-alikes) convert code into abstract syntax trees (ASTs) for manipulation, while splices ($(...) or $$(...)) do the reverse - turning ASTs back into code that can be combined with the surrounding code. This post focuses on typed expression quotations, [|| ... ||], which turns code into CodeQ a, and typed expression splices, $$(...), which convert CodeQ a back into code. All of the following also applies to untyped expression quotations ([| ... |]) and untyped splices ($(...)). The rules are similar but slightly different for other kinds of quotations - type, pattern and declaration quotations. We won’t cover these to keep things short.

To understand stage errors, keep the following four rules about quotations and splices in mind:

1. Without TH, all expressions in the code are evaluated at runtime. For example, writing x = 3 + 5 means 3 + 5 is computed at runtime1, when x is evaluated.

2. With TH, expressions can be evaluated at different stages. Quotations delay the evaluation of quoted code relative to the surrounding code, while splices do the opposite.

Consider a top-level binding x = g $$(f 2). Here f 2 is evaluated at compile time - earlier than evaluating the application of g. The compiler evaluates f 2, producing an AST, which the splice operator converts into code. It then proceeds to compile x = g <code generated by f 2>. This is the whole point of TH: generating and manipulating code at compile time.

As a contrasting example, consider x = g [|| f 2 ||]. Now f 2 is inside a quotation, meaning it is evaluated later than the surrounding code. Indeed, when the application of g is evaluated, f 2 is not - the argument g receives is, in effect, the AST representing f 2. f 2 is only evaluated if and when x is eventually used in a splice.

3. Quotations and splices cancel each other out. For example, foo f = [|| $$(f 2) ||] is equivalent to foo f = f 2, as is foo f = $$([|| f 2 ||]).

4. Generally, a variable can only be used at the same level where it is bound. The level is determined by the number of surrounding quotations or splices, after adjacent ones cancel out.

Without quotations or splices, the code is at level 0. Each quotation increases the level, while each splice decreases it2. For instance, this is valid: foo f = [|| 3 + $(f 2) ||], because f is bound at level 0 and used at level 0 (assuming no additional surrounding quotations or splices). But the following aren’t valid:

  • foo f = $$(f 2), because f is bound at level 0 but used at level -1, meaning it is used before it is bound. The compiler needs to evaluate f 2 first, splice the resulting code into foo’s definition, and then continue compiling foo f = <code generated by f 2>. But f is a parameter that isn’t determined until foo is called, making this impossible.
  • foo = [|| \f -> $$(f 2) ||], because f is bound at level 1 but used at level 0.
  • foo x = unsafeCodeCoerce (varE 'x). This is usually equivalent to foo x = [|| x ||], but I’m avoiding [|| x ||] here because there are cases where it is valid to write foo x = [|| x ||], as I’ll explain shortly. This has the opposite problem: x is bound at level 0 but used at level 1. It is problematic because when the result of foo x is spliced into the surrounding code, x no longer exists as a variable, so varE 'x no longer makes sense. Instead, x has been replaced by a concrete argument, effectively becoming a literal3.

GHC has several variations of the stage error message. It may say “the non-top-level quoted name ‘x must be used at the same stage at which it is bound”, or “‘x’ is used in a top-level splice, quasi-quote, or annotation, and must be imported, not defined locally”, or “‘x’ is bound at stage 2 but used at stage 1”. These all result from breaking rule #4.

Exceptions to Rule #4

Rule #4 has two exceptions. First, it applies only within a single module. If an identifier is defined in another module and imported into the current module, then the restriction does not apply, and it can be used at any level.

In theory, an identifier should be usable at any level if its value is statically known. This holds for top-level identifiers (both defined in the same module and imported) and certain local identifiers. However, in practice, while top-level identifiers defined in the same module can be used at a later level (e.g., level 1), only imported identifiers can be used at an earlier level (e.g., level -1). This limitation is known as the stage restriction.

Second, and this explains why foo x = [|| x ||] is sometimes allowed: it doesn’t compile this code into foo x = unsafeCodeCoerce (varE 'x), but instead into foo x = liftTyped x. Essentially, when foo is applied to a concrete argument, the argument itself, rather than the variable x, is converted into AST nodes. For example, if foo is applied to 2, the generated AST contains LitE 2 instead of VarE 'x.

This only works if the type of x has a Lift instance, which is usually true for types that don’t contain functions. However, functions generally cannot be lifted.

An Example: The peel Function

Equipped with the understanding of stages and stage errors, we are ready to implement the peel function. It takes a number k, and the step function of a recursive function, and generates a new recursive function with the first k layers unrolled. This is similar to loop unrolling, except we only unroll once, covering the first k steps.

We’ll use the length function as our running example, which computes the length of a list. The standard way to define length using a step function is:

length :: forall a. [a] -> Int
length = fix step
  where
    step = \self xs -> case xs of [] -> 0; y:ys -> 1 + self ys

If we peel 3 layers off, the definition should look like this:

-- Desired definition of `lengthPeeled`
lengthPeeled :: forall a. [a] -> Int
lengthPeeled xs =
  case xs of
    [] -> 0
    y:ys -> 1 +
      case ys of
        [] -> 0
        z:zs -> 1 +
          case zs of
            [] -> 0
            w:ws -> 1 + fix step ws
  where
    step = -- same as above

lengthPeeled is what we aim to generate using TH, because writing it by hand becomes tedious quickly for larger values of k.

Attempt 1: no TH

Before diving into TH, let’s first question why we really need it. Could we just write it like this instead?

-- Attempt 1 - doomed to fail
peel_1 :: Int -> ((a -> b) -> a -> b) -> a -> b
peel_1 0 f = fix f
peel_1 k f = f (peel_1 (k - 1) f)

lengthPeeled_1 :: forall a. [a] -> Int
lengthPeeled_1 = peel_1 3 step
  where
    step = -- same as above

No, this of course won’t work. Remember rule #1 above: without TH, all evaluation happens at runtime. This means peel_1 is compiled as is into the target code, and the “peeling” takes place at runtime. This is certainly not what we want, and it has no advantage over the regular length, only additional overhead. We need lengthPeeled to be generated at compile time.

Attempt 2: put peel’s result in a splice

How do we ensure the peeling happens at compile time? Recall rule #2: code inside a splice evaluates earlier than the surrounding code, while code inside a quotation evaluates later. Therefore, to make the peeling happen at compile time (which of course is earlier than runtime), the application of peel must be placed inside a splice, as in

lengthPeeled_2 :: forall a. [a] -> Int
lengthPeeled_2 = $$(peel_2 3 step)

And this means peel_2 must return CodeQ (a -> b), rather than a -> b. But what should the argument types of peel_2 be? Can it still take Int and (a -> b) -> a -> b? Let’s give it a try. The only way to write peel_2 with such a type would be

-- Attempt 2 - doesn't even compile
peel_2 :: Int -> ((a -> b) -> a -> b) -> CodeQ (a -> b)
peel_2 0 f = [|| fix f ||]

Well, we don’t even need to continue to write the peel_2 k f case. This already violates rule #4: f is bound at level 0 but used at level 1, and since it has a function type, it can’t be lifted. Note that there’s no problem with fix being used at level 1, since it is imported from another module.

Attempt 3: also put peel’s second argument in a splice

It should be clear by now that since f is neither a top-level variable nor can it be lifted, to avoid violating rule #4, f must be used at level 0. The more obvious way to achieve this is to make peel take CodeQ ((a -> b) -> a -> b) as the second argument:

-- Attempt 3 - still not quite right
peel_3 :: Int -> CodeQ ((a -> b) -> a -> b) -> CodeQ (a -> b)
peel_3 0 f = [|| fix $$f ||]
peel_3 n f = [|| $$f $$(peel_3 (n - 1) f) ||]

This compiles! In this definition, f, n and peel_3 are all used at level 0 (as each usage is surrounded by one splice and one quotation) - no stage violation! Now we can proceed to define lengthPeeled_3 as follows. Note that due to rule #4, lengthPeeled_3 must be placed in a different module than the one that defines peel_3.

lengthPeeled_3 :: forall a. [a] -> Int
lengthPeeled_3 =
  $$( let step :: CodeQ (([a] -> Int) -> [a] -> Int)
          step = [|| \self xs -> case xs of [] -> 0; y:ys -> 1 + self ys ||]
       in peel_3 3 step
    )

GHC is happy with this, but does it do what we want? Sadly, no. If we use -ddump-splices to inspect the generated code, we’ll see that it produces something equivalent to the following:

-- Generated by TH
lengthPeeled_3 :: forall a. [a] -> Int
lengthPeeled_3 = step (step (step (fix step)))
  where
    step = \self xs -> case xs of [] -> 0; y:ys -> 1 + self ys

This isn’t quite what we want. What we’ve done is expand fix step into step (step (step (fix step))). While this may not be totally useless, it definitely falls short of the desired lengthPeeled. The desired lengthPeeled is equivalent to taking step (step (step (fix step))), and applying beta reduction 3 times. Specifically, applying one beta reduction gives

step (step (\zs -> case zs of [] -> 0; w:ws -> 1 + fix step ws))

and applying two more beta reductions results in what we want. This is why the desired lengthPeeled is better: not only do fewer beta reductions need to be performed at runtime, but performing them at compile time could expose additional optimization opportunities for the compiler. To generate it correctly, we need to rethink our approach.

Attempt 4: evaluate f (peel (n-1) f) earlier

The problem with attempt 3 is that the code isn’t sufficiently evaluated at compile time. In particular, in the definition of peel_3, expression peel_3 (n - 1) f is at level 0 (because it is surrounded by one splice and one quotation), which means peel_3 (n - 1) f is evaluated when peel_3 n f is evaluated. This is desirable, because when compiling lengthPeeled_3, expression peel_3 3 step is evaluated at level -1 (compile time), and as a result, all applications of peel_3 are evaluated at compile time. This is what expands fix step into step (step (step (fix step))).

But on the other hand, the application of f - $$f $$(peel_3 (n - 1) f) - is evaluated too late, because it is at a later level - level 1 (since it is only surrounded by a quotation). This means it is not evaluated at compile time when compiling lengthPeeled_3. This application is exactly where the desired beta reductions take place, so we need to find a way to put it at a lower level - level 0.

Recall that level 0 means something is either not surrounded by quotations or splices, or surrounded by an equal number of quotations or splices. This suggests that in the definition of peel_4, the peel_4 k f case could simply be

peel_4 k f = f (peel_4 (k-1) f) -- no quotation or splice - everything is at level 0

This appears identical to attempt 1, but only at the term level. The types are very different, since peel_4 needs to return CodeQ (a -> b). In order for the types to align correctly, peel_4’s type needs to be

peel_4 :: Int -> (CodeQ (a -> b) -> CodeQ (a -> b)) -> CodeQ (a -> b)

What about the peel_4 0 f case? This turns out to be a bit more involved, as it requires nested quotations, but guided by the types and the knowledge of stage errors, it isn’t too difficult to get to the correct definition:

peel_4 0 f = [|| fix (\self -> $$(f [|| self ||])) ||]

This is well-staged: f is bound and used at level 0, self is bound and used at level 1, and fix is imported. All is well. And once again, guided by the types and the knowledge of stage errors, it is relatively straightforward to arrive at the correct definition of lengthPeeled_4:

lengthPeeled_4 :: forall a. [a] -> Int
lengthPeeled_4 =
  $$( let step :: CodeQ ([a] -> Int) -> CodeQ ([a] -> Int)
          step = \self -> [|| \xs -> case xs of [] -> 0; y:ys -> 1 + $$self ys ||]
       in peel_4 3 step
    )

This produces exactly the code we want.


Update (03/11/2025) : a reader (AndrasKovacs on Reddit) pointed out another alternative, using a custom fix rather than the standard one from base:

peel_5 :: Int -> ((CodeQ a -> CodeQ b) -> CodeQ a -> CodeQ b) -> CodeQ (a -> b)
peel_5 k f = [||\a -> $$(go k f [||a||])||]
  where
    go :: Int -> ((CodeQ a -> CodeQ b) -> CodeQ a -> CodeQ b) -> CodeQ a -> CodeQ b
    go 0 f a = [||let body a = $$(f (\a -> [||body $$a||]) [||a||]) in body $$a||]
    go k f a = f (go (k - 1) f) a

lengthPeeled_5 :: forall a. [a] -> Int
lengthPeeled_5 =
  $$( let step :: (Code Q [a] -> Code Q Int) -> Code Q [a] -> Code Q Int
          step = \self xs -> [|| case $$xs of [] -> 0; y:ys -> $$(self [|| ys ||]) ||]
       in peel 3 step
    )

This generates more optimized code that performs additional beta reductions when compiling with -O0. It would be a good exercise to try to figure out why.


  1. Assuming the compiler does not perform constant folding. 

  2. Single-quoting an identifier, e.g., 'x, is equivalent to placing it in a quotation ([|| x ||]) for the purpose of calculating levels. For instance, in f x = 'x, x is bound at level 0 and used at level 1. 

  3. On the other hand, foo x = [p| x |] is always valid, because the x in the quotation does not refer to the function parameter x. Instead, [p| ... |] quotes patterns, meaning the x within it is a fresh variable introduced in a variable pattern that always succeeds.