Haskell is a great, if not the best language for embedding DSLs in. Thanks to Haskell’s modern type system and elegant syntax, devising embedded domain-specific languages (EDSL) that are both low friction and type safe is often fairly achievable. Let’s take a simple EDSL as a running example, and evolve it in a series of steps to make it more user-friendly and safer. The main machineries involved are the indexed state monad (IxState) and some moderate type-level programming.

The EDSL we are going to build is for constructing record data types, particularly those for
holding configuration settings. As a starter, the following data type called `X`

will be used:

```
data X = X
{ aField :: A
, bField :: B
, csField :: NonEmpty C
}
type A = Int
type B = String
type C = Double
```

This `X`

data type is just a toy example. You can substitute for it whatever similar data types you
work with in practice. For example, `X`

could be a configuration for a service, a script with a non-empty
list of commands, a user account where `csField`

is the operations the user
is authorized to perform, or the description of a data migration job with a number of steps to be executed.
It can literally be anything that is a record containing a number of fields.

Now, suppose we want to equip our users with a DSL to construct terms of type `X`

. What are our options?

# Version 1: Plain Record Syntax

(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version1.hs)

This is, of course, not really a DSL. It’s just an L – the plain vanilla Haskell record syntax. To construct
a term of type `X`

, we can write something like

```
import Data.List.NonEmpty as NonEmpty
x :: X
x = X
{ aField = 42
, bField = "Hello"
, csField = NonEmpty.fromList [1.0, 2.0]
}
```

This looks reasonably good. So why bother doing anything else? Becuase there are a number of things not so desirable about this record syntax:

- Not everybody loves the record syntax. The more fields
`X`

has, and the more nested the fields are, the more aggravating it becomes. - It is often useful to have partially constructed
`X`

values which can be reused in constructing other`X`

s. With this approach it is not clear how you can partially construct an`X`

. None of the three fields has a`Maybe`

type, so you can’t use`Nothing`

to signal that a field is not yet set. - Related to the above point, there is no decent way (that I know of) to provide default values to some fields.
- The record syntax becomes difficult to use when the fields are more complex than simple values like numbers,
strings and lists. For example, trees. Later in this post, we are going to extend
`X`

such that each item in`csField`

can be associated with a label; labels are potentially nested and thus form a tree structure. Using the record syntax directly would become quite cumbersome in that case.

Another common way to construct a record is lens, so let’s see how that works.

# Version 2: Lens

(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version2.hs)

To use lens, we make the field types of `X`

abstract, and make lenses for each field:

```
{-# LANGUAGE TemplateHaskell #-}
import qualified Control.Lens as Lens
data X_Abs a b cs = X_Abs
{ _aField :: a
, _bField :: b
, _csField :: cs
}
Lens.makeLenses ''X_Abs
```

Then we can write:

```
type X_Empty = X_Abs () () ()
type X = X_Abs A B (NonEmpty C)
x_empty :: X_Empty
x_empty = X_Abs () () ()
x :: X
x = Lens.set aField 42
. Lens.set bField "hello"
. Lens.set csField (fromList [1.0, 2.0])
$ x_empty
```

By making the field types of `X`

abstract, the second issue mentioned in Version 1
is addressed: we can now have partially constructed `X`

values. But it has a new problem:
now it is possible to accidentally set the same field multiple times. The following code typechecks:

```
x :: X
x = Lens.set aField 42
. Lens.set aField 43
. Lens.set bField "hello"
. Lens.set cField (fromList [1.0, 2.0])
$ emptyX
```

This isn’t much of a worry for our toy data type `X`

, but the larger number of fields, the more
unsafe it is. The problem is that `Lens.set`

is too polymorphic. We need to restrict `Lens.set aField`

to only be able to alter the type of `_aField`

from `()`

to `Int`

, and not from `Int`

to `Int`

. There are at least
two ways to do so. The first is to specialize the type parameters of `Lens.set`

for each field, for example:

```
set_aField :: a -> X_Abs () b cs -> X_Abs a b cs
set_aField = Lens.set aField
```

With this specialized type, calling `set_aField`

twice no longer type checks.

The second and fancier way is to specialize the constraints, rather than the type parameters, of `Lens.set`

,
by making use of a type family that returns `Constraint`

:

```
type family ExactlyOnceConstraints field a a' b b' cs cs' :: Constraint where
ExactlyOnceConstraints A a a' b b' cs cs' = (a ~ (), b ~ b', cs ~ cs')
ExactlyOnceConstraints B a a' b b' cs cs' = (a ~ a', b ~ (), cs ~ cs')
ExactlyOnceConstraints (NonEmpty C) a a' b b' cs cs' = (a ~ a', b ~ b', cs ~ ())
set :: ExactlyOnceConstraints y a a' b b' cs cs'
=> Lens.ASetter (X_Abs a b cs) (X_Abs a' b' cs') x y
-> y -> X_Abs a b cs -> X_Abs a' b' cs'
set = Lens.set
x :: X
x = set aField 42
. set bField "Hello"
. set csField (NonEmpty.fromList [1.0, 2.0])
$ x_empty
```

With `ExactlyOnceConstraints`

in place, setting any field twice is illegal. This approach
requires that each field of `X`

have a distinct type. This can be achieved by using newtypes
to distinguish fields with the same type, and it is usually considered good practice to do so.

# Version 3: Enabling do-notation via IxState

(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version3.hs)

This version of the DSL takes advantage of the indexed state monad (IxState) to convert regular
function composition (`.`

) into composition of monadic actions (`>>`

), allowing us to construct
`X`

terms in do-notations, for instance

```
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
set csField (NonEmpty.fromList [1.0, 2.0])
```

The idea is straightforward: `X_Abs`

is now the state in the (indexed) state monad, and
`set`

modifies the state behind the scenes. This makes our DSL look like imperative programming
with a bunch of `set`

statements, and in my opinion, nicer.

The reason to use the indexed state monad, rather than the regular state monad, is to ensure
each field is set exactly once, same as the guarantee that Version 2 achieves. The initial state
would be `X_Empty`

(which is `X_Abs () () ()`

). After setting `aField`

, the state becomes
`X_Abs A () ()`

, and so on. Since each call to `set`

changes the type of the state, it is
necessary to use the indexed state monad.

Basically the only difference in the implementation compared to Version 2 is that
`set`

now returns `IxState (AbsX a b cs) (AbsX a' b' cs') ()`

, in contrast to
`X_Abs a b cs -> X_Abs a' b' cs'`

. The transition from `X_Abs a b cs`

to `AbsX a' b' cs'`

happens
in an `IxState`

action, as opposed to a function. To enable do-notation for `IxState`

, we need to turn on
`RedindableSyntax`

, and rebind the `(>>)`

operator using `(>>>=)`

from `Control.Monad.Indexed`

.
In Version 3 it is sufficient just to rebind `(>>)`

. In the next version and later you’ll also
need to rebind `(>>=)`

.

Like Version 2, the order in which the fields are set doesn’t matter. The code is well-typed as long as each field is set exactly once.

# Version 4: Adding Nested Labels

(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version4.hs)

Now, let’s make things more interesting by adding labels to `X`

. We want to be able
to attach labels to each value in `_csField`

, and a label can be nested under
another. Ideally, we want to construct `X`

terms with the following code, which uses
nested do-blocks for nested labels:

```
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0 -- 1.0 has no label
label "label-foo" $ do
new_c 2.0 -- 2.0 is labeled ["label-foo"]
label "label-bar" $ do
new_c 3.0 -- 3.0 is labeled ["label-foo", "label-bar"]
label "label-baz" $ do
new_c 4.0 -- 4.0 is labeled ["label-foo", "label-bar", "label-baz"]
new_c 5.0 -- 5.0 is labeled ["label-foo", "label-bar"]
new_c 6.0 -- 6.0 is labeled ["label-foo"]
new_c 7.0 -- 7.0 is labeled ["label-foo"]
new_c 8.0 -- 8.0 has no label
```

For this to work, every time we create a new `C`

value using `new_c`

, we need to know
the label we currently have, so that we can associate it with the new `C`

we create.
This can be pulled off using the Reader monad. Whenever we create a new label using
`label`

, we can use `local`

to locally add the new label to the environment.

So now our indexed state monad is composed with the reader monad, and we give it an alias `X_M`

:

```
type X_M i j a = IxStateT (Reader Label) i j a
type Component = String
type Label = [Component]
type C_Labeled = (Label, C)
```

Corresponding changes are made to the definitions of `X`

and `X_Empty`

to add labels:

```
type X_Empty = X_Abs () () [C_Labeled] -- Instead of "AbsX () () ()"
type X = X_Abs A B (NonEmpty C_Labeled) -- Instead of "AbsX A B (NonEmpty C)"
```

Type `X_Empty`

becomes `AbsX () () [C_Labeled]`

as opposed to `AbsX () () ()`

, because we
are no longer setting the entire `_csField`

in one fell swoop, but via multiple calls to `new_c`

.
Each time we call `new_c`

, a new `C_Labeled`

is added, and it needs to be added to *somewhere*.
In other words, there needs to be something that we can add each new `C_Labeled`

to. Since we can’t
add a `C_Labeled`

to `()`

, we need to use an empty list initially. The `new_c`

function is defined as:

```
new_c :: (IsList cs, Item cs ~ C_Labeled)
=> C
-> X_M (X_Abs a b cs) (X_Abs a b (NonEmpty C_Labeled)) ()
new_c c = do
labels <- ilift ask
let labeled = (labels, c)
imodify $ Lens.over csField ((labeled :|) . toList)
```

`IsList`

is a typeclass that both regular lists and non-empty lists are instances of. Before
any `new_c`

is called, `cs ~ [C_Labeled]`

, and after the first `new_c`

, `cs ~ NonEmpty C_Labeled`

.

The `label`

function uses `ilocal`

(which lifts `local`

into `IxStateT`

) to change the environment
by prepending the given label component to the label in the environment:

```
label :: Component -> X_M i j () -> X_M i j ()
label = ilocal . (:)
ilocal :: (Label -> Label) -> X_M i j a -> X_M i j a
ilocal f m = IxStateT $ local f . runIxStateT m
```

Function `set`

has exactly the same implementation as Version 3, except that its return type
becomes `X_M (AbsX a b cs) (AbsX a' b' cs') ()`

, as opposed to `IxState (AbsX a b cs) (AbsX a' b' cs') ()`

.
Similarly, type alias `MkX`

becomes `X_M X_Empty X a`

instead of `IxState X_Empty X a`

.

# Version 5: Guarding Against Incorrect Indentations

(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version5.hs)

One problem with the Version 4 DSL is that, since it is indentation-sensitive (as Haskell itself is),
it is prone to bugs caused by accidentally using the incorrect indentations. If, for instance,
`label "label-baz"`

is indented with two fewer spaces, the code would still be well-typed, but
“label-baz” would no longer be nested under “label-bar”.

To make this class of errors less likely, we can augment the `label`

function with type level
natural numbers from `GHC.TypeLits`

. Whenever a new label component is added by calling `label`

,
we annotate it with the current indentation level, and if the annotation is inconsistent with the level,
it would be a type error. Version 5 of our DSL looks like the following:

```
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @0 "label-foo" $ do
new_c 2.0
label @1 "label-bar" $ do
new_c 3.0
label @2 "label-baz" $ do
new_c 4.0
new_c 5.0
new_c 6.0
new_c 7.0
new_c 8.0
```

In the above code, if we replace any `label @n`

with `label @m`

where `m ≠ n`

, it should fail to compile.

To accomplish this, we need to keep track of the current indentation level in the types, and
to do so, we add a new type parameter of kind Nat
to `X_M`

. Alternatively, we can add the new type parameter to `X_Abs`

, but this is unnecessary
since neither `set`

nor `new_c`

changes the indentation level.

To add the new type parameter, we make `X_M`

a newtype in lieu of a type alias:

```
newtype X_M (n :: Nat) i j a = X_M
{ runX_M :: IxStateT (Reader Label) i j a }
deriving (IxFunctor, IxPointed, IxApplicative, IxMonad, IxMonadState)
-- Instead of: type X_M i j a = IxStateT (Reader Label) i j a
```

Other than that, the implementation is almost identical to Version 4, except that a few type signatures are changed:

```
type MkX a = X_M 0 X_Empty X a
set :: ExactlyOnceConstraints y a a' b b' cs cs'
=> Lens.ASetter (X_Abs a b cs) (X_Abs a' b' cs') x y
-> y -> X_M n (X_Abs a b cs) (X_Abs a' b' cs') ()
set = -- omitted, same as Version 4
new_c :: (IsList cs, Item cs ~ C_Labeled)
=> C
-> X_M n (X_Abs a b cs) (X_Abs a b (NonEmpty C_Labeled)) ()
new_c = -- omitted, same as Version 4
label :: forall n i j. Component -> X_M (n+1) i j () -> X_M n i j ()
label = -- omitted, same as Version 4
ilocal :: (Label -> Label) -> X_M n i j a -> X_M n' i j a
ilocal f m = -- omitted, same as Version 4
```

The key thing to note here is `label`

’s type, where the nested action has type `X_M (n+1) i j ()`

.
For example, In `label @2`

, the nested action has type `X_M 3 i j ()`

, and so any call to
`label`

immediately nested under `label @2`

must be annotated with `@3`

. The `0`

in `MkX`

ensures that the top-level calls to `label`

are annotated with `@0`

.

# Version 6: At Most Two Occurrences per Label

(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version6.hs)

In our final version of the DSL, to make things even more entertaining, let’s say there’s a requirement, for whatever reason, that the same label cannot be used more than twice, and we want this property statically enforced.

For example, the code above in Version 5 should now become ill-typed, as
there are three `C`

values (2.0, 6.0 and 7.0) labeled `["label-foo"]`

.

Since we now want to statically enforce a property about labels, we have to lift labels
to the type level via `GHC.TypeLits.Symbol`

. At each step during the construction of `X`

, we need
to be aware of all labels that have been used so far, so that we can determine whether the next `C`

value we want to create is legal.

To keep track of all labels that have been used so far, we add a type parameter `xss`

of kind `[TLabel]`

to
`X_Abs`

, where `TLabel ~ [Symbol]`

stands for type-level labels. To determine whether
each `C`

to be created is legal, we add another type parameter `xs`

of kind
`TLabel`

, which is the current label in the environment. The first type parameter `xss`

must be added to `X_Abs`

rather than `X_M`

, since it can be changed by `new_c`

.
On the other hand, `xs`

can be added to `X_M`

.

So the definition of `X_Abs`

and `X_M`

becomes

```
data X_Abs a b cs (xss :: [TLabel]) = X_Abs
{ _aField :: a
, _bField :: b
, _csField :: cs
}
Lens.makeLenses ''X_Abs
newtype X_M (n :: Nat) (xs :: TLabel) i j a = X_M
{ runX_M :: IxStateT (Reader Label) i j a }
deriving (IxFunctor, IxPointed, IxApplicative, IxMonad, IxMonadState)
```

They each have one additional type parameter compared to Version 5.

We also need to add the type parameter `xss`

to `X`

, but we don’t really want to, as once we
are done constructing `X`

, we don’t care about `xss`

any more. The purpose of `xss`

is just to enforce
the property that no label is used more than twice. For this reason, we create a type `X'`

with the
additional type parameter, and in the definition of `X`

we existentialize it away:

```
type X_Empty = X_Abs () () [C_Labeled] '[]
type X' xss = X_Abs A B (NonEmpty C_Labeled) xss
data X = forall xss. X (X' xss)
```

Other than that, there are two main differences in the implementation compared to Version 5.

*The first difference* is the type signature of `new_c`

, which now has an additional
constraint, `AtMostOnce xs xss`

, and adds the label associated with the new `C`

to `xss`

,
via `ConsNE xs xss`

:

```
new_c :: (IsList cs, Item cs ~ C_Labeled, AtMostOnce xs xss)
=> C
-> X_M n xs (X_Abs a b cs xss) (X_Abs a b (NonEmpty C_Labeled) (ConsNE xs xss)) ()
new_c c = -- omitted, same as Version 5
-- Prepend xs to xss unless x is empty.
type family ConsNE (xs :: TLabel) (xss :: [TLabel]) :: [TLabel] where
ConsNE '[] xss = xss
ConsNE xs xss = xs ': xss
type family Count x xs :: Nat where
Count x '[] = 0
Count x (x ': xs) = 1 + Count x xs
Count x (y ': xs) = Count x xs
type family AtMostOnce x xs :: Constraint where
AtMostOnce '[] _ = ()
AtMostOnce x xs = Count x xs <= 1
```

This is what guarantees that no labels is used more than twice. When `new_c`

is called, if the current
label is already used more than once, the constraint is violated and it wouldn’t compile.
If the constraint is satisfied, `xs`

is added to `xss`

unless `xs ~ '[]`

.

*The second difference* is the type signature of the `label`

function, which now reads

```
label :: forall n comp a b cs cs' xs xss xss'. (KnownSymbol comp)
=> X_M (n+1)
(comp ': xs)
(X_Abs a b cs xss)
(X_Abs a b cs' xss')
()
-> X_M n
xs
(X_Abs a b cs xss)
(X_Abs a b cs' xss')
()
label m = -- omitted, same as Version 5
```

`(comp ': xs)`

and `xs`

in the type signature serves a similar purpose to `(n+1)`

and `n`

.
It ensures that the child action’s current label carries the new component `comp`

. For example,
any call to `label`

immediately nested under `label @n @"label-foo"`

(for some `n`

) has its
current label instantiated to `"label-foo" ': xs`

for some `xs`

.

The child action is allowed to change type parameter `xss`

into `xss'`

, since it may create an
arbitrary number of new `C`

values, whose labels (unless empty) are added to `xss`

. Here `xss'`

must contain
`xss`

as a suffix, and it is possible to reflect this fact using another constraint. Doing so would
help guard against certain classes of bugs in the DSL implementation. It doesn’t, however,
make our DSL any safer to *use*. So, for simplicity, my implementation just uses two completely
independent type parameters, `xss`

and `xss'`

.

# Conclusions

This is the end of our series of improvements to this particular EDSL. As a matter of fact, I’m not really advocating for this approach, since the benefit-cost ratio is not necessarily good enough to justify it. The implementation of this DSL is not exactly Simple Haskell, in fact, far from it (just look at the number of GHC extensions required in the code for Version 6).

An argument for it, though, is that most of the complexity lies in the implementation of the DSL, which the users of the DSL do not necessarily need to be concerned about. This is true, but on the other hand, when a user of such a DSL makes a mistake leading to a type error, the error message could be a bit of a mouthful. This can be alleviated to some degree by customizing the ErrorMessages, but could still lead to friction. Therefore, the usual advice for type-level programming in Haskell applies: it is very helpful to understand the power it holds, but use it in moderation, and find a good tradeoff that maximizes your team’s productivity.

We are not finished just yet - here are some fun exercises to work on if you are in the mood.

# Exercise 1 (Easy)

(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise1.hs)

In Version 5, we added a type parameter `n :: Nat`

to `X_M`

, which can be used to guard against accidentally
using the wrong indentations. An alternative approach is to add a type parameter `xs :: TLabel`

instead of
`n :: Nat`

, and require that each call to `label`

supply the full label. The DSL would look like
the following (apologies for the syntax highlighting failure):

```
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @'["label-foo"] $ do
new_c 2.0
label @'["label-foo", "label-bar"] $ do
new_c 3.0
label @'["label-foo", "label-bar", "label-baz"] $ do
new_c 4.0
new_c 5.0
new_c 6.0
-- new_c 7.0
new_c 8.0
```

Modify the code for Version 5 to use the above approach. A nested call to `label`

must be
annotated with a label that consists of the parent label as a prefix, plus exactly
one new component. The code should otherwise fail to compile.

# Exercise 2 (Medium)

(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise2.hs)

Like Exercise 1, but for Version 6. Enforce the property that there are no more than two occurrences
per label. The above code in Exercise 1 should be ill-typed if we un-comment `new_c 7.0`

.

# Exercise 3 (Easy)

(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise3.hs)

Modify Version 6 such that we can optionally annotate the nested level when calling `new_c`

, which further reduces
the chance of incorrect indentations. The DSL would look like this:

```
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @0 @"label-foo" $ do
new_c 2.0
label @1 @"label-bar" $ do
new_c 3.0
label @2 @"label-baz" $ do
new_c 4.0
new_c @2 5.0
new_c @1 6.0
-- new_c 7.0
new_c @0 8.0
```

# Exercise 4 (Medium)

(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise4.hs)

Modify Version 6 such that `_aField`

has a default value (say 42). When constructing
a term of type `X`

, `set aField`

should now be called *at most once*.

# Exercise 5 (Hard)

(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise5.hs)

In this exercise we modularize the DSL so that we can construct an `X`

term in separate pieces.
For example, let’s say we want to move a large part of the block nested under “label-foo” out
into a separate value `cs`

:

```
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @0 @"label-foo" $ do
cs
-- new_c 7.0
new_c 8.0
cs = do
new_c 2.0
label @n @"label-bar" $ do
new_c 3.0
label @(n+1) @"label-baz" $ do
new_c 4.0
new_c 5.0
new_c 6.0
```

Note that the first type parameter passed to `label`

in `cs`

becomes `@n`

and `@(n+1)`

. This makes
it possible to insert `cs`

at any level when constructing `X`

.

The code above currently doesn’t compile because `n`

is not in scope, and even if it were in scope,
GHC would still be unable to infer the type of `cs`

(to see this, change `n`

and `n+1`

to some concrete
numbers like 1 and 2).

Make it compile. Hint: in addition to adding an appropriate type signature for `cs`

, consider
making a new version of `new_c`

.

# Further Reading

- Ting-Yan Lai, Tyng-Ruey Chuang, Shin-Cheng Mu: Type Safe Redis Queries: A Case Study of Type-Level Programming in Haskell. This paper uses a similar approach involving indexed monads and type families to build a safe EDSL for querying Redis.