Haskell is a great, if not the best language for embedding DSLs in. Thanks to Haskell’s modern type system and elegant syntax, devising embedded domain-specific languages (EDSL) that are both low friction and type safe is often fairly achievable. Let’s take a simple EDSL as a running example, and evolve it in a series of steps to make it more user-friendly and safer. The main machineries involved are the indexed state monad (IxState) and some moderate type-level programming.
The EDSL we are going to build is for constructing record data types, particularly those for
holding configuration settings. As a starter, the following data type called X
will be used:
data X = X
{ aField :: A
, bField :: B
, csField :: NonEmpty C
}
type A = Int
type B = String
type C = Double
This X
data type is just a toy example. You can substitute for it whatever similar data types you
work with in practice. For example, X
could be a configuration for a service, a script with a non-empty
list of commands, a user account where csField
is the operations the user
is authorized to perform, or the description of a data migration job with a number of steps to be executed.
It can literally be anything that is a record containing a number of fields.
Now, suppose we want to equip our users with a DSL to construct terms of type X
. What are our options?
Version 1: Plain Record Syntax
(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version1.hs)
This is, of course, not really a DSL. It’s just an L – the plain vanilla Haskell record syntax. To construct
a term of type X
, we can write something like
import Data.List.NonEmpty as NonEmpty
x :: X
x = X
{ aField = 42
, bField = "Hello"
, csField = NonEmpty.fromList [1.0, 2.0]
}
This looks reasonably good. So why bother doing anything else? Becuase there are a number of things not so desirable about this record syntax:
- Not everybody loves the record syntax. The more fields
X
has, and the more nested the fields are, the more aggravating it becomes. - It is often useful to have partially constructed
X
values which can be reused in constructing otherX
s. With this approach it is not clear how you can partially construct anX
. None of the three fields has aMaybe
type, so you can’t useNothing
to signal that a field is not yet set. - Related to the above point, there is no decent way (that I know of) to provide default values to some fields.
- The record syntax becomes difficult to use when the fields are more complex than simple values like numbers,
strings and lists. For example, trees. Later in this post, we are going to extend
X
such that each item incsField
can be associated with a label; labels are potentially nested and thus form a tree structure. Using the record syntax directly would become quite cumbersome in that case.
Another common way to construct a record is lens, so let’s see how that works.
Version 2: Lens
(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version2.hs)
To use lens, we make the field types of X
abstract, and make lenses for each field:
{-# LANGUAGE TemplateHaskell #-}
import qualified Control.Lens as Lens
data X_Abs a b cs = X_Abs
{ _aField :: a
, _bField :: b
, _csField :: cs
}
Lens.makeLenses ''X_Abs
Then we can write:
type X_Empty = X_Abs () () ()
type X = X_Abs A B (NonEmpty C)
x_empty :: X_Empty
x_empty = X_Abs () () ()
x :: X
x = Lens.set aField 42
. Lens.set bField "hello"
. Lens.set csField (fromList [1.0, 2.0])
$ x_empty
By making the field types of X
abstract, the second issue mentioned in Version 1
is addressed: we can now have partially constructed X
values. But it has a new problem:
now it is possible to accidentally set the same field multiple times. The following code typechecks:
x :: X
x = Lens.set aField 42
. Lens.set aField 43
. Lens.set bField "hello"
. Lens.set cField (fromList [1.0, 2.0])
$ emptyX
This isn’t much of a worry for our toy data type X
, but the larger number of fields, the more
unsafe it is. The problem is that Lens.set
is too polymorphic. We need to restrict Lens.set aField
to only be able to alter the type of _aField
from ()
to Int
, and not from Int
to Int
. There are at least
two ways to do so. The first is to specialize the type parameters of Lens.set
for each field, for example:
set_aField :: a -> X_Abs () b cs -> X_Abs a b cs
set_aField = Lens.set aField
With this specialized type, calling set_aField
twice no longer type checks.
The second and fancier way is to specialize the constraints, rather than the type parameters, of Lens.set
,
by making use of a type family that returns Constraint
:
type family ExactlyOnceConstraints field a a' b b' cs cs' :: Constraint where
ExactlyOnceConstraints A a a' b b' cs cs' = (a ~ (), b ~ b', cs ~ cs')
ExactlyOnceConstraints B a a' b b' cs cs' = (a ~ a', b ~ (), cs ~ cs')
ExactlyOnceConstraints (NonEmpty C) a a' b b' cs cs' = (a ~ a', b ~ b', cs ~ ())
set :: ExactlyOnceConstraints y a a' b b' cs cs'
=> Lens.ASetter (X_Abs a b cs) (X_Abs a' b' cs') x y
-> y -> X_Abs a b cs -> X_Abs a' b' cs'
set = Lens.set
x :: X
x = set aField 42
. set bField "Hello"
. set csField (NonEmpty.fromList [1.0, 2.0])
$ x_empty
With ExactlyOnceConstraints
in place, setting any field twice is illegal. This approach
requires that each field of X
have a distinct type. This can be achieved by using newtypes
to distinguish fields with the same type, and it is usually considered good practice to do so.
Version 3: Enabling do-notation via IxState
(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version3.hs)
This version of the DSL takes advantage of the indexed state monad (IxState) to convert regular
function composition (.
) into composition of monadic actions (>>
), allowing us to construct
X
terms in do-notations, for instance
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
set csField (NonEmpty.fromList [1.0, 2.0])
The idea is straightforward: X_Abs
is now the state in the (indexed) state monad, and
set
modifies the state behind the scenes. This makes our DSL look like imperative programming
with a bunch of set
statements, and in my opinion, nicer.
The reason to use the indexed state monad, rather than the regular state monad, is to ensure
each field is set exactly once, same as the guarantee that Version 2 achieves. The initial state
would be X_Empty
(which is X_Abs () () ()
). After setting aField
, the state becomes
X_Abs A () ()
, and so on. Since each call to set
changes the type of the state, it is
necessary to use the indexed state monad.
Basically the only difference in the implementation compared to Version 2 is that
set
now returns IxState (AbsX a b cs) (AbsX a' b' cs') ()
, in contrast to
X_Abs a b cs -> X_Abs a' b' cs'
. The transition from X_Abs a b cs
to AbsX a' b' cs'
happens
in an IxState
action, as opposed to a function. To enable do-notation for IxState
, we need to turn on
RedindableSyntax
, and rebind the (>>)
operator using (>>>=)
from Control.Monad.Indexed
.
In Version 3 it is sufficient just to rebind (>>)
. In the next version and later you’ll also
need to rebind (>>=)
.
Like Version 2, the order in which the fields are set doesn’t matter. The code is well-typed as long as each field is set exactly once.
Version 4: Adding Nested Labels
(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version4.hs)
Now, let’s make things more interesting by adding labels to X
. We want to be able
to attach labels to each value in _csField
, and a label can be nested under
another. Ideally, we want to construct X
terms with the following code, which uses
nested do-blocks for nested labels:
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0 -- 1.0 has no label
label "label-foo" $ do
new_c 2.0 -- 2.0 is labeled ["label-foo"]
label "label-bar" $ do
new_c 3.0 -- 3.0 is labeled ["label-foo", "label-bar"]
label "label-baz" $ do
new_c 4.0 -- 4.0 is labeled ["label-foo", "label-bar", "label-baz"]
new_c 5.0 -- 5.0 is labeled ["label-foo", "label-bar"]
new_c 6.0 -- 6.0 is labeled ["label-foo"]
new_c 7.0 -- 7.0 is labeled ["label-foo"]
new_c 8.0 -- 8.0 has no label
For this to work, every time we create a new C
value using new_c
, we need to know
the label we currently have, so that we can associate it with the new C
we create.
This can be pulled off using the Reader monad. Whenever we create a new label using
label
, we can use local
to locally add the new label to the environment.
So now our indexed state monad is composed with the reader monad, and we give it an alias X_M
:
type X_M i j a = IxStateT (Reader Label) i j a
type Component = String
type Label = [Component]
type C_Labeled = (Label, C)
Corresponding changes are made to the definitions of X
and X_Empty
to add labels:
type X_Empty = X_Abs () () [C_Labeled] -- Instead of "AbsX () () ()"
type X = X_Abs A B (NonEmpty C_Labeled) -- Instead of "AbsX A B (NonEmpty C)"
Type X_Empty
becomes AbsX () () [C_Labeled]
as opposed to AbsX () () ()
, because we
are no longer setting the entire _csField
in one fell swoop, but via multiple calls to new_c
.
Each time we call new_c
, a new C_Labeled
is added, and it needs to be added to somewhere.
In other words, there needs to be something that we can add each new C_Labeled
to. Since we can’t
add a C_Labeled
to ()
, we need to use an empty list initially. The new_c
function is defined as:
new_c :: (IsList cs, Item cs ~ C_Labeled)
=> C
-> X_M (X_Abs a b cs) (X_Abs a b (NonEmpty C_Labeled)) ()
new_c c = do
labels <- ilift ask
let labeled = (labels, c)
imodify $ Lens.over csField ((labeled :|) . toList)
IsList
is a typeclass that both regular lists and non-empty lists are instances of. Before
any new_c
is called, cs ~ [C_Labeled]
, and after the first new_c
, cs ~ NonEmpty C_Labeled
.
The label
function uses ilocal
(which lifts local
into IxStateT
) to change the environment
by prepending the given label component to the label in the environment:
label :: Component -> X_M i j () -> X_M i j ()
label = ilocal . (:)
ilocal :: (Label -> Label) -> X_M i j a -> X_M i j a
ilocal f m = IxStateT $ local f . runIxStateT m
Function set
has exactly the same implementation as Version 3, except that its return type
becomes X_M (AbsX a b cs) (AbsX a' b' cs') ()
, as opposed to IxState (AbsX a b cs) (AbsX a' b' cs') ()
.
Similarly, type alias MkX
becomes X_M X_Empty X a
instead of IxState X_Empty X a
.
Version 5: Guarding Against Incorrect Indentations
(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version5.hs)
One problem with the Version 4 DSL is that, since it is indentation-sensitive (as Haskell itself is),
it is prone to bugs caused by accidentally using the incorrect indentations. If, for instance,
label "label-baz"
is indented with two fewer spaces, the code would still be well-typed, but
“label-baz” would no longer be nested under “label-bar”.
To make this class of errors less likely, we can augment the label
function with type level
natural numbers from GHC.TypeLits
. Whenever a new label component is added by calling label
,
we annotate it with the current indentation level, and if the annotation is inconsistent with the level,
it would be a type error. Version 5 of our DSL looks like the following:
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @0 "label-foo" $ do
new_c 2.0
label @1 "label-bar" $ do
new_c 3.0
label @2 "label-baz" $ do
new_c 4.0
new_c 5.0
new_c 6.0
new_c 7.0
new_c 8.0
In the above code, if we replace any label @n
with label @m
where m ≠ n
, it should fail to compile.
To accomplish this, we need to keep track of the current indentation level in the types, and
to do so, we add a new type parameter of kind Nat
to X_M
. Alternatively, we can add the new type parameter to X_Abs
, but this is unnecessary
since neither set
nor new_c
changes the indentation level.
To add the new type parameter, we make X_M
a newtype in lieu of a type alias:
newtype X_M (n :: Nat) i j a = X_M
{ runX_M :: IxStateT (Reader Label) i j a }
deriving (IxFunctor, IxPointed, IxApplicative, IxMonad, IxMonadState)
-- Instead of: type X_M i j a = IxStateT (Reader Label) i j a
Other than that, the implementation is almost identical to Version 4, except that a few type signatures are changed:
type MkX a = X_M 0 X_Empty X a
set :: ExactlyOnceConstraints y a a' b b' cs cs'
=> Lens.ASetter (X_Abs a b cs) (X_Abs a' b' cs') x y
-> y -> X_M n (X_Abs a b cs) (X_Abs a' b' cs') ()
set = -- omitted, same as Version 4
new_c :: (IsList cs, Item cs ~ C_Labeled)
=> C
-> X_M n (X_Abs a b cs) (X_Abs a b (NonEmpty C_Labeled)) ()
new_c = -- omitted, same as Version 4
label :: forall n i j. Component -> X_M (n+1) i j () -> X_M n i j ()
label = -- omitted, same as Version 4
ilocal :: (Label -> Label) -> X_M n i j a -> X_M n' i j a
ilocal f m = -- omitted, same as Version 4
The key thing to note here is label
’s type, where the nested action has type X_M (n+1) i j ()
.
For example, In label @2
, the nested action has type X_M 3 i j ()
, and so any call to
label
immediately nested under label @2
must be annotated with @3
. The 0
in MkX
ensures that the top-level calls to label
are annotated with @0
.
Version 6: At Most Two Occurrences per Label
(Code: https://github.com/zliu41/dsl/blob/master/src/DSL/Version6.hs)
In our final version of the DSL, to make things even more entertaining, let’s say there’s a requirement, for whatever reason, that the same label cannot be used more than twice, and we want this property statically enforced.
For example, the code above in Version 5 should now become ill-typed, as
there are three C
values (2.0, 6.0 and 7.0) labeled ["label-foo"]
.
Since we now want to statically enforce a property about labels, we have to lift labels
to the type level via GHC.TypeLits.Symbol
. At each step during the construction of X
, we need
to be aware of all labels that have been used so far, so that we can determine whether the next C
value we want to create is legal.
To keep track of all labels that have been used so far, we add a type parameter xss
of kind [TLabel]
to
X_Abs
, where TLabel ~ [Symbol]
stands for type-level labels. To determine whether
each C
to be created is legal, we add another type parameter xs
of kind
TLabel
, which is the current label in the environment. The first type parameter xss
must be added to X_Abs
rather than X_M
, since it can be changed by new_c
.
On the other hand, xs
can be added to X_M
.
So the definition of X_Abs
and X_M
becomes
data X_Abs a b cs (xss :: [TLabel]) = X_Abs
{ _aField :: a
, _bField :: b
, _csField :: cs
}
Lens.makeLenses ''X_Abs
newtype X_M (n :: Nat) (xs :: TLabel) i j a = X_M
{ runX_M :: IxStateT (Reader Label) i j a }
deriving (IxFunctor, IxPointed, IxApplicative, IxMonad, IxMonadState)
They each have one additional type parameter compared to Version 5.
We also need to add the type parameter xss
to X
, but we don’t really want to, as once we
are done constructing X
, we don’t care about xss
any more. The purpose of xss
is just to enforce
the property that no label is used more than twice. For this reason, we create a type X'
with the
additional type parameter, and in the definition of X
we existentialize it away:
type X_Empty = X_Abs () () [C_Labeled] '[]
type X' xss = X_Abs A B (NonEmpty C_Labeled) xss
data X = forall xss. X (X' xss)
Other than that, there are two main differences in the implementation compared to Version 5.
The first difference is the type signature of new_c
, which now has an additional
constraint, AtMostOnce xs xss
, and adds the label associated with the new C
to xss
,
via ConsNE xs xss
:
new_c :: (IsList cs, Item cs ~ C_Labeled, AtMostOnce xs xss)
=> C
-> X_M n xs (X_Abs a b cs xss) (X_Abs a b (NonEmpty C_Labeled) (ConsNE xs xss)) ()
new_c c = -- omitted, same as Version 5
-- Prepend xs to xss unless x is empty.
type family ConsNE (xs :: TLabel) (xss :: [TLabel]) :: [TLabel] where
ConsNE '[] xss = xss
ConsNE xs xss = xs ': xss
type family Count x xs :: Nat where
Count x '[] = 0
Count x (x ': xs) = 1 + Count x xs
Count x (y ': xs) = Count x xs
type family AtMostOnce x xs :: Constraint where
AtMostOnce '[] _ = ()
AtMostOnce x xs = Count x xs <= 1
This is what guarantees that no labels is used more than twice. When new_c
is called, if the current
label is already used more than once, the constraint is violated and it wouldn’t compile.
If the constraint is satisfied, xs
is added to xss
unless xs ~ '[]
.
The second difference is the type signature of the label
function, which now reads
label :: forall n comp a b cs cs' xs xss xss'. (KnownSymbol comp)
=> X_M (n+1)
(comp ': xs)
(X_Abs a b cs xss)
(X_Abs a b cs' xss')
()
-> X_M n
xs
(X_Abs a b cs xss)
(X_Abs a b cs' xss')
()
label m = -- omitted, same as Version 5
(comp ': xs)
and xs
in the type signature serves a similar purpose to (n+1)
and n
.
It ensures that the child action’s current label carries the new component comp
. For example,
any call to label
immediately nested under label @n @"label-foo"
(for some n
) has its
current label instantiated to "label-foo" ': xs
for some xs
.
The child action is allowed to change type parameter xss
into xss'
, since it may create an
arbitrary number of new C
values, whose labels (unless empty) are added to xss
. Here xss'
must contain
xss
as a suffix, and it is possible to reflect this fact using another constraint. Doing so would
help guard against certain classes of bugs in the DSL implementation. It doesn’t, however,
make our DSL any safer to use. So, for simplicity, my implementation just uses two completely
independent type parameters, xss
and xss'
.
Conclusions
This is the end of our series of improvements to this particular EDSL. As a matter of fact, I’m not really advocating for this approach, since the benefit-cost ratio is not necessarily good enough to justify it. The implementation of this DSL is not exactly Simple Haskell, in fact, far from it (just look at the number of GHC extensions required in the code for Version 6).
An argument for it, though, is that most of the complexity lies in the implementation of the DSL, which the users of the DSL do not necessarily need to be concerned about. This is true, but on the other hand, when a user of such a DSL makes a mistake leading to a type error, the error message could be a bit of a mouthful. This can be alleviated to some degree by customizing the ErrorMessages, but could still lead to friction. Therefore, the usual advice for type-level programming in Haskell applies: it is very helpful to understand the power it holds, but use it in moderation, and find a good tradeoff that maximizes your team’s productivity.
We are not finished just yet - here are some fun exercises to work on if you are in the mood.
Exercise 1 (Easy)
(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise1.hs)
In Version 5, we added a type parameter n :: Nat
to X_M
, which can be used to guard against accidentally
using the wrong indentations. An alternative approach is to add a type parameter xs :: TLabel
instead of
n :: Nat
, and require that each call to label
supply the full label. The DSL would look like
the following (apologies for the syntax highlighting failure):
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @'["label-foo"] $ do
new_c 2.0
label @'["label-foo", "label-bar"] $ do
new_c 3.0
label @'["label-foo", "label-bar", "label-baz"] $ do
new_c 4.0
new_c 5.0
new_c 6.0
-- new_c 7.0
new_c 8.0
Modify the code for Version 5 to use the above approach. A nested call to label
must be
annotated with a label that consists of the parent label as a prefix, plus exactly
one new component. The code should otherwise fail to compile.
Exercise 2 (Medium)
(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise2.hs)
Like Exercise 1, but for Version 6. Enforce the property that there are no more than two occurrences
per label. The above code in Exercise 1 should be ill-typed if we un-comment new_c 7.0
.
Exercise 3 (Easy)
(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise3.hs)
Modify Version 6 such that we can optionally annotate the nested level when calling new_c
, which further reduces
the chance of incorrect indentations. The DSL would look like this:
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @0 @"label-foo" $ do
new_c 2.0
label @1 @"label-bar" $ do
new_c 3.0
label @2 @"label-baz" $ do
new_c 4.0
new_c @2 5.0
new_c @1 6.0
-- new_c 7.0
new_c @0 8.0
Exercise 4 (Medium)
(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise4.hs)
Modify Version 6 such that _aField
has a default value (say 42). When constructing
a term of type X
, set aField
should now be called at most once.
Exercise 5 (Hard)
(Answer: https://github.com/zliu41/dsl/blob/master/src/DSL/Exercise5.hs)
In this exercise we modularize the DSL so that we can construct an X
term in separate pieces.
For example, let’s say we want to move a large part of the block nested under “label-foo” out
into a separate value cs
:
x :: X
x = mkX $ do
set aField 42
set bField "Hello"
new_c 1.0
label @0 @"label-foo" $ do
cs
-- new_c 7.0
new_c 8.0
cs = do
new_c 2.0
label @n @"label-bar" $ do
new_c 3.0
label @(n+1) @"label-baz" $ do
new_c 4.0
new_c 5.0
new_c 6.0
Note that the first type parameter passed to label
in cs
becomes @n
and @(n+1)
. This makes
it possible to insert cs
at any level when constructing X
.
The code above currently doesn’t compile because n
is not in scope, and even if it were in scope,
GHC would still be unable to infer the type of cs
(to see this, change n
and n+1
to some concrete
numbers like 1 and 2).
Make it compile. Hint: in addition to adding an appropriate type signature for cs
, consider
making a new version of new_c
.
Further Reading
- Ting-Yan Lai, Tyng-Ruey Chuang, Shin-Cheng Mu: Type Safe Redis Queries: A Case Study of Type-Level Programming in Haskell. This paper uses a similar approach involving indexed monads and type families to build a safe EDSL for querying Redis.