Algrebraic Data Types in Scala

What a delightful idea to come back from vacation with batteries fully charged and with some wacky ideas around our minds to write about. Best of these came from the very influence of joints the moon.


An Algebraic Data Type (TDA from now so we can save money for each word in WordPress) is just a way to express a data type (Cat, Dog, Prevarication) based on an algebra. And when we say ‘algebra’, we mean type sums and products (of Integers, Cats, Cars, Prevarications, …). For example:

Train = Locomotive + Wagon * Train

How do one read that? A train may be: a locomotive OR a wagon AND another train (that may be as well a wagon and another train, that may be as well a …).
Take a look at both disjunction and conjunction: the sum represents an OR, and the product represents an AND (like Boole algebra).

It’s also worthy to notice that, from this type definition you can infer a recursive pattern. With the Train type, the base case is definitively the Locomotive and, at the recursive case, we have a wagon and another train. As we’ll see later, this pattern is very frequent and makes easier the type definition.

And how are sum and product represented in Scala?

The easier way to represent the type sum (also called co-product), in a paradigm with polimorphism support (in general) and in Scala (in particular), is just the inheritance feature. If we have the following case:

sealed trait Animal
case object Cat extends Animal
case object Dog extends Animal

we’re indeed expressing a type co-product:

Animal = Cat + Dog

that is, an Animal can only be, a Cat, or a Dog.

Regarding the product, we could define it as the attribute set that compounds a certain type instance. For example,

case class Student(name: String, age: Int)

expressed as a product sum, would be as follows:

Student = String * Int

So, for building a Student instance, you need a String and an Int.

If we try now to materialize the previously exposed train model (with some additives) we’ll notice that

Wagon = String * Int
Train = Locomotive + Wagon * Train

is translated into Scala as

sealed trait Train
case object Locomotive extends Train
case class Wagon(model: String, passengers: Int)
case class Nexus(wagon: Wagon, next: Train)

So what is it good for?


…absolutely nothing, listen to me♩♪♫♬.
If you think, my fellow, that this is stuff that nobody uses, you haven’t thought about which scala.Prefef structures are defined this way. Lists, for example, as defined as:

trait List[+T]
case object Nil extends List[Nothing]
case class ::[T](head: T, tail: List[T]) extends List[T]

That is, a List can be, an empty one, or an element followed by another list.
If we express that idea in terms of products and co-products:

List[T] = EmptyList[T] + NonEmptyList[T]
NonEmptyList[T] = T * List[T]

Please, notice that, the case of the empt list (Nil) has a bizarre but beautiful implementation in Scala.

If we try to define an empty list for eeeeeeeeeevery single existing type, we would have to instantiate a Nil[Cat], a Nil[Dog], …
In order to avoid this, and having an only Nil, we make it extend from List[Nothing] that, as you’ll probably remember from other posts, Nothing extends from eeeeeeeeevery single existing type (both primitive and programmer defined). If we add the fact of List[T] being covariant at T, we’ll have an only object Nil that represents the empty lists for all types. Awesome, right?


Example: Even numbers

In order to harden to this new way of thinking, let’s suppose the following challenge: how could we represent even numbers in Scala?


If we’re not sophisticated enough and we trust a lil’ bit in runtime assertions, we could say that even numbers are defined as:

case class Even(value: Int) { 
  require(value%2==0, "it's not even")

But, if we try to create an Even with an odd integer number we’ll get a giant NOPE:

java.lang.IllegalArgumentException: requirement failed: it's not even
	at scala.Predef$.require(Predef.scala:233)
	at Even.<init>(<console>:7)

However this assertion won’t be verified until run-time, the moment when require is executed. Thus, our code could be compiled without being correct…
We can do it much better…


Another option is to assume (and we won’t discuss about it) that zero is an even number, that we have infinite memory installed in our machine, that the overflow error doesn’t exist…


In that, not so far, case (…), we could define even numbers as:

sealed abstract class Even(val value: Int)
case object Zero extends Even(0)
case class Next(previousEven: Even) 
  extends Even(2 + previousEven.value)

So, if we have a method that generate reservations for the Boat Love, that requires an even number of participants, we can use our brand new defined Even type:

def loveBoatReservation(
  peopleAmount: Even): Reservation = ???

Given there’s no way to build an Even from an integer that is not even, we avoid uncomfortable situations at runtime, where the amount of people that get on the Love Boat are odd. Otherwise, someone could be…


Recursive ADTs and its techniques

Once the data type is defined, let’s suppose we want to implement the sum of even numbers:

def sum(e1: Even, e2: Even): Even = ???

We handle several alternatives. One of them could be the quick-and-dirty one:

def sum(e1: Even, e2: Even): Event = 
  new Even(e1.value + e2.value){}

But take a closer look at the fact that we’re totally ignoring the constructors we’ve defined. If we want to use pattern matching over the result:

val four = new Even(4){}
sum(Zero, four) match {
  case Zero => 
    //it never gets inside this case!
  case Next(Next(Zero)) => 
scala.MatchError: $anon$1@649f2009 (of class $anon$1)


The other technique (much more sophisticated by the way) consists on invoking a recursive method that, for each call, it decreases the second even number, while it increases the first one. For doing so, we make use of Next apply(constructor) and unapply(extractor) methods:

def sum(e1: Even, e2: Even): Even = {
  def rSum(ev1: Even, ev2: Even): (Even, Even) = {
    ev2 match {
      case Zero => (ev1, Zero)
      case Next(e) => rSum(Next(ev1), e)
  val (result, _) = rSum(e1, e2)

Undeniably beautiful 🙂


Apart from becoming a lil’ bit crazier when reading back-from-vacation posts, we can extract several main conclusions from what we’ve read:

  • As we always say, every possible assertion that we can check at compile time instead of runtime, it’s saving time and headaches hunting bugs of software in production (which is more expensive and more keen to make heads roll).
  • Constructors are THE key. If we define an ADT, we can control that generated values of that type are correct by defining the proper constructors: Zero and Next. In both cases, we are certainly sure that even number rules are satisfied.
  • Methods that operate over recursive data types use to be recursive as well. And, apart from that, for generating values of the mentioned type (Even in our example) they should only use the existing constructor methods.

In a future post, we’ll talk about the relation between data types algebra and definition of formal grammars…or not.

Peace out!


Scala, Scala and nothing more

After so many posts on Scala, many snippets of code, and many weird concepts, there comes a post that had been pending since almost the beginning of this blog. What resources exist to learn Scala? Well … now we show our favorites.


Books: a good starting point 🙂

  • Programming in Scala: a classic. Written by the very Odersky and with the third edition hot off the press.
  • Funcional Programming in Scala (also known, in an effort of originality and simplicity, as the red book). A book that opens the mind to the functional paradigm and it is advisable not only for people who want to program in Scala, but for people who are interested in functional programming.

Courses: to deepen a bit more.

  • On the online courses, Coursera and the new specialization of functional programming in Scala they have all the monopoly. It consists of 4 courses + 1 project and discusses in depth the most important aspects of Scala.
  • Furthermore, scala-exercises allows us to practice in a faster way, but no less effective. Fully recommended.

Events: not only to drink beer  😉

  • Scala Days : Undoubtedly the most important annual convention about Scala. Since a year, there are two conventions at year: one in America and one in Europe. In addition, all the videos of the talks are published 🙂
  • Scala World: UK convention with a warm welcome and where you can hear and see some of the biggest Scala gurus.
  • Scala eXchange: another of the best known conventions. It takes place on December in London.

These are the resources that we know and that we found interesting. Without doubt, there are many more that we left unsaid. If you missing some resource on this list, any contribution is welcome 🙂

Scala: Code interpretation at runtime

With today’s post, we’ll dive into Scala code generation on the fly: at runtime. We have to be cautious of not mixing concepts with Scala macros, which generate code at compile-time. These make use of Scala type system, which is much safer than generating code at runtime.


When is it useful to make use of this mechanism then? We’ll try to shed light on this by following a very simple example, getting abstract of real implementation (that you may find at Scalera’s Github).

The problem: “da” serializer

Let’s suppose a not-so-wild case, which consists on communicating two services via an event bus (we’ll get abstract of its implementation: it could be a message queue like Kafka, Akka streams, …).

The main idea is the following:


The producer knows how to send and the consumer has an associated callback for message arrivals:

trait Producer{
  def produce(message: Any): Try[Unit]
trait Consumer{
  val consume: Any => Unit

Both producer and consumer services know the message types that may arrive. In our example, they could be one of these:

case class Foo(att1: Int, att2: String)
case class Bar(att1: String)

If the producer wants to send a message using the event bus, it will have to serialize it somehow (JSON, Byte array, XML, …) so, by the time it reaches the opposite end, the consumer will start the inverse process (deserialization) and will get the original message.

…nothing weird so far.


If we have a JSON serializer …

trait JsonSer[T] {
  def serialize(t: T): String
  def deserialize(json: String): T

and we serialize our message …

implicit val fooSerializer: JsonSer[Foo] = ???
val foo: Foo = ???

How do we know which deserializer to use when the consumer gets the message?

Option 1: Try every possible serializer until one of them works

In our consumer, we’d have:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = {
    case message: String =>
      Seq(barSerializer, fooSerializer).flatMap { ser =>
      }.headOption.fold(ifEmpty = println("Couldn't deserialize")) {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")

A lil’ bit coarse, right? If the proper serializer is the last of a 100 list, we would have tried and failed with 99 serializers before (Such a waste of CPU!).


Besides, we could also consider the case of having a different type deserializer, but it fits with the received message, so it would partially or wrongly deserialize the message.

Option 2: Add the message type to the message itself

We could add an extra layer wrapping the message for indicating the message type that it contains. This way, when receiving it, we could determine the serializer type we have to use.


For writing the type, we’ll make use of Scala’s TypeTags, getting info about the T contained message type.

//Wrapper for the message (and its serializer)

import scala.reflect.runtime.universe.{TypeTag, typeTag}

case class Message[T: TypeTag](content: T){
  val messageType: Message.Type = typeTag[T].tpe.toString
object Message {

  type Type = String

  def typeFrom(msg: String): Message.Type = ???

  implicit def messageSer[T:TypeTag:JsonSer]: JsonSer[Message[T]] = ???


//We'll make use of it for sending


//And we redefine the consumer

lazy val consumer = new Consumer {
    override val consume: Any => Unit = {
      case message: String =>
        Message.typeFrom(message) match {

          case "org.scalera.reflect.runtime.Bar" =>
            println("it's a bar!")
            val value = messageSer[Bar].deserialize(message).content

          case "org.scalera.reflect.runtime.Foo" =>
            val value = messageSer[Foo].deserialize(message).content
            println("it's a foo!")

          case _ =>
            println("it's ... something!")

As you can see, we don’t have to try every possible serializer anymore. Instead of that, from the message type we’ve extracted from the Message wrapper we have added, we’re able to use the proper deserializer.

But it is also true that we have to add an extra case for each string that represents the message type. Wouldn’t it be nice to deserialize somehow and to have the defined case only for the already deserialized objects? (Something similar what we first tried but without trying all posible serializers). Something like:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { msg =>
    genericDeserialize(msg) match {
      case f: Foo =>
      case b: Bar =>
      case _ =>

Option 2-cool: Serializers ‘under the hood’

For achieving something similar, we have to get focused on that genericDeserialize method: Which signature should it have? Initially, something like this:

def genericDeserialize(msg: String): Any

An Any? Seriously? My fellows, at runtime, we have no idea about the type we can get. We just know that, from a String, we’ll get ‘some….thing’. The match that applies to that Any will allow us to go from something totally abstract to more concrete types.

At this point is where both reflect library and Scala compiler appear.


The Toolbox API allows parsing strings and getting the resulting AST (abstract syntax tree). From that AST, it is able to evaluate the expression and returns an instance of an Any as well.

For instantiating a Toolbox and use type references, we hace to add as SBT dependencies the following:

libraryDependencies ++= Seq(
  "org.scala-lang" % "scala-compiler" % "2.11.8",
  "org.scala-lang" % "scala-reflect" % "2.11.8")

For example, if we wanted to parse the "2".toInt + 4 operation,

import scala.reflect.runtime.{universe => ru}
import ru._

//  Scala compiler tool box
val tb = ru.runtimeMirror(

println(ru.showRaw(tb.parse("2".toInt + 4")))

we would get the abstract syntax tree generated as a String (by using showRaw):

Apply(Select(Select(Literal(Constant("2")), TermName("toInt")), TermName("$plus")), List(Literal(Constant(4))))

If we use the toolbox for evaluating the parsed expression,

println(tb.eval(tb.parse("2".toInt + 4")))

we’ll get an Any that represents the resulting value of the sum:


“Da” serializer

Once we’ve seen how it generally works, we apply the same principle to our serializer, so the expression we’re going to try to interpret is similar to:

  import scala.reflect._;
  import spray.json._;
  import org.scalera.reflect.runtime._;
  import MySprayJsonImplicits._;
  import MyJsonSerImplicits._;


where MySprayJsonImplicits and MyJsonSerImplicits represent the objects that contain both the Spray implicits for JsonFormat and the JsonSer implicits that we have defined before.

$messageType represents the concrete type to deserialize that we would have got by using the TypeTag (as seen before).

If we adapt it to our code, we’ll get something similar to:

object GenSer {

  import scala.reflect.runtime.{universe => ru}
  import ru._

  //  Scala compiler tool box
  private val tb = ru.runtimeMirror(this.getClass.getClassLoader).mkToolBox()

  def genericDeserialize(msg: String)(serContainers: Seq[AnyRef]): Any = {

    val messageType = Message.typeFrom(msg)

    val serContainersImport = =>
      "import " + container.toString.split("\\$").head + "._").mkString(";\n")

    val expr =
         |  import scala.reflect._;
         |  import spray.json._;
         |  import org.scalera.reflect.runtime._;
         |  $serContainersImport;
         |  implicitly[JsonSer[Message[$messageType]]]



If you take a look, we’ve empowered the generic deserialization method notation to hold a sequence of objects to import, so we won’t make explicit which object contains the Spray implicits and which one contains our JsonSer‘s.

val serContainersImport = =>
  "import " + container.toString.split("\\$").head + "._").mkString(";\n")

It is also noteworthy that, when deserializing, we get a Message[Any]; so we’ll have to get the ‘content’ field that represents the raw value of Any type that holds the deserialized message.

The result

So finally, we can now make use of our function for generically deserialize and let our consumer code be ‘swaggy’:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { 
    case msg: String =>
      genericDeserialize(msg)(Seq(case3,Message)) match {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")


Being able to evaluate code at runtime let us do a lot of interesting things when we want to interpret String based types. However, these kind of techniques won’t care about the so worthy and safe Scala type system, so these are tools to be used with a lil’ bit of care.

It is also kind of expensive in time terms to evaluate these expressions. I would recommend to use a type cache in this case. Something really simple like a map:

type TypeName = String
var cache: Map[TypeName, JsonSer[Message[_]]]

And when invoking the method, we’ll check if the type evidence (JsonSer) we’re looking for is already stored in the map. If so, we’ll use it, otherwise, we’ll create and store it in the cache, using it as result.


Easy peasy…
Peace out!

Graffiti Rules: playing with JSON [Snow]

A few months ago we talked about Spray, a toolkit that allowed us to build REST APIs in an easy way with a pretty complete DSL.

One of the components belonging to this toolkit is spray-json. This module allows us to serialize and deserialize our objects to/from a JSON format. Today we’ll see how to work with it quickly and easily.

How to create serializers to different classes? Well, there are two options depending on how the class we want to serialize is defined.

Easy option: when we have a case class

Well, that’s a piece of cake. The only thing we need to use is the jsonFormatN() method where N represents the number of arguments of the case class apply method. Let’s see it with an example:

case class Character(name: String, family: String, isDead: Boolean)

object MyJsonProtocol extends DefaultJsonProtocol {

 implicit val characterFormat = jsonFormat3(Character.apply)


As you can see, in order to create a serializer for the Character case class, we create an implicit value with the help of jsonFormat3 (as it has 3 attributes). Such an implicit object will be defined inside a trait which will extend from DefaultJsonProtocol. In this trait, the serializers for Scala basic types (Int, String, Boolean…) are defined… Easy peasy.


Less easy option: when we don’t have a case class or we want to serialize a case class in a different way.

In such a case, we’ll need to implement how the serialization and deserialization of the given type should be. To do so, we may need to create an implicit object extending from RootJsonFormat[T] where T is the type we want to serialize. Such a trait contains two methods that need to be implemented. On the one hand, there is the write method, which converts a typeT in a Json, and on the other a method read, performing the reverse process. We’ll now see an example with the same type as before:

class Character(val name: String, val family: String, val isDead: Boolean)

object MyJsonProtocol extends DefaultJsonProtocol {

  implicit object CharacterJsonFormat extends RootJsonFormat[Character] {

    def write(c: Character) =
      JsArray(JsString(, JsString(, JsBoolean(c.isDead))

    def read(value: JsValue) = value match {
      case JsArray(Vector(JsString(name), JsString(family), JsBoolean(isDead))) =>
        new Character(name, family, isDead.toBoolean)
      case _ => throw new DeserializationException("Character expected")

As can be observed, it is a bit more tedious but not too complicated.

And how do we use it? We just have to import the object where we have defined the implicits and call the methods toJson or convertTo[T].

import MyJsonProtocol._

val json = Character("Jon Snow", "Stark", ???).toJson //....You know nothing!!!!
// Returns {"name": "Jon Snow", "family": "Stark", "isDead": ???}
val jonSnow = json.convertTo[Character]


Besides, if we use Spray’s REST API, the transformation will be done in a transparent way and it won’t be necessary to call  toJson or convertTo explicitely. But we’ll leave that to other post. Next week we’ll be talking about this library so we’d better see something of it before 🙂

See you all!

Shapeless: polymorphic functions

Time ago, our friend Javier Fuentes illustrated us with an introduction to Shapeless.
Some months after that, at Scala Madrid meetup, he offered a pretty interesting speech about structural induction with Shapeless and HLists. We couldn’t avoid it and we got the enthusiasm flu 😛

What we want to achieve

Let’s set as our study case what I think more than one has thought before: how can I compose in the same for-comprehension different types. Something like:

import scala.util.Try

for {
  v1 <- List(1,2,3)
  v2 <- Try("hi, person ")
} yield s"$v2 $v1"

which usually comes from the hand of the following frustrating error:

<console>:15: error: type mismatch;
 found   : scala.util.Try[String]
 required: scala.collection.GenTraversableOnce[?]
         v2 <- Try("hi, person ")

Therefore we need a way to transform these data types (Future, Try) into iterable ‘stuff’ (GenTraversable[T] might work). In our example we won’t have in mind the information that we loose about the error that might happen, for example, if certain Try or Future expression has failed. For have a better understanding about the problem we present, let’s talk about some theory.


Monomorphism vs polymorphism

We say a method is monomorphic when you can only invoke it with parameters whose concrete type is explicitly declared in the method signature, whilst the polymorphic methods can take parameters of any type while it fits in the signature (in case of Scala language: parameter types). Speaking proper English:

def monomorphic(parameter: Int): String

def polymorphic[T](parameter: T): String


It’s also good to know that a method can be polymorphic due to parameter types or just to parameter subtyping. E.g.:

def parametricallyPolymorphic[T](parameter: T): String

def subtypedPolymorphic(parameter: Animal): String

subtypedPolymorphic(new Cat)

If we use parameter types and we have NO information at all about those types, we are in front of a parametric polymorphism case.

If we use parameter types but we need any extra view / context bound for that type (T <: Whatever o T:TypeClass), then we are talking about ad-hoc polymorphism.

Problem: Function values

There’s not such a problem when using parametric polymorphism and methods but, what about function values? In Scala, it cannot be achieved and therefore it produces some lack of expressiveness:

val monomorphic: Int => String = _.toString

val anotherMonomorphic: List[Int] => Set[Int] = 

Please, notice the definition of the function that trasforms a List into a Set. It could be totally independant of the list element type, but Scala syntax doesn’t allow to define something similar. We could try asigning the method to a val (Eta expansion):

def polymorphic[T](l: List[T]): Set[T] = l.toSet

val sadlyMonomorphic = polymorphic _

But the compiler (as clever as usual) wil infer that the list contained type is Nothing: a special one, but concrete as the most.


Shapeless parametric polymorphism

How does Shapeless solve this problem? If we had to define a transformation function from Option to List in Scala, we’d have the previously mentioned limitation for using function values and we could only achieve it by defining a method:

def toList[T](o: Option[T]): List[T] =

However, Shapeless, using its alchemy, provides us some ways to do so. In category theory, when talking about type constructors transformations, it’s so called natural transformations. First of them has the following notation:

import shapeless.poly._

val polyFunction = new (Option ~> List){
  def apply[T](f: Option[T]): List[T] = f.toList

If you have a closer look at it, the parametric polymorphism is moved to the method inside the object. Using this function is as simple as:

val result: List[Int] = polyFunction(Option(2))

The other possible notation consists on defining the function behavior based on cases, in other words, if we want the function to be defined only for Int, String and Boolean, we’ll add a case for each of them.

import shapeless.Poly1

object polymorphic extends Poly1 {

  implicit optionIntCase = 
    at[Option[Int]]( + 1))

  implicit optionStringCase = 
    at[Option[String]]( + " mutated"))

  implicit optionBooleanCase = 


As you can see, if we want our function to be defined at the case where the input parameter is an Option[Int], we define that all contained elements in the list are added to 1.

This expression returns a this.Case[Option[Int]], where this refers to polymorphic object that we are defining:

implicit optionIntCase = 
  at[Option[Int]]( + 1))

The good part of this? In case we wanted to use the funcion on a input type that doesn’t have a defined case at the function, we’ll get a compile-time error (Awesome, right?):

The result

Applying this last way (based on cases), we get the expected result that we mentioned in the introductory section: to be able to use a for-comprehension for composing different typed values: iterables, Try, Future…

The proposed solution can be found in the following file.

In our function we have a case for GenTraversable, another for Try and Future (this last one needs to have its expression evaluated for being able to iterate over it, so we need a timeout for waiting):

object values extends Poly1 {

  implicit def genTraversableCase[T,C[_]](implicit ev: C[T] => GenTraversable[T]) = 

  implicit def tryCase[T,C[_]](implicit ev: C[T] => Try[T]) = 

  implicit def futureCase[T,C[_]](implicit ev: C[T] => Future[T], atMost: Duration = Duration.Inf) =
    at[C[T]](f => Try(Await.result(f,atMost)).toOption.toStream)


Now we can use it in our controversial code snippet:


case class User(name: String, age: Int)

val result: Stream[_] = for {
  v1 <- values(List(1,2,3))
  v2 <- values(Set("hi","bye"))
  v3 <- values(Option(true))
  v4 <- values(Try(2.0))
  v5 <- values(Future(User("john",15)))
} yield (v1,v2,v3,v4,v5)

The sole solution?

At all! you can always implement it using traditional Scala type classes, though it implies defining a trait that represent the ADT iterable. You can take a look at the example at the following gist content.

Peace out!

Scala: One language to rule them all (II)

You wouldn’t put a layman under the controls of a brand new Airbus 380. Such a powerful tool requires its users to be trained; something similar happens with Scala when it is used to cast new rings of power, I mean, DSLs. Some theory and lots of learning by doing.  We’ve already seen a bit of theory and now is time to learn by building a DSL from scratch.

Our DSL’s target

Our brand new DSL is intended to serve as an didactic exercise. However, despite of its purpose, it needs to have a target. That is (or):

  • A process or system to govern.
  • Another language to serve as proxy of.

In our example we’ll tackle the second case.

Introducing AWK

Are you kidding? does it need introduction?
Let your unix terminal introduce it for me:

man awk | head -n 6

GAWK(1) Utility Commands GAWK(1)

gawk – pattern scanning and processing language

OK, Wikipedia seems a little more verbose:

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline

Consider these two heroes:


Gnu has a belt you can’t see in this drawing and there he hides a powerful weapon: AWK.
It is so powerful because it allows many scripts, running  upon (and used to build) GNU/Linux/BSD distributions, to transform, filter, aggregate,… data from other commands outputs. Let’s see an example:


Here, the output generated by lsmod is piped to AWK whereby each line is processed by extracting its second column value which will be accumulated in a total byte counter. This way, when the output has been exhausted, total will be printed as the total amount of Kilo Bytes of memory used by Linux Kernel modules.

A hard nut to crack?

The applications of AWK are innumerable as well as the amount of time you can save once you have a grasp of it. However, for many people, it is more like…


… than a helping tool. Its 1475 lines of man pages aren’t precisely light reading.

It seem therefore that guiding an user through the composition of AWK programs could be of great help. Does this phrase ring a bell to you?

DSLs are small and concise which implies that they guide their users in the process of describing actions, entities or relations within their domain of application.

Yes! A Scala internal DSL could be used to build such a tool!

Hands on the DSL Scalawk construction

The most important thing in the programming language is the name…
Donald Erving Knuth

First things first and, if we take the argument from authority as a valid argument, we should start by giving a name to our DSL.

To be brief: We’ll compose AWK programs by the use of a Scala internal DSL,

Scala + AWK = Scalawk

So far so good!

You can already clone Scalawk source code GitHub

git clone

Divide & Conquer

In the last episode we agreed that the safest way to design a DSL is by the use of the state machine model. These machines are easily decomposed into:

  • States (one of them is the machine initial state).
  • Transitions:
    • Inputs triggering the transition.
    • Transition effects:
      • The machine changes its current state.
      • Some output could be generated besides the state transition.
  • Machine alphabet: Machine input entities.

By drawing the state machine diagram, which is not different from designing an interactive guide, we’ll complete the whole creative process in the creation of the DSL. The remaining work isn’t more than beautiful Scala boilerplate.


All possible users interaction with Scalawk are represented in the previous graph, e.g:

lines splitBy ";" arePresentedAs ('c1, 'c2)


This machine modelling and decomposition leads to the following Scala packages structure:


The nuts & bolts or: How I Learnt to Stop Worrying and Love the Building Blocks that Scala Provides

States, transitions and auxiliary elements are entities contained by the packages listed above. In fact, they are nothing but Scala objects, classes, methods and traits.

Initial State


As we already know,  states are objects. Either they are instances of classes or singleton objects. On the other hand, we’ve also seen that the right way to implement state machines is to make these states immutable, being transitions responsible for new states generation.

The initial state ins’t generated by any transition, it exists as it is from the beginning of the program. That is a good indicator of its singleton nature which is definitely confirmed by the fact that no other instance of this initial state can exist:

object lines extends ToCommandWithSeparator

From the DSL user standpoint, this initial state should be just a word indicating the start of a phrase in the language. That’s another reason supporting the singleton object approach.

The initial state need to transit to the next one, that’s is why lines is extending ToCommandWithSeparator . Don’t rush, but keep in mind that ToCommandWithSeparator is transition set trait.

Transitory and final states

Yeah, states are objects… is that all? No!  There are different kinds of states, besides, many states are quite alike and they could be built from templates. Lets review some tricks and tips.

The top-level classification of non-initial states should be this one: Transitory and final. Conceptually, the former can’t be used to generate a result whereas the latter can. In the concrete case of Scalawk that implies that transitory states can’t generate valid AWK programs but final states can.

Non-final state
Final state

In Scalawk, any entity able to generate valid AWK code should mix AwkElement

trait AwkElement {
  def toAwk: String

By doing so, we are adding toAwk method to that entity, the entry point to ask for AWK code from client code.

Despite of their differences, almost all states share a common set of attributes from which AWK commands can be composed:

  • Command line options, e.g: Token separator
  • Initial program: Instructions to be run by AWK before starting line processing. e.g: Initialize a counter value.
  • Line program: Instructions to be executed for each line of the AWK input. e.g: Printing a line; adding a value to an accumulator initialized at the Initial Program.
  • End program: Instructions to be executed after all lines have been processed, that is, after Line Program has been run using each single input line as its own input. e.g: Printing counters values.

At each state, these fields might be empty or not and when a final state is asked to produce an AWK program, they will be used to generate the result string value.

abstract class AwkCommand protected(
  private[scalawk] val commandOptions: Seq[String] = Seq.empty,
  private[scalawk] val linePresentation: Seq[AwkExpression] = Seq.empty,
  private[scalawk] val lineProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val initialProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val endProgram: Seq[SideEffectStatement] = Seq.empty
) {
  def this(prev: AwkCommand) = this(

So far, we know that non-initial states:

  • For sure, contain the building blocks of a result and propagate the previous state contents for these fields:  Then they should extend AwkCommand abstract class.
  • Most probably, add or change some piece of information from the previous state AwkCommand attributes: Then they should override AwkCommand attributes.
  • Optionally, can transit to another state: If it is the case, they should have a method with the type of the target state as return value or mix  a transition family trait.

You might be thinking: Why is AwkCommand an abstract class and not a trait?
Well, AwkCommand‘s goal is to provide a re-usable code for continuity. That is, it provides the constructor to build a state from another state (prev parameter). This way, states code is reduced to just their transitions and AwkCommand attribute overrides but only for those attributes whose information is changing in the new state.

Obviously, the only way to provide a constructor in a class hierarchy is by providing a class, if this class can’t be instantiated: Make it abstract.


class CommandWithLineProgram(
                              statements: Seq[SideEffectStatement]
                            )(prev: AwkCommand) extends AwkCommand(prev)
  with ToSolidCommand {

  override private[scalawk]val lineProgram: Seq[SideEffectStatement] = statements


CommandWithLineProgram is a non-final state hence it doesn’t mix AwkElement trait.

//This is the first state which can be used to obtain an AWK command string `toAwk`
class SolidCommand(val lineResult: Seq[AwkExpression], prevSt: AwkCommand) extends AwkCommand(prevSt)
  with AwkElement
  with ToCommandWithLastAction {

On the contrary, SolidCommand does, therefore needs to provide an implementation to toAwk method:

 // AWK Program sections

protected def beginBlock: String = programToBlock(initialProgram)

// Per-line
protected def eachLineActionBlock: String =
programToBlock(lineProgram ++ => Print(linePresentation)))

// END
protected def endBlock: String = programToBlock(endProgram)

protected def programToBlock(program: Seq[SideEffectStatement]) =
{ mkString "; "} + => "; ").getOrElse("")

protected def optionsBlock: String =
{commandOptions mkString " "} + => " ").getOrElse("")

override def toAwk: String =
s"""|awk ${optionsBlock}'
|${identifyBlock("BEGIN", beginBlock)}
|${identifyBlock("", eachLineActionBlock)}
|${identifyBlock("END", endBlock)}'""".stripMargin.replace("\n", "")

//Auxialiary methods
private[this] def identifyBlock(blockName: String, blockAwkCode: String): String = => s"$blockName{$blockAwkCode}").getOrElse("")

This class hierarchy enables code re-utilization, for example, SolidCommandWithLastAction is almost an exact copy of SolidCommand and nothing prevents us from extending it in order to define SolidCommandWithLastAction:

class SolidCommandWithLastAction(lastAction: Seq[SideEffectStatement])(prevSt: SolidCommand)
extends SolidCommand(prevSt) {...}

At this point, you should be able to start exploring the repository as well as to associate each state from the graph diagram with a state class in the code. Just in case, the following table collect these associations:

Graph Node


Is final state?

Entity Kind




Singleton Object





with line program




with initial program




solid command




with last action





Transitions between states are the easy part, they are as simple as methods returning new states. Thanks to Scala infix notation they create the illusion of natural language expressions, at least to some degree…

Some states might share transitions so it seems a good a idea to create traits containing them. By the use of mixing, states can thus use them as LEGO pieces in order to build their own transition set.

There are two special cases which deserve special attention: Groups of transitions and Empty input transitions.

Group of transitions…

… are composed by transitions which are always present together or which are different versions of the same transition. These are normally defined at the same trait named following the pattern To<STATE_CLASS_NAME>.

trait ToCommandWithSeparator {

  def splitBy(separator: String) = new CommandWithSeparator(separator)
  def splitBy(separator: Regex) =  new CommandWithSeparator(separator)


The example above is a rather clear case of two versions of the same transition: One receiving a string input and the other receiving a regular expression.

Note that, in relation with the abstract state machine, the machine input is both the method name and its parameters.

Empty input transitions

Consider the following transition, extracted from our state machine:


State machines can move from one state to another when the input is an empty string. It might seem bizarre but it can be done with our object modelling of state machines thanks to implicit conversions.

An implicit conversion from a state (Source) to another (Target) by just trying to access one of Target methods having a Source instance will be perceived by the DSL user as an empty transition. As simply as it sounds.

What is more, by just defining the implicit conversion at the companion object of either the Source or Target class/trait, it will be available in the scope of the sentence where the empty transition occurs. No need of imports which means: ABSOLUTELY TRANSPARENCY for the user.

Thus, the following code:

object ToCommandWithSeparator {
  implicit def toCommandWithSep(x: ToCommandWithSeparator) = new CommandWithSeparator()

… enables the transition described in the diagram below:



– If ToCommandWithSeparator is a transition family trait, isn’t its equally named companion object the companion object of that transition family? Didn’t we set that the implicit conversion should be defined within Source or Target‘s companion object and, therefore, within a state class?

– Exactly! And what’s ToCommandWithSeparator‘s fate if not to be mixed at a state class definition?

In Scala, implicit conversions defined at a trait companion object will also apply for those classes extending or mixing that trait and they’ll be available on any scope where that class is available. This feature, besides being extremely useful, seems to be a very rational one: The class mixing/extending the trait can be regarded as a kind-of that trait, a subtype, so any of its instances are also of the type given by the trait and it should be expected that any conversion applying to that type could also apply to these instances.

Take, for example, the traits T and , both having companion objects where implicit conversions to and are respectively defined:

case class E(x: Int)
case class D(x: Int)

trait T
object T { implicit def toD(from: T): D = D(1) }

trait S
object S { implicit def toE(from: S): E = E(2) }

Mix both of them in the definition of the class C

case class C() extends T with S

… and check how a C instance is implicitly converted into instances of E or D.

scala> val c: C = C()
c: C = C()

scala> val d: D = c
d: D = D(1)
scala> val e: E = c
e: E = E(2)


Most Scalawk transition inputs fit into the pattern transition name + basic type value. However, some of them receive expressions, identifiers or sentences. These are particular types created to express instructions and values within the AWK program. Hence they should not be part of a generic guide on how to create a DSL. Yet, the constructs behind them are not uncommon in many Scala Internal DSLs so we’ll take a brief look at some of them.

Identifiers in Scalawk (Internal Identifiers)

Some DSL expressions, such as arePresentedAs, need to make reference to AWK variables, declared by some other DSL expression. You could use strings to represent these internal identifiers. But having to surround our DSL identifiers in double quotes throws a bucket of cold water on its user experience, making the user conscious of the fact that she/he is actually using Scala instead of a crafted-from-scratch domain language.

Scala offers a mechanism to get unique objects for equal strings. That is exactly what a good DSL identifier needs.

If anyone writes:


… her/he will be obtaining a reference to a Symbol instance.  Symbol class has a name attribute whereby you can recover the character string used to obtain the instance.

The user would just write ‘counter and the DSL developer can obtain the string counter and use it for the internal representation of, in this case, the AWK variable.


By combining internal identifiers with ad-hoc classes and implicit conversions it isn’t difficult to express assignation sentences, event algebraic operations.

's := 's + 1

This article is already far too long and with what has been hinted and the tricks of previous sections it is actually easy to understand the code to build this kind of expressions. That code is located under entities package.  Take that package as if it were a embedded DSL within Scalawk, yes, a DSL embedded in a DSL which is in turn embedded in Scala.

Some final thoughts

Developing internal DSLs isn’t precisely a piece of cake. If being forced to fall back to the host language constructs, users will easily wake up from the dream of being using a language built for them from the ground up. Nobody likes to be reminded he/her is not that special.


You’ll encounter many pitfalls when trying to faithfully reproduce the state machine, the temptation to abandon the model is huge. Trust me, this is complex and, if you leave the path of the state machine, the ogres in the forest will have your heart for breakfast sooner than you imagine. Scala has proven it can offer a solution to any bump of the state machine road, outside that road you are free to face a minefield.

As a last piece of advice, I’d recommend you to buy a white/black board, a paper notebook and a good pen… whatever you feel confortable to draw on.

These are two examples of the early stages of design of Scalawk:



Think! draw! think! and draw again! so you are a better DSL architect than this guy…

… so your Neo(s) wouldn’t wake so easily.