Scala, Scala and nothing more

After so many posts on Scala, many snippets of code, and many weird concepts, there comes a post that had been pending since almost the beginning of this blog. What resources exist to learn Scala? Well … now we show our favorites.


Books: a good starting point 🙂

  • Programming in Scala: a classic. Written by the very Odersky and with the third edition hot off the press.
  • Funcional Programming in Scala (also known, in an effort of originality and simplicity, as the red book). A book that opens the mind to the functional paradigm and it is advisable not only for people who want to program in Scala, but for people who are interested in functional programming.

Courses: to deepen a bit more.

  • On the online courses, Coursera and the new specialization of functional programming in Scala they have all the monopoly. It consists of 4 courses + 1 project and discusses in depth the most important aspects of Scala.
  • Furthermore, scala-exercises allows us to practice in a faster way, but no less effective. Fully recommended.

Events: not only to drink beer  😉

  • Scala Days : Undoubtedly the most important annual convention about Scala. Since a year, there are two conventions at year: one in America and one in Europe. In addition, all the videos of the talks are published 🙂
  • Scala World: UK convention with a warm welcome and where you can hear and see some of the biggest Scala gurus.
  • Scala eXchange: another of the best known conventions. It takes place on December in London.

These are the resources that we know and that we found interesting. Without doubt, there are many more that we left unsaid. If you missing some resource on this list, any contribution is welcome 🙂

Scala: Code interpretation at runtime

With today’s post, we’ll dive into Scala code generation on the fly: at runtime. We have to be cautious of not mixing concepts with Scala macros, which generate code at compile-time. These make use of Scala type system, which is much safer than generating code at runtime.


When is it useful to make use of this mechanism then? We’ll try to shed light on this by following a very simple example, getting abstract of real implementation (that you may find at Scalera’s Github).

The problem: “da” serializer

Let’s suppose a not-so-wild case, which consists on communicating two services via an event bus (we’ll get abstract of its implementation: it could be a message queue like Kafka, Akka streams, …).

The main idea is the following:


The producer knows how to send and the consumer has an associated callback for message arrivals:

trait Producer{
  def produce(message: Any): Try[Unit]
trait Consumer{
  val consume: Any => Unit

Both producer and consumer services know the message types that may arrive. In our example, they could be one of these:

case class Foo(att1: Int, att2: String)
case class Bar(att1: String)

If the producer wants to send a message using the event bus, it will have to serialize it somehow (JSON, Byte array, XML, …) so, by the time it reaches the opposite end, the consumer will start the inverse process (deserialization) and will get the original message.

…nothing weird so far.


If we have a JSON serializer …

trait JsonSer[T] {
  def serialize(t: T): String
  def deserialize(json: String): T

and we serialize our message …

implicit val fooSerializer: JsonSer[Foo] = ???
val foo: Foo = ???

How do we know which deserializer to use when the consumer gets the message?

Option 1: Try every possible serializer until one of them works

In our consumer, we’d have:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = {
    case message: String =>
      Seq(barSerializer, fooSerializer).flatMap { ser =>
      }.headOption.fold(ifEmpty = println("Couldn't deserialize")) {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")

A lil’ bit coarse, right? If the proper serializer is the last of a 100 list, we would have tried and failed with 99 serializers before (Such a waste of CPU!).


Besides, we could also consider the case of having a different type deserializer, but it fits with the received message, so it would partially or wrongly deserialize the message.

Option 2: Add the message type to the message itself

We could add an extra layer wrapping the message for indicating the message type that it contains. This way, when receiving it, we could determine the serializer type we have to use.


For writing the type, we’ll make use of Scala’s TypeTags, getting info about the T contained message type.

//Wrapper for the message (and its serializer)

import scala.reflect.runtime.universe.{TypeTag, typeTag}

case class Message[T: TypeTag](content: T){
  val messageType: Message.Type = typeTag[T].tpe.toString
object Message {

  type Type = String

  def typeFrom(msg: String): Message.Type = ???

  implicit def messageSer[T:TypeTag:JsonSer]: JsonSer[Message[T]] = ???


//We'll make use of it for sending


//And we redefine the consumer

lazy val consumer = new Consumer {
    override val consume: Any => Unit = {
      case message: String =>
        Message.typeFrom(message) match {

          case "org.scalera.reflect.runtime.Bar" =>
            println("it's a bar!")
            val value = messageSer[Bar].deserialize(message).content

          case "org.scalera.reflect.runtime.Foo" =>
            val value = messageSer[Foo].deserialize(message).content
            println("it's a foo!")

          case _ =>
            println("it's ... something!")

As you can see, we don’t have to try every possible serializer anymore. Instead of that, from the message type we’ve extracted from the Message wrapper we have added, we’re able to use the proper deserializer.

But it is also true that we have to add an extra case for each string that represents the message type. Wouldn’t it be nice to deserialize somehow and to have the defined case only for the already deserialized objects? (Something similar what we first tried but without trying all posible serializers). Something like:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { msg =>
    genericDeserialize(msg) match {
      case f: Foo =>
      case b: Bar =>
      case _ =>

Option 2-cool: Serializers ‘under the hood’

For achieving something similar, we have to get focused on that genericDeserialize method: Which signature should it have? Initially, something like this:

def genericDeserialize(msg: String): Any

An Any? Seriously? My fellows, at runtime, we have no idea about the type we can get. We just know that, from a String, we’ll get ‘some….thing’. The match that applies to that Any will allow us to go from something totally abstract to more concrete types.

At this point is where both reflect library and Scala compiler appear.


The Toolbox API allows parsing strings and getting the resulting AST (abstract syntax tree). From that AST, it is able to evaluate the expression and returns an instance of an Any as well.

For instantiating a Toolbox and use type references, we hace to add as SBT dependencies the following:

libraryDependencies ++= Seq(
  "org.scala-lang" % "scala-compiler" % "2.11.8",
  "org.scala-lang" % "scala-reflect" % "2.11.8")

For example, if we wanted to parse the "2".toInt + 4 operation,

import scala.reflect.runtime.{universe => ru}
import ru._

//  Scala compiler tool box
val tb = ru.runtimeMirror(

println(ru.showRaw(tb.parse("2".toInt + 4")))

we would get the abstract syntax tree generated as a String (by using showRaw):

Apply(Select(Select(Literal(Constant("2")), TermName("toInt")), TermName("$plus")), List(Literal(Constant(4))))

If we use the toolbox for evaluating the parsed expression,

println(tb.eval(tb.parse("2".toInt + 4")))

we’ll get an Any that represents the resulting value of the sum:


“Da” serializer

Once we’ve seen how it generally works, we apply the same principle to our serializer, so the expression we’re going to try to interpret is similar to:

  import scala.reflect._;
  import spray.json._;
  import org.scalera.reflect.runtime._;
  import MySprayJsonImplicits._;
  import MyJsonSerImplicits._;


where MySprayJsonImplicits and MyJsonSerImplicits represent the objects that contain both the Spray implicits for JsonFormat and the JsonSer implicits that we have defined before.

$messageType represents the concrete type to deserialize that we would have got by using the TypeTag (as seen before).

If we adapt it to our code, we’ll get something similar to:

object GenSer {

  import scala.reflect.runtime.{universe => ru}
  import ru._

  //  Scala compiler tool box
  private val tb = ru.runtimeMirror(this.getClass.getClassLoader).mkToolBox()

  def genericDeserialize(msg: String)(serContainers: Seq[AnyRef]): Any = {

    val messageType = Message.typeFrom(msg)

    val serContainersImport = =>
      "import " + container.toString.split("\\$").head + "._").mkString(";\n")

    val expr =
         |  import scala.reflect._;
         |  import spray.json._;
         |  import org.scalera.reflect.runtime._;
         |  $serContainersImport;
         |  implicitly[JsonSer[Message[$messageType]]]



If you take a look, we’ve empowered the generic deserialization method notation to hold a sequence of objects to import, so we won’t make explicit which object contains the Spray implicits and which one contains our JsonSer‘s.

val serContainersImport = =>
  "import " + container.toString.split("\\$").head + "._").mkString(";\n")

It is also noteworthy that, when deserializing, we get a Message[Any]; so we’ll have to get the ‘content’ field that represents the raw value of Any type that holds the deserialized message.

The result

So finally, we can now make use of our function for generically deserialize and let our consumer code be ‘swaggy’:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { 
    case msg: String =>
      genericDeserialize(msg)(Seq(case3,Message)) match {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")


Being able to evaluate code at runtime let us do a lot of interesting things when we want to interpret String based types. However, these kind of techniques won’t care about the so worthy and safe Scala type system, so these are tools to be used with a lil’ bit of care.

It is also kind of expensive in time terms to evaluate these expressions. I would recommend to use a type cache in this case. Something really simple like a map:

type TypeName = String
var cache: Map[TypeName, JsonSer[Message[_]]]

And when invoking the method, we’ll check if the type evidence (JsonSer) we’re looking for is already stored in the map. If so, we’ll use it, otherwise, we’ll create and store it in the cache, using it as result.


Easy peasy…
Peace out!

Graffiti Rules: playing with JSON [Snow]

A few months ago we talked about Spray, a toolkit that allowed us to build REST APIs in an easy way with a pretty complete DSL.

One of the components belonging to this toolkit is spray-json. This module allows us to serialize and deserialize our objects to/from a JSON format. Today we’ll see how to work with it quickly and easily.

How to create serializers to different classes? Well, there are two options depending on how the class we want to serialize is defined.

Easy option: when we have a case class

Well, that’s a piece of cake. The only thing we need to use is the jsonFormatN() method where N represents the number of arguments of the case class apply method. Let’s see it with an example:

case class Character(name: String, family: String, isDead: Boolean)

object MyJsonProtocol extends DefaultJsonProtocol {

 implicit val characterFormat = jsonFormat3(Character.apply)


As you can see, in order to create a serializer for the Character case class, we create an implicit value with the help of jsonFormat3 (as it has 3 attributes). Such an implicit object will be defined inside a trait which will extend from DefaultJsonProtocol. In this trait, the serializers for Scala basic types (Int, String, Boolean…) are defined… Easy peasy.


Less easy option: when we don’t have a case class or we want to serialize a case class in a different way.

In such a case, we’ll need to implement how the serialization and deserialization of the given type should be. To do so, we may need to create an implicit object extending from RootJsonFormat[T] where T is the type we want to serialize. Such a trait contains two methods that need to be implemented. On the one hand, there is the write method, which converts a typeT in a Json, and on the other a method read, performing the reverse process. We’ll now see an example with the same type as before:

class Character(val name: String, val family: String, val isDead: Boolean)

object MyJsonProtocol extends DefaultJsonProtocol {

  implicit object CharacterJsonFormat extends RootJsonFormat[Character] {

    def write(c: Character) =
      JsArray(JsString(, JsString(, JsBoolean(c.isDead))

    def read(value: JsValue) = value match {
      case JsArray(Vector(JsString(name), JsString(family), JsBoolean(isDead))) =>
        new Character(name, family, isDead.toBoolean)
      case _ => throw new DeserializationException("Character expected")

As can be observed, it is a bit more tedious but not too complicated.

And how do we use it? We just have to import the object where we have defined the implicits and call the methods toJson or convertTo[T].

import MyJsonProtocol._

val json = Character("Jon Snow", "Stark", ???).toJson //....You know nothing!!!!
// Returns {"name": "Jon Snow", "family": "Stark", "isDead": ???}
val jonSnow = json.convertTo[Character]


Besides, if we use Spray’s REST API, the transformation will be done in a transparent way and it won’t be necessary to call  toJson or convertTo explicitely. But we’ll leave that to other post. Next week we’ll be talking about this library so we’d better see something of it before 🙂

See you all!

Shapeless: polymorphic functions

Time ago, our friend Javier Fuentes illustrated us with an introduction to Shapeless.
Some months after that, at Scala Madrid meetup, he offered a pretty interesting speech about structural induction with Shapeless and HLists. We couldn’t avoid it and we got the enthusiasm flu 😛

What we want to achieve

Let’s set as our study case what I think more than one has thought before: how can I compose in the same for-comprehension different types. Something like:

import scala.util.Try

for {
  v1 <- List(1,2,3)
  v2 <- Try("hi, person ")
} yield s"$v2 $v1"

which usually comes from the hand of the following frustrating error:

<console>:15: error: type mismatch;
 found   : scala.util.Try[String]
 required: scala.collection.GenTraversableOnce[?]
         v2 <- Try("hi, person ")

Therefore we need a way to transform these data types (Future, Try) into iterable ‘stuff’ (GenTraversable[T] might work). In our example we won’t have in mind the information that we loose about the error that might happen, for example, if certain Try or Future expression has failed. For have a better understanding about the problem we present, let’s talk about some theory.


Monomorphism vs polymorphism

We say a method is monomorphic when you can only invoke it with parameters whose concrete type is explicitly declared in the method signature, whilst the polymorphic methods can take parameters of any type while it fits in the signature (in case of Scala language: parameter types). Speaking proper English:

def monomorphic(parameter: Int): String

def polymorphic[T](parameter: T): String


It’s also good to know that a method can be polymorphic due to parameter types or just to parameter subtyping. E.g.:

def parametricallyPolymorphic[T](parameter: T): String

def subtypedPolymorphic(parameter: Animal): String

subtypedPolymorphic(new Cat)

If we use parameter types and we have NO information at all about those types, we are in front of a parametric polymorphism case.

If we use parameter types but we need any extra view / context bound for that type (T <: Whatever o T:TypeClass), then we are talking about ad-hoc polymorphism.

Problem: Function values

There’s not such a problem when using parametric polymorphism and methods but, what about function values? In Scala, it cannot be achieved and therefore it produces some lack of expressiveness:

val monomorphic: Int => String = _.toString

val anotherMonomorphic: List[Int] => Set[Int] = 

Please, notice the definition of the function that trasforms a List into a Set. It could be totally independant of the list element type, but Scala syntax doesn’t allow to define something similar. We could try asigning the method to a val (Eta expansion):

def polymorphic[T](l: List[T]): Set[T] = l.toSet

val sadlyMonomorphic = polymorphic _

But the compiler (as clever as usual) wil infer that the list contained type is Nothing: a special one, but concrete as the most.


Shapeless parametric polymorphism

How does Shapeless solve this problem? If we had to define a transformation function from Option to List in Scala, we’d have the previously mentioned limitation for using function values and we could only achieve it by defining a method:

def toList[T](o: Option[T]): List[T] =

However, Shapeless, using its alchemy, provides us some ways to do so. In category theory, when talking about type constructors transformations, it’s so called natural transformations. First of them has the following notation:

import shapeless.poly._

val polyFunction = new (Option ~> List){
  def apply[T](f: Option[T]): List[T] = f.toList

If you have a closer look at it, the parametric polymorphism is moved to the method inside the object. Using this function is as simple as:

val result: List[Int] = polyFunction(Option(2))

The other possible notation consists on defining the function behavior based on cases, in other words, if we want the function to be defined only for Int, String and Boolean, we’ll add a case for each of them.

import shapeless.Poly1

object polymorphic extends Poly1 {

  implicit optionIntCase = 
    at[Option[Int]]( + 1))

  implicit optionStringCase = 
    at[Option[String]]( + " mutated"))

  implicit optionBooleanCase = 


As you can see, if we want our function to be defined at the case where the input parameter is an Option[Int], we define that all contained elements in the list are added to 1.

This expression returns a this.Case[Option[Int]], where this refers to polymorphic object that we are defining:

implicit optionIntCase = 
  at[Option[Int]]( + 1))

The good part of this? In case we wanted to use the funcion on a input type that doesn’t have a defined case at the function, we’ll get a compile-time error (Awesome, right?):

The result

Applying this last way (based on cases), we get the expected result that we mentioned in the introductory section: to be able to use a for-comprehension for composing different typed values: iterables, Try, Future…

The proposed solution can be found in the following file.

In our function we have a case for GenTraversable, another for Try and Future (this last one needs to have its expression evaluated for being able to iterate over it, so we need a timeout for waiting):

object values extends Poly1 {

  implicit def genTraversableCase[T,C[_]](implicit ev: C[T] => GenTraversable[T]) = 

  implicit def tryCase[T,C[_]](implicit ev: C[T] => Try[T]) = 

  implicit def futureCase[T,C[_]](implicit ev: C[T] => Future[T], atMost: Duration = Duration.Inf) =
    at[C[T]](f => Try(Await.result(f,atMost)).toOption.toStream)


Now we can use it in our controversial code snippet:


case class User(name: String, age: Int)

val result: Stream[_] = for {
  v1 <- values(List(1,2,3))
  v2 <- values(Set("hi","bye"))
  v3 <- values(Option(true))
  v4 <- values(Try(2.0))
  v5 <- values(Future(User("john",15)))
} yield (v1,v2,v3,v4,v5)

The sole solution?

At all! you can always implement it using traditional Scala type classes, though it implies defining a trait that represent the ADT iterable. You can take a look at the example at the following gist content.

Peace out!

Scala: One language to rule them all (II)

You wouldn’t put a layman under the controls of a brand new Airbus 380. Such a powerful tool requires its users to be trained; something similar happens with Scala when it is used to cast new rings of power, I mean, DSLs. Some theory and lots of learning by doing.  We’ve already seen a bit of theory and now is time to learn by building a DSL from scratch.

Our DSL’s target

Our brand new DSL is intended to serve as an didactic exercise. However, despite of its purpose, it needs to have a target. That is (or):

  • A process or system to govern.
  • Another language to serve as proxy of.

In our example we’ll tackle the second case.

Introducing AWK

Are you kidding? does it need introduction?
Let your unix terminal introduce it for me:

man awk | head -n 6

GAWK(1) Utility Commands GAWK(1)

gawk – pattern scanning and processing language

OK, Wikipedia seems a little more verbose:

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline

Consider these two heroes:


Gnu has a belt you can’t see in this drawing and there he hides a powerful weapon: AWK.
It is so powerful because it allows many scripts, running  upon (and used to build) GNU/Linux/BSD distributions, to transform, filter, aggregate,… data from other commands outputs. Let’s see an example:


Here, the output generated by lsmod is piped to AWK whereby each line is processed by extracting its second column value which will be accumulated in a total byte counter. This way, when the output has been exhausted, total will be printed as the total amount of Kilo Bytes of memory used by Linux Kernel modules.

A hard nut to crack?

The applications of AWK are innumerable as well as the amount of time you can save once you have a grasp of it. However, for many people, it is more like…


… than a helping tool. Its 1475 lines of man pages aren’t precisely light reading.

It seem therefore that guiding an user through the composition of AWK programs could be of great help. Does this phrase ring a bell to you?

DSLs are small and concise which implies that they guide their users in the process of describing actions, entities or relations within their domain of application.

Yes! A Scala internal DSL could be used to build such a tool!

Hands on the DSL Scalawk construction

The most important thing in the programming language is the name…
Donald Erving Knuth

First things first and, if we take the argument from authority as a valid argument, we should start by giving a name to our DSL.

To be brief: We’ll compose AWK programs by the use of a Scala internal DSL,

Scala + AWK = Scalawk

So far so good!

You can already clone Scalawk source code GitHub

git clone

Divide & Conquer

In the last episode we agreed that the safest way to design a DSL is by the use of the state machine model. These machines are easily decomposed into:

  • States (one of them is the machine initial state).
  • Transitions:
    • Inputs triggering the transition.
    • Transition effects:
      • The machine changes its current state.
      • Some output could be generated besides the state transition.
  • Machine alphabet: Machine input entities.

By drawing the state machine diagram, which is not different from designing an interactive guide, we’ll complete the whole creative process in the creation of the DSL. The remaining work isn’t more than beautiful Scala boilerplate.


All possible users interaction with Scalawk are represented in the previous graph, e.g:

lines splitBy ";" arePresentedAs ('c1, 'c2)


This machine modelling and decomposition leads to the following Scala packages structure:


The nuts & bolts or: How I Learnt to Stop Worrying and Love the Building Blocks that Scala Provides

States, transitions and auxiliary elements are entities contained by the packages listed above. In fact, they are nothing but Scala objects, classes, methods and traits.

Initial State


As we already know,  states are objects. Either they are instances of classes or singleton objects. On the other hand, we’ve also seen that the right way to implement state machines is to make these states immutable, being transitions responsible for new states generation.

The initial state ins’t generated by any transition, it exists as it is from the beginning of the program. That is a good indicator of its singleton nature which is definitely confirmed by the fact that no other instance of this initial state can exist:

object lines extends ToCommandWithSeparator

From the DSL user standpoint, this initial state should be just a word indicating the start of a phrase in the language. That’s another reason supporting the singleton object approach.

The initial state need to transit to the next one, that’s is why lines is extending ToCommandWithSeparator . Don’t rush, but keep in mind that ToCommandWithSeparator is transition set trait.

Transitory and final states

Yeah, states are objects… is that all? No!  There are different kinds of states, besides, many states are quite alike and they could be built from templates. Lets review some tricks and tips.

The top-level classification of non-initial states should be this one: Transitory and final. Conceptually, the former can’t be used to generate a result whereas the latter can. In the concrete case of Scalawk that implies that transitory states can’t generate valid AWK programs but final states can.

Non-final state
Final state

In Scalawk, any entity able to generate valid AWK code should mix AwkElement

trait AwkElement {
  def toAwk: String

By doing so, we are adding toAwk method to that entity, the entry point to ask for AWK code from client code.

Despite of their differences, almost all states share a common set of attributes from which AWK commands can be composed:

  • Command line options, e.g: Token separator
  • Initial program: Instructions to be run by AWK before starting line processing. e.g: Initialize a counter value.
  • Line program: Instructions to be executed for each line of the AWK input. e.g: Printing a line; adding a value to an accumulator initialized at the Initial Program.
  • End program: Instructions to be executed after all lines have been processed, that is, after Line Program has been run using each single input line as its own input. e.g: Printing counters values.

At each state, these fields might be empty or not and when a final state is asked to produce an AWK program, they will be used to generate the result string value.

abstract class AwkCommand protected(
  private[scalawk] val commandOptions: Seq[String] = Seq.empty,
  private[scalawk] val linePresentation: Seq[AwkExpression] = Seq.empty,
  private[scalawk] val lineProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val initialProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val endProgram: Seq[SideEffectStatement] = Seq.empty
) {
  def this(prev: AwkCommand) = this(

So far, we know that non-initial states:

  • For sure, contain the building blocks of a result and propagate the previous state contents for these fields:  Then they should extend AwkCommand abstract class.
  • Most probably, add or change some piece of information from the previous state AwkCommand attributes: Then they should override AwkCommand attributes.
  • Optionally, can transit to another state: If it is the case, they should have a method with the type of the target state as return value or mix  a transition family trait.

You might be thinking: Why is AwkCommand an abstract class and not a trait?
Well, AwkCommand‘s goal is to provide a re-usable code for continuity. That is, it provides the constructor to build a state from another state (prev parameter). This way, states code is reduced to just their transitions and AwkCommand attribute overrides but only for those attributes whose information is changing in the new state.

Obviously, the only way to provide a constructor in a class hierarchy is by providing a class, if this class can’t be instantiated: Make it abstract.


class CommandWithLineProgram(
                              statements: Seq[SideEffectStatement]
                            )(prev: AwkCommand) extends AwkCommand(prev)
  with ToSolidCommand {

  override private[scalawk]val lineProgram: Seq[SideEffectStatement] = statements


CommandWithLineProgram is a non-final state hence it doesn’t mix AwkElement trait.

//This is the first state which can be used to obtain an AWK command string `toAwk`
class SolidCommand(val lineResult: Seq[AwkExpression], prevSt: AwkCommand) extends AwkCommand(prevSt)
  with AwkElement
  with ToCommandWithLastAction {

On the contrary, SolidCommand does, therefore needs to provide an implementation to toAwk method:

 // AWK Program sections

protected def beginBlock: String = programToBlock(initialProgram)

// Per-line
protected def eachLineActionBlock: String =
programToBlock(lineProgram ++ => Print(linePresentation)))

// END
protected def endBlock: String = programToBlock(endProgram)

protected def programToBlock(program: Seq[SideEffectStatement]) =
{ mkString "; "} + => "; ").getOrElse("")

protected def optionsBlock: String =
{commandOptions mkString " "} + => " ").getOrElse("")

override def toAwk: String =
s"""|awk ${optionsBlock}'
|${identifyBlock("BEGIN", beginBlock)}
|${identifyBlock("", eachLineActionBlock)}
|${identifyBlock("END", endBlock)}'""".stripMargin.replace("\n", "")

//Auxialiary methods
private[this] def identifyBlock(blockName: String, blockAwkCode: String): String = => s"$blockName{$blockAwkCode}").getOrElse("")

This class hierarchy enables code re-utilization, for example, SolidCommandWithLastAction is almost an exact copy of SolidCommand and nothing prevents us from extending it in order to define SolidCommandWithLastAction:

class SolidCommandWithLastAction(lastAction: Seq[SideEffectStatement])(prevSt: SolidCommand)
extends SolidCommand(prevSt) {...}

At this point, you should be able to start exploring the repository as well as to associate each state from the graph diagram with a state class in the code. Just in case, the following table collect these associations:

Graph Node


Is final state?

Entity Kind




Singleton Object





with line program




with initial program




solid command




with last action





Transitions between states are the easy part, they are as simple as methods returning new states. Thanks to Scala infix notation they create the illusion of natural language expressions, at least to some degree…

Some states might share transitions so it seems a good a idea to create traits containing them. By the use of mixing, states can thus use them as LEGO pieces in order to build their own transition set.

There are two special cases which deserve special attention: Groups of transitions and Empty input transitions.

Group of transitions…

… are composed by transitions which are always present together or which are different versions of the same transition. These are normally defined at the same trait named following the pattern To<STATE_CLASS_NAME>.

trait ToCommandWithSeparator {

  def splitBy(separator: String) = new CommandWithSeparator(separator)
  def splitBy(separator: Regex) =  new CommandWithSeparator(separator)


The example above is a rather clear case of two versions of the same transition: One receiving a string input and the other receiving a regular expression.

Note that, in relation with the abstract state machine, the machine input is both the method name and its parameters.

Empty input transitions

Consider the following transition, extracted from our state machine:


State machines can move from one state to another when the input is an empty string. It might seem bizarre but it can be done with our object modelling of state machines thanks to implicit conversions.

An implicit conversion from a state (Source) to another (Target) by just trying to access one of Target methods having a Source instance will be perceived by the DSL user as an empty transition. As simply as it sounds.

What is more, by just defining the implicit conversion at the companion object of either the Source or Target class/trait, it will be available in the scope of the sentence where the empty transition occurs. No need of imports which means: ABSOLUTELY TRANSPARENCY for the user.

Thus, the following code:

object ToCommandWithSeparator {
  implicit def toCommandWithSep(x: ToCommandWithSeparator) = new CommandWithSeparator()

… enables the transition described in the diagram below:



– If ToCommandWithSeparator is a transition family trait, isn’t its equally named companion object the companion object of that transition family? Didn’t we set that the implicit conversion should be defined within Source or Target‘s companion object and, therefore, within a state class?

– Exactly! And what’s ToCommandWithSeparator‘s fate if not to be mixed at a state class definition?

In Scala, implicit conversions defined at a trait companion object will also apply for those classes extending or mixing that trait and they’ll be available on any scope where that class is available. This feature, besides being extremely useful, seems to be a very rational one: The class mixing/extending the trait can be regarded as a kind-of that trait, a subtype, so any of its instances are also of the type given by the trait and it should be expected that any conversion applying to that type could also apply to these instances.

Take, for example, the traits T and , both having companion objects where implicit conversions to and are respectively defined:

case class E(x: Int)
case class D(x: Int)

trait T
object T { implicit def toD(from: T): D = D(1) }

trait S
object S { implicit def toE(from: S): E = E(2) }

Mix both of them in the definition of the class C

case class C() extends T with S

… and check how a C instance is implicitly converted into instances of E or D.

scala> val c: C = C()
c: C = C()

scala> val d: D = c
d: D = D(1)
scala> val e: E = c
e: E = E(2)


Most Scalawk transition inputs fit into the pattern transition name + basic type value. However, some of them receive expressions, identifiers or sentences. These are particular types created to express instructions and values within the AWK program. Hence they should not be part of a generic guide on how to create a DSL. Yet, the constructs behind them are not uncommon in many Scala Internal DSLs so we’ll take a brief look at some of them.

Identifiers in Scalawk (Internal Identifiers)

Some DSL expressions, such as arePresentedAs, need to make reference to AWK variables, declared by some other DSL expression. You could use strings to represent these internal identifiers. But having to surround our DSL identifiers in double quotes throws a bucket of cold water on its user experience, making the user conscious of the fact that she/he is actually using Scala instead of a crafted-from-scratch domain language.

Scala offers a mechanism to get unique objects for equal strings. That is exactly what a good DSL identifier needs.

If anyone writes:


… her/he will be obtaining a reference to a Symbol instance.  Symbol class has a name attribute whereby you can recover the character string used to obtain the instance.

The user would just write ‘counter and the DSL developer can obtain the string counter and use it for the internal representation of, in this case, the AWK variable.


By combining internal identifiers with ad-hoc classes and implicit conversions it isn’t difficult to express assignation sentences, event algebraic operations.

's := 's + 1

This article is already far too long and with what has been hinted and the tricks of previous sections it is actually easy to understand the code to build this kind of expressions. That code is located under entities package.  Take that package as if it were a embedded DSL within Scalawk, yes, a DSL embedded in a DSL which is in turn embedded in Scala.

Some final thoughts

Developing internal DSLs isn’t precisely a piece of cake. If being forced to fall back to the host language constructs, users will easily wake up from the dream of being using a language built for them from the ground up. Nobody likes to be reminded he/her is not that special.


You’ll encounter many pitfalls when trying to faithfully reproduce the state machine, the temptation to abandon the model is huge. Trust me, this is complex and, if you leave the path of the state machine, the ogres in the forest will have your heart for breakfast sooner than you imagine. Scala has proven it can offer a solution to any bump of the state machine road, outside that road you are free to face a minefield.

As a last piece of advice, I’d recommend you to buy a white/black board, a paper notebook and a good pen… whatever you feel confortable to draw on.

These are two examples of the early stages of design of Scalawk:



Think! draw! think! and draw again! so you are a better DSL architect than this guy…

… so your Neo(s) wouldn’t wake so easily.

Recursion recursive recursively.

Today we will discuss about recursion. Don’t you know what is the recursion? Don’t worry, now we are going to talk about recursion. Don’t you know what is the recursion? Don’t worry, now we are going to talk about recursion. Don’t you know what is the recursion?  Don’t worry, now we are going to talk about recursion. Do …. Well, as a joke it’s ok 🙂

Because Scala is a language that, despite being object-oriented, its true potential lies largely in its functional part, it is normal that recursion has a special importance. However, even more important, it is to generate the recursive calls properly to not have a stack overflow.

Stack Overflow: plus a pretty famous page, is what happens when we perform many recursive calls and, in consequence, the stack memory is overflowed.


In order to generate recursive calls properly it is necessary that the function will be tail-recursive. This means that the function needn’t to stored in memory the result of the recursive call. Thus, a stack overflow is not triggered. It can also be seen as a recursive function when the function only calls itself with different arguments. Examples of functions that do not meet the tail-recursive condition are those in which operations are performed with the results of the recursive calls. For example, the factorial function encoded on this way:

def factorial(n: Int): Int =
  if (n <= 1) 1
  else n * factorial(n - 1)

because the call is multiplied by n, it will not perform a tail-recursive recursion.

And how do you know if a function is tail-recursive? Scala makes the whole thing easy. We can add the tailRecursive annotation in this function so that if it has not been written on a tail-recursive way, Scala return us a compilation error. The compiler makes the work for us 🙂

Let’s see how we can build a function which returns the element n of the Fibonacci sequence (a tipical example). First without tail-recursive:

def fibonacci(n : Int) : Int = n match {
  case 0 | 1 => n
  case _ => fibonacci( n-1 ) + fibonacci( n-2 )

This returns a compilation error like this:

“could not optimize @tailrec annotated method fibonacci: it contains a recursive call not in tail position”

However, if we think a little more we can get it a function much efficient 🙂

def fibonacci(n: Int): Int = {

  def loop(i: Int, previous: Int, current: Int): Int =
    if (i == 0) previous
    else loop(i - 1, current, previous + current)

  loop(n, 0, 1)

In this case there will be no compilation error.


And that’s all. And that’s all. And that’s …

Scalera tip: Why ‘scala.util.Try’ doesn’t have ‘finally’ clause?

Some days ago, a work colleage raised the million-dollar question.

If we use the traditional java try, we could be handling some code similar to this:

val connection = database.getConnection()
var data: Seq[Data] = Seq()
try {
  val results = connection.query("select whatever")
  data =
} catch {
  case t: Throwable => logger.error("Oh noes!")
} finally {

In Scala, we have a more functional version of this mechanism: scala.util.Try.
The same example could be implemented by using this data type:

val connection = database.getConnection()
val data: Seq[Data] = Try{
  val results = connection.query("select whatever")
  val data: Seq[Data] =
} recover {
  case t: Throwable => 
    logger.error("Oh noes!")
} get

The question is, why doesn’t scala.util.Try even consider a finally clause like Java’s try?

Side effects….side effects everywhere…

If you remember the post where David talked about Try[T] data type, it’s a type that may have two different possible values Success(t: T) or Failure(t: Throwable).

On the other hand, if you remembet another post where we talked about vals and vars, we mentioned the referential transparency as principle that must be followed for considering a function to be pure.

So, if we test this principle with the previously described snippet, we could replace the Try[Seq[Data]] expression with the same type value that we would have got by evaluating the expression; and we should retrieve the same result. I.e.:

val connection = database.getConnection()
val data: Seq[Data] = 

We can see it hasn’t closed the connection that we opened before though…


For this reason, it makes more sense to code something like this:

val connection = database.getConnection()
val data: Seq[Data] = Try{
  val results = connection.query("select whatever")
} recover {
  case t: Throwable => 
} get

This way, the data value can be replaced easily, without any extra side effect implication.

…And for this reason, fellows, it doesn’t make sense to think about a finally clause for Try[T]! 🙂

Peace out!

Transforming the Future

A few weeks ago we talked about the type Future and its use to create asynchronous calls.

We saw how to work with blocking calls to obtain the value of the future. We also used callbacks in order to obtain the result of the future asynchronously. However, there are some issues that were left unsaid. And by that I’m referring to transforming the Future without blocking the execution.

Future transformations

In order to transform futures, as with other Scala basic types, mainly two methods are used: map and flatmap.

Map method

Map method allows us to change the content of a future by applying a function. For instance, if we have a method to get the first million prime numbers but we want to transform it to return just the first hundred ones, we can apply the map method in the following way:

def getFirstMillionOfPrimes(): Future[List[Int]] = ???

  (list: List[Int]) => list.take(100)

This way we will be transforming the inside of the future without breaking the asynchrony.


FlatMap method

On the other hand, the flatMap method allows us to apply a function to the content of the future and returning a future in turn. After that, a flatten operation is applied to convert the Future[Future[A]] into a simple Future[A]. What the f…? Better explained with an example.

Imagine we want to concatenate the first million prime numbers in a string. To do so, we’ll use a new method:

def concatenate(l: List[Int]): Future[String] = ???

and now we perform a  flatMap

  (list: List[Int]) => concatenate(list)
) //Future[String]

And how can we do all this in a more simple way?

Easy question. For comprehension to the rescue! With a spoonful of syntactic sugar we can write a much more readable code.

for {
  primes <- getFirstMillionOfPrimes()
  primesString <- concatenate(primes)
} yield primesString

This way, the concatenation operation won’t be applied until the prime numbers are obtained with the method getFirstMillionPrimes.

This allows us to keep an order when composing asynchronous calls. Besides, if the first asynchronous call fails, the second won’t be conducted.

And that’s all for today. Now you know how to change the future. What a shame not to be able to change the past 😦


See you soon!

Spark Streaming: stateful streams (Android vs IOS?)

If you remember the post where we talk about how to connect Spark Streaming and Twitter, we said that the limit for performing analytics was up to your imagination.

In this post we’re going to propose some example that works for illustrating the idea of keeping a state based on the defined stream feed. Speaking plainly, we’ll see how popular is some Twitter topic compared to another one: Android vs IOS.



The main idea is to have some S entity which state is fed by V-typed elements that are received and processed for each batch in the stream.

stateful stream

The way that the state is updated for a concrete instant Tn is by applying a user defined function that takes as parameters both the state that was at Tn-1 instant and the value set that provides the batch received at Tn instant:

updateState function (2)

A more casual notation for a common ‘scalero’ would be

type StateUpdate[S] = 
  (Option[S],Seq[V]) => Option[S]

Why Option[S]? For a good main reason: initially, when we first start listening on the stram, we don’t have any S state, so a (S,Seq[V]) => S function wouldn’t make sense.

And in practice?

Spark’s API for pair DStreams (PairDStreamFunctions) provides the following method for doing so:

def updateStateByKey[S]
  (updateFunc: (Seq[V], Option[S]) ⇒ Option[S])
  (implicit arg0: ClassTag[S]): DStream[(K, S)]

So for a DStream which is able to classify the elements by using a function that provides a key (we’ll see an example later on), we’ll be able to apply this method and get an S state (most of the cases, this state will refer to some aggregation over the values) for each key.

The method is overloaded, so you can specify a different partition apart from HashPartitioner, indicate a custom partition number or set an initial state (RDD[(K,S)]. Remember that, otherwise, initially our state for all K keys will be None).

The example

Let’s suppose we want to measure how strong is the rivalry between Android and IOS and we want to know which is the top trending topic at Twitter (in this example we won’t distinguish between against and in favour mentions).

Using the same project we proposed for the previous Spark + Twitter example, we’ll change the Boot.scala file so it looks like more to the following gist contents.

At first place, we have to enable the checkpointing dir and tweet filters that we are interested in:

  //  Set checkpoint dir


  //  Add filters ...

  val Android = "android"
  val IOS = "ios"


We’ll next group our tweets based on whether these tweets contain the ‘Android’ or ‘IOS’ filter (if the tweet contains both, it will be counted in both sides). The result we get is another DStream but that contains key-value pairs (filter-tweet):

val groupedTweets = stream.flatMap{content =>
  List(Android, IOS).flatMap(key =>
    if (content.getText.contains(key)) 
      Option(key -> content)
    else None)

Once we have grouped the tweets, we create a new DStream from the previous one by using the function we defined at the beginning of this post updateStateByKey, so the S state that we were talking about would be the sum of tweets for each key word:

val aggregatedTweets: DStream[(String,Long)] =
    (newTweets, previousState) =>
      val newTweetsAmount = newTweets.size.toLong
        Some(newTweetsAmount))(previousSize =>
        Some(previousSize + newTweetsAmount))

The only ‘tricky’ part to understand could be the fold code part, but it’s simple. It actually indicates that, in case of having a previous amount (previous state) we just add the new amount to it. Otherwise, we use the new amount.

Apart from this, we make our snippet work by printing these figures and we start listening at the stream:

//  And add actions to perform (like printing the aggregatedTweets) ...

aggregatedTweets.foreachRDD{ results =>
    case (team, amount) =>">>> $team : $amount")

// ... and begin listening


Can you think about another way to make the stream more interesting? Playing with tweets geo-location on a heat map, for example? 😉

Easy peasy!