Closed for vacations

As we’re getting used to it and, given how much prolific we are becoming when writting Scala posts, we think we deserve some rest.

friends-tv-show-bad-lessons-8

That’s why we will be off for some weeks, to recharge our batteries and get back full of energy for going on our labor 🙂

giphy

Peace out!

Anuncios

Scala, Scala and nothing more

After so many posts on Scala, many snippets of code, and many weird concepts, there comes a post that had been pending since almost the beginning of this blog. What resources exist to learn Scala? Well … now we show our favorites.

yoda_meme

Books: a good starting point 🙂

  • Programming in Scala: a classic. Written by the very Odersky and with the third edition hot off the press.
  • Funcional Programming in Scala (also known, in an effort of originality and simplicity, as the red book). A book that opens the mind to the functional paradigm and it is advisable not only for people who want to program in Scala, but for people who are interested in functional programming.

Courses: to deepen a bit more.

  • On the online courses, Coursera and the new specialization of functional programming in Scala they have all the monopoly. It consists of 4 courses + 1 project and discusses in depth the most important aspects of Scala.
  • Furthermore, scala-exercises allows us to practice in a faster way, but no less effective. Fully recommended.

Events: not only to drink beer  😉

  • Scala Days : Undoubtedly the most important annual convention about Scala. Since a year, there are two conventions at year: one in America and one in Europe. In addition, all the videos of the talks are published 🙂
  • Scala World: UK convention with a warm welcome and where you can hear and see some of the biggest Scala gurus.
  • Scala eXchange: another of the best known conventions. It takes place on December in London.

These are the resources that we know and that we found interesting. Without doubt, there are many more that we left unsaid. If you missing some resource on this list, any contribution is welcome 🙂

Scala: Code interpretation at runtime

With today’s post, we’ll dive into Scala code generation on the fly: at runtime. We have to be cautious of not mixing concepts with Scala macros, which generate code at compile-time. These make use of Scala type system, which is much safer than generating code at runtime.

House-of-cards-but-why

When is it useful to make use of this mechanism then? We’ll try to shed light on this by following a very simple example, getting abstract of real implementation (that you may find at Scalera’s Github).

The problem: “da” serializer

Let’s suppose a not-so-wild case, which consists on communicating two services via an event bus (we’ll get abstract of its implementation: it could be a message queue like Kafka, Akka streams, …).

The main idea is the following:

Sender-receiver-schema

The producer knows how to send and the consumer has an associated callback for message arrivals:

trait Producer{
  def produce(message: Any): Try[Unit]
}
trait Consumer{
  val consume: Any => Unit
}

Both producer and consumer services know the message types that may arrive. In our example, they could be one of these:

case class Foo(att1: Int, att2: String)
case class Bar(att1: String)

If the producer wants to send a message using the event bus, it will have to serialize it somehow (JSON, Byte array, XML, …) so, by the time it reaches the opposite end, the consumer will start the inverse process (deserialization) and will get the original message.

…nothing weird so far.

Whydoeseverythinghavetobesocomplicated

If we have a JSON serializer …

trait JsonSer[T] {
  def serialize(t: T): String
  def deserialize(json: String): T
}

and we serialize our message …

implicit val fooSerializer: JsonSer[Foo] = ???
val foo: Foo = ???
producer.send(implicitly[JsonSer[Foo]].serialize(foo))

How do we know which deserializer to use when the consumer gets the message?

Option 1: Try every possible serializer until one of them works

In our consumer, we’d have:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = {
    case message: String =>
      Seq(barSerializer, fooSerializer).flatMap { ser =>
        Try(ser.deserialize(message)).toOption
      }.headOption.fold(ifEmpty = println("Couldn't deserialize")) {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")
      }
  }
}

A lil’ bit coarse, right? If the proper serializer is the last of a 100 list, we would have tried and failed with 99 serializers before (Such a waste of CPU!).

4d8

Besides, we could also consider the case of having a different type deserializer, but it fits with the received message, so it would partially or wrongly deserialize the message.

Option 2: Add the message type to the message itself

We could add an extra layer wrapping the message for indicating the message type that it contains. This way, when receiving it, we could determine the serializer type we have to use.

MessageWrapper

For writing the type, we’ll make use of Scala’s TypeTags, getting info about the T contained message type.


//Wrapper for the message (and its serializer)

import scala.reflect.runtime.universe.{TypeTag, typeTag}

case class Message[T: TypeTag](content: T){
  val messageType: Message.Type = typeTag[T].tpe.toString
}
object Message {

  type Type = String

  def typeFrom(msg: String): Message.Type = ???

  implicit def messageSer[T:TypeTag:JsonSer]: JsonSer[Message[T]] = ???

}

//We'll make use of it for sending

producer.produce(messageSer[Foo].serialize(Message(foo)))

//And we redefine the consumer

lazy val consumer = new Consumer {
    override val consume: Any => Unit = {
      case message: String =>
        Message.typeFrom(message) match {

          case "org.scalera.reflect.runtime.Bar" =>
            println("it's a bar!")
            val value = messageSer[Bar].deserialize(message).content
            println(value.att1)

          case "org.scalera.reflect.runtime.Foo" =>
            val value = messageSer[Foo].deserialize(message).content
            println("it's a foo!")
            println(value.att2)

          case _ =>
            println("it's ... something!")
        }
    }
  }

As you can see, we don’t have to try every possible serializer anymore. Instead of that, from the message type we’ve extracted from the Message wrapper we have added, we’re able to use the proper deserializer.

But it is also true that we have to add an extra case for each string that represents the message type. Wouldn’t it be nice to deserialize somehow and to have the defined case only for the already deserialized objects? (Something similar what we first tried but without trying all posible serializers). Something like:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { msg =>
    genericDeserialize(msg) match {
      case f: Foo =>
      case b: Bar =>
      case _ =>
    }
  }
}

Option 2-cool: Serializers ‘under the hood’

For achieving something similar, we have to get focused on that genericDeserialize method: Which signature should it have? Initially, something like this:

def genericDeserialize(msg: String): Any

An Any? Seriously? My fellows, at runtime, we have no idea about the type we can get. We just know that, from a String, we’ll get ‘some….thing’. The match that applies to that Any will allow us to go from something totally abstract to more concrete types.

At this point is where both reflect library and Scala compiler appear.

reflect.Toolbox

The Toolbox API allows parsing strings and getting the resulting AST (abstract syntax tree). From that AST, it is able to evaluate the expression and returns an instance of an Any as well.

For instantiating a Toolbox and use type references, we hace to add as SBT dependencies the following:

libraryDependencies ++= Seq(
  "org.scala-lang" % "scala-compiler" % "2.11.8",
  "org.scala-lang" % "scala-reflect" % "2.11.8")

For example, if we wanted to parse the "2".toInt + 4 operation,

import scala.tools.reflect.ToolBox
import scala.reflect.runtime.{universe => ru}
import ru._

//  Scala compiler tool box
val tb = ru.runtimeMirror(
  this.getClass.getClassLoader).mkToolBox()

println(ru.showRaw(tb.parse("2".toInt + 4")))

we would get the abstract syntax tree generated as a String (by using showRaw):

Apply(Select(Select(Literal(Constant("2")), TermName("toInt")), TermName("$plus")), List(Literal(Constant(4))))

If we use the toolbox for evaluating the parsed expression,

println(tb.eval(tb.parse("2".toInt + 4")))

we’ll get an Any that represents the resulting value of the sum:

6

“Da” serializer

Once we’ve seen how it generally works, we apply the same principle to our serializer, so the expression we’re going to try to interpret is similar to:

{
  import scala.reflect._;
  import spray.json._;
  import org.scalera.reflect.runtime._;
  import MySprayJsonImplicits._;
  import MyJsonSerImplicits._;

  implicitly[JsonSer[Message[$messageType]]]
}

where MySprayJsonImplicits and MyJsonSerImplicits represent the objects that contain both the Spray implicits for JsonFormat and the JsonSer implicits that we have defined before.

$messageType represents the concrete type to deserialize that we would have got by using the TypeTag (as seen before).

If we adapt it to our code, we’ll get something similar to:

object GenSer {

  import scala.tools.reflect.ToolBox
  import scala.reflect.runtime.{universe => ru}
  import ru._

  //  Scala compiler tool box
  private val tb = ru.runtimeMirror(this.getClass.getClassLoader).mkToolBox()

  def genericDeserialize(msg: String)(serContainers: Seq[AnyRef]): Any = {

    val messageType = Message.typeFrom(msg)

    val serContainersImport = serContainers.map(container =>
      "import " + container.toString.split("\\$").head + "._").mkString(";\n")

    val expr =
      s"
         |{
         |  import scala.reflect._;
         |  import spray.json._;
         |  import org.scalera.reflect.runtime._;
         |  $serContainersImport;
         |
         |  implicitly[JsonSer[Message[$messageType]]]
         |}
        ".stripMargin

    tb.eval(tb.parse(expr))
      .asInstanceOf[JsonSer[Message[Any]]]
      .deserialize(msg).content
  }

}

If you take a look, we’ve empowered the generic deserialization method notation to hold a sequence of objects to import, so we won’t make explicit which object contains the Spray implicits and which one contains our JsonSer‘s.

val serContainersImport = serContainers.map(container =>
  "import " + container.toString.split("\\$").head + "._").mkString(";\n")

It is also noteworthy that, when deserializing, we get a Message[Any]; so we’ll have to get the ‘content’ field that represents the raw value of Any type that holds the deserialized message.

The result

So finally, we can now make use of our function for generically deserialize and let our consumer code be ‘swaggy’:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { 
    case msg: String =>
      genericDeserialize(msg)(Seq(case3,Message)) match {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")
      }
  }
}

Conclusions

Being able to evaluate code at runtime let us do a lot of interesting things when we want to interpret String based types. However, these kind of techniques won’t care about the so worthy and safe Scala type system, so these are tools to be used with a lil’ bit of care.

It is also kind of expensive in time terms to evaluate these expressions. I would recommend to use a type cache in this case. Something really simple like a map:

type TypeName = String
var cache: Map[TypeName, JsonSer[Message[_]]]

And when invoking the method, we’ll check if the type evidence (JsonSer) we’re looking for is already stored in the map. If so, we’ll use it, otherwise, we’ll create and store it in the cache, using it as result.

post-28553-Steve-Jobs-mind-blown-gif-HD-T-pVbd

Easy peasy…
Peace out!

Graffiti Rules: playing with JSON [Snow]

A few months ago we talked about Spray, a toolkit that allowed us to build REST APIs in an easy way with a pretty complete DSL.

One of the components belonging to this toolkit is spray-json. This module allows us to serialize and deserialize our objects to/from a JSON format. Today we’ll see how to work with it quickly and easily.

How to create serializers to different classes? Well, there are two options depending on how the class we want to serialize is defined.

Easy option: when we have a case class

Well, that’s a piece of cake. The only thing we need to use is the jsonFormatN() method where N represents the number of arguments of the case class apply method. Let’s see it with an example:

case class Character(name: String, family: String, isDead: Boolean)

object MyJsonProtocol extends DefaultJsonProtocol {

 implicit val characterFormat = jsonFormat3(Character.apply)

}

As you can see, in order to create a serializer for the Character case class, we create an implicit value with the help of jsonFormat3 (as it has 3 attributes). Such an implicit object will be defined inside a trait which will extend from DefaultJsonProtocol. In this trait, the serializers for Scala basic types (Int, String, Boolean…) are defined… Easy peasy.

tumblr_mfuulmem1p1qem4feo1_250

Less easy option: when we don’t have a case class or we want to serialize a case class in a different way.

In such a case, we’ll need to implement how the serialization and deserialization of the given type should be. To do so, we may need to create an implicit object extending from RootJsonFormat[T] where T is the type we want to serialize. Such a trait contains two methods that need to be implemented. On the one hand, there is the write method, which converts a typeT in a Json, and on the other a method read, performing the reverse process. We’ll now see an example with the same type as before:

class Character(val name: String, val family: String, val isDead: Boolean)

object MyJsonProtocol extends DefaultJsonProtocol {

  implicit object CharacterJsonFormat extends RootJsonFormat[Character] {

    def write(c: Character) =
      JsArray(JsString(c.name), JsString(c.family), JsBoolean(c.isDead))

    def read(value: JsValue) = value match {
      case JsArray(Vector(JsString(name), JsString(family), JsBoolean(isDead))) =>
        new Character(name, family, isDead.toBoolean)
      case _ => throw new DeserializationException("Character expected")
    }
  }
}

As can be observed, it is a bit more tedious but not too complicated.

And how do we use it? We just have to import the object where we have defined the implicits and call the methods toJson or convertTo[T].

import MyJsonProtocol._

val json = Character("Jon Snow", "Stark", ???).toJson //....You know nothing!!!!
// Returns {"name": "Jon Snow", "family": "Stark", "isDead": ???}
val jonSnow = json.convertTo[Character]

giphy

Besides, if we use Spray’s REST API, the transformation will be done in a transparent way and it won’t be necessary to call  toJson or convertTo explicitely. But we’ll leave that to other post. Next week we’ll be talking about this library so we’d better see something of it before 🙂

See you all!

Shapeless: polymorphic functions

Time ago, our friend Javier Fuentes illustrated us with an introduction to Shapeless.
Some months after that, at Scala Madrid meetup, he offered a pretty interesting speech about structural induction with Shapeless and HLists. We couldn’t avoid it and we got the enthusiasm flu 😛

What we want to achieve

Let’s set as our study case what I think more than one has thought before: how can I compose in the same for-comprehension different types. Something like:

import scala.util.Try

for {
  v1 <- List(1,2,3)
  v2 <- Try("hi, person ")
} yield s"$v2 $v1"

which usually comes from the hand of the following frustrating error:

<console>:15: error: type mismatch;
 found   : scala.util.Try[String]
 required: scala.collection.GenTraversableOnce[?]
         v2 <- Try("hi, person ")
            ^

Therefore we need a way to transform these data types (Future, Try) into iterable ‘stuff’ (GenTraversable[T] might work). In our example we won’t have in mind the information that we loose about the error that might happen, for example, if certain Try or Future expression has failed. For have a better understanding about the problem we present, let’s talk about some theory.

53132063

Monomorphism vs polymorphism

We say a method is monomorphic when you can only invoke it with parameters whose concrete type is explicitly declared in the method signature, whilst the polymorphic methods can take parameters of any type while it fits in the signature (in case of Scala language: parameter types). Speaking proper English:

def monomorphic(parameter: Int): String

def polymorphic[T](parameter: T): String

Polimorphism

It’s also good to know that a method can be polymorphic due to parameter types or just to parameter subtyping. E.g.:

def parametricallyPolymorphic[T](parameter: T): String

def subtypedPolymorphic(parameter: Animal): String

subtypedPolymorphic(new Cat)

If we use parameter types and we have NO information at all about those types, we are in front of a parametric polymorphism case.

If we use parameter types but we need any extra view / context bound for that type (T <: Whatever o T:TypeClass), then we are talking about ad-hoc polymorphism.

Problem: Function values

There’s not such a problem when using parametric polymorphism and methods but, what about function values? In Scala, it cannot be achieved and therefore it produces some lack of expressiveness:

val monomorphic: Int => String = _.toString

val anotherMonomorphic: List[Int] => Set[Int] = 
  _.toSet

Please, notice the definition of the function that trasforms a List into a Set. It could be totally independant of the list element type, but Scala syntax doesn’t allow to define something similar. We could try asigning the method to a val (Eta expansion):

def polymorphic[T](l: List[T]): Set[T] = l.toSet

val sadlyMonomorphic = polymorphic _

But the compiler (as clever as usual) wil infer that the list contained type is Nothing: a special one, but concrete as the most.

64331666

Shapeless parametric polymorphism

How does Shapeless solve this problem? If we had to define a transformation function from Option to List in Scala, we’d have the previously mentioned limitation for using function values and we could only achieve it by defining a method:

def toList[T](o: Option[T]): List[T] =
  o.toList

However, Shapeless, using its alchemy, provides us some ways to do so. In category theory, when talking about type constructors transformations, it’s so called natural transformations. First of them has the following notation:

import shapeless.poly._

val polyFunction = new (Option ~> List){
  def apply[T](f: Option[T]): List[T] = f.toList
}

If you have a closer look at it, the parametric polymorphism is moved to the method inside the object. Using this function is as simple as:

val result: List[Int] = polyFunction(Option(2))

The other possible notation consists on defining the function behavior based on cases, in other words, if we want the function to be defined only for Int, String and Boolean, we’ll add a case for each of them.

import shapeless.Poly1

object polymorphic extends Poly1 {

  implicit optionIntCase = 
    at[Option[Int]](_.toList.map(_ + 1))

  implicit optionStringCase = 
    at[Option[String]](_.toList.map(_ + " mutated"))

  implicit optionBooleanCase = 
    at[Option[Boolean]](_.toList.map(!_))

}

As you can see, if we want our function to be defined at the case where the input parameter is an Option[Int], we define that all contained elements in the list are added to 1.

This expression returns a this.Case[Option[Int]], where this refers to polymorphic object that we are defining:

implicit optionIntCase = 
  at[Option[Int]](_.toList.map(_ + 1))

The good part of this? In case we wanted to use the funcion on a input type that doesn’t have a defined case at the function, we’ll get a compile-time error (Awesome, right?):

The result

Applying this last way (based on cases), we get the expected result that we mentioned in the introductory section: to be able to use a for-comprehension for composing different typed values: iterables, Try, Future…

The proposed solution can be found in the following file.

In our function we have a case for GenTraversable, another for Try and Future (this last one needs to have its expression evaluated for being able to iterate over it, so we need a timeout for waiting):

object values extends Poly1 {

  implicit def genTraversableCase[T,C[_]](implicit ev: C[T] => GenTraversable[T]) = 
    at[C[T]](_.toStream)

  implicit def tryCase[T,C[_]](implicit ev: C[T] => Try[T]) = 
    at[C[T]](_.toOption.toStream)

  implicit def futureCase[T,C[_]](implicit ev: C[T] => Future[T], atMost: Duration = Duration.Inf) =
    at[C[T]](f => Try(Await.result(f,atMost)).toOption.toStream)

}

Now we can use it in our controversial code snippet:

import scala.concurrent.ExecutionContext.Implicits.global

case class User(name: String, age: Int)

val result: Stream[_] = for {
  v1 <- values(List(1,2,3))
  v2 <- values(Set("hi","bye"))
  v3 <- values(Option(true))
  v4 <- values(Try(2.0))
  v5 <- values(Future(User("john",15)))
} yield (v1,v2,v3,v4,v5)

The sole solution?

At all! you can always implement it using traditional Scala type classes, though it implies defining a trait that represent the ADT iterable. You can take a look at the example at the following gist content.

Peace out!

Scala: One language to rule them all (II)

You wouldn’t put a layman under the controls of a brand new Airbus 380. Such a powerful tool requires its users to be trained; something similar happens with Scala when it is used to cast new rings of power, I mean, DSLs. Some theory and lots of learning by doing.  We’ve already seen a bit of theory and now is time to learn by building a DSL from scratch.

Our DSL’s target

Our brand new DSL is intended to serve as an didactic exercise. However, despite of its purpose, it needs to have a target. That is (or):

  • A process or system to govern.
  • Another language to serve as proxy of.

In our example we’ll tackle the second case.

Introducing AWK

Are you kidding? does it need introduction?
Let your unix terminal introduce it for me:

man awk | head -n 6

GAWK(1) Utility Commands GAWK(1)

NAME
gawk – pattern scanning and processing language

OK, Wikipedia seems a little more verbose:

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline

Consider these two heroes:

Gnu-and-penguin-color

Gnu has a belt you can’t see in this drawing and there he hides a powerful weapon: AWK.
It is so powerful because it allows many scripts, running  upon (and used to build) GNU/Linux/BSD distributions, to transform, filter, aggregate,… data from other commands outputs. Let’s see an example:

awk_sample

Here, the output generated by lsmod is piped to AWK whereby each line is processed by extracting its second column value which will be accumulated in a total byte counter. This way, when the output has been exhausted, total will be printed as the total amount of Kilo Bytes of memory used by Linux Kernel modules.

A hard nut to crack?

The applications of AWK are innumerable as well as the amount of time you can save once you have a grasp of it. However, for many people, it is more like…

6bkFb7B

… than a helping tool. Its 1475 lines of man pages aren’t precisely light reading.

It seem therefore that guiding an user through the composition of AWK programs could be of great help. Does this phrase ring a bell to you?

DSLs are small and concise which implies that they guide their users in the process of describing actions, entities or relations within their domain of application.

Yes! A Scala internal DSL could be used to build such a tool!

Hands on the DSL Scalawk construction

The most important thing in the programming language is the name…
Donald Erving Knuth

First things first and, if we take the argument from authority as a valid argument, we should start by giving a name to our DSL.

To be brief: We’ll compose AWK programs by the use of a Scala internal DSL,

Scala + AWK = Scalawk

So far so good!

You can already clone Scalawk source code GitHub

git clone https://github.com/pfcoperez/scalawk.git

Divide & Conquer

In the last episode we agreed that the safest way to design a DSL is by the use of the state machine model. These machines are easily decomposed into:

  • States (one of them is the machine initial state).
  • Transitions:
    • Inputs triggering the transition.
    • Transition effects:
      • The machine changes its current state.
      • Some output could be generated besides the state transition.
  • Machine alphabet: Machine input entities.

By drawing the state machine diagram, which is not different from designing an interactive guide, we’ll complete the whole creative process in the creation of the DSL. The remaining work isn’t more than beautiful Scala boilerplate.

machine

All possible users interaction with Scalawk are represented in the previous graph, e.g:

lines splitBy ";" arePresentedAs ('c1, 'c2)

machine_walkthrough

This machine modelling and decomposition leads to the following Scala packages structure:

packages

The nuts & bolts or: How I Learnt to Stop Worrying and Love the Building Blocks that Scala Provides

States, transitions and auxiliary elements are entities contained by the packages listed above. In fact, they are nothing but Scala objects, classes, methods and traits.

Initial State

initial_state

As we already know,  states are objects. Either they are instances of classes or singleton objects. On the other hand, we’ve also seen that the right way to implement state machines is to make these states immutable, being transitions responsible for new states generation.

The initial state ins’t generated by any transition, it exists as it is from the beginning of the program. That is a good indicator of its singleton nature which is definitely confirmed by the fact that no other instance of this initial state can exist:

object lines extends ToCommandWithSeparator

From the DSL user standpoint, this initial state should be just a word indicating the start of a phrase in the language. That’s another reason supporting the singleton object approach.

The initial state need to transit to the next one, that’s is why lines is extending ToCommandWithSeparator . Don’t rush, but keep in mind that ToCommandWithSeparator is transition set trait.

Transitory and final states

Yeah, states are objects… is that all? No!  There are different kinds of states, besides, many states are quite alike and they could be built from templates. Lets review some tricks and tips.

The top-level classification of non-initial states should be this one: Transitory and final. Conceptually, the former can’t be used to generate a result whereas the latter can. In the concrete case of Scalawk that implies that transitory states can’t generate valid AWK programs but final states can.

with_initialprogram_st
Non-final state
solidcomand_st
Final state

In Scalawk, any entity able to generate valid AWK code should mix AwkElement

trait AwkElement {
  def toAwk: String
}

By doing so, we are adding toAwk method to that entity, the entry point to ask for AWK code from client code.

Despite of their differences, almost all states share a common set of attributes from which AWK commands can be composed:

  • Command line options, e.g: Token separator
  • Initial program: Instructions to be run by AWK before starting line processing. e.g: Initialize a counter value.
  • Line program: Instructions to be executed for each line of the AWK input. e.g: Printing a line; adding a value to an accumulator initialized at the Initial Program.
  • End program: Instructions to be executed after all lines have been processed, that is, after Line Program has been run using each single input line as its own input. e.g: Printing counters values.

At each state, these fields might be empty or not and when a final state is asked to produce an AWK program, they will be used to generate the result string value.

abstract class AwkCommand protected(
  private[scalawk] val commandOptions: Seq[String] = Seq.empty,
  private[scalawk] val linePresentation: Seq[AwkExpression] = Seq.empty,
  private[scalawk] val lineProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val initialProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val endProgram: Seq[SideEffectStatement] = Seq.empty
) {
  def this(prev: AwkCommand) = this(
    prev.commandOptions,
    prev.linePresentation,
    prev.lineProgram,
    prev.initialProgram,
    prev.endProgram
  )
}

So far, we know that non-initial states:

  • For sure, contain the building blocks of a result and propagate the previous state contents for these fields:  Then they should extend AwkCommand abstract class.
  • Most probably, add or change some piece of information from the previous state AwkCommand attributes: Then they should override AwkCommand attributes.
  • Optionally, can transit to another state: If it is the case, they should have a method with the type of the target state as return value or mix  a transition family trait.

You might be thinking: Why is AwkCommand an abstract class and not a trait?
Well, AwkCommand‘s goal is to provide a re-usable code for continuity. That is, it provides the constructor to build a state from another state (prev parameter). This way, states code is reduced to just their transitions and AwkCommand attribute overrides but only for those attributes whose information is changing in the new state.

Obviously, the only way to provide a constructor in a class hierarchy is by providing a class, if this class can’t be instantiated: Make it abstract.

abstract_with_constructor

class CommandWithLineProgram(
                              statements: Seq[SideEffectStatement]
                            )(prev: AwkCommand) extends AwkCommand(prev)
  with ToSolidCommand {

  override private[scalawk]val lineProgram: Seq[SideEffectStatement] = statements

}

CommandWithLineProgram is a non-final state hence it doesn’t mix AwkElement trait.

//This is the first state which can be used to obtain an AWK command string `toAwk`
class SolidCommand(val lineResult: Seq[AwkExpression], prevSt: AwkCommand) extends AwkCommand(prevSt)
  with AwkElement
  with ToCommandWithLastAction {
 ...
 ...
 ...
}

On the contrary, SolidCommand does, therefore needs to provide an implementation to toAwk method:

 // AWK Program sections

// BEGIN
protected def beginBlock: String = programToBlock(initialProgram)

// Per-line
protected def eachLineActionBlock: String =
programToBlock(lineProgram ++ linePresentation.headOption.map(_ => Print(linePresentation)))


// END
protected def endBlock: String = programToBlock(endProgram)

protected def programToBlock(program: Seq[SideEffectStatement]) =
{program.map(_.toAwk) mkString "; "} +
program.headOption.map(_ => "; ").getOrElse("")

protected def optionsBlock: String =
{commandOptions mkString " "} + commandOptions.headOption.map(_ => " ").getOrElse("")

override def toAwk: String =
s"""|awk ${optionsBlock}'
|${identifyBlock("BEGIN", beginBlock)}
|${identifyBlock("", eachLineActionBlock)}
|${identifyBlock("END", endBlock)}'""".stripMargin.replace("\n", "")

//Auxialiary methods
private[this] def identifyBlock(blockName: String, blockAwkCode: String): String =
blockAwkCode.headOption.map(_ => s"$blockName{$blockAwkCode}").getOrElse("")

This class hierarchy enables code re-utilization, for example, SolidCommandWithLastAction is almost an exact copy of SolidCommand and nothing prevents us from extending it in order to define SolidCommandWithLastAction:

class SolidCommandWithLastAction(lastAction: Seq[SideEffectStatement])(prevSt: SolidCommand)
extends SolidCommand(prevSt) {...}

At this point, you should be able to start exploring the repository as well as to associate each state from the graph diagram with a state class in the code. Just in case, the following table collect these associations:

Graph Node

Entity

Is final state?

Entity Kind

init

lines

No

Singleton Object

command

CommandWithSeparator

No

Class

with line program

CommandWithLineProgram

No

Class

with initial program

CommandWithInitialProgram

No

Class

solid command

SolidCommand

Yes

Class

with last action

SolidCommandWithLastAction

Yes

Class

Transitions

Transitions between states are the easy part, they are as simple as methods returning new states. Thanks to Scala infix notation they create the illusion of natural language expressions, at least to some degree…

Some states might share transitions so it seems a good a idea to create traits containing them. By the use of mixing, states can thus use them as LEGO pieces in order to build their own transition set.

There are two special cases which deserve special attention: Groups of transitions and Empty input transitions.

Group of transitions…

… are composed by transitions which are always present together or which are different versions of the same transition. These are normally defined at the same trait named following the pattern To<STATE_CLASS_NAME>.

trait ToCommandWithSeparator {

  def splitBy(separator: String) = new CommandWithSeparator(separator)
  def splitBy(separator: Regex) =  new CommandWithSeparator(separator)

}

The example above is a rather clear case of two versions of the same transition: One receiving a string input and the other receiving a regular expression.

Note that, in relation with the abstract state machine, the machine input is both the method name and its parameters.

Empty input transitions

Consider the following transition, extracted from our state machine:

empty_transition

State machines can move from one state to another when the input is an empty string. It might seem bizarre but it can be done with our object modelling of state machines thanks to implicit conversions.

An implicit conversion from a state (Source) to another (Target) by just trying to access one of Target methods having a Source instance will be perceived by the DSL user as an empty transition. As simply as it sounds.

What is more, by just defining the implicit conversion at the companion object of either the Source or Target class/trait, it will be available in the scope of the sentence where the empty transition occurs. No need of imports which means: ABSOLUTELY TRANSPARENCY for the user.

Thus, the following code:

object ToCommandWithSeparator {
  implicit def toCommandWithSep(x: ToCommandWithSeparator) = new CommandWithSeparator()
}

… enables the transition described in the diagram below:

empty_transition2

VvqNk6o

– If ToCommandWithSeparator is a transition family trait, isn’t its equally named companion object the companion object of that transition family? Didn’t we set that the implicit conversion should be defined within Source or Target‘s companion object and, therefore, within a state class?

– Exactly! And what’s ToCommandWithSeparator‘s fate if not to be mixed at a state class definition?

In Scala, implicit conversions defined at a trait companion object will also apply for those classes extending or mixing that trait and they’ll be available on any scope where that class is available. This feature, besides being extremely useful, seems to be a very rational one: The class mixing/extending the trait can be regarded as a kind-of that trait, a subtype, so any of its instances are also of the type given by the trait and it should be expected that any conversion applying to that type could also apply to these instances.

Take, for example, the traits T and , both having companion objects where implicit conversions to and are respectively defined:

case class E(x: Int)
case class D(x: Int)

trait T
object T { implicit def toD(from: T): D = D(1) }

trait S
object S { implicit def toE(from: S): E = E(2) }

Mix both of them in the definition of the class C

case class C() extends T with S

… and check how a C instance is implicitly converted into instances of E or D.

scala> val c: C = C()
c: C = C()

scala> val d: D = c
d: D = D(1)
scala> val e: E = c
e: E = E(2)

Expressions

Most Scalawk transition inputs fit into the pattern transition name + basic type value. However, some of them receive expressions, identifiers or sentences. These are particular types created to express instructions and values within the AWK program. Hence they should not be part of a generic guide on how to create a DSL. Yet, the constructs behind them are not uncommon in many Scala Internal DSLs so we’ll take a brief look at some of them.

Identifiers in Scalawk (Internal Identifiers)

Some DSL expressions, such as arePresentedAs, need to make reference to AWK variables, declared by some other DSL expression. You could use strings to represent these internal identifiers. But having to surround our DSL identifiers in double quotes throws a bucket of cold water on its user experience, making the user conscious of the fact that she/he is actually using Scala instead of a crafted-from-scratch domain language.

Scala offers a mechanism to get unique objects for equal strings. That is exactly what a good DSL identifier needs.

If anyone writes:

'counter

… her/he will be obtaining a reference to a Symbol instance.  Symbol class has a name attribute whereby you can recover the character string used to obtain the instance.

The user would just write ‘counter and the DSL developer can obtain the string counter and use it for the internal representation of, in this case, the AWK variable.

Sentences

By combining internal identifiers with ad-hoc classes and implicit conversions it isn’t difficult to express assignation sentences, event algebraic operations.

's := 's + 1

This article is already far too long and with what has been hinted and the tricks of previous sections it is actually easy to understand the code to build this kind of expressions. That code is located under entities package.  Take that package as if it were a embedded DSL within Scalawk, yes, a DSL embedded in a DSL which is in turn embedded in Scala.

Some final thoughts

Developing internal DSLs isn’t precisely a piece of cake. If being forced to fall back to the host language constructs, users will easily wake up from the dream of being using a language built for them from the ground up. Nobody likes to be reminded he/her is not that special.

scala_has_you

You’ll encounter many pitfalls when trying to faithfully reproduce the state machine, the temptation to abandon the model is huge. Trust me, this is complex and, if you leave the path of the state machine, the ogres in the forest will have your heart for breakfast sooner than you imagine. Scala has proven it can offer a solution to any bump of the state machine road, outside that road you are free to face a minefield.

As a last piece of advice, I’d recommend you to buy a white/black board, a paper notebook and a good pen… whatever you feel confortable to draw on.

These are two examples of the early stages of design of Scalawk:

whiteboard

notebook

Think! draw! think! and draw again! so you are a better DSL architect than this guy…
BE-21-architect

… so your Neo(s) wouldn’t wake so easily.

Recursion recursive recursively.

Today we will discuss about recursion. Don’t you know what is the recursion? Don’t worry, now we are going to talk about recursion. Don’t you know what is the recursion? Don’t worry, now we are going to talk about recursion. Don’t you know what is the recursion?  Don’t worry, now we are going to talk about recursion. Do …. Well, as a joke it’s ok 🙂

Because Scala is a language that, despite being object-oriented, its true potential lies largely in its functional part, it is normal that recursion has a special importance. However, even more important, it is to generate the recursive calls properly to not have a stack overflow.

Stack Overflow: plus a pretty famous page, is what happens when we perform many recursive calls and, in consequence, the stack memory is overflowed.

szpjhwz

In order to generate recursive calls properly it is necessary that the function will be tail-recursive. This means that the function needn’t to stored in memory the result of the recursive call. Thus, a stack overflow is not triggered. It can also be seen as a recursive function when the function only calls itself with different arguments. Examples of functions that do not meet the tail-recursive condition are those in which operations are performed with the results of the recursive calls. For example, the factorial function encoded on this way:

def factorial(n: Int): Int =
  if (n <= 1) 1
  else n * factorial(n - 1)

because the call is multiplied by n, it will not perform a tail-recursive recursion.

And how do you know if a function is tail-recursive? Scala makes the whole thing easy. We can add the tailRecursive annotation in this function so that if it has not been written on a tail-recursive way, Scala return us a compilation error. The compiler makes the work for us 🙂

Let’s see how we can build a function which returns the element n of the Fibonacci sequence (a tipical example). First without tail-recursive:

@annotation.tailrec
def fibonacci(n : Int) : Int = n match {
  case 0 | 1 => n
  case _ => fibonacci( n-1 ) + fibonacci( n-2 )
}

This returns a compilation error like this:

“could not optimize @tailrec annotated method fibonacci: it contains a recursive call not in tail position”

However, if we think a little more we can get it a function much efficient 🙂

def fibonacci(n: Int): Int = {

  @annotation.tailrec
  def loop(i: Int, previous: Int, current: Int): Int =
    if (i == 0) previous
    else loop(i - 1, current, previous + current)

  loop(n, 0, 1)
}

In this case there will be no compilation error.

28b5865ebe59280d5c3ed18fc1147964309d9d0c81663c0c3f81d42fc5979c8f

And that’s all. And that’s all. And that’s …