Scala: One language to rule them all (II)

You wouldn’t put a layman under the controls of a brand new Airbus 380. Such a powerful tool requires its users to be trained; something similar happens with Scala when it is used to cast new rings of power, I mean, DSLs. Some theory and lots of learning by doing.  We’ve already seen a bit of theory and now is time to learn by building a DSL from scratch.

Our DSL’s target

Our brand new DSL is intended to serve as an didactic exercise. However, despite of its purpose, it needs to have a target. That is (or):

  • A process or system to govern.
  • Another language to serve as proxy of.

In our example we’ll tackle the second case.

Introducing AWK

Are you kidding? does it need introduction?
Let your unix terminal introduce it for me:

man awk | head -n 6

GAWK(1) Utility Commands GAWK(1)

NAME
gawk – pattern scanning and processing language

OK, Wikipedia seems a little more verbose:

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline

Consider these two heroes:

Gnu-and-penguin-color

Gnu has a belt you can’t see in this drawing and there he hides a powerful weapon: AWK.
It is so powerful because it allows many scripts, running  upon (and used to build) GNU/Linux/BSD distributions, to transform, filter, aggregate,… data from other commands outputs. Let’s see an example:

awk_sample

Here, the output generated by lsmod is piped to AWK whereby each line is processed by extracting its second column value which will be accumulated in a total byte counter. This way, when the output has been exhausted, total will be printed as the total amount of Kilo Bytes of memory used by Linux Kernel modules.

A hard nut to crack?

The applications of AWK are innumerable as well as the amount of time you can save once you have a grasp of it. However, for many people, it is more like…

6bkFb7B

… than a helping tool. Its 1475 lines of man pages aren’t precisely light reading.

It seem therefore that guiding an user through the composition of AWK programs could be of great help. Does this phrase ring a bell to you?

DSLs are small and concise which implies that they guide their users in the process of describing actions, entities or relations within their domain of application.

Yes! A Scala internal DSL could be used to build such a tool!

Hands on the DSL Scalawk construction

The most important thing in the programming language is the name…
Donald Erving Knuth

First things first and, if we take the argument from authority as a valid argument, we should start by giving a name to our DSL.

To be brief: We’ll compose AWK programs by the use of a Scala internal DSL,

Scala + AWK = Scalawk

So far so good!

You can already clone Scalawk source code GitHub

git clone https://github.com/pfcoperez/scalawk.git

Divide & Conquer

In the last episode we agreed that the safest way to design a DSL is by the use of the state machine model. These machines are easily decomposed into:

  • States (one of them is the machine initial state).
  • Transitions:
    • Inputs triggering the transition.
    • Transition effects:
      • The machine changes its current state.
      • Some output could be generated besides the state transition.
  • Machine alphabet: Machine input entities.

By drawing the state machine diagram, which is not different from designing an interactive guide, we’ll complete the whole creative process in the creation of the DSL. The remaining work isn’t more than beautiful Scala boilerplate.

machine

All possible users interaction with Scalawk are represented in the previous graph, e.g:

lines splitBy ";" arePresentedAs ('c1, 'c2)

machine_walkthrough

This machine modelling and decomposition leads to the following Scala packages structure:

packages

The nuts & bolts or: How I Learnt to Stop Worrying and Love the Building Blocks that Scala Provides

States, transitions and auxiliary elements are entities contained by the packages listed above. In fact, they are nothing but Scala objects, classes, methods and traits.

Initial State

initial_state

As we already know,  states are objects. Either they are instances of classes or singleton objects. On the other hand, we’ve also seen that the right way to implement state machines is to make these states immutable, being transitions responsible for new states generation.

The initial state ins’t generated by any transition, it exists as it is from the beginning of the program. That is a good indicator of its singleton nature which is definitely confirmed by the fact that no other instance of this initial state can exist:

object lines extends ToCommandWithSeparator

From the DSL user standpoint, this initial state should be just a word indicating the start of a phrase in the language. That’s another reason supporting the singleton object approach.

The initial state need to transit to the next one, that’s is why lines is extending ToCommandWithSeparator . Don’t rush, but keep in mind that ToCommandWithSeparator is transition set trait.

Transitory and final states

Yeah, states are objects… is that all? No!  There are different kinds of states, besides, many states are quite alike and they could be built from templates. Lets review some tricks and tips.

The top-level classification of non-initial states should be this one: Transitory and final. Conceptually, the former can’t be used to generate a result whereas the latter can. In the concrete case of Scalawk that implies that transitory states can’t generate valid AWK programs but final states can.

with_initialprogram_st
Non-final state
solidcomand_st
Final state

In Scalawk, any entity able to generate valid AWK code should mix AwkElement

trait AwkElement {
  def toAwk: String
}

By doing so, we are adding toAwk method to that entity, the entry point to ask for AWK code from client code.

Despite of their differences, almost all states share a common set of attributes from which AWK commands can be composed:

  • Command line options, e.g: Token separator
  • Initial program: Instructions to be run by AWK before starting line processing. e.g: Initialize a counter value.
  • Line program: Instructions to be executed for each line of the AWK input. e.g: Printing a line; adding a value to an accumulator initialized at the Initial Program.
  • End program: Instructions to be executed after all lines have been processed, that is, after Line Program has been run using each single input line as its own input. e.g: Printing counters values.

At each state, these fields might be empty or not and when a final state is asked to produce an AWK program, they will be used to generate the result string value.

abstract class AwkCommand protected(
  private[scalawk] val commandOptions: Seq[String] = Seq.empty,
  private[scalawk] val linePresentation: Seq[AwkExpression] = Seq.empty,
  private[scalawk] val lineProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val initialProgram: Seq[SideEffectStatement] = Seq.empty,
  private[scalawk] val endProgram: Seq[SideEffectStatement] = Seq.empty
) {
  def this(prev: AwkCommand) = this(
    prev.commandOptions,
    prev.linePresentation,
    prev.lineProgram,
    prev.initialProgram,
    prev.endProgram
  )
}

So far, we know that non-initial states:

  • For sure, contain the building blocks of a result and propagate the previous state contents for these fields:  Then they should extend AwkCommand abstract class.
  • Most probably, add or change some piece of information from the previous state AwkCommand attributes: Then they should override AwkCommand attributes.
  • Optionally, can transit to another state: If it is the case, they should have a method with the type of the target state as return value or mix  a transition family trait.

You might be thinking: Why is AwkCommand an abstract class and not a trait?
Well, AwkCommand‘s goal is to provide a re-usable code for continuity. That is, it provides the constructor to build a state from another state (prev parameter). This way, states code is reduced to just their transitions and AwkCommand attribute overrides but only for those attributes whose information is changing in the new state.

Obviously, the only way to provide a constructor in a class hierarchy is by providing a class, if this class can’t be instantiated: Make it abstract.

abstract_with_constructor

class CommandWithLineProgram(
                              statements: Seq[SideEffectStatement]
                            )(prev: AwkCommand) extends AwkCommand(prev)
  with ToSolidCommand {

  override private[scalawk]val lineProgram: Seq[SideEffectStatement] = statements

}

CommandWithLineProgram is a non-final state hence it doesn’t mix AwkElement trait.

//This is the first state which can be used to obtain an AWK command string `toAwk`
class SolidCommand(val lineResult: Seq[AwkExpression], prevSt: AwkCommand) extends AwkCommand(prevSt)
  with AwkElement
  with ToCommandWithLastAction {
 ...
 ...
 ...
}

On the contrary, SolidCommand does, therefore needs to provide an implementation to toAwk method:

 // AWK Program sections

// BEGIN
protected def beginBlock: String = programToBlock(initialProgram)

// Per-line
protected def eachLineActionBlock: String =
programToBlock(lineProgram ++ linePresentation.headOption.map(_ => Print(linePresentation)))


// END
protected def endBlock: String = programToBlock(endProgram)

protected def programToBlock(program: Seq[SideEffectStatement]) =
{program.map(_.toAwk) mkString "; "} +
program.headOption.map(_ => "; ").getOrElse("")

protected def optionsBlock: String =
{commandOptions mkString " "} + commandOptions.headOption.map(_ => " ").getOrElse("")

override def toAwk: String =
s"""|awk ${optionsBlock}'
|${identifyBlock("BEGIN", beginBlock)}
|${identifyBlock("", eachLineActionBlock)}
|${identifyBlock("END", endBlock)}'""".stripMargin.replace("\n", "")

//Auxialiary methods
private[this] def identifyBlock(blockName: String, blockAwkCode: String): String =
blockAwkCode.headOption.map(_ => s"$blockName{$blockAwkCode}").getOrElse("")

This class hierarchy enables code re-utilization, for example, SolidCommandWithLastAction is almost an exact copy of SolidCommand and nothing prevents us from extending it in order to define SolidCommandWithLastAction:

class SolidCommandWithLastAction(lastAction: Seq[SideEffectStatement])(prevSt: SolidCommand)
extends SolidCommand(prevSt) {...}

At this point, you should be able to start exploring the repository as well as to associate each state from the graph diagram with a state class in the code. Just in case, the following table collect these associations:

Graph Node

Entity

Is final state?

Entity Kind

init

lines

No

Singleton Object

command

CommandWithSeparator

No

Class

with line program

CommandWithLineProgram

No

Class

with initial program

CommandWithInitialProgram

No

Class

solid command

SolidCommand

Yes

Class

with last action

SolidCommandWithLastAction

Yes

Class

Transitions

Transitions between states are the easy part, they are as simple as methods returning new states. Thanks to Scala infix notation they create the illusion of natural language expressions, at least to some degree…

Some states might share transitions so it seems a good a idea to create traits containing them. By the use of mixing, states can thus use them as LEGO pieces in order to build their own transition set.

There are two special cases which deserve special attention: Groups of transitions and Empty input transitions.

Group of transitions…

… are composed by transitions which are always present together or which are different versions of the same transition. These are normally defined at the same trait named following the pattern To<STATE_CLASS_NAME>.

trait ToCommandWithSeparator {

  def splitBy(separator: String) = new CommandWithSeparator(separator)
  def splitBy(separator: Regex) =  new CommandWithSeparator(separator)

}

The example above is a rather clear case of two versions of the same transition: One receiving a string input and the other receiving a regular expression.

Note that, in relation with the abstract state machine, the machine input is both the method name and its parameters.

Empty input transitions

Consider the following transition, extracted from our state machine:

empty_transition

State machines can move from one state to another when the input is an empty string. It might seem bizarre but it can be done with our object modelling of state machines thanks to implicit conversions.

An implicit conversion from a state (Source) to another (Target) by just trying to access one of Target methods having a Source instance will be perceived by the DSL user as an empty transition. As simply as it sounds.

What is more, by just defining the implicit conversion at the companion object of either the Source or Target class/trait, it will be available in the scope of the sentence where the empty transition occurs. No need of imports which means: ABSOLUTELY TRANSPARENCY for the user.

Thus, the following code:

object ToCommandWithSeparator {
  implicit def toCommandWithSep(x: ToCommandWithSeparator) = new CommandWithSeparator()
}

… enables the transition described in the diagram below:

empty_transition2

VvqNk6o

– If ToCommandWithSeparator is a transition family trait, isn’t its equally named companion object the companion object of that transition family? Didn’t we set that the implicit conversion should be defined within Source or Target‘s companion object and, therefore, within a state class?

– Exactly! And what’s ToCommandWithSeparator‘s fate if not to be mixed at a state class definition?

In Scala, implicit conversions defined at a trait companion object will also apply for those classes extending or mixing that trait and they’ll be available on any scope where that class is available. This feature, besides being extremely useful, seems to be a very rational one: The class mixing/extending the trait can be regarded as a kind-of that trait, a subtype, so any of its instances are also of the type given by the trait and it should be expected that any conversion applying to that type could also apply to these instances.

Take, for example, the traits T and , both having companion objects where implicit conversions to and are respectively defined:

case class E(x: Int)
case class D(x: Int)

trait T
object T { implicit def toD(from: T): D = D(1) }

trait S
object S { implicit def toE(from: S): E = E(2) }

Mix both of them in the definition of the class C

case class C() extends T with S

… and check how a C instance is implicitly converted into instances of E or D.

scala> val c: C = C()
c: C = C()

scala> val d: D = c
d: D = D(1)
scala> val e: E = c
e: E = E(2)

Expressions

Most Scalawk transition inputs fit into the pattern transition name + basic type value. However, some of them receive expressions, identifiers or sentences. These are particular types created to express instructions and values within the AWK program. Hence they should not be part of a generic guide on how to create a DSL. Yet, the constructs behind them are not uncommon in many Scala Internal DSLs so we’ll take a brief look at some of them.

Identifiers in Scalawk (Internal Identifiers)

Some DSL expressions, such as arePresentedAs, need to make reference to AWK variables, declared by some other DSL expression. You could use strings to represent these internal identifiers. But having to surround our DSL identifiers in double quotes throws a bucket of cold water on its user experience, making the user conscious of the fact that she/he is actually using Scala instead of a crafted-from-scratch domain language.

Scala offers a mechanism to get unique objects for equal strings. That is exactly what a good DSL identifier needs.

If anyone writes:

'counter

… her/he will be obtaining a reference to a Symbol instance.  Symbol class has a name attribute whereby you can recover the character string used to obtain the instance.

The user would just write ‘counter and the DSL developer can obtain the string counter and use it for the internal representation of, in this case, the AWK variable.

Sentences

By combining internal identifiers with ad-hoc classes and implicit conversions it isn’t difficult to express assignation sentences, event algebraic operations.

's := 's + 1

This article is already far too long and with what has been hinted and the tricks of previous sections it is actually easy to understand the code to build this kind of expressions. That code is located under entities package.  Take that package as if it were a embedded DSL within Scalawk, yes, a DSL embedded in a DSL which is in turn embedded in Scala.

Some final thoughts

Developing internal DSLs isn’t precisely a piece of cake. If being forced to fall back to the host language constructs, users will easily wake up from the dream of being using a language built for them from the ground up. Nobody likes to be reminded he/her is not that special.

scala_has_you

You’ll encounter many pitfalls when trying to faithfully reproduce the state machine, the temptation to abandon the model is huge. Trust me, this is complex and, if you leave the path of the state machine, the ogres in the forest will have your heart for breakfast sooner than you imagine. Scala has proven it can offer a solution to any bump of the state machine road, outside that road you are free to face a minefield.

As a last piece of advice, I’d recommend you to buy a white/black board, a paper notebook and a good pen… whatever you feel confortable to draw on.

These are two examples of the early stages of design of Scalawk:

whiteboard

notebook

Think! draw! think! and draw again! so you are a better DSL architect than this guy…
BE-21-architect

… so your Neo(s) wouldn’t wake so easily.

Anuncios

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s