Scala: Code interpretation at runtime

With today’s post, we’ll dive into Scala code generation on the fly: at runtime. We have to be cautious of not mixing concepts with Scala macros, which generate code at compile-time. These make use of Scala type system, which is much safer than generating code at runtime.

House-of-cards-but-why

When is it useful to make use of this mechanism then? We’ll try to shed light on this by following a very simple example, getting abstract of real implementation (that you may find at Scalera’s Github).

The problem: «da» serializer

Let’s suppose a not-so-wild case, which consists on communicating two services via an event bus (we’ll get abstract of its implementation: it could be a message queue like Kafka, Akka streams, …).

The main idea is the following:

Sender-receiver-schema

The producer knows how to send and the consumer has an associated callback for message arrivals:

trait Producer{
  def produce(message: Any): Try[Unit]
}
trait Consumer{
  val consume: Any => Unit
}

Both producer and consumer services know the message types that may arrive. In our example, they could be one of these:

case class Foo(att1: Int, att2: String)
case class Bar(att1: String)

If the producer wants to send a message using the event bus, it will have to serialize it somehow (JSON, Byte array, XML, …) so, by the time it reaches the opposite end, the consumer will start the inverse process (deserialization) and will get the original message.

…nothing weird so far.

Whydoeseverythinghavetobesocomplicated

If we have a JSON serializer …

trait JsonSer[T] {
  def serialize(t: T): String
  def deserialize(json: String): T
}

and we serialize our message …

implicit val fooSerializer: JsonSer[Foo] = ???
val foo: Foo = ???
producer.send(implicitly[JsonSer[Foo]].serialize(foo))

How do we know which deserializer to use when the consumer gets the message?

Option 1: Try every possible serializer until one of them works

In our consumer, we’d have:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = {
    case message: String =>
      Seq(barSerializer, fooSerializer).flatMap { ser =>
        Try(ser.deserialize(message)).toOption
      }.headOption.fold(ifEmpty = println("Couldn't deserialize")) {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")
      }
  }
}

A lil’ bit coarse, right? If the proper serializer is the last of a 100 list, we would have tried and failed with 99 serializers before (Such a waste of CPU!).

4d8

Besides, we could also consider the case of having a different type deserializer, but it fits with the received message, so it would partially or wrongly deserialize the message.

Option 2: Add the message type to the message itself

We could add an extra layer wrapping the message for indicating the message type that it contains. This way, when receiving it, we could determine the serializer type we have to use.

MessageWrapper

For writing the type, we’ll make use of Scala’s TypeTags, getting info about the T contained message type.


//Wrapper for the message (and its serializer)

import scala.reflect.runtime.universe.{TypeTag, typeTag}

case class Message[T: TypeTag](content: T){
  val messageType: Message.Type = typeTag[T].tpe.toString
}
object Message {

  type Type = String

  def typeFrom(msg: String): Message.Type = ???

  implicit def messageSer[T:TypeTag:JsonSer]: JsonSer[Message[T]] = ???

}

//We'll make use of it for sending

producer.produce(messageSer[Foo].serialize(Message(foo)))

//And we redefine the consumer

lazy val consumer = new Consumer {
    override val consume: Any => Unit = {
      case message: String =>
        Message.typeFrom(message) match {

          case "org.scalera.reflect.runtime.Bar" =>
            println("it's a bar!")
            val value = messageSer[Bar].deserialize(message).content
            println(value.att1)

          case "org.scalera.reflect.runtime.Foo" =>
            val value = messageSer[Foo].deserialize(message).content
            println("it's a foo!")
            println(value.att2)

          case _ =>
            println("it's ... something!")
        }
    }
  }

As you can see, we don’t have to try every possible serializer anymore. Instead of that, from the message type we’ve extracted from the Message wrapper we have added, we’re able to use the proper deserializer.

But it is also true that we have to add an extra case for each string that represents the message type. Wouldn’t it be nice to deserialize somehow and to have the defined case only for the already deserialized objects? (Something similar what we first tried but without trying all posible serializers). Something like:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { msg =>
    genericDeserialize(msg) match {
      case f: Foo =>
      case b: Bar =>
      case _ =>
    }
  }
}

Option 2-cool: Serializers ‘under the hood’

For achieving something similar, we have to get focused on that genericDeserialize method: Which signature should it have? Initially, something like this:

def genericDeserialize(msg: String): Any

An Any? Seriously? My fellows, at runtime, we have no idea about the type we can get. We just know that, from a String, we’ll get ‘some….thing’. The match that applies to that Any will allow us to go from something totally abstract to more concrete types.

At this point is where both reflect library and Scala compiler appear.

reflect.Toolbox

The Toolbox API allows parsing strings and getting the resulting AST (abstract syntax tree). From that AST, it is able to evaluate the expression and returns an instance of an Any as well.

For instantiating a Toolbox and use type references, we hace to add as SBT dependencies the following:

libraryDependencies ++= Seq(
  "org.scala-lang" % "scala-compiler" % "2.11.8",
  "org.scala-lang" % "scala-reflect" % "2.11.8")

For example, if we wanted to parse the "2".toInt + 4 operation,

import scala.tools.reflect.ToolBox
import scala.reflect.runtime.{universe => ru}
import ru._

//  Scala compiler tool box
val tb = ru.runtimeMirror(
  this.getClass.getClassLoader).mkToolBox()

println(ru.showRaw(tb.parse("2".toInt + 4")))

we would get the abstract syntax tree generated as a String (by using showRaw):

Apply(Select(Select(Literal(Constant("2")), TermName("toInt")), TermName("$plus")), List(Literal(Constant(4))))

If we use the toolbox for evaluating the parsed expression,

println(tb.eval(tb.parse("2".toInt + 4")))

we’ll get an Any that represents the resulting value of the sum:

6

«Da» serializer

Once we’ve seen how it generally works, we apply the same principle to our serializer, so the expression we’re going to try to interpret is similar to:

{
  import scala.reflect._;
  import spray.json._;
  import org.scalera.reflect.runtime._;
  import MySprayJsonImplicits._;
  import MyJsonSerImplicits._;

  implicitly[JsonSer[Message[$messageType]]]
}

where MySprayJsonImplicits and MyJsonSerImplicits represent the objects that contain both the Spray implicits for JsonFormat and the JsonSer implicits that we have defined before.

$messageType represents the concrete type to deserialize that we would have got by using the TypeTag (as seen before).

If we adapt it to our code, we’ll get something similar to:

object GenSer {

  import scala.tools.reflect.ToolBox
  import scala.reflect.runtime.{universe => ru}
  import ru._

  //  Scala compiler tool box
  private val tb = ru.runtimeMirror(this.getClass.getClassLoader).mkToolBox()

  def genericDeserialize(msg: String)(serContainers: Seq[AnyRef]): Any = {

    val messageType = Message.typeFrom(msg)

    val serContainersImport = serContainers.map(container =>
      "import " + container.toString.split("\\$").head + "._").mkString(";\n")

    val expr =
      s"
         |{
         |  import scala.reflect._;
         |  import spray.json._;
         |  import org.scalera.reflect.runtime._;
         |  $serContainersImport;
         |
         |  implicitly[JsonSer[Message[$messageType]]]
         |}
        ".stripMargin

    tb.eval(tb.parse(expr))
      .asInstanceOf[JsonSer[Message[Any]]]
      .deserialize(msg).content
  }

}

If you take a look, we’ve empowered the generic deserialization method notation to hold a sequence of objects to import, so we won’t make explicit which object contains the Spray implicits and which one contains our JsonSer‘s.

val serContainersImport = serContainers.map(container =>
  "import " + container.toString.split("\\$").head + "._").mkString(";\n")

It is also noteworthy that, when deserializing, we get a Message[Any]; so we’ll have to get the ‘content’ field that represents the raw value of Any type that holds the deserialized message.

The result

So finally, we can now make use of our function for generically deserialize and let our consumer code be ‘swaggy’:

lazy val consumer = new Consumer {
  override val consume: Any => Unit = { 
    case msg: String =>
      genericDeserialize(msg)(Seq(case3,Message)) match {
        case bar: Bar => println("it's a bar!")
        case foo: Foo => println("it's a foo!")
        case _ => println("it's ... something!")
      }
  }
}

Conclusions

Being able to evaluate code at runtime let us do a lot of interesting things when we want to interpret String based types. However, these kind of techniques won’t care about the so worthy and safe Scala type system, so these are tools to be used with a lil’ bit of care.

It is also kind of expensive in time terms to evaluate these expressions. I would recommend to use a type cache in this case. Something really simple like a map:

type TypeName = String
var cache: Map[TypeName, JsonSer[Message[_]]]

And when invoking the method, we’ll check if the type evidence (JsonSer) we’re looking for is already stored in the map. If so, we’ll use it, otherwise, we’ll create and store it in the cache, using it as result.

post-28553-Steve-Jobs-mind-blown-gif-HD-T-pVbd

Easy peasy…
Peace out!

5 comentarios en “Scala: Code interpretation at runtime

    • Hi Mario. First of all, thanks for reading!
      In this post we just focused on the mkToolBox but you’re totally right.
      A malicious message type like

      «com.my.company.MessageType1;while(true){/* Let’s keep this thread busy …*/};implicitly[JsonSer[Message[com.my.company.MessageType1»

      could be used. For avoiding that, I would include some previous validation on the message type (not containing semi-colons, CRs, …)

      Thanks for your comment 🙂

      Me gusta

Deja un comentario