Parser combinators (briefly explained)

Parser combinators allow you to create complex parsers from simpler ones, by combining them as if they were arithmetic operands.

As very well defined at Wikipedia:

In functional programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

The arithmetic operators you can use are:

  • a ~ b parse a sequence made of a then b
  • a | b introduce an alternative parser that parses a or b
  • a? introduce an optional parser
  • a* introduce on optional and repeatable parser
  • a+ introduce a repeatable parser
  • a ~> b like ~ but ignore the left member (a)
  • a <~ b like ~ but ignore the right member (b)

And that’s pretty much everything you need to know about parser combinators!
A very simple parser could be:

import scala.util.parsing.combinator._
import scala.util.matching.Regex
val parser = new RegexParsers {
  def mySuperSimpleParser= """\s*_end_$""".r ^^ { _.toString }
}

(which would recognice any string that ends with zero or more blank spaces and _end_, and return that string)

Here, what is enclosed between the curly brackets is a semantic action that is going to be executed.

So basically

a ^^ f

will parse a, and apply function f to the result (semantic action)

To parse a string we would simply do this:

parse(myParser, "my input string           _end_") match {
  case Success(matched,_) =>
    matched match{
      case "hello"=>
        println "hello world"
      case listOfArgs@h::t=> 
        MyParserHelper.executeAction(listOfArgs)
      case _=> 
        println( s"string $matched has been recognized, but I don't know what to do with it" )
    }
  case Failure(msg,_) =>   
    println("FAILURE PARSING STRING " + msg)
}

And will define myParser right now.

So far, we haven’t combined mySuperSimpleParser with anything, so let’s make it a little bit more complicated:
A still simple example coud be somethig like this:

def myParser = 
  (pHelp | pThreeArgs) <~ mySuperSimpleParser

where mySuperSimpleParser would (as we saw before) match any string ending in zero or more blanks plus the “_end_” string at the end of the input, but wouldn’t do anything with it, and would continue on the left side on the input either parsing pHelp or pThreeArgs:
pHelp could be something like this:

def pHelp=  
  "(help|helpMe)".r ^^ { _=> HelpCommand }

HelpCommand is a case class that, for the sake of simplicity we will not define here, but we could also do something like:

def pHelp = 
  "(help|helpMe)".r ^^ { _=>"the recogniced command is help" }

Or even:

def pHelp = 
  "(help|helpMe)".r

We don’t even need to take any semantic action on it if we do not want to.

The only thing left now is the pThreeArgs parser. This parser will recognice a secuence of three words (first one word, then another, and then another) and put the three of them, sequencially into a list, and that list is what will be returned:

def pThreeArgs: Parser[List[Any]] = {
  var paramsCtx=mutable.ListBuffer[String]()
  pListParam(paramsCtx) ~> pListParam(paramsCtx) ~> pListParam(paramsCtx) ^^ {_.toList}
}

Note, that I created a mutable ListBuffer, which will act as a “global” context to each of the three parsers that we are combining, so that they only worry about parsing one parameter and storing it on the global context that they have received as a function parameter (these parsers are functions).
At the end, the whole List, containing the three parameters, will be returned 🙂

Can you think of a better (or more functional way of doing it? 😉 )
pListParam would be something like that:

def pListParam(l:mutable.ListBuffer[String])= 
  """\S+""".r ^^ {l+=_.toString}

For a more complete example, please look here

Anuncios

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s