In a previous post I described the basic principles of PetitParser and gave some introductory examples. In this blog post I am going to present a way to define more complicated grammars. We continue where we left off the last time, with the expression grammar.
Writing parsers as a script as we did in the previous post can be cumbersome, especially if grammar productions that are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is
PPCompositeParser to the rescue.
As an example let’s create a composite parser using the same expression grammar we built in the last blog post:
PPCompositeParser subclass: #ExpressionGrammar
Again we start with the grammar for an integer number. Define the method
ExpressionGrammar as follows:
^ #digit asParser plus token trim ==> [ :token | token value asNumber ]
Every production in
ExpressionGrammar is specified as a method that returns its parser. Productions refer to each other by reading the respective instance variable of the same name. This is important to be able to create recursive grammars. The instance variables themselves are typically not written to as PetitParser takes care to initialize them for you automatically.
Next we define the productions
prim. Contrary to our previous implementation we do not define the production actions yet; and we factor out the parts for addition (
add), multiplication (
mul), and parenthesis (
parens) into separate productions. This will give us better reusability later on. We let Pharo automatically add the necessary instance variables as we refer to them for the first time.
^ add / prod
^ prod , $+ asParser trim , term
^ mul / prim
^ prim , $* asParser trim , prod
^ parens / number
^ $( asParser trim , term , $) asParser trim
Last but not least we define the starting point of the expression grammar. This is done by overriding
start in the
^ term end
ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:
parser := ExpressionGrammar new.
parser parse: '1 + 2 * 3'. " --> #(1 $+ #(2 $* 3)) "
parser parse: '(1 + 2) * 3'. " --> #(#($( #(1 $+ 2) $)) $* 3) "
Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of
ExpressionGrammar subclass: #ExpressionEvaluator
and we redefine the implementation of
parens with our evaluation semantics:
^ super add ==> [ :nodes | nodes first + nodes last ]
^ super mul ==> [ :nodes | nodes first * nodes last ]
^ super parens ==> [ :nodes | nodes second ]
The evaluator is now ready to be tested:
parser := ExpressionEvaluator new.
parser parse: '1 + 2 * 3'. " --> 7 "
parser parse: '(1 + 2) * 3'. " --> 9 "
Similarly — as an exercise — a pretty printer can be defined by subclassing
ExpressionGrammar and by redefining a few of its productions:
parser := ExpressionPrinter new.
parser parse: '1+2 *3'. " --> '1 + 2 * 3' "
parser parse: '(1+ 2 )* 3'. " --> '(1 + 2) * 3' "