Composite Grammars with PetitParser

In a previous post I described the basic principles of PetitParser and gave some introductory examples. In this blog post I am going to present a way to define more complicated grammars. We continue where we left off the last time, with the expression grammar.

Writing parsers as a script as we did in the previous post can be cumbersome, especially if grammar productions that are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser to the rescue.

Defining the Grammar

As an example let’s create a composite parser using the same expression grammar we built in the last blog post:

PPCompositeParser subclass: #ExpressionGrammar
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'

Again we start with the grammar for an integer number. Define the method number in ExpressionGrammar as follows:

ExpressionGrammar>>number
^ #digit asParser plus token trim ==> [ :token | token value asNumber ]

Every production in ExpressionGrammar is specified as a method that returns its parser. Productions refer to each other by reading the respective instance variable of the same name. This is important to be able to create recursive grammars. The instance variables themselves are typically not written to as PetitParser takes care to initialize them for you automatically.

Next we define the productions term, prod, and prim. Contrary to our previous implementation we do not define the production actions yet; and we factor out the parts for addition (add), multiplication (mul), and parenthesis (parens) into separate productions. This will give us better reusability later on. We let Pharo automatically add the necessary instance variables as we refer to them for the first time.

ExpressionGrammar>>term
^ add / prod
ExpressionGrammar>>add
^ prod , $+ asParser trim , term
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim , $* asParser trim , prod
ExpressionGrammar>>prim
^ parens / number
ExpressionGrammar>>parens
^ $( asParser trim , term , $) asParser trim

Last but not least we define the starting point of the expression grammar. This is done by overriding start in the ExpressionGrammar class:

ExpressionGrammar>>start
^ term end

Instantiating the ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:

parser := ExpressionGrammar new.
parser parse: '1 + 2 * 3'. " --> #(1 $+ #(2 $* 3)) "
parser parse: '(1 + 2) * 3'. " --> #(#($( #(1 $+ 2) $)) $* 3) "

Defining the Evaluator

Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar called ExpressionEvaluator

ExpressionGrammar subclass: #ExpressionEvaluator
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'

and we redefine the implementation of add, mul and parens with our evaluation semantics:

ExpressionEvaluator>>add
^ super add ==> [ :nodes | nodes first + nodes last ]
ExpressionEvaluator>>mul
^ super mul ==> [ :nodes | nodes first * nodes last ]

ExpressionEvaluator>>parens
^ super parens ==> [ :nodes | nodes second ]

The evaluator is now ready to be tested:

parser := ExpressionEvaluator new.
parser parse: '1 + 2 * 3'. " --> 7 "
parser parse: '(1 + 2) * 3'. " --> 9 "

Similarly — as an exercise — a pretty printer can be defined by subclassing ExpressionGrammar and by redefining a few of its productions:

parser := ExpressionPrinter new.
parser parse: '1+2 *3'. " --> '1 + 2 * 3' "
parser parse: '(1+ 2 )* 3'. " --> '(1 + 2) * 3' "
Posted by Lukas Renggli at 27 November 2010, 3:14 pm with tags tutorial, petitparser, smalltalk, pharo link