In a previous post I described the basic principles of PetitParser and gave some introductory examples. In this blog post I am going to present a way to define more complicated grammars. We continue where we left off the last time, with the expression grammar.
Writing parsers as a script as we did in the previous post can be cumbersome, especially if grammar productions that are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser
to the rescue.
Defining the Grammar
As an example let’s create a composite parser using the same expression grammar we built in the last blog post:
PPCompositeParser subclass: #ExpressionGrammar
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
Again we start with the grammar for an integer number. Define the method number
in ExpressionGrammar
as follows:
ExpressionGrammar>>number
^ #digit asParser plus token trim ==> [ :token | token value asNumber ]
Every production in ExpressionGrammar
is specified as a method that returns its parser. Productions refer to each other by reading the respective instance variable of the same name. This is important to be able to create recursive grammars. The instance variables themselves are typically not written to as PetitParser takes care to initialize them for you automatically.
Next we define the productions term
, prod
, and prim
. Contrary to our previous implementation we do not define the production actions yet; and we factor out the parts for addition (add
), multiplication (mul
), and parenthesis (parens
) into separate productions. This will give us better reusability later on. We let Pharo automatically add the necessary instance variables as we refer to them for the first time.
ExpressionGrammar>>term
^ add / prod
ExpressionGrammar>>add
^ prod , $+ asParser trim , term
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim , $* asParser trim , prod
ExpressionGrammar>>prim
^ parens / number
ExpressionGrammar>>parens
^ $( asParser trim , term , $) asParser trim
Last but not least we define the starting point of the expression grammar. This is done by overriding start
in the ExpressionGrammar
class:
ExpressionGrammar>>start
^ term end
Instantiating the ExpressionGrammar
gives us an expression parser that returns a default abstract-syntax tree:
parser := ExpressionGrammar new.
parser parse: '1 + 2 * 3'. " --> #(1 $+ #(2 $* 3)) "
parser parse: '(1 + 2) * 3'. " --> #(#($( #(1 $+ 2) $)) $* 3) "
Defining the Evaluator
Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar
called ExpressionEvaluator
ExpressionGrammar subclass: #ExpressionEvaluator
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'
and we redefine the implementation of add
, mul
and parens
with our evaluation semantics:
ExpressionEvaluator>>add
^ super add ==> [ :nodes | nodes first + nodes last ]
ExpressionEvaluator>>mul
^ super mul ==> [ :nodes | nodes first * nodes last ]
ExpressionEvaluator>>parens
^ super parens ==> [ :nodes | nodes second ]
The evaluator is now ready to be tested:
parser := ExpressionEvaluator new.
parser parse: '1 + 2 * 3'. " --> 7 "
parser parse: '(1 + 2) * 3'. " --> 9 "
Similarly — as an exercise — a pretty printer can be defined by subclassing ExpressionGrammar
and by redefining a few of its productions:
parser := ExpressionPrinter new.
parser parse: '1+2 *3'. " --> '1 + 2 * 3' "
parser parse: '(1+ 2 )* 3'. " --> '(1 + 2) * 3' "