Talking Meta

Meta talk about Smalltalk, Seaside, Magritte, Pier and related things.

Composite Grammars with PetitParser

In a previous post I described the basic principles of PetitParser and gave some introductory examples. In this blog post I am going to present a way to define more complicated grammars. We continue where we left off the last time, with the expression grammar.

Writing parsers as a script as we did in the previous post can be cumbersome, especially if grammar productions that are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser to the rescue.

Defining the Grammar

As an example let’s create a composite parser using the same expression grammar we built in the last blog post:

PPCompositeParser subclass: #ExpressionGrammar
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'

Again we start with the grammar for an integer number. Define the method number in ExpressionGrammar as follows:

ExpressionGrammar>>number
^ #digit asParser plus token trim ==> [ :token | token value asNumber ]

Every production in ExpressionGrammar is specified as a method that returns its parser. Productions refer to each other by reading the respective instance variable of the same name. This is important to be able to create recursive grammars. The instance variables themselves are typically not written to as PetitParser takes care to initialize them for you automatically.

Next we define the productions term, prod, and prim. Contrary to our previous implementation we do not define the production actions yet; and we factor out the parts for addition (add), multiplication (mul), and parenthesis (parens) into separate productions. This will give us better reusability later on. We let Pharo automatically add the necessary instance variables as we refer to them for the first time.

ExpressionGrammar>>term
^ add / prod
ExpressionGrammar>>add
^ prod , $+ asParser trim , term
ExpressionGrammar>>prod
^ mul / prim
ExpressionGrammar>>mul
^ prim , $* asParser trim , prod
ExpressionGrammar>>prim
^ parens / number
ExpressionGrammar>>parens
^ $( asParser trim , term , $) asParser trim

Last but not least we define the starting point of the expression grammar. This is done by overriding start in the ExpressionGrammar class:

ExpressionGrammar>>start
^ term end

Instantiating the ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:

parser := ExpressionGrammar new.
parser parse: '1 + 2 * 3'. " --> #(1 $+ #(2 $* 3)) "
parser parse: '(1 + 2) * 3'. " --> #(#($( #(1 $+ 2) $)) $* 3) "

Defining the Evaluator

Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar called ExpressionEvaluator

ExpressionGrammar subclass: #ExpressionEvaluator
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitTutorial'

and we redefine the implementation of add, mul and parens with our evaluation semantics:

ExpressionEvaluator>>add
^ super add ==> [ :nodes | nodes first + nodes last ]
ExpressionEvaluator>>mul
^ super mul ==> [ :nodes | nodes first * nodes last ]

ExpressionEvaluator>>parens
^ super parens ==> [ :nodes | nodes second ]

The evaluator is now ready to be tested:

parser := ExpressionEvaluator new.
parser parse: '1 + 2 * 3'. " --> 7 "
parser parse: '(1 + 2) * 3'. " --> 9 "

Similarly — as an exercise — a pretty printer can be defined by subclassing ExpressionGrammar and by redefining a few of its productions:

parser := ExpressionPrinter new.
parser parse: '1+2 *3'. " --> '1 + 2 * 3' "
parser parse: '(1+ 2 )* 3'. " --> '(1 + 2) * 3' "
Posted by Lukas Renggli at 27 November 2010, 3:14 pm with tags tutorial, petitparser, smalltalk, pharo comment link

PhD Defense Video & Slides

For those that missed my PhD defense, below the video recording of the event. Thanks go to Tudor Gîrba and Jorge Ressia for recording it. The slides can be found on SlideShare.

Posted by Lukas Renggli at 22 October 2010, 7:34 am with tags phd, helvetia, pharo, smalltalk comment link

PhD Defense on Dynamic Language Embedding

I would like to invite to my PhD defense:

Dynamic Language Embedding With Homogeneous Tool Support

Domain-specific languages (DSLs) are increasingly used as embedded languages within general-purpose host languages. DSLs provide a compact, dedicated syntax for specifying parts of an application related to specialized domains. Unfortunately, such language extensions typically do not integrate well with existing development tools. Editors, compilers and debuggers are either unaware of the extensions, or must be adapted at a non-trivial cost. Furthermore, these embedded languages typically conflict with the grammar of the host language and make it difficult to write hybrid code; few mechanisms exist to control the scope and usage of multiple tightly interconnected embedded languages.

In this dissertation we present Helvetia, a novel approach to embed languages into an existing host language by leveraging the underlying representation of the host language used by these tools. We introduce Language Boxes, an approach that offers a simple, modular mechanism to encapsulate (i) compositional changes to the host language, (ii) transformations to address various concerns such as compilation and syntax highlighting, and (iii) scoping rules to control visibility of fine-grained language changes. We describe the design and implementation of Helvetia and Language Boxes, discuss the required infrastructure of a host language enabling language embedding, and validate our approach by case studies that demonstrate different ways to extend or adapt the host language syntax and semantics.

The defense will take place on Wednesday, October 20, 2010 at 16h15 in Room 001, Engehaldenstrasse 8, Bern. For directions please see http://www.iam.unibe.ch/en/contact.

After the defense, there will be an apéro.

Posted by Lukas Renggli at 17 October 2010, 5:31 pm with tags phd, helvetia, pharo, smalltalk 5 comments link

New OmniBrowser Completion Dialog

The latest version of OmniBrowser includes a completion dialog that is used at various places to ease the use of the keyboard. This new dialog is currently used for the following features:

  • Finding Classes (Ctrl+F)
  • Finding Implementors (Ctrl+M) and Senders (Ctrl+N)
  • Creating and Renaming Categories and Protocols
  • Triggering Commands (Ctrl+Shift+T) (new)
  • Selecting Windows (Ctrl+Shift+W) (new)

The following video gives a quick overview of these features:

Posted by Lukas Renggli at 25 September 2010, 6:14 pm with tags omnibrowser, smalltalk 2 comments link

How does Monticello merge?

There is a repeated confusion on how merging in Monticello works. This post tries to clear up the situation. In the following examples we always start from the same situation displayed below. P is the name of the package we are working with and the number behind P.1 signifies a specific version of that package. The Working Copy is the code we currently have in our Smalltalk image.

 P.0 --- P.1 --- P.2 --- P.3 --- Working Copy
              \- P.6 --- P.7

This shows us that we currently have all the changes from P.0, P.1, P.2 and P.3 in our image. P.6 and P.7 is a separate branch that we do not have in our image.

Load

What happens if we load P.7? If there are any changes in our Working Copy we lose those changes after a warning. Afterwards P.7 is loaded and a new Working Copy is ready to be changed. After this load operation we end up with the following situation:

 P.0 --- P.1 --- P.2 --- P.3 
              \- P.6 --- P.7 --- Working Copy

Merge

What happens if we merge P.7? First of all Monticello tries to figure out the closest common ancestor of the Working Copy and P.7. In our examples this is obviously P.1.

Before performing the actual merge Monticello needs to load the code in P.1 into memory. This can cause an error if P.1 has been moved or deleted from its original repository. If you run into troubles, you need to make sure that P.1 is in a repository known to Monticello, otherwise an automatic merge cannot be performed.

To perform the actual merge Monticello calculates the delta between the common ancestor and the two versions to be merged:

  • D.1 is the delta from P.1 to Working Copy. D.1 is also called the local change, because it is the delta from the common ancestor to the working copy.
  • D.2 is the delta from P.1 to P.7. D.2 is also called the remote change, because it is the delta from the common ancestor to the version you merge.

Monticello then inspects the two deltas D.1 and D.2. Source elements (class definitions, method definitions, variable definitions, etc.) that are added, removed or changed in both deltas are a conflict. In the merge browser the user needs to resolve these conflicts before proceeding:

  • The Keep button chooses the remote change from D.2 and ignores the change in D.1.
  • The Reject button chooses the local change from D.1 and ignores the change in D.2.

Next Monticello creates a patch-set of D.2 with the conflicting elements resolved as specified by the user. It then applies this patch-set onto the working copy and adopts the ancestry as follows:

 P.0 --- P.1 --- P.2 --- P.3 --- Working Copy
              \- P.6 --- P.7 -/

The Working Copy has now two ancestors P.3 and P.7. In most cases the user will then commit the merged version to the repository. Afterwards the ancestry looks like this:

 P.0 --- P.1 --- P.2 --- P.3 --- P.8 --- Working Copy
              \- P.6 --- P.7 -/

The merge operation performed by Monticello is called a three-way-merge, because it takes three versions into account: the working copy, the version to merge, and the common ancestor of the two versions.

Manual Merge

What happens if we start from the same situation, but the version of the closest common ancestor P.1 cannot be found. Being able to load P.0 doesn’t help here, because that would cause all the changes from P.0 to P.1 to conflict. Furthermore, if any of the changes between P.0 to P.1 were undone in either of the branches these changes would be lost.

What can we do if the common ancestor is missing? The simples thing is to do a manual merge. This means we calculate the changes from the Working Copy to P.7. This gives us a combined delta D.3 that undoes the changes from P.1 to Working Copy and that adds the changes from P.1 to P.7. Unfortunately Monticello cannot tell us from which branch these changes are coming from, so we have to go through D.3 one-by-one and apply the changes that we think belong to the delta of P.1 to P.7. At the end of the manual merge we can tell Monticello to adopt the version P.7, so that the ancestry looks again like this:

 P.0 --- P.1 --- P.2 --- P.3 --- Working Copy
              \- P.6 --- P.7 -/
Posted by Lukas Renggli at 24 March 2010, 9:16 pm with tags monticello, pharo, smalltalk comment link
<< 1 2 3 4 5 6 7 8 9 10 >>