Meta talk about Smalltalk, Seaside, Magritte, Pier and related things.

Composite Grammars with PetitParser

In a previous post I described the basic principles of PetitParser and gave some introductory examples. In this blog post I am going to present a way to define more complicated grammars. We continue where we left off the last time, with the expression grammar.

Writing parsers as a script as we did in the previous post can be cumbersome, especially if grammar productions that are mutually recursive and refer to each other in complicated ways. Furthermore a grammar specified in a single script makes it unnecessary hard to reuse specific parts of that grammar. Luckily there is PPCompositeParser to the rescue.

Defining the Grammar

As an example let’s create a composite parser using the same expression grammar we built in the last blog post:

PPCompositeParser subclass: #ExpressionGrammar
   instanceVariableNames: ''
   classVariableNames: ''
   poolDictionaries: ''
   category: 'PetitTutorial'

Again we start with the grammar for an integer number. Define the method number in ExpressionGrammar as follows:

ExpressionGrammar>>number
   ^ #digit asParser plus token trim ==> [ :token | token value asNumber ]

Every production in ExpressionGrammar is specified as a method that returns its parser. Productions refer to each other by reading the respective instance variable of the same name. This is important to be able to create recursive grammars. The instance variables themselves are typically not written to as PetitParser takes care to initialize them for you automatically.

Next we define the productions term, prod, and prim. Contrary to our previous implementation we do not define the production actions yet; and we factor out the parts for addition (add), multiplication (mul), and parenthesis (parens) into separate productions. This will give us better reusability later on. We let Pharo automatically add the necessary instance variables as we refer to them for the first time.

ExpressionGrammar>>term
   ^ add / prod

ExpressionGrammar>>add
   ^ prod , $+ asParser trim , term

ExpressionGrammar>>prod
   ^ mul / prim

ExpressionGrammar>>mul
   ^ prim , $* asParser trim , prod

ExpressionGrammar>>prim
   ^ parens / number

ExpressionGrammar>>parens
   ^ $( asParser trim , term , $) asParser trim

Last but not least we define the starting point of the expression grammar. This is done by overriding start in the ExpressionGrammar class:

ExpressionGrammar>>start
   ^ term end

Instantiating the ExpressionGrammar gives us an expression parser that returns a default abstract-syntax tree:

parser := ExpressionGrammar new.
parser parse: '1 + 2 * 3'.       " --> #(1 $+ #(2 $* 3)) "
parser parse: '(1 + 2) * 3'.     " --> #(#($( #(1 $+ 2) $)) $* 3) "

Defining the Evaluator

Now that we have defined a grammar we can reuse this definition to implement an evaluator. To do this we create a subclass of ExpressionGrammar called ExpressionEvaluator

ExpressionGrammar subclass: #ExpressionEvaluator
   instanceVariableNames: ''
   classVariableNames: ''
   poolDictionaries: ''
   category: 'PetitTutorial'

and we redefine the implementation of add, mul and parens with our evaluation semantics:

ExpressionEvaluator>>add
   ^ super add ==> [ :nodes | nodes first + nodes last ]

ExpressionEvaluator>>mul
   ^ super mul ==> [ :nodes | nodes first * nodes last ]

ExpressionEvaluator>>parens
   ^ super parens ==> [ :nodes | nodes second ]

The evaluator is now ready to be tested:

parser := ExpressionEvaluator new.
parser parse: '1 + 2 * 3'.       " --> 7 "
parser parse: '(1 + 2) * 3'.     " --> 9 "

Similarly — as an exercise — a pretty printer can be defined by subclassing ExpressionGrammar and by redefining a few of its productions:

parser := ExpressionPrinter new.
parser parse: '1+2 *3'.          " --> '1 + 2 * 3' "
parser parse: '(1+ 2 )* 3'.      " --> '(1 + 2) * 3' "

Posted by Lukas Renggli at 27 November 2010, 3:14 pm with tags tutorial, petitparser, smalltalk, pharo comment link

More Filesystems

In my recent post I’ve mentioned that the Filesystem library can work on different kinds of filesystems. In this post I am going to walk you through the supported filesystems one by one.

Before you proceed with this hands-on blog post, please update to a patched and unofficial version of the Filesystem package. In the meantime some bugs and minor issues got fixed that are not integrated into the official release yet.

 Gofer new
    renggli: 'fs';
    package: 'Filesystem';
    load.

Disk Filesystem

The Disk Filesystem is implemented in FSDiskFilesystem and its platform specific subclasses. As we have seen in the last post the singleton filesystem instance for the current platform can be retrieved using:

 disk := FSDiskFilesystem current.

Subclasses of FSFilesystem implement many methods, but we should resist from calling most of them directly. These methods implement the low-level behavior of the respective file-systems and are private to the framework.

As we have learned in the previous post we should only work with references. Filesystem instances know two methods that return an FSReference object. This is true not only for the disk filesystem, but also for all other filesystem types presented later on:

 disk root.                               " a reference to the root directory "
 disk working.                            " a reference to the working directory "

Given a reference we can navigate to another reference in the filesystem with the method #resolve:. Resolving works similar the command cd (change directory) on Unix and Windows and returns a new reference to a file or directory. Print the result of evaluating the following expressions:

 disk working resolve: '/'.               " the same as 'disk root' "
 disk working resolve: '.'.               " the same as 'disk working' "
 disk working resolve: '/home/renggli'.   " an absolute path to a directory or file "
 disk working resolve: '../bar'.          " a relative path from the working directory "

Note that the message #resolve: is also understood by FSFilesystem itself. Do not call this method though, it is private and does not return an FSReference as you might expect.

Memory Filesystem

The memory filesystem is another simple filesystem type. Think of it as a virtual in-memory filesystem, very much like a RAM disk that lives in your Smalltalk image. The use of a memory filesystem can be very convenient for testing your filesystem code, because it does not pollute your hard disk and is garbage collected as soon as you do not reference it any longer. To instantiate a memory filesystem evaluate:

 memory := FSMemoryFilesystem new.

On this filesystem you can do everything you learned before. A memory filesystem is initially empty:

 memory root children size.               " --> 0 "

To create a file we can use the same techniques we learned previously:

 (memory root / 'foo.txt')
     writeStreamDo: [ :stream | stream nextPutAll: 'Hey Memory' ].
 (memory root / 'foo.txt') exists.        " --> true "

We can also copy files from a different filesystem to our memory filesystem:

 cache := disk working / 'package-cache'.
 cache copyAllTo: memory root.

The above code copies all the files in the package cache of your Pharo installation to the memory filesystem. Before you try it out make sure that you don’t have your MP3 collection in that directory, otherwise your image might blow up.

In my case I have now 64 files in the memory filesystem:

 memory root children size.               " --> 64 "

As you would expect we can perform other operations on our virtual filesystem, for example delete all the files that start with the letter F:

 memory root children do: [ :reference |
    reference basename first = $F
       ifTrue: [ reference delete ] ].

As you see, there is nothing special about a memory filesystem. It behaves and understands exactly the same messages as the disk filesystem does.

ZIP Filesystem

The ZIP filesystem represents a ZIP Archive that resides on another filesystem. To create a new archive instantiate the ZIP filesystem with a reference of a ZIP archive:

 zip := FSZipFilesystem atReference: disk working / 'cache.zip'.

Contrary to other filesystems a ZIP filesystem needs to be opened (and closed) explicitly:

 zip open.

Apart from that, the ZIP filesystem behaves exactly the same way as the other filesystems we learned up to now. To copy the contents of the memory filesystem to the ZIP archive we can evaluate the following code:

 memory root copyAllTo: zip root.

To enumerate the contents we use:

 zip root children
    do: [ :reference | Transcript show: reference basename; cr ].

To flush the ZIP archive to the underlying filesystem we simply close it:

 zip close.

This is a convenient way to access archives. Again your code does not have to worry about the details of this particular filesystem, but transparently accesses and modifies it using references.

cURL Filesystem

The cURL filesystem is an experimental extension to the Fileystem framework. It uses the cURL plugin written by Danil Osipchuk to work with filesystems that can be accessed through FTP, FTPS, HTTP, HTTPS, SCP, SFTP, and TFTP.

First, we need to load the extension packages:

 Gofer new
    renggli: 'fs';
    package: 'Curl';
    package: 'FS-Curl';
    load.

Note that the cURL filesystem also requires the latest version of the CurlPlugin. Make sure that your VM is up-to-date before you proceed:

 Curl curlVersion.                        " --> 'libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3' "

What about downloading the latest cURL plugin for the Mac VM from within Pharo? To do this we can connect to the directory with the latest experimental code of John McIntosh:

 ftp := FSCurlFilesystem url: 'ftp://ftp.smalltalkconsulting.com/experimental'.

With the resulting filesystem you can do all things you already know. If you are not authenticated however, it is unlikely that you are allowed to write (or upload) to the server. Note that currently enumerating the contents of a directory only works for FTP and SFTP servers. Due to limitations of the CurlPlugin it is furthermore not possible to create directories, delete or rename files. Hopefully that will be fixed sometime soon in the plugin code.

 ftp working children.

To download the curl plugin and save it to your hard disk you can use:

 ftp working / 'CurlPlugin.1.1.0.bundle.zip' copyTo: disk working / 'CurlPlugin.1.1.0.bundle.zip'.

Unpacking the zip archive should be a breeze.

Posted by Lukas Renggli at 9 March 2010, 10:08 pm with tags filesystem, pharo, smalltalk, tutorial 5 comments link

Writing Parsers with PetitParser

After the announcement in the Moose mailing list and after various people have asked me to provide some introduction to PetitParser I decided to write short tutorial.

Originally I have written PetitParser as part of my work on the Helvetia system. PetitParser is a parsing framework different to many other popular parser generators. For example, it is not table based such as SmaCC or ANTLR. Instead it uses a unique combination of four alternative parser methodologies: scannerless parsers, parser combinators, parsing expression grammars and packrat parsers. As such PetitParser is more powerful in what it can parse and it arguably fits better the dynamic nature of Smalltalk. Let’s have a quick look at these four parser methodologies:

Scannerless Parsers combine what is usually done by two independent tools (scanner and parser) into one. This makes writing a grammar much simpler and avoids common problems when grammars are composed.
Parser Combinators are building blocks for parsers modeled as a graph of composable objects; they are modular and maintainable, and can be changed, recomposed, transformed and reflected upon.
Parsing Expression Grammars (PEGs) provide ordered choice. Unlike in parser combinators, the ordered choice of PEGs always follows the first matching alternative and ignores other alternatives. Valid input always results in exactly one parse-tree, the result of a parse is never ambiguous.
Packrat Parsers give linear parse time guarantees and avoid common problems with left-recursion in PEGs.

Loading PetitParser

Enough theory, let’s get started. PetitParser is developed in Pharo, but is also available on other Smalltalk platforms. A ready made image can be downloaded here. To load PetitParser into an existing image evaluate the following Gofer expression:

Gofer new
   renggli: 'petit'; 
   package: 'PetitParser';
   package: 'PetitTests';
   load.

There are other packages in the same repository that provide additional features, for example PetitSmalltalk is a Smalltalk grammar, PetitXml is an XML grammar, PetitJson is a JSON grammar, PetitAnalyzer provides functionality to analyze and transform grammars, and PetitGui is a Glamour IDE for writing complex grammars. We are not going to use any of these packages for now.

More information on how to get PetitParser can be found on the website of the project.

Writing a Simple Grammar

Writing grammars with PetitParser is simple as writing Smalltalk code. For example to write a grammar that can parse identifiers that start with a letter followed by zero or more letter or digits is defined as follows. In a workspace we evaluate:

identifier := #letter asParser , #word asParser star.

If you inspect the object identifier you’ll notice that it is an instance of a PPSequenceParser. This is because the #, operator created a sequence of a letter and a zero or more word character parser. If you dive further into the object you notice the following simple composition of different parser objects:

PPSequenceParser (this parser accepts a sequence of parsers)
    PPPredicateObjectParser (this parser accepts a single letter)
    PPRepeatingParser (this parser accepts zero or more instances of another parser)
       PPPredicateObjectParser (this parser accepts a single word character)

Parsing Some Input

To actually parse a string (or stream) we can use the method #parse::

identifier parse: 'yeah'.          " --> #($y #($e $a $h)) "
identifier parse: 'f12'.           " --> #($f #($1 $2)) "

While it seems odd to get these nested arrays with characters as a return value, this is the default decomposition of the input into a parse tree. We’ll see in a while how that can be customized.

If we try to parse something invalid we get an instance of PPFailure as an answer:

identifier parse: '123'.           " --> letter expected at 0 "

Instances of PPFailure are the only objects in the system that answer with true when you send the message #isPetitFailure. Alternatively you can also use #parse:onError: to throw an exception in case of an error:

identifier
   parse: '123'
   onError: [ :msg :pos | self error: msg ].

If you are only interested if a given string (or stream) matches or not you can use the following constructs:

identifier matches: 'foo'.         " --> true "
identifier matches: '123'.         " --> false "

Furthermore to find all matches in a given input string (or stream) you can use:

identifier matchesIn: 'foo 123 bar12'.

Similarly, to find all the matching ranges in the given input string (or stream) you can use:

identifier matchingRangesIn: 'foo 123 bar12'.

Different Kinds of Parsers

PetitParser provide a large set of ready-made parser that you can compose to consume and transform arbitrarily complex languages. The terminal parsers are the most simple ones. We’ve already seen a few of those:

Terminal Parsers	Description
`$a asParser`	Parses the character `$a`.
`'abc' asParser`	Parses the string `'abc'`.
`#any asParser`	Parses any character.
`#digit asParser`	Parses the digits 0..9.
`#letter asParser`	Parses the letters a..z and A..Z.

The class side of PPPredicateObjectParser provides a lot of other factory methods that can be used to build more complex terminal parsers.

The next set of parsers are used to combine other parsers together:

Parser Combinators	Description
`p1 , p2`	Parses `p1` followed by `p2` (sequence).
`p1 / p2`	Parses `p1`, if that doesn’t work parses `p2` (ordered choice).
`p star`	Parses zero or more `p`.
`p plus`	Parses one or more `p`.
`p optional`	Parses `p` if possible.
`p and`	Parses `p` but does not consume its input.
`p not`	Parses `p` and succeed when `p` fails, but does not consume its input.
`p end`	Parses `p` and succeed at the end of the input.

So instead of using the #word predicated we could have written our identifier parser like this:

identifier := #letter asParser , (#letter asParser / #digit asParser) star.

To attach an action or transformation to a parser we can use the following methods:

Action Parsers	Description
`p ==> aBlock`	Performs the transformation given in `aBlock`.
`p flatten`	Creates a string from the result of `p`.
`p token`	Creates a token from the result of `p`.
`p trim`	Trims whitespaces before and after `p`.

To return a string of the parsed identifier, we can modify our parser like this:

identifier := (#letter asParser , (#letter asParser / #digit asParser) star) flatten.

These are the basic elements to build parsers. There are a few more well documented and tested factory methods in the operations protocol of PPParser. If you want browse that protocol.

Writing a More Complicated Grammar

Now we are able to write a more complicated grammar for evaluating simple arithmetic expressions. Within a workspace we start with the grammar for a number (actually an integer):

number :=  #digit asParser plus token trim ==> [ :token | token value asNumber ].

Then we define the productions for addition and multiplication in order of precedence. Note that we instantiate the productions as PPUnresolvedParser upfront, because they recursively refer to each other. The method #def: resolves this recursion using the reflective facilities of the host language:

term := PPUnresolvedParser new.
prod := PPUnresolvedParser new.
prim := PPUnresolvedParser new.
 
term def: (prod , $+ asParser trim , term ==> [ :nodes | nodes first + nodes last ])
   / prod.
prod def: (prim , $* asParser trim , prod ==> [ :nodes | nodes first * nodes last ])
   / prim.
prim def: ($( asParser trim , term , $) asParser trim ==> [ :nodes | nodes second ])
   / number.

To make sure that our parser consumes all input we wrap it with the end parser into the start production:

start := term end.

That’s it, now we can test our parser and evaluator:

start parse: '1 + 2 * 3'.       " --> 7 "
start parse: '(1 + 2) * 3'.     " --> 9 "

As an exercise we could extend the parser to also accept negative numbers and floating point numbers, not only integers. Furthermore it would be useful to add support subtraction and division as well. All these features can be added with a few lines of PetitParser code.

Posted by Lukas Renggli at 25 February 2010, 5:34 pm with tags tutorial, petitparser, smalltalk, pharo 4 comments link

Disk Filesystem

A while ago Colin Putney announced the Filesystem framework, a nice and extensible replacement for the ugly FileDirectory class in Pharo. While all core classes are well commented, there is a quick start missing that explains how end users are supposed to adopt the framework. This blog post should fill that gap.

First we need to load the package:

 Gofer new
     wiresong: 'mc';
     package: 'Filesystem';
     load.

The framework supports different kinds of filesystems that can be used interchangeably and that can transparently work with each other. The most obvious one is the filesystem on your hard disk. We are going to work with that one for now:

 working := FSDiskFilesystem current working.

Put the above code into a workspace and evaluate it. It assigns a reference of the current working directory to the variable working. References are the central object of the framework and provide the primary mechanism of working with files and directories. All code below works on FSReference instances.

Navigating the Filesystem

Now lets do some more interesting things. To list all children of your working directory evaluate the following expression:

 working children.

To iterate over all children recursively evaluate:

 working allChildren.

To get a reference to a specific file or directory within your working directory use the slash operator:

 cache := working / 'package-cache'.

Navigating back to the parent is easy:

 cache parent.

You can check for various properties of the cache directory by evaluating the following expressions:

 cache exists.             "--> true"
 cache isFile.             "--> false"
 cache isDirectory.        "--> true"
 cache basename.           "--> 'package-cache'"

To get additional information about the filesystem entry evaluate:

 cache entry creation.     "--> 2010-02-14T10:34:31+00:00"
 cache entry modification. "--> 2010-02-14T10:34:31+00:00"
 cache entry size.         "--> 0 (directories have size 0)"

The framework also supports locations, late-bound references that point to a file or directory. When asking to perform a concrete operation, a location behaves the same way as a reference. Currently the following locations are supported:

 FSLocator desktop.
 FSLocator home.
 FSLocator image.
 FSLocator vmBinary.
 FSLocator vmDirectory.

If you save a location with your image and move the image to a different machine or operating system, a location will dynamically adapt and always point to the place you would expect.

Opening Read- and Write-Streams

To open a file-stream on a file ask the reference for a read- or write-stream:

 stream := (working / 'foo.txt') writeStream.
 stream nextPutAll: 'Hello World'.
 stream close.

 stream := (working / 'foo.txt') readStream.
 stream contents.
 stream close.

Please note that #writeStream overrides any existing file and #readStream throws an exception if the file does not exist. There are also short forms available:

 working / 'foo.txt' writeStreamDo: [ :stream | stream nextPutAll: 'Hello World' ].

 working / 'foo.txt' readStreamDo: [ :stream | stream contents ].

Have a look at the streams protocol of FSReference for other convenience methods.

Renaming, Copying and Deleting Files and Directories

You can also copy and rename files by evaluating:

 (working / 'foo.txt') copyTo: (working / 'bar.txt').

To create a directory evaluate:

 backup := working / 'cache-backup'.
 backup createDirectory.

And then to copy the contents of the complete package-cache to that directory simply evaluate:

 cache copyAllTo: backup.

Note, that the target directory would be automatically created, if it was not there before.

To delete a single file evaluate:

 (working / 'bar.txt') delete.

To delete a complete directory tree use the following expression. Be careful with that one though.

 backup deleteAll.

That’s the basic API of the Filesystem library. If there is interest we can have a look at other features and other filesystem types in a next iteration.

Posted by Lukas Renggli at 14 February 2010, 2:10 pm with tags filesystem, pharo, smalltalk, tutorial 5 comments link

Programmatically Run Lint

While most people prefer to run SmallLint (Smalltalk Code Critics) from within OmniBrowser, the question arises from time to time on how to do the same from within a workspace script:

1. Select one or more rules to run from the class hierarchy below RBLintRule. For example, the following expression would instantiate a single rule that searches for questionable message sends:

rule := RBBadMessageRule new

The following expression could be used to search for both classes that implement #= but not #hash and methods that do float equality comparisons:

rule := RBCompositeLintRule rules: (Array
    with: RBDefinesEqualNotHashRule new
    with: RBFloatEqualityComparisonRule new)

2. Next select the scope to run the rules in. In the most simple case this is the complete image:

environment := BrowserEnvironment new

You can however restrict the scope further, for example to the collection hierarchy:

environment := BrowserEnvironment new
    forClasses: Collection withAllSubclasses

If you have the OmniBrowser refactoring tools installed you can easily browse a restricted environment by evaluating:

environment open

3. Finally you perform the actual search by evaluating the following expression:

SmalllintChecker runRule: rule onEnvironment: environment

Note that this might take a while depending on the size of the environment and the number of rules you run.

4. If you have the OmniBrowser refactoring tools installed you can have a look at the result by evaluating the following expression:

rule open

If you have an image without OmniBrowser loaded open an Inspector on the result to see the matching code. If you are running a transformation rule, use the following script to perform the change in your system:

change := CompositeRefactoryChange new.
change changes: rule changes.
change execute

Posted by Lukas Renggli at 19 November 2009, 6:04 pm with tags lint, refactoring, smalltalk, tutorial 1 comment link

<< 1 2 >>

Lukas Renggli

Talking Meta

Archive

Tags

Composite Grammars with PetitParser

Defining the Grammar

Defining the Evaluator

More Filesystems

Disk Filesystem

Memory Filesystem

ZIP Filesystem

cURL Filesystem

Writing Parsers with PetitParser

Loading PetitParser

Writing a Simple Grammar

Parsing Some Input

Different Kinds of Parsers

Writing a More Complicated Grammar

Disk Filesystem

Navigating the Filesystem

Opening Read- and Write-Streams

Renaming, Copying and Deleting Files and Directories

Programmatically Run Lint