Dynamic Language Embedding With Homogeneous Tool Support
Domain-specific languages (DSLs) are increasingly used as embedded languages within general-purpose host languages. DSLs provide a compact, dedicated syntax for specifying parts of an application related to specialized domains. Unfortunately, such language extensions typically do not integrate well with existing development tools. Editors, compilers and debuggers are either unaware of the extensions, or must be adapted at a non-trivial cost. Furthermore, these embedded languages typically conflict with the grammar of the host language and make it difficult to write hybrid code; few mechanisms exist to control the scope and usage of multiple tightly interconnected embedded languages.
In this dissertation we present Helvetia, a novel approach to embed languages into an existing host language by leveraging the underlying representation of the host language used by these tools. We introduce Language Boxes, an approach that offers a simple, modular mechanism to encapsulate (i) compositional changes to the host language, (ii) transformations to address various concerns such as compilation and syntax highlighting, and (iii) scoping rules to control visibility of fine-grained language changes. We describe the design and implementation of Helvetia and Language Boxes, discuss the required infrastructure of a host language enabling language embedding, and validate our approach by case studies that demonstrate different ways to extend or adapt the host language syntax and semantics.
The defense will take place on Wednesday, October 20, 2010 at 16h15 in Room 001, Engehaldenstrasse 8, Bern. For directions please see http://www.iam.unibe.ch/en/contact.
The latest version of OmniBrowser includes a completion dialog that is used at various places to ease the use of the keyboard. This new dialog is currently used for the following features:
Finding Classes (Ctrl+F)
Finding Implementors (Ctrl+M) and Senders (Ctrl+N)
Creating and Renaming Categories and Protocols
Triggering Commands (Ctrl+Shift+T) (new)
Selecting Windows (Ctrl+Shift+W) (new)
The following video gives a quick overview of these features:
There is a repeated confusion on how merging in Monticello works. This post tries to clear up the situation. In the following examples we always start from the same situation displayed below. P is the name of the package we are working with and the number behind P.1 signifies a specific version of that package. The Working Copy is the code we currently have in our Smalltalk image.
This shows us that we currently have all the changes from P.0, P.1, P.2 and P.3 in our image. P.6 and P.7 is a separate branch that we do not have in our image.
Load
What happens if we load P.7? If there are any changes in our Working Copy we lose those changes after a warning. Afterwards P.7 is loaded and a new Working Copy is ready to be changed. After this load operation we end up with the following situation:
What happens if we merge P.7? First of all Monticello tries to figure out the closest common ancestor of the Working Copy and P.7. In our examples this is obviously P.1.
Before performing the actual merge Monticello needs to load the code in P.1 into memory. This can cause an error if P.1 has been moved or deleted from its original repository. If you run into troubles, you need to make sure that P.1 is in a repository known to Monticello, otherwise an automatic merge cannot be performed.
To perform the actual merge Monticello calculates the delta between the common ancestor and the two versions to be merged:
D.1 is the delta from P.1 to Working Copy. D.1 is also called the local change, because it is the delta from the common ancestor to the working copy.
D.2 is the delta from P.1 to P.7. D.2 is also called the remote change, because it is the delta from the common ancestor to the version you merge.
Monticello then inspects the two deltas D.1 and D.2. Source elements (class definitions, method definitions, variable definitions, etc.) that are added, removed or changed in both deltas are a conflict. In the merge browser the user needs to resolve these conflicts before proceeding:
The Keep button chooses the remote change from D.2 and ignores the change in D.1.
The Reject button chooses the local change from D.1 and ignores the change in D.2.
Next Monticello creates a patch-set of D.2 with the conflicting elements resolved as specified by the user. It then applies this patch-set onto the working copy and adopts the ancestry as follows:
The Working Copy has now two ancestors P.3 and P.7. In most cases the user will then commit the merged version to the repository. Afterwards the ancestry looks like this:
The merge operation performed by Monticello is called a three-way-merge, because it takes three versions into account: the working copy, the version to merge, and the common ancestor of the two versions.
Manual Merge
What happens if we start from the same situation, but the version of the closest common ancestor P.1 cannot be found. Being able to load P.0 doesn’t help here, because that would cause all the changes from P.0 to P.1 to conflict. Furthermore, if any of the changes between P.0 to P.1 were undone in either of the branches these changes would be lost.
What can we do if the common ancestor is missing? The simples thing is to do a manual merge. This means we calculate the changes from the Working Copy to P.7. This gives us a combined delta D.3 that undoes the changes from P.1 to Working Copy and that adds the changes from P.1 to P.7. Unfortunately Monticello cannot tell us from which branch these changes are coming from, so we have to go through D.3 one-by-one and apply the changes that we think belong to the delta of P.1 to P.7. At the end of the manual merge we can tell Monticello to adopt the version P.7, so that the ancestry looks again like this:
In my recent post I’ve mentioned that the Filesystem library can work on different kinds of filesystems. In this post I am going to walk you through the supported filesystems one by one.
Before you proceed with this hands-on blog post, please update to a patched and unofficial version of the Filesystem package. In the meantime some bugs and minor issues got fixed that are not integrated into the official release yet.
The Disk Filesystem is implemented in FSDiskFilesystem and its platform specific subclasses. As we have seen in the last post the singleton filesystem instance for the current platform can be retrieved using:
disk:=FSDiskFilesystemcurrent.
Subclasses of FSFilesystem implement many methods, but we should resist from calling most of them directly. These methods implement the low-level behavior of the respective file-systems and are private to the framework.
As we have learned in the previous post we should only work with references. Filesystem instances know two methods that return an FSReference object. This is true not only for the disk filesystem, but also for all other filesystem types presented later on:
diskroot." a reference to the root directory " diskworking." a reference to the working directory "
Given a reference we can navigate to another reference in the filesystem with the method #resolve:. Resolving works similar the command cd (change directory) on Unix and Windows and returns a new reference to a file or directory. Print the result of evaluating the following expressions:
diskworkingresolve:'/'." the same as 'disk root' " diskworkingresolve:'.'." the same as 'disk working' " diskworkingresolve:'/home/renggli'." an absolute path to a directory or file " diskworkingresolve:'../bar'." a relative path from the working directory "
Note that the message #resolve: is also understood by FSFilesystem itself. Do not call this method though, it is private and does not return an FSReference as you might expect.
Memory Filesystem
The memory filesystem is another simple filesystem type. Think of it as a virtual in-memory filesystem, very much like a RAM disk that lives in your Smalltalk image. The use of a memory filesystem can be very convenient for testing your filesystem code, because it does not pollute your hard disk and is garbage collected as soon as you do not reference it any longer. To instantiate a memory filesystem evaluate:
memory:=FSMemoryFilesystemnew.
On this filesystem you can do everything you learned before. A memory filesystem is initially empty:
memoryrootchildrensize." --> 0 "
To create a file we can use the same techniques we learned previously:
The above code copies all the files in the package cache of your Pharo installation to the memory filesystem. Before you try it out make sure that you don’t have your MP3 collection in that directory, otherwise your image might blow up.
In my case I have now 64 files in the memory filesystem:
memoryrootchildrensize." --> 64 "
As you would expect we can perform other operations on our virtual filesystem, for example delete all the files that start with the letter F:
As you see, there is nothing special about a memory filesystem. It behaves and understands exactly the same messages as the disk filesystem does.
ZIP Filesystem
The ZIP filesystem represents a ZIP Archive that resides on another filesystem. To create a new archive instantiate the ZIP filesystem with a reference of a ZIP archive:
Contrary to other filesystems a ZIP filesystem needs to be opened (and closed) explicitly:
zipopen.
Apart from that, the ZIP filesystem behaves exactly the same way as the other filesystems we learned up to now. To copy the contents of the memory filesystem to the ZIP archive we can evaluate the following code:
To flush the ZIP archive to the underlying filesystem we simply close it:
zipclose.
This is a convenient way to access archives. Again your code does not have to worry about the details of this particular filesystem, but transparently accesses and modifies it using references.
cURL Filesystem
The cURL filesystem is an experimental extension to the Fileystem framework. It uses the cURL plugin written by Danil Osipchuk to work with filesystems that can be accessed through FTP, FTPS, HTTP, HTTPS, SCP, SFTP, and TFTP.
What about downloading the latest cURL plugin for the Mac VM from within Pharo? To do this we can connect to the directory with the latest experimental code of John McIntosh:
With the resulting filesystem you can do all things you already know. If you are not authenticated however, it is unlikely that you are allowed to write (or upload) to the server. Note that currently enumerating the contents of a directory only works for FTP and SFTP servers. Due to limitations of the CurlPlugin it is furthermore not possible to create directories, delete or rename files. Hopefully that will be fixed sometime soon in the plugin code.
ftpworkingchildren.
To download the curl plugin and save it to your hard disk you can use:
After the announcement in the Moose mailing list and after variouspeoplehave asked me to provide some introduction to PetitParser I decided to write short tutorial.
Originally I have written PetitParser as part of my work on the Helvetia system. PetitParser is a parsing framework different to many other popular parser generators. For example, it is not table based such as SmaCC or ANTLR. Instead it uses a unique combination of four alternative parser methodologies: scannerless parsers, parser combinators, parsing expression grammars and packrat parsers. As such PetitParser is more powerful in what it can parse and it arguably fits better the dynamic nature of Smalltalk. Let’s have a quick look at these four parser methodologies:
Scannerless Parsers combine what is usually done by two independent tools (scanner and parser) into one. This makes writing a grammar much simpler and avoids common problems when grammars are composed.
Parser Combinators are building blocks for parsers modeled as a graph of composable objects; they are modular and maintainable, and can be changed, recomposed, transformed and reflected upon.
Parsing Expression Grammars (PEGs) provide ordered choice. Unlike in parser combinators, the ordered choice of PEGs always follows the first matching alternative and ignores other alternatives. Valid input always results in exactly one parse-tree, the result of a parse is never ambiguous.
Packrat Parsers give linear parse time guarantees and avoid common problems with left-recursion in PEGs.
Loading PetitParser
Enough theory, let’s get started. PetitParser is developed in Pharo, but is also available on other Smalltalk platforms. A ready made image can be downloaded here. To load PetitParser into an existing image evaluate the following Gofer expression:
There are other packages in the same repository that provide additional features, for example PetitSmalltalk is a Smalltalk grammar, PetitXml is an XML grammar, PetitJson is a JSON grammar, PetitAnalyzer provides functionality to analyze and transform grammars, and PetitGui is a Glamour IDE for writing complex grammars. We are not going to use any of these packages for now.
More information on how to get PetitParser can be found on the website of the project.
Writing a Simple Grammar
Writing grammars with PetitParser is simple as writing Smalltalk code. For example to write a grammar that can parse identifiers that start with a letter followed by zero or more letter or digits is defined as follows. In a workspace we evaluate:
identifier:=#letterasParser,#wordasParserstar.
If you inspect the object identifier you’ll notice that it is an instance of a PPSequenceParser. This is because the #, operator created a sequence of a letter and a zero or more word character parser. If you dive further into the object you notice the following simple composition of different parser objects:
PPSequenceParser (this parser accepts a sequence of parsers)
PPPredicateObjectParser (this parser accepts a single letter)
PPRepeatingParser (this parser accepts zero or more instances of another parser)
PPPredicateObjectParser (this parser accepts a single word character)
Parsing Some Input
To actually parse a string (or stream) we can use the method #parse::
While it seems odd to get these nested arrays with characters as a return value, this is the default decomposition of the input into a parse tree. We’ll see in a while how that can be customized.
If we try to parse something invalid we get an instance of PPFailure as an answer:
identifierparse:'123'." --> letter expected at 0 "
Instances of PPFailure are the only objects in the system that answer with true when you send the message #isPetitFailure. Alternatively you can also use #parse:onError: to throw an exception in case of an error:
Furthermore to find all matches in a given input string (or stream) you can use:
identifiermatchesIn:'foo 123 bar12'.
Similarly, to find all the matching ranges in the given input string (or stream) you can use:
identifiermatchingRangesIn:'foo 123 bar12'.
Different Kinds of Parsers
PetitParser provide a large set of ready-made parser that you can compose to consume and transform arbitrarily complex languages. The terminal parsers are the most simple ones. We’ve already seen a few of those:
Terminal Parsers
Description
$a asParser
Parses the character $a.
'abc' asParser
Parses the string 'abc'.
#any asParser
Parses any character.
#digit asParser
Parses the digits 0..9.
#letter asParser
Parses the letters a..z and A..Z.
The class side of PPPredicateObjectParser provides a lot of other factory methods that can be used to build more complex terminal parsers.
The next set of parsers are used to combine other parsers together:
Parser Combinators
Description
p1 , p2
Parses p1 followed by p2 (sequence).
p1 / p2
Parses p1, if that doesn’t work parses p2 (ordered choice).
p star
Parses zero or more p.
p plus
Parses one or more p.
p optional
Parses p if possible.
p and
Parses p but does not consume its input.
p not
Parses p and succeed when p fails, but does not consume its input.
p end
Parses p and succeed at the end of the input.
So instead of using the #word predicated we could have written our identifier parser like this:
These are the basic elements to build parsers. There are a few more well documented and tested factory methods in the operations protocol of PPParser. If you want browse that protocol.
Writing a More Complicated Grammar
Now we are able to write a more complicated grammar for evaluating simple arithmetic expressions. Within a workspace we start with the grammar for a number (actually an integer):
Then we define the productions for addition and multiplication in order of precedence. Note that we instantiate the productions as PPUnresolvedParser upfront, because they recursively refer to each other. The method #def: resolves this recursion using the reflective facilities of the host language:
As an exercise we could extend the parser to also accept negative numbers and floating point numbers, not only integers. Furthermore it would be useful to add support subtraction and division as well. All these features can be added with a few lines of PetitParser code.