Thursday, October 29, 2015

Using Antlr with jEdit

I've written a number of parsers in the past with javacc, but my recent struggles in modifying my Java 7 grammar to support Java 8 got me to looking at other options. I'd been reading good things about Antlr 4 (Antlr is at www.antlr.org), so I gave it a try. I've been very impressed. It's easy to use, well documented, and well architected. The on-line documentation is great, but it's worth buying the book.

Of course, I'm creating and editing Antlr grammars with jEdit, so I wrote a few things for jEdit to make it easier to work with Antlr.

First is an edit mode for the *.g4 grammar files:
https://sourceforge.net/p/jedit/svn/HEAD/tree/jEdit/trunk/modes/antlr4.xml

This lets jEdit provide syntax highlighting for the grammar files. Here's an example using the Java 8 grammar:


grammar Java8;

@lexer::members {
    public static final int WHITESPACE = 1;
    public static final int COMMENTS = 2;
}

/*
 * Productions from §3 (Lexical Structure)
 */

literal
    :   IntegerLiteral
    |   FloatingPointLiteral
    |   BooleanLiteral
    |   CharacterLiteral
    |   StringLiteral
    |   NullLiteral
    ;

/*
 * Productions from §4 (Types, Values, and Variables)
 */

type
    :   primitiveType
    |   referenceType
    ;

primitiveType
    :   annotation* numericType
    |   annotation* 'boolean'
    ;

Antlr produces a couple of *.tokens files, one for the parser, and one for the lexer. It turns out these are just properties files, which jEdit already has good support for. To get jEdit to recognize these token files as property files, it is just a matter of going to Utilities - Global Options - Editing, choosing the Properties edit mode, and adding "token" as a file name extension to the existing list of extensions. Now the token files look like this:


THROW=45 
STATIC=39 
INTERFACE=29 
AND_ASSIGN=96 
BREAK=5 
BYTE=6 
ELSE=16 
IF=23 
ENUM=17 
SUB=83 
BANG=70 
LPAREN=58 
DOT=66 
CASE=7

Notice there is no order to these, so the JavaSideKick plugin, which supports properties files, is very useful:

Even more useful is an Antlr Sidekick. The parser for this Sidekick is written in Antlr. I packaged Antlr itself in a separate library plugin so it can be updated independently of the Antlr Sidekick plugin.


Notice the lexer and parser rules are in separate nodes in the tree and the usual sidekick features such as sorting and filtering are supported.

The Antlr Sidekick also has one action that is really useful, if a *.g4 file is the current file in jEdit, then going to Plugins - AntlrSidekick - Generate files will run Antlr on the g4 file and generate (or regenerate) the parser, lexer, and listener files, so there is no need to go to the command line to do so.


No comments: