BNF Converter

Java Mode


By Michael Pellauer

2003

One of the best features of the LBNF file format is that it is completely independent of the target programming language. The new Java mode allows the BNF Converter to output a compiler front end in Java using JLex and CUP.

BNFC Java-mode was developed for JLex version 1.2.6 and CUP version 0.10k. The generated Java code requires JDK version 1.2 or greater.

BNF Converter Homepage:
http://www.cs.chalmers.se/~markus/BNFC/

JLex Homepage:
http://www.cs.princeton.edu/~appel/modern/java/JLex/

CUP Homepage:
http://www.cs.princeton.edu/~appel/modern/java/CUP/

USAGE

bnfc [-m] -java FILENAME.cf

To access the java mode simply pass the -java command to bnfc. It uses FILENAME as the package name of the generated Java code. The result will be the following files:

Filename: Description:
Absyn/*.java
Visitor.java
Visitable.java
FILENAME.lex
FILENAME.cup
PrettyPrinter.java
Skeleton.java
VisitSkel.java
Test.java
Makefile
Abstract Syntax tree package (in subdirectory Absyn).
Visitor interface
Visitable interface
JLex file (lexer specification)
CUP file (parser specification)
A Pretty Printer for the abstract syntax tree.
A code skeleton for traversing the syntax tree.
A skeleton that uses the Visitor Design Pattern.
A testbench to test the final result.
A Makefile (generated only with the -m flag).

Please ensure that your $CLASSPATH correctly points to JLex.Main and java_cup.Main. It may also be useful to include the current working directory in the $CLASSPATH.

Compiling the Compiler Front End

You can use the generated Makefile to compile the generated Java Code with "make" ("gmake" on some systems).

If all goes well the following files should be generated:

File: Description:
Absyn/*.class
FILENAME.lex.java
Yylex.class
parser.java
sym.java
parser.class, sym.class,
  CUP$parser$actions.class
PrettyPrinter.class
Test.class
Abstract Syntax class files
Lexer generated by JLex
Compiled Lexer class
Parser generated by CUP
CUP token symbols

Compiled parser classes
Compiled Pretty Printer
Compiled Test Bench

Testing the Compiler Front End

The Java class Test.java may be used to test the result of the generation. To use it simply give it the name of a file written in the target language. If parsing is correct, the abstract syntax tree (in Haskell-like syntax) and linearized pretty-printed result will be shown.

For example, if we had created a subset of Java called Javalette, in Javalette.cf:

> java Javalette.Test helloworld.jl

Parse Succesful!

[Abstract Syntax]

(PDefList [(FuncDef TInt(FFuncName "main")[][(SPrintString "Hello World"), (SReturn NoExpression)]NoSemicolon)])

[Linearized Tree]

int main ()
{
  printString ("Hello World") ;
  return ;
 
}

If no argument is given then it attempts to read from stdin.

Note that the current directory may need to be included in your $CLASSPATH.

Packages in the Generated Code

The BNF Converter generates code that makes use of Java packages. For example, if your file name was Javalette.cf, then it will generate code with a package name of "Javalette". Because of this your Java compiler may expect that the current directory name matches the package name. So in our example above the Java compiler probably will expect the current directory to be named "Javalette" and be included in the $CLASSPATH somehow.

Generation of the Abstract Syntax tree can result in hundreds of classes (one for each rule in your LBNF file). Therefore they are placed in a subpackage called "Absyn". A subdirectory will be created if one does not already exist.

The Abstract Syntax Tree

The Abstract Syntax tree is generated following the method Andrew Appel outlines in
"Modern Compiler Construction in Java"
http://www.cs.princeton.edu/~appel/modern/java/

The generated code follows Appel's "non-Object Oriented method." In addition they implement the Visitor Design Pattern.

Here is an example. Let us say that we have made a simple LBNF file to describe lists of boolean expressions, called BoolExp.cf:

--Begin LBNF file

PROG.           PROG   ::= [EXP] ;

OrExp.          EXP     ::= EXP "||" EXP1 ;
AndExp.         EXP1    ::= EXP1 "&&" EXP2 ;
TVal.           EXP2    ::= "true" ;
FVal.           EXP2    ::= "false" ;

separator EXP ";" ;
coercions EXP 2 ;

--LBNF file ends here


The Absyn package will contain the following files:

AndExp.java   FVal.java     OrExp.java    TVal.java
EXP.java      ListEXP.java  PROG.java

Note that there is one Java class for each Label in the LBNF file. The top of the tree is PROG, which is just a pointer to the top of a list:

package BoolExp.Absyn; //Java Package generated by BNF converter

public class PROG implements BoolExp.Visitable
{
  public ListEXP listexp_;
  public PROG(ListEXP p1) { listexp_ = p1; }
  public void accept(BoolExp.Visitor v) { v.visitPROG(this); }
}


The class ListEXP is automatically generated to represent the list of EXP. This is significantly different than the standard BNF Haskell mode, which usesHaskell's built-in lists. Currently for each class NAME that can be held in a list, a class ListNAME is generated. These are straightforward null-terminated linked lists and should be familiar to all Java programmers. In the future this feature could be extended to use Java generic types.

For example, here is ListEXP:

package BoolExp.Absyn; //Java Package generated by BNF converter

public class ListEXP implements BoolExp.Visitable
{
  public EXP exp_;
  public ListEXP listexp_;

  public ListEXP(EXP p1, ListEXP p2) { exp_ = p1; listexp_ = p2; }
  public void accept(BoolExp.Visitor v) { v.visitListEXP(this); }
  public ListEXP reverse()
  {
    ...
  }
  public ListEXP reverse(ListEXP prev)
  {
    ...
  }
}


Note that all Lists include a generated "reverse" function used during BNFC's optimization of Left-recursive lists. If there is sufficient programmer demand more standard generated functions (such as delete, lookup, etc) could be added.

It is interesting to note that ListEXP has a pointer to EXP. But EXP is an abstract class:

package SimpleTest.Absyn; //Java Package generated by BNF converter

public abstract class EXP {}

Abstract classes represent LBNF Categories, such as EXP. If a category does not have a label with the same name (as PROG did in our example) then it is declared abstract. All Labels of a category are represented as a subclass of the abstract category. Label names should not match category names if that category has more than one label.

Here are EXP's subtypes in our example:

public class AndExp extends EXP
{
  public EXP exp_1, exp_2;
  public AndExp(EXP p1, EXP p2) { exp_1 = p1; exp_2 = p2; }
}

public class OrExp extends EXP
{
  public EXP exp_1, exp_2;
  public OrExp(EXP p1, EXP p2) { exp_1 = p1; exp_2 = p2; }
}

public class TVal extends EXP
{
  public TVal() { }
}

public class FVal extends EXP
{
  public FVal() { }
}

The programmer can then traverse through the tree using the generated Skeleton file (see below). The PrettyPrinter is an example of this. It contains twostatic methods that traverse the tree and return Strings. The "show" functionprints a Haskell-like view of the data type names. The other function "print" is a pretty-printer that uses simple heuristics (easily changed in the function "render") to linearize the syntax tree.

We can see the PrettyPrinter class at work in the automatically generated class Test.java. Continuing our early example, let us write a simple input file in our language of boolean expressions:

true && (false || true);
true;
false || false

We can now test our parser using the Test class:

> java BoolExp.Test testfile

Parse Succesful!

[Abstract Syntax]

(PROG [(AndExp TVal(OrExp FValTVal)), TVal, (OrExp FValFVal)])

[Linearized Tree]

true && (false || true) ;
true ;
false || false


Using the Skeleton File

Now that we have sucessfully constructed a parser, it's time to do something useful with it. This generally involves traversing the Abstract Syntax tree and producing a new data structure, such as translating from a high-level language to assembly code. To assist the programmer with this process the BNF Converter's Java Mode provides two alternative methods, found in the files Skeleton.java, and VisitSkel.java.

Skeleton.java is an empty skeleton that traverses the abstract syntax tree (currently doing nothing interesting). It uses the general algorithm outlined by Appel. That is to say, it generates functions for each major abstract class (categories in the LBNF file). It then uses Java's instanceof operator to determine which non-abstract subclass it is dealing with, then recurses onto that subclass's members.

To use Skeleton.java the programmer must generally do the following:
From this point it should be fairly straightforward to implement the desired algorithm.

Appel's method is very efficient in practice, but it may not be familiar to all Java programmers. More familiar is the Visitor Design Pattern. A Visitor interface specific to the language is generated in the file Visitor.java. The class VisitSkel.java implements the interface and gives the programmer a template skeleton of all classes in the Abstract Syntax. The VisitSkel class is not static, and must be instantiated to be used.

To use VisitSkel.java:

Many Visitor Design Patterns use a Visitee-traversal algorithm. That is to say, visiting the top member of a List will automatically visit all the members of the List. However, the BNFC-generated pattern uses Visitor-traversal. This means that it is the Visitor's responsibility, when visiting a list, to visit all the members in turn. This is because certain algorithms that compilers want to implement are not compositional. That is to say, that performing a transformation on a single member may be quite different than performing that transformation on a certain pattern of nodes. For example, during peephole analysis a compiler may wish to merge to subsequent additions into a single operation, but may want to leave single additions unchanged. This is easier to implement if it is the Visitor doing the traversal itself.

Known Issues

Here is a list of known issues with the BNF Converter's Java Mode.

Keywords

Java contains many more keywords than Haskell, and currently there is nothing to prevent a programmer from using these keywords in their LBNF grammar. For example:

Static.   Package   ::= Sychronized "." Switch ;

While this will be accepted by BNF it might result in un-compilable code, as the Java occaisionally uses lowercase names for instance variables. As such programmers should generally avoid matching Label names with Java keywords, or the name of already existing packages.

Reuse of Label Names

Reusing label names currently results in a warning from BNFC. However, for Java this is a much more serious issue, as they are used for class names. For example:

public class FOO extends BAR
{
 ...
}

public class FOO extends BAZ //OOPS! That's two classes named FOO
{
 ...
}


The best solution is for the programmer to ensure that all label names are unique and not ignore the BNFC warning.

JLex Regular Expressions

BNFC's regular expressions (for user-defined token types) are exactly isomorphic to those of Alex for Haskell. Unfortunately JLex regular expressions are not as expressive as those of Alex. Specifically the following operators are not supported:

-Set subtraction (-)
-Epsillon (eps)

Currently using those operators in a regular expression will cause generation of the lexer by JLex to fail. In the future a solution might be found to this by transforming the regular expression, but currently the programmer should avoid those operators if they wish to generate Java code.

Shift-Reduce Conflicts in CUP

Certain grammar constructs result in shift-reduce conflicts in CUP that do not result in them in Happy. We are currently investigating exactly what these differences are and how to resolve them, however initial investigations seem to indicate that CUP's default policy of Shifting is sufficient. In the meantime, the generated Makefile is currently set to tell CUP to expect 100 conflicts before it aborts parser generation. Depending on the needs of your project this number could be made bigger or smaller.