Versions: 22/3 improve lexer. 8/3 remove some obsolete code, improve pretty-printer, add examples. 21/2 add comment pragma. 20/2 add comments to BNF files. 17/2 generate a pretty-printer (experimental). 14/2 generate a case skeleton. 12/2 add grammar checking, simplify coercion and list conventions. 9/2 add convention about non-parsing rules, filter out non-existing literal types from lexer. 8/2 generate Latex files. 5/2 generate Alex files, indicate line numbers at parse errors. 4/2 add support for lists, extend the token type. 1/2/2002 first version.
C4, a fragment of C (see "What it does" above).
Moreover, even though Happy does a great job in generating parsers LALR(1), writing a Happy parser by hand is unnecessarily low-level if all you want the parser to return is an abstract syntax tree. Only wanting this is part of the modern multi-phase compilation ideology: you should not define your compiler back-end in the semantic actions of Happy.
Of course, if you do want to use the semantic actions of Happy to do more than just construct abstract syntax trees, you will find the BNF converter too restricted.
This program is a spin-off of a much larger tool, GF, which has a grammar formalism richer than BNF and was particularly designed for natural languages. With the addition of a GF-to-Happy converter, in combination with an earlier BNF-to-GF converter, it turned out that GF is almost usable as a compiler tool, as well - but only almost, if one wants to create components that fit seamlessly into the traditional way of writing compilers in Haskell (e.g. use Haskell's lists and integers in syntax trees, and to get rid of some concrete syntax clutter). Rather that adding such ad hoc things to GF itself, we wrote this ad hoc tool where such facilities are provided by ad hoc conventions.
$ hugs ... Prelude> :l CFTop ... Main> makeAll "C4" 34 rules accepted wrote file AbsC4.hs wrote file LexC4.x wrote file ParC4.y wrote file LatexC4.tex wrote file SkelC4.hs wrote file PrintC4.hs Executing alex Executing happy unused terminals: 1 Executing latex Executing xdvi to get started, import the parser in hugs by :l ParC4 Main> :l ParC4 ... ParC4> readFile "koe2.c" >>= print . pProgr . myLexer Ok (Program [Decl TInt [Ident "i"],Decl TInt [Ident "k"]] [SAss (Ident "i") (EInt 1),SAss (Ident "k") (EVar (Ident "i")), SWhile (EOp (EVar (Ident "i")) OLt (EInt 13)) (SBlock [SAss (Ident "k") (EOp (EVar (Ident "k")) OTimes (EVar (Ident "i"))),SIncr (Ident "i"), SPrint (EVar (Ident "k"))]),SReturn (EInt 0)])
Notice that you need hugs, alex, happy, latex, and xdvi. The Alex runtime file Alex.hs must be on your Hugs path when running the BNF converter.
When run on the file Foo.cf, makeAll writes six files, AbsFoo.hs, SkelFoo.hs, PrintFoo.hs, LexFoo.x, LatexFoo.tex and ParFoo.y.
The files LexFoo.x and ParFoo.y are compiled into a production-quality lexer and parser by using alex and happy. If this goes well, it produces two huge Haskell files, LexFoo.hs and ParFoo.hs, which define a lexer and an LALR(1) parser corresponding to your BNF grammar. The file can then be compiled (or hugs-interpreted) in the usual way. The most important functions they define are
The file SkelFoo.hs defines case skeletons for the abstract syntax in AbsFoo.hs. This file defines a function:
The file PrintFoo.hs defines a pretty-printer class Print and instantiates it for every datatype defined in AbsFoo.hs. The top-level function is
The documentation, describing the language defined by your BNF grammar, is produced by compiling LatexFoo.tex with latex. If everything works fine, latex outputs a file LatexFoo.dvi that can be viewed with xdvi. If you prefer postscript, convert the output with the command dvips LatexFoo.dvi -o LatexFoo.ps.
Main> cfTop "Foo.cf" 27 rulesThis drops you into a subshell where you can enter strings and get them parsed in the value category of the first rule of your grammar, in this case Prog.
Prog? <koe1.c Program [] [] Prog? >Exp Exp? 2 + 2 Sum Two Two Exp? . Main>The example illustrates the four available commands:
Within Hugs, it is easy to use the Happy parser interactively, as well, since parsers are exported for every category. For example:
ParC4> return "2*(x+3)" >>= print . pExp . myLexer Ok (EOp (EInt 2) OTimes (EOp (EVar (Ident "x")) OPlus (EInt 3)))
Ident"." Ident "::=" (Ident | Quoted)*The first identifier is a rule label, which comes out as a constructor in the generated abstract syntax. The second identifier is a category symbol, which is any identifier. Then comes the symbol ::=, and finally a list of items separated by blanks. An item is either a category symbol (nonterminal) or a quoted string (terminal). For example, the following rule defines C-like if-else statements.
SIf. Stm ::= "if" "(" Exp ")" Stm "else" StmComments in a BNF grammar are like in Haskell: --... for single-line comments and {- ... -} for possibly multiple-line comments.
-- This is a single-line comment. {- This is a multiple-line comment. -}
Rule names and category symbols can be chosen ad libitum, with the restrictions imposed by Haskell (if you want to use the generated Happy and Haskell files), and some other conventions:
data Stm = ... | SIf Exp Stm Stm | ...Certain identifiers are treated in special ways, as explained below: Char, Double, Int, String, Ident.
To type-control the usage of coercions, we introduce the notion of the category skeleton of a BNF rule. The category skeleton is a pair
(C, C1...Cn)where C is the undigited left-hand-side nonterminal and C1...Cn is the sequence of undigited right-hand-side nonterminals of the rule. The category skeleton is the structure that is used for producing a constructor in the abstract syntax, with value type C and argument types C1,...,Cn.
Precedence classes usually require semantically dummy transitions, e.g. from Exp1 to Exp3 by the addition of parentheses, and in the opposite direction for free. Since we don't want to use constructors constructors for such transitions, we introduce the following convention:
(C,C).
Digited variants of categories can be used for other purposes than precedence, provided that the type-skeleton is correct. This is OK for the parser, but the pretty-printer may get confused, since it asssumes that digits are used for precedence only.
Sometimes it is necessary to use a constructor in many BNF rules, between different precedence classes. This is accepted: it is only required that all uses of the constructor have the same category skeleton.
data Ident = Ident StringFor example, the BNF rules
EVar. Exp ::= Ident EInt. Exp ::= Intgenerate the following abstract syntax:
data Exp = ... | EVar Ident | EInt Int | ...
The set of reserved words is the set of terminals appearing in the grammar. Those reserved words that consist of non-letter characters are treated in a different way from those that are similar to identifiers. The lexer tries to follow normal rules familiar from languages like Haskell, C, and Java, including longest match and spacing conventions.
Comments are like in Haskell: {- ... -} and --... are treated as comments in the lexers produced. To change this, you have to define a comment pragma in your BNF grammar file (currently the only pragma in the system): put anywhere in the code either of
The strings in enclosed comments must be, besides the quotes, no longer than two character. If this is to restrictive, then edit the LexFoo.x file.
So if we would like to make our Haskell comment convention explict, we could write:
# comment "--"Generate XML.
Generate syntax editors?