How to design a compiler
Hi,
I want to write a simple java compiler and I wondered what would be a good design for that project.
I need a tokenizer and a parser, that much is obvious.
I thought about having a Token class and then create one instance for each token. Of course that would result in a lot of objects but that's no problem as long as I only compile small projects with it.
The advantage would be that my Token class can then have methods like isOperator() or isIdentifier(). Then I could have a TokenReader class and in the parser do something like
Token token=tokenReader.nextToken();
if(token.isIdentifier()) {
...
}
Do you think this is a bad idea?
Does someone have some better suggestions?
http://www.antlr.org http://www.javacc.org- Saish
Saisha at 2007-7-15 23:18:13 >

Excellent suggestions from Saish.
Parsing is just the first part of the equation.
The second part is code generation. You'll have to understand the JVM and its byte code very well.
How well do you understand compiler design? Do you have a copy of The Dragon Book? If not, go get one.
Google for other resources:
http://gcc.gnu.org/java/
http://www.thefreecountry.com/compilers/java.shtml
http://www.cs.cmu.edu/~jch/java/compilers.html
Maybe another idea is to start by writing a compiler for a simpler language than Java. Work your way up from a simple assembler. The Deitel and Deitel book has a fine example of how to do this. I've done one for PDP-8 assembly language. It was a good first step, because the instruction set is small and easy to understand.
%
Thanks for the suggestions. I will look into it.But what about my question?Is it considered a good design to create tons of objects in such a case?
If "tons" is thousands of objects (which I expect a largish source file might contain) then don't worry about it. A regular PC should be able to create and GC a hundred million smallish objects per second.
Another approach is not to have each token a field in a Lexer class. So you have something like
Lexer lexer = new Lexer(sourceFileName);
...
lexer.nextToken();
if (lexer.currentToken() == Lexer.IDENTIFIER) {
String identifier = lexer.currentTokenText();
....
Depends on what kind of a parser you use. If you use a ready-made parser generator it will have its own lexer interface and you don't get to choose. I like handcrafted recursive descent parsers if the source language allows it but that's just me.
Despite the time compiler construction books spend on the subject, lexing is and parsing are a relatively straightforward processes, and not a terribly big parts of a compiler. Especially if you write an optimizing compiler. And assuming a reasonably regular language (C++ == parsing nightmare).
javacc is a great way to go, because it already has a java grammar defined. you'll have to learn javacc and jtree well enough to create and manipulate the parse tree.%
> But what about my question?
> Is it considered a good design to create tons of objects in such a case?
If you're concerned about the number of tokens, have a look at the
flyweight pattern; hint: a simple ')' doesn't need a new object every
time that character is seen by the lexer.
kind regards,
Jos
JosAHa at 2007-7-15 23:18:13 >

I looked up the definition of flyweight pattern.
http://en.wikipedia.org/wiki/Flyweight_pattern
Judging from this Java is already using this pattern i.e. in Java you don't have pointers for the methods stored in the objects and the second thing mentioned on that website is usually true too throughout the API.
Anyway, I think I can pool all Token objects in a hashmap like the Sun JVM does with strings as long as I don't store specific things like the line number in them. Then the amount of objects will remain very small.
But I also think you are right about the number of objects, even a very large amount of objects shouldn't be a problem.
There is a new version Dragon book being realest any daynow. Its called '21st Century Compilers'.
parza at 2007-7-15 23:18:13 >

Hi,I think that you could see javacc or jlex and CUP. But if you need designed a compiler you must buy a book like "Modern Compiler Implementation in Java" or similar (Dragon book ).thanks