In this assignment you will add a checking pass to your compiler - an algorithm that traverses the syntax tree and decides if it is legal code.
For this assignment, you will be downloading a compressed "tar" (tape-archive) file which contains all the files you will need. Keep all the files in a single directory.
Once you've downloaded the compressed tar file, execute the following commands:
% gunzip pa4.tar.gz % tar xf pa4.tarThis will create a directory called
pa4 and will put in it all the files you need. You can then remove the pa4.tar file.
You'll notice that we have now moved to a structure where each Java class is in its own source file. This is because of the growing interdependence between classes. For example, in this project, the classes SymTable and Entry refer to Node, and most of the Node classes will refer to those classes.
There is only one part to this assignment; you don't have to do it using both the top-down and bottom-up parsers. We will continue with the top-down parser. Included in the tar file is a correct solution to Programming Assignment 3. It has some added features that will be discussed below.
There are two equally reasonable ways to proceed with the project: the procedural approach and the object-oriented approach. We will be using the OO approach. At the end of this document is a discussion of how these approaches differ.
The program you turn in should check an input Decaf program for the following properties:
class MyClass
{
static int MyClass() { } // Decaf error! (Java actually allows this.)
static int f(int MyClass) // Decaf error! (Java allows this too.)
{
int MyClass; // Decaf error!
int f; // Decaf error!
int g; // This is OK...
int g; // ...but this isn't.
{ // A nested block.
int g; // Decaf error! (Java error too.)
}
}
}Algebra.decaf file, isPrime() could call gcd().int or boolean. But since this is already checked by the Parser, you don't need to do anything about it here.return statement. This means there must be an unconditional return statement, or there must be an if statement that has an else part, and all branches of the if statement have return statements.int or boolean in the current scope or higher.int or boolean in the current scope or higher.if statement or the requirement of a while statement must be boolean.ArithmeticExpr node, both its left and right child must be expressions of integer type.ConditionalExpr node, both its left and right child must be expressions of boolean type.EqualityExpr node, its left and right child must have the same type, and that type must be integer or boolean.That seems like a lot, but using a few basic ideas about symbol tables and stacks, it can be done relatively painlessly. Notice that once parsing is successful for an input program, your code should report all errors.
Download the tar file and un-tar it as described above. Now execute the following instructions:
% javac *.java % java Main Algebra.decaf
Both commands should return with no errors - in fact no messages of any kind.
OK, so what do you have to do? The actual code in Main.java looks like this:
Scanner S = new Scanner(args[0]); Lexer L = new Lexer(S); Parser P = new Parser(L); ClassDecl CD = P.parseClassDecl(); CD.check(); // This is the only new line of code!
So we're sending the check message to the top node of the syntax tree created by the parser. You will be adding to each node class a method that checks that node for semantic correctness. In the case of the ClassDecl node, that method has been completed for you, and looks like this:
Checker.table.openScope(); name.register(this, Type.CLASS); methods.registerNames(); methods.check(); Checker.table.closeScope();
First we tell the symbol table that we're entering a new scope. Then we tell it to register the name of the class itself. Now if any subsequent variables use this name, it will already be in the table and you should report an error. Then we register the names of each method in the symbol table. This is necessary because you can call a method that gets declared later in the file. Then we send the check message to the methods, and close the scope.
This is the pattern: for each type of node in the syntax tree, you check its children, and do some checking on the node itself. So to start the assignment, you now have to look at the code for MethodDecl. In fact, that class' check() method is also already finished for you. But you have to look at it to see what other check() methods it calls. Continuing this way you will eventually look at each node class and fill in its check() methods.
Some basic rules to follow when writing your checking code:
check() method defined, but many of them need finishing. In every method body are comments telling you what you need to do there.FormalParams, MethodDecls, etc.) have already implemented the check() method to just pass on the call to each of the elements in their list. You don't need to add anything.Expression has an extra field, called type. At the end of the check() method you set this field with a number from the Type.java file. After that you can use the value in this field to do type checking.Statement defines a method returnExists() and returns the value false. You need to override this method in subclasses of Statement. What this method is supposed to do is tell you if that statement is or contains a return of the right return type. The right return type is given as an argument to the returnexists() method when it is first called in the check method of MethodDecl class.Id node in a declaration, as the name of something, you have to call its register() method. Later when you see an Id in an expression you instead call its check() method. The first one wants there not to be a matching entry already in the symbol table; the second one wants there to be one.You are provided also with various helper classes. For convenience while browsing this HTML document, the source code for these classes exists in an un-tarred form. But don't download these classes from the links; you already get them when you download the tar file.
Type.java just contains some integer definitions that we will use to describe the types of variables and expressions in Decaf code.SymTable.java contains a simple symbol table class. The table keeps a stack of entries, that is it stores them in last-in first-out order. This corresponds to the nesting of scopes in a Decaf program. As you traverse the syntax tree of your Decaf program, each time you find a new method or variable declaration you enter it into the symbol table. When you enter a new scope, you call the openScope() method of the symbol table. When you leave a scope, you call closeScope(), which pops from the stack all the variable names local to that scope. If you want to know if a name has been declared, and what its node is, you call the lookup() method of the table, which returns the entry if it exists, or null if not. This table helps you check most of the conditions described above.Entry.java describes the structure of a single entry in the symbol table. These are the things that get pushed and popped from the table. An entry keeps track of the spelling of an identifier, the Node where it is declared, and its type (a number from Type.java). If the identifier is the name of a method then returntype field stores its returntype.Checker.java is just a place to keep what might be globals if we were working in C or C++. The class keeps our symbol table, and provides a few methods for formatting error messages.There are a few additions to the Parser we built in the previous assignment. The most significant is that Nodes now have a field that keeps their range (location in the source program). This makes the Checker's error messages a lot more useful. To fill these values in, the Parser has to do some more work too, saving the locations of Tokens as it parses.
There is also a new method in the NodeList class called length() that tells how many elements in the list. This will be useful when you have to count parameters in parameter lists.
The following argument compares the procedural and OO programming styles. You do not have to read it to do the project. It should give you a good insight on different ways of implementing a type checker.
The procedural approach resembles closely things we've done before. (Recall that we won't be using this way; it's just for illustration.) You would use a framework just like the one used in the top-down parser. That is, you would have a class called Checker, which has methods like checkClassDecl(), checkReturnStmt(), checkEqualityExpr(), and so forth. You could almost just replace every parseXXX() method in the parser with a checkXXX method in the checker. It would look something like this:
class Checker
{
SymTable table;
// ... lots of methods omitted.
void checkMethodDecl(MethodDecl md)
{
table.openScope();
checkFormalParams(md.params);
checkStatement(md.body);
table.closeScope();
}
void checkMethodDecls(MethodDecls mds)
{
for (MethodDecl md = mds.first; md != null; md = md.next)
checkMethodDecl(md);
}
void checkClassDecl(ClassDecl cd)
{
table.openScope();
registerName(cd.name, cd, Type.CLASS);
checkMethodDecls(cd.methods);
table.closeScope();
}
}
At each step you just check each part of the node you're passed as a parameter. So what's the drawback? Consider the routine for checking statements:
void checkStatement(Statement s)
{
if (s instanceof Block)
checkBlock((Block) s);
else if (s instanceof LocalVarDecl)
checkLocalVarDecl((LocalVarDecl) s);
else ... etc. ... four more cases
}
Whenever you see code like this, alarm bells should go off in your head. This is the sort of thing that polymorphism is supposed to take care of in object-oriented programming. Using that approach, you define a check() method for each type of syntax node. In other words, each type of node knows how to check itself. Then when you have a statement s that you want to check, you just say
s.check();
That takes the thirteen lines of code in checkStatement() and reduces them to one line. At runtime, when this call is made, the appropriate sort of checking will be done for each type of node, even though at compile time all you know is that s is some sort of Statement.
Comparing the tradeoffs between the two approaches:
instanceof operations.The problem with node classes growing in size can be solved by using the Visitor design pattern. But that's for a more advanced class.