|
CS160: Project 3 - Abstract Syntax Tree for CSimple (20% of project score)
Project Goals
The goals of this project are:
- to build an abstract syntax tree (AST)
- to get familiar with the AST classes of the project and bison's actions
Administrative Information
This is an individual project.
The project is due on Thursday, November 16, 2023, 23:59:59 PST.
Project Introduction
At this point, you should have a parser for CSimple programs! To
make our future tasks (such as semantic analysis and code
generation) easier, we want to build an abstract syntax tree
(AST). This AST resembles a parse tree, but contains only the
essential nodes, while redundant parts should be removed.
To make your life easier, we have included all the support
classes you will need to build the AST. Your task is to read
these files THOROUGHLY and to understand them. In particular,
these are the files that contain the classes for the relevant
AST nodes: ast.hpp and ast.cpp.
Tour of the Code
Again, we provide code that you should use as a starting point
for your project. You can find the
files here.
-
Makefile - Your makefile. You don't need to edit
this file.
-
main.cpp - The main C++ file. You don't need to
edit this file. The difference to the previous project is
that main invokes the ast2dot visitor. This
task of this visitor class is to traverse your AST and produce
a dot file that allows us to draw the tree that you have
generated.
-
ast2dot.cpp - This is a VISITOR class that we have
included. This visitor walks through the AST and prints out a
.dot file, which we can use to generate a tree (just like in
Project 1). Do not modify this file.
-
primitive.hpp and primitive.cpp - These
files contain the definition of two special classes. One class
(called Primitive) stores the integer values of all
the simple literals (character, integer, and boolean). The
other class (called StringPrimitive) stores the
values of string literals. These classes are used in the
constructor of a few AST classes. You don't need to edit these
files.
-
symtab.hpp and symtab.cpp - These files
contain the code for the symbol table. The symbol table is not
used in this project, but you will need the SymName
class to store the name of symbol names (e.g., variable names,
function names, ...). For example, when encountering an
identifier in your parser, you might need something similar
to new Ident(new SymName($1.u_base_charptr)); in your bison
action. Do not modify these files.
-
lexer.l - The flex file that contains the regular
expressions for recognizing your tokens. In a first step, edit
this file and put in those expressions that you wrote for
Project 2.
-
parser.ypp - The bison file that contains your grammar
rules/productions. In a first step, you will need to fill into
this file the grammar that you developed for Project 2. For
later, also make sure that you include "symtab.hpp".
-
ast.cdef - The file that defines the classes of the
AST data structure. We use this file to generate the classes
of the AST. If you look at this file carefully, you will see
that it closely resembles your grammar (or it should at
least). Do not modify this file.
-
astbuilder.gawk - The script that generates the AST
classes (using ast.cdef as input). You don't really
need to know this file in depth. All you need to know is that
this file creates the AST classes when you do a "make ast". Do not
modify this file. In some cases (when you get an error after
typing make), you need to make this file executable (chmod
u+x astbuilder.gawk) after unpacking our files. Note that
we already supply the necessary AST classes, but if you want,
you can rebuild them from scratch with this script.
Steps to Solve the Challenge
-
First, let's have a look at the AST classes in more
detail. Open ast.hpp and locate the YYSTYPE data structure, which
defines the union of all types that bison can pass between
nodes in the parse tree. Also, look at some of the class
definitions such as ProcImpl, Assignment,
or Plus. Check the member variables of these
classes. You will, for example, notice that ProcImpl
looks like this:
class ProcImpl : public Proc
{
public:
SymName *m_symname;
list<Decl_ptr> *m_decl_list;
Type *m_type;
Procedure_block *m_procedure_block;
....
ProcImpl(SymName *p1, list<Decl_ptr> *p2, Type *p3, Procedure_block *p4);
....
}
This means that the AST node that represents a procedure has
four children, one that points to the name of the procedure
(SymName *m_symname), one for the variable
declarations (list<Decl_ptr> *m_decl_list), one
for the return type (Type *m_type), and one for the
body of the procedure (Procedure_block
*m_procedure_block). Furthermore, observe that this fits
nicely together with the constructor of this class. That is,
when you want to create an AST node that represents a
procedure, you first need four other AST nodes that represent a
name, a list of variable declarations, a return type, and the
body of the procedure. When these four nodes are available, you
can create a new AST ProcImpl node by simply invoking
the constructor and passing the right arguments. This AST node
represents a procedure in the source code.
-
Understand the AST and all the classes that compose it. Really
understand it. Read them thoroughly. Do not take this part lightly.
-
Once you understand the AST classes, it is time to build your
AST. We have said previously that the AST should probably be
built from the bottom up, since the constructor of an AST class
typically requires as arguments the pointers to the objects
that represent the child nodes in the tree. This fits very
well with the way that bison builds a parse tree, which is from
the bottom up. Thus, to build an abstract syntax tree, you have
to augment your grammar (in the file parser.ypp) with
actions that build the abstract syntax tree. A bison action
is a snippet of code that is invoked when the parser
reduces a right-hand side of a production.
You will recognize that you can create the necessary AST
nodes and "pass" them up the tree as you parse the stream of
tokens coming from the scanner. To this end, bison offers
special variables: $$
and $1....$N. $$ is the return
value of a rule. $1-$N are the attributes
associated with each terminal/non-terminal on the right side
of the rule. For example:
A : B + C { $$ = $1 + $3; }
This little chunk of code (the action) will take whatever
was passed up through B and C, add them, and associate it with
A. If A is used again (higher up the tree/grammar), then that
value will be attached to A. As you can see, this will allow
you to create your AST objects, as some of them require other,
lower AST objects in the constructor.
-
For this assignment, you should build your AST slowly. DO NOT
try to do everything at once. Somewhat non-intuitively, it
might be easier to work from the top down (although the final
AST will be built bottom-up). That is, you should first add AST
nodes to your grammar that are high up in the tree (such
as ProgramImpl and ProcImpl). Why is that?
Because you can then immediately and easily start to print out
the top parts of the AST tree, using the ast2dot
visitor (which starts at the root node of the tree). For
example, when you have added code to instantiate the
classes ProgramImpl and ProcImpl, you should
see an abstract syntax tree that has one root node
(ProgramImpl) and, for each procedure,
one ProcImpl child. Note that many AST classes require
arguments for their constructors. When you start top-down,
these arguments might not be available yet (since you haven't
added the necessary code to the actions of all grammar rules). In
this case, just pass a null value to the constructor
of the class. However, you have to modify the
class' visit_children method to make sure that it does
not dereference a null pointer and crash the program. Of
course, if you want, you can also start building your AST in a
bottom-up fashion.
-
Besides adding actions to your grammar
(in parser.ypp), you also need to make small changes to
your scanner (in lexer.l). More specifically, you need
to be able to pass values (i.e., the content) of your
primitives (integers, chars, booleans, and strings) and IDs up to the parser. To
do this, you must assign values (the content of an ID or a
primitive) to one of the yylval union
members (yylval.u_base_int
and yylval.u_base_charptr, which are of integer type
and char* type respectively). To get the values, you must read
the character array that matched the current rule
"yytext". Boolean values should be given an int value, "0" for
false and "1" for true. For characters, use their ASCII values.
What Your Compiler Has to Do!
-
Your compiler must successfully parse any valid input
file.
-
Your compiler must generate the correct AST. If you have the
correct associativity and precedence, your tree (the .pdf file
generated from the .dot file, produced by ast2dot)
will look exactly like ours. In main.cpp, you will
see that we call a function ast2dot(). This will walk
through the AST you just generated and print out a .dot
file.
-
We have made available a few test files
(test1.lang,
test2.lang, and
test3.lang)
and the corresponding trees - as PDF - produced by our
compiler
(test1.pdf,
test2.pdf, and
test3.pdf). Please
check your program results against these outputs. If you find that there is a discrepancy, convince yourself that your tree is correct first, and then please let us know so that we can fix potential bugs in our reference implementation.
Deliverables
Like for the previous project, we are using Gradescope (and its
auto-grader feature) to grade this assignment and your submissions.
- Once you are done with your scanner/parser, go to the third assignment and submit your code.
- For this project, please just submit your "lexer.l" and "parser.ypp" files. We supply the rest and build your project.
- We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
- You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.
|