CS160: Project 3 - Abstract Syntax Tree for CSimple (20% of project score)

Project Goals


The goals of this project are:

  • to build an abstract syntax tree (AST)
  • to get familiar with the AST classes of the project and bison's actions

Administrative Information


This is an individual project.

The project is due on Thursday, November 16, 2023, 23:59:59 PST.

Project Introduction


At this point, you should have a parser for CSimple programs! To make our future tasks (such as semantic analysis and code generation) easier, we want to build an abstract syntax tree (AST). This AST resembles a parse tree, but contains only the essential nodes, while redundant parts should be removed.

To make your life easier, we have included all the support classes you will need to build the AST. Your task is to read these files THOROUGHLY and to understand them. In particular, these are the files that contain the classes for the relevant AST nodes: ast.hpp and ast.cpp.

Tour of the Code


Again, we provide code that you should use as a starting point for your project. You can find the files here.

  • Makefile - Your makefile. You don't need to edit this file.
  • main.cpp - The main C++ file. You don't need to edit this file. The difference to the previous project is that main invokes the ast2dot visitor. This task of this visitor class is to traverse your AST and produce a dot file that allows us to draw the tree that you have generated.
  • ast2dot.cpp - This is a VISITOR class that we have included. This visitor walks through the AST and prints out a .dot file, which we can use to generate a tree (just like in Project 1). Do not modify this file.
  • primitive.hpp and primitive.cpp - These files contain the definition of two special classes. One class (called Primitive) stores the integer values of all the simple literals (character, integer, and boolean). The other class (called StringPrimitive) stores the values of string literals. These classes are used in the constructor of a few AST classes. You don't need to edit these files.
  • symtab.hpp and symtab.cpp - These files contain the code for the symbol table. The symbol table is not used in this project, but you will need the SymName class to store the name of symbol names (e.g., variable names, function names, ...). For example, when encountering an identifier in your parser, you might need something similar to new Ident(new SymName($1.u_base_charptr)); in your bison action. Do not modify these files.
  • lexer.l - The flex file that contains the regular expressions for recognizing your tokens. In a first step, edit this file and put in those expressions that you wrote for Project 2.
  • parser.ypp - The bison file that contains your grammar rules/productions. In a first step, you will need to fill into this file the grammar that you developed for Project 2. For later, also make sure that you include "symtab.hpp".
  • ast.cdef - The file that defines the classes of the AST data structure. We use this file to generate the classes of the AST. If you look at this file carefully, you will see that it closely resembles your grammar (or it should at least). Do not modify this file.
  • astbuilder.gawk - The script that generates the AST classes (using ast.cdef as input). You don't really need to know this file in depth. All you need to know is that this file creates the AST classes when you do a "make ast". Do not modify this file. In some cases (when you get an error after typing make), you need to make this file executable (chmod u+x astbuilder.gawk) after unpacking our files. Note that we already supply the necessary AST classes, but if you want, you can rebuild them from scratch with this script.

Steps to Solve the Challenge


  1. First, let's have a look at the AST classes in more detail. Open ast.hpp and locate the YYSTYPE data structure, which defines the union of all types that bison can pass between nodes in the parse tree. Also, look at some of the class definitions such as ProcImpl, Assignment, or Plus. Check the member variables of these classes. You will, for example, notice that ProcImpl looks like this:
       class ProcImpl : public Proc
       {
         public:
           SymName *m_symname;
           list<Decl_ptr> *m_decl_list;
           Type *m_type;
           Procedure_block *m_procedure_block;
           ....
           ProcImpl(SymName *p1, list<Decl_ptr> *p2, Type *p3, Procedure_block *p4);
           ....
        }
    
    This means that the AST node that represents a procedure has four children, one that points to the name of the procedure (SymName *m_symname), one for the variable declarations (list<Decl_ptr> *m_decl_list), one for the return type (Type *m_type), and one for the body of the procedure (Procedure_block *m_procedure_block). Furthermore, observe that this fits nicely together with the constructor of this class. That is, when you want to create an AST node that represents a procedure, you first need four other AST nodes that represent a name, a list of variable declarations, a return type, and the body of the procedure. When these four nodes are available, you can create a new AST ProcImpl node by simply invoking the constructor and passing the right arguments. This AST node represents a procedure in the source code.

  2. Understand the AST and all the classes that compose it. Really understand it. Read them thoroughly. Do not take this part lightly.

  3. Once you understand the AST classes, it is time to build your AST. We have said previously that the AST should probably be built from the bottom up, since the constructor of an AST class typically requires as arguments the pointers to the objects that represent the child nodes in the tree. This fits very well with the way that bison builds a parse tree, which is from the bottom up. Thus, to build an abstract syntax tree, you have to augment your grammar (in the file parser.ypp) with actions that build the abstract syntax tree. A bison action is a snippet of code that is invoked when the parser reduces a right-hand side of a production.

    You will recognize that you can create the necessary AST nodes and "pass" them up the tree as you parse the stream of tokens coming from the scanner. To this end, bison offers special variables: $$ and $1....$N. $$ is the return value of a rule. $1-$N are the attributes associated with each terminal/non-terminal on the right side of the rule. For example:
         A : B + C { $$ = $1 + $3; } 
    
    This little chunk of code (the action) will take whatever was passed up through B and C, add them, and associate it with A. If A is used again (higher up the tree/grammar), then that value will be attached to A. As you can see, this will allow you to create your AST objects, as some of them require other, lower AST objects in the constructor.

  4. For this assignment, you should build your AST slowly. DO NOT try to do everything at once. Somewhat non-intuitively, it might be easier to work from the top down (although the final AST will be built bottom-up). That is, you should first add AST nodes to your grammar that are high up in the tree (such as ProgramImpl and ProcImpl). Why is that? Because you can then immediately and easily start to print out the top parts of the AST tree, using the ast2dot visitor (which starts at the root node of the tree). For example, when you have added code to instantiate the classes ProgramImpl and ProcImpl, you should see an abstract syntax tree that has one root node (ProgramImpl) and, for each procedure, one ProcImpl child. Note that many AST classes require arguments for their constructors. When you start top-down, these arguments might not be available yet (since you haven't added the necessary code to the actions of all grammar rules). In this case, just pass a null value to the constructor of the class. However, you have to modify the class' visit_children method to make sure that it does not dereference a null pointer and crash the program. Of course, if you want, you can also start building your AST in a bottom-up fashion.

  5. Besides adding actions to your grammar (in parser.ypp), you also need to make small changes to your scanner (in lexer.l). More specifically, you need to be able to pass values (i.e., the content) of your primitives (integers, chars, booleans, and strings) and IDs up to the parser. To do this, you must assign values (the content of an ID or a primitive) to one of the yylval union members (yylval.u_base_int and yylval.u_base_charptr, which are of integer type and char* type respectively). To get the values, you must read the character array that matched the current rule "yytext". Boolean values should be given an int value, "0" for false and "1" for true. For characters, use their ASCII values.

What Your Compiler Has to Do!


  1. Your compiler must successfully parse any valid input file.
  2. Your compiler must generate the correct AST. If you have the correct associativity and precedence, your tree (the .pdf file generated from the .dot file, produced by ast2dot) will look exactly like ours. In main.cpp, you will see that we call a function ast2dot(). This will walk through the AST you just generated and print out a .dot file.
  3. We have made available a few test files (test1.lang, test2.lang, and test3.lang) and the corresponding trees - as PDF - produced by our compiler (test1.pdf, test2.pdf, and test3.pdf). Please check your program results against these outputs. If you find that there is a discrepancy, convince yourself that your tree is correct first, and then please let us know so that we can fix potential bugs in our reference implementation.

Deliverables


Like for the previous project, we are using Gradescope (and its auto-grader feature) to grade this assignment and your submissions.

  1. Once you are done with your scanner/parser, go to the third assignment and submit your code.
  2. For this project, please just submit your "lexer.l" and "parser.ypp" files. We supply the rest and build your project.
  3. We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
  4. You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.