Project Goals
The goals of this project are:
- to learn how to write a simple parser
- to develop a simple calculator
Administrative Information
This is an individual project.
The project is due on Tuesday, October 24, 2023, 23:59:59 PST. There will be no deadline extensions.
Project Introduction
In this assignment, we are developing a simple calculator. The
goal of this project is to demystify all of the front-end stuff
that a compiler does. Everything you need is right
in calc.cpp - one file makes all. The other stuff is
just to help you develop and test your calculator. We have not
used any complex tools to do things automatically, everything is
in very vanilla C++ and the code should be fully commented and
pretty readable. You _need_ to know what everything
in calc.cpp does before you can start writing
code. There should be a "WRITEME" everyplace in the code that is
in need of your assistance. However, the version that we provide
to start will compile and even run (although it does not
actually scan a file, and the grammar it parses is trivial). The
grammar we want our our simple calculator to recognize is as
follows:
List -> List Expr T_period
| Expr T_period
Expr -> Expr T_times Expr
| Expr T_plus Expr
| Expr T_minus Expr
| T_minus Expr
| T_num
| T_openparen Expr T_closeparen
| T_bar Expr T_bar
While the grammar above is ambiguous (in the technical sense) in
many many ways, it will recognize any program written in our
calc language (in other words, it should be able to tell if it
is syntactically correct or not). You will need to change the
grammar around so that it recognizes the same language but is no
longer ambiguous. Furthermore, it should be LL(1) so we can
parse it with recursive decent. Of course, your parser must also
correctly handle associativity and precedence. Especially for
associativity, this can be a little tricky.
Tour of the Code
The tarball with a skeleton of calc.cpp (and some
additional files) can be
downloaded here. You
will see that the code in calc.cpp is divided into four
major parts and three classes. The first part contains some
enums and helper functions to aid in dealing with tokens,
non-terminals, and printing out all that stuff. Once you figure
out the grammar that you are going to use, it should be pretty
straightforward to add the other non-terminals into the
appropriate enum and helper functions.
The second chunk of code is the scanner (which is the C++ class
scanner_t). The scanner should handle reading
the input from stdin and identifying the appropriate tokens. The
interface listed should be supported because that is how the
recursive decent parser will actually be getting tokens. It does
not have to be a big state machine or a regular expression
... just something that is coded by you and that works. Make
sure you test your scanner well before you move on. The last
thing you want to be doing is trying to track down a weird
scanner problem in your parser. In addition to finding the
tokens, you should also keep track of newlines so that you can
find any syntax errors your parser says it finds (it does this
by calling get_line(), which should return the
number of the line of input that is now being scanned). The code
that is in there to start is just stub code so that everything
compiles and you actually get some visible output (the scanner
is just returning either a "+" or "eof" randomly). For the basic
part of the assignment, you do not need to handle any attributes
(such as the actual value of a number token).
The third chunk of code is the class that draws a parse tree
(called parsetree_t). The class parsetree_t
need not and should not be modified. All it does is print out
the parse tree as you discover it. It prints the tree out in a
format readable by the program "dot" (from the graphviz
package), which can turn it to a PDF file (the makefile shows
how to do that). The output is really a bunch of nodes and
edges. When you start processing a non-terminal, you push it on
a stack (this will draw an edge from that newly pushed node to
the current node on the top of the stack, which is its
parent). When you finish, just pop it. The parse tree that you
generate should be super helpful for debugging purposes.
The fourth chunk is the parser itself
(class parser_t). There are already some helper
methods provided for you, but you need to figure out how to
structure your grammar such that it can be written as a
recursive decent parser (more on that later). The code that is
there now will parse the grammar "List -> '+' List | EOF". If
you run
parse(), it will call List(),
which recursively calls itself. As List() is
executed, it calls the scanner to get new tokens, and it
calls parsetree_t to actually print out the
parse tree.
Steps to Solve the Challenge
- Get the scanner working:
Implement the scanner class and pass in some of the test inputs
(call make test_parse). You will need more test
inputs, the ones we are giving are just some examples. When
testing the scanner, we suggest to instantiate and call the scanner
directly from main, so that there is no parsing
involved. Then, check that the tokens returned are correct by
printing them out (call token_to_string on them
and printf them).
- Modify the grammar to handle precedence correctly:
The
grammar presented above is ambiguous and requires
modification. While you need to modify the grammar by hand, we
have included a second set of files to help
you test your modified
grammars. calc_def.l is a lexer
and calc_def.y is a parser written for flex
and bison respectively (you cannot use them in your
final code, but you can use them to help you write and
understand your grammar). If you look at
calc_def.y, you will see the ambiguous
grammar. If you call make, it will compile it
to calc_def, which you can then run on input files!
You will see which expressions parse and which cause syntax
errors. It works even though it is ambiguous (the
"shift/reduce warnings" are warning you that the grammar is
ambiguous). You can modify that grammar and then test that it
still recognizes the exact same set of programs (and find a
syntax error on the same set too). If your new grammar is
unambiguous, you should see no shift/reduce
warnings (or any other types of warnings). However, just
because your grammar does not have shift reduce errors, does
not mean it correctly handles precedence.
-
We use the standard precedence for operators (same as for C):
Multiplication has a higher precedence than addition and
subtraction, addition and subtraction have the same
precedence, and all three operators are left associative.
-
The bar characters are used like parenthesis, but they compute
the absolute (positive) value of the expression between the
"opening" and the corresponding "closing" bar. That is, |2| =
2, and |-2| = 2.
- Modify the grammar to be LL(1):
Again, you should test your grammar with bison (the .y file)
to make sure you did not break anything in the process (you
want to start Step 4 with the correct grammar instead of
finding problems there).
- Get the parser written:
Now that you have a grammar ready to go, start writing the
parser (by adding new methods to parser_t). This step
should actually be very easy if you did the previous three
steps correctly. Do not skip this part! The whole point of
this assignment is to get you familiar with scanning AND
parsing. Solving the calculator problem is just an
exercise.
- Make sure that you check for errors:
You will need to
return an error as soon as possible. That means that if epsilon
can be derived from the non-terminal you are working on, you
need to check the following token to make sure that it is
allowed to appear after the non-terminal that you are currently
examining. To ensure that our automated grading system correctly
handles your submission, please follow the following
required guidelines for your program's output:
-
When you detect a scanner or a parser error, print to
stderr the line where the error is detected. Your
line counter must start at 1. Also, the program must exit
with an exit code that is 1 for a scanner error and 2 for a
parser error.
-
If there is no error and your parser can correctly process
the input, you must exit with an error code of 0. The
generated dot file should be printed to stdout.
-
If you have implemented the full calculator (that is, you evaluate
expressions -- see below), the results for all calculations
(one for each "Expr") should be printed to stderr,
one result per line.
-
Nothing else must be printed to stderr or
stdout.
Getting to this point gives you full credit on
this assignment.
- Make it work:
If you want that +5% extra credit, and if you have Steps 1 through
5 done and rock solid, Step 6 is to finish the calculator (so
that it really does calculations). The calculator should
simply print out the signed integer that the expression
evaluates to (you can assume that the integer does not leave
the value range of what can be represented in a 32-bit
integer). Note that handling associativity is the
trickiest part here, since your LL(1) grammar likely does not
produce the correct associativity. Thus, the parser must do
some intelligent things to compensate for that. If you don't
do this part at all, we won't be insulted in the least, it is
more related to the later material and I won't cover it for
this project. However, we are reserving this extra credit for
those students that want to figure it out for themselves and
build something that is actually functional.
Deliverables
We are using Gradescope (and its
auto-grader feature) to grade this assignment and your submissions.
- As a first step, make sure that you received the invitation email and can properly log in.
- Once you are done with your scanner/parser, go to the first assignment and submit your code. For this, just submit your "calc.cpp" file.
- We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
- You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.
- Once you have the scanner and parser work, consider adding the calculator functionality and go for the extra credit!