CS160: Project 2 - Scanner/Parser for CSimple (20% of project score)

Project Goals


The goals of this project are:

  • to implement the grammar for our "simple" programming language
  • to get familiar with front-end generators such as Lex and Yacc

Administrative Information


This is an individual project.

The project is due on Friday, October 28 2016, 23:59:59 PST (extended).

Project Introduction


We want you to build the first part of your compiler: the scanner and parser. To do this, we will be using lex (flex) and yacc (bison), common tools to build LR parsers. For the rest of this class, we will be focusing on a new language, which we call CSimple. You will be building a compiler for this new programming language. The language manual can be found here. This manual is going to be used for the rest of this quarter (and might be updated frequently). Always use this as the first and last authority on what your grammar should be.

Tour of the Code


Again, we provide code that you should use as a starting point for your project. You can find the files here.

  • Makefile - Your make file. You don't need to edit this file.
  • main.cpp - The main C++ file. You don't need to edit this file.
  • parser.y - The yacc file that contains your grammar rules/productions. At this point, it contains the grammar from the Project 1. You will need to edit it and take that out. Replace that grammar with the your own grammar for the language that we have defined for you in the manual.
  • lexer.l - The lex file that contains the regular expressions for recognizing your tokens. It now contains the tokens from Project 1. Once again, you will have to edit this file and replace these expressions with your own.
  • test.good.calc - A test file just so you can compile and run this project as it is (this is not a valid CSimple program, add your own test cases as you go along).

Steps to Solve the Challenge


  1. READ the manual for the language. Understand this language and its specifications.
  2. Go over the small example we have included. Make sure you understand how Lex and Yacc work together in the example. To familiarize yourself with Lex and Yacc, you should read up on them. Here is a good link: Dinosaur Compiler Tools that might help you out. Also, use Google and read the man pages.
  3. From the language specification, create a grammar which accepts all valid programs for our language. This is the crucial part. You must get the grammar correct here in this first part of the core compiler project. You will be building on top of this project, and your grammar must be correct. Test this thoroughly.
  4. Implement the scanner (in Lex) and make sure you account for all of the lexical patterns. Ensure that your scanner gives an error for dangling comments (comments not terminated before the EOF is reached), and make sure that it handles characters and strings correctly.
  5. Implement your grammar (in Yacc). Save time for this part, since you will likely have have to iteratively correct for errors.
  6. TEST: You can find some test files here. You will want to test your parser using these good and bad files thoroughly. You must also create your own test files. Make them as complete and complex as possible. To test your Lexer, put printf statements before you return something. This will tell you where your scanner stopped working AND which token you just failed in parsing. To test your Parser, put printf statements after each rule. This will make it easier for you to trace what your parser is doing. To run your program use:
    ./csimple < test.lang
    where test.lang is a test file. If you run Yacc with the -v flag, it will write the file y.output. It contains a readable description of the parsing tables (more specifically, a description of the LR(1) states and the items they contain). In addition, it will report where the conflicts or problems in the grammar appear. Make sure you get the Lexer working perfectly first! Lex allows you to execute C code when it matches a rule (AFTER it matches the rule). Simply print to stdout like you did for the previous project. You should get a stream of tokens.

What Your Parser Has to Do!


  1. Your parser should be able parse any valid input file from our language.
  2. You will need to catch ALL syntax errors.
  3. You will need to catch ALL program structure errors. By this I mean that your parser has to know that the keyword "procedure" ALWAYS precedes a procedure_id in a procedure declaration.
  4. You will NOT have to check that procedures and variables have been declared before you use them.
  5. You will NOT have to check that there is one and only one Main(). Remember that Main() is just a special procedure. At this point we don't care that it is special.
  6. You will NOT have to check that procedure_ids and variable_ids are used multiple times. So you could declare variable A multiple times and it would be okay at this point.
  7. You will NOT have to check the return types of procedures.
  8. In a nutshell, your parser looks at each line of code individually. It does not have global knowledge of variables or procedures....yet.

Deliverables / Turnin


Please follow the instructions below exactly!

  1. Your files must be in a directory named "parse".
  2. All files must be included (makefiles, everything!) in that folder.
  3. Your project must compile on a CSIL machine. If you worked on a Windows machine or your laptop at home, then make sure it still works on CSIL or modify it appropriately!
  4. Include a README with this project. Explain what you did in the README. If you had problems, tell us why and what.
  5. All errors (ALL OF THEM) go to stderr.

Use this command to submit your work: turnin proj2@cs160 parse

Grading


We will run your program against a number of test files that check for correct parsing of individual language features, in increasing levels of complexity. Your grade derives from the fraction of test files that you parse correctly. Remember that correctly parsing implies that you reject invalid input files and throw appropriate error messages.

Important Note: No README == No partial credit if the project does not work 100%