This manual describes the syntax and (some) semantics of CSimple, a simple procedural language. Please note that some of the examples in this description might be legal syntactically, but are not semantically. You should be mindful of those cases.

Lexical Description

Keywords Lexemes

Operator Lexemes

We support the following operators, which follow the operator precedence table from the language C:

Literal Lexemes

Other Lexemes

Lexem Use Example
; Each statement ends with a semicolon i = 0;
: Variable declarations use colons var i : integer;
, Used in parameter lists procedure foo(i, j : integer) {}
| For integer expression: Absolute value of i |i|
| For strings: Declared length of string s |s|
{ Start block of code
} End block of code
( Begin parameter list
) End parameter list
[ Begin string (character array) index
] End string (character array) index

Description of Program Structure

Comments

Comments in this language are block comments (C-style). The form is /% comments %/.

/% this is my comment %/

Incorrect (illegal):

/* wrong language */

Programs

A program is composed of many procedures, just listed one after another. Every legal program should have one and only one procedure: 'Main()'. This is case sensitive, so main() is incorrect. Of course, a program can have user defined procedures too. Any procedure must be defined before the point of call. The return value of a procedure must be assigned to a variable of the appropriate type.

procedure foo() return integer
{
    return 0;
}

procedure Main() return integer
{
    var a : integer;
    a = foo();
    return 0;
}

Incorrect (illegal): foo's return value is not assigned to a variable.

procedure foo() return integer
{
    return 0;
}

procedure Main() return integer
{
    foo();
    return 0;
}

Incorrect (illegal): foo is used before it is declared

procedure Main() return integer
{
    var a : integer;
    a = foo();
    return 0;
}

procedure foo() return integer
{
    return 0;
}

Procedures

Procedures are declared as:

"procedure" procedure_id "(" parameter_list ")" "return" type "{" body_of_procedure "}"

Note the placement of the "()" and "{}" symbols. These must go exactly there. procedure_id is the name of the procedure and must follow the keyword "procedure". Read below for more details. parameter_list are the parameters you have declared for the procedure. This list can be empty. The types of the procedure arguments must be either boolean, char, integer, charptr, or intptr. type is the type of the return value and must be either boolean, char, integer, charptr, or intptr. body_of_procedure contains procedure declarations, variable declarations, and statements.

You may declare one or more procedures inside the body of a procedure, thus, nested procedures are possible with this language. The last statement in a procedure must be a return statement, and it cannot be within a code block (and, as a result, not inside a if/ifwithelse/while block). Note that a procedure body must contain at least one statement.

procedure foo(i, j, k : integer) return integer
{
    procedure fee(l, m, n : integer) return boolean
    {
        return true;
    }
    return 0;
}

The procedure_id can be any string starting with an alpha character (upper or lowercase letter) and can contains digits, "_", or other alpha characters.

procedure foo() return integer { return 0; }
procedure foo_2() return integer { return 0; }
procedure f234() return integer { return 0;}

Incorrect (illegal):

procedure 9foo() return integer { return 0; }
procedure _rip() return integer { return 0; }

A parameter_list is somewhat complicated. You can pass multiple types of variables, and as many variables as you want when you declare the procedure. However, you must list the same variable types together and separate them with a comma. You must separate different types with a semicolon. The correct syntax is as follows:

"("  IDA_0  ","  ...  ","  ID0_N  ":"  TYPE1  ";" IDB_0  ","  IDB_N  ":"  TYPE2;  ...  TYPEN  ")"

Notice that the last type does not have a semicolon after it. If you only pass in one type of variable, you would not need to have a semicolon and putting one in should produce an error.

procedure foo(i, j, k : integer; l, m, n : boolean) return integer { return 0; }
procedure fee(a, b : integer) return integer { return 0; }
procedure fei(a, b, c : integer; d, e, f : boolean; g, h : integer) return integer { return 0; }

Incorrect (illegal):

procedure foo(i, j, k) return integer { return 0; } /% no type defined %/
procedure foo(i j k : integer) return integer { return 0; } /% IDs must be separated by comma %/

Procedure Body

The body_of_procedure can contain nested procedure declarations, variable declarations, and statements (they MUST appear in that order!). This makes our language very much like C, because you must declare everything first.

procedure foo(i, j, k : integer) return integer
{
    procedure square(t : integer) /% procedure declarations %/ return integer
    {
        var temp : integer;
        temp = t* t;
        return 0;
    }

    var total : integer;          /% variable declarations %/

    total = 1;                    /% statements %/
    return 0;
}

Variable Declarations

Variables are declared in the following syntax:

"var"  ID1  ","  ID2  ","  ID3  ","  ...  ","  IDN  ":"  TYPE  ";"

Variables must first be declared before they can be assigned. This is the only way to declare them.

var i : integer;
var m, n : boolean;
var c : char;
var s : string[20];

Incorrect (illegal):

var i = 5 : integer;

Strings (character arrays)

Arrays are declared with the following syntax:

"var"  ID1  ","  ID2  ","  ID3  ","  ...  ","  IDN  ":"  "string"  "["  INTEGER_LITERAL  "]"  ";"

Strings can be assigned as a normal variable. You can also assign string literals to string variables. Individual string elements can be assigned character values, or they can be used as part of an expression. Their indexing element is also an expression. By using the bar |s|, one can compute the length of the string as it was declared.

var a, b : string[100];
var c : char;
var i: integer;
c = 'e';
a[19] = 'f';
a[4+2] = 'g';
b = a;
b[3] = c;
a = "test";   /% basically equivalent to a[0] = 't'; a[1] = 'e'; a[2] = 's'; a[3] = 't'; a[4] = '\0'; %/ 
i = |s|;      /% this assigns 100 to variable i, since the length operator returns the size of the character array %/

Essentially, a string element is exactly like a character type and the string variable itself is simply a new type. The following are not legal uses of strings:

Incorrect (illegal):

var a, b : string[100];
var c : char;
    c = 'e';             /% everything up to this is OK %/
    c = a;               /% type mismatch, can't assign string type to character type %/
    (a + 4)[0] = 'e';    /% cannot add anything to array elements - they are not pointers %/

Statements

Statements can be many things: an assignment statement, a function call statement, an if statement, an if-else statement, a while statement, or a code block.

The syntax for an assignment statement is:

lhs "=" expression ";"
lhs "=" STRING_LITERAL ";"

Here, lhs -- which stands for left-hand side (of the assignment) -- specifies all legal targets of assignments. Specifically, our grammar accepts three different lhs items:

   x = expr;                /% lhs is variable identifier %/
   string[expr] = expr;     /% lhs is string element %/
   ^ptr = expr;             /% lhs is dereferenced pointer %/

We cannot assign values to arbitrary expressions. After all, what sense would make a statement such as

(5+2) = x;

Thus, we have to limit the possible elements that can appear on the left-hand side of the assignment as discussed above.

The right-hand side of assignments is less restrictive. It includes expressions, as well as string literals (we have to mention string literals explicitly, since strictly speaking, they are not expressions).

A code block starts with a "{" and ends with a "}". It may contain variable declarations and statements (again, in this specific order). Both variable declarations and statements are optional. Thus, a code block can be empty. Of course, since a code block is a statement, code blocks can be nested within code blocks.

procedure foo() return integer
{
    var x : integer;
    { 
        var y : integer;
        x = 1;
        y = 2;
        {
            x = 2;
        }
        y = 3;
     }
    return 0;
}

procedure foo() return integer
{
    {
        {}   /% empty code blocks are okay, although not very useful %/
    }
    return 0;
}

Incorrect (illegal):

procedure foo() return integer
{
    var x : integer;
    { 
        x = 1;
        var y : integer;  /% must declare all variables before any statement %/
    }
    return 0;
}

The syntax for a function call statement is:

lhs "=" procedure_id "(" expression0 "," expression1 "," ... expressionN ")" ";"

The syntax for if, if/else, and while statements is shown below. Note that for if, if/else, and while blocks of code, the "{" and "}" are REQUIRED, unlike in C. Treat these blocks of code the same as a block of code for a procedure declaration. The only difference is that a block of code can be empty.

"if"  "("  expression  ")"  "{"  body_of_nested_statement  "}"

"if"  "("  expression  ")"  "{"  body_of_nested_statement  "}"
"else"  "{"  body_of_nested_statement  "}"

"while"  "("  expression  ")"  "{"  body_of_nested_statement  "}"

Here, body_of_nested_statement is similar to a code block, in that it may contain variable declarations and statements (in this specific order). The body of a nested statement can be empty.

Return Statement

The last statement in a procedure must be a return statement. The syntax for the return statement is:

return expression

As mentioned previously, a return statement cannot occur within an if, if/else, or while block. The type of the expression of the return statement must match the return type declared for the procedure.

procedure foo() return integer { return 0; }
procedure foo_2() return integer { var a: integer; a = 2; return a; }

Incorrect (illegal):

procedure foo_3() return integer { return foo(); }
procedure foo_3() return integer { return true;  }
procedure foo_3() return integer { if (true) { return foo(); } }

Expressions

An expression's syntax is as follows:

expression operator expression
   OR
operator expression

This implies that expressions are recursive by nature. Look at how statements are defined though! Statements and expressions are not equivalent! Expressions can be just IDS or certain LITERALS (integers, characters, bool, or the null pointer). These examples assume you have declared each variable and procedure already. Expressions and an assignment statement are NOT equivalent! Operators have the same precedence as in C/C++.

Expressions

3 || 2
(3 + 2) / 3 - 5 * 2
true && false || false
5
0x012
true
-5
^x
^(p+5)
!false
Incorrect (illegal):
a = b
i = j = k = 2
&(x + y)

Procedure Call

a = foo(i, j); /% 'a' has been declared already %/

if/else/while statements

if(3 > 2)
{
    /%...statements...%/
    i = 5; /% i has been declared above%/
}
if(true) { j = 3; }
else { k = 4; }
while(true) { l = 2; }
Incorrect (illegal):
if(true) i = 5;
if(true) { j = 3; }
else x = x -1;
while(false) x = x + 1;

Pointers

Note that pointers require some special attention: you cannot take the address of just any expression. This is the case because an expression might not actually have a memory address where it is stored. For instance, &(5+3) is undefined, because the result 8 does not have to be stored in memory but could be stored in a register instead.

Therefore, we are allowing the use of the address of operator (&) only on variable identifiers and string (character array) elements. When you take the address of a variable, you can use the result in an expression. However, you cannot take the address of an arbitrary expression.

When taking the address of a string, indexing is required (&string is illegal, but &string[0] is legal). Note that the type of &string[0] is charptr.

Our language also supports some pointer arithmetic for char pointers: you can add and subtract from a pointer. If you add or subtract to a char pointer, then you should advance to the next or previous character respectively. We do not support pointer arithmetic for pointers to integer. Also, you cannot multiply a pointer with a value or a variable. When you add the result of an expression to a charptr (or subtract an expression from a charptr), the resulting type is still a charptr.

If you perform pointer arithmetic and you point outside of your allocated string, then the behavior is undefined.

null assigns a value of 0 to a pointer. Note that this is very different from assigning the value 0 to an integer that an intptr might reference! Instead, it means that the pointer does not point to any legal variable / value. When you dereference the null pointer, the result is undefined, and your program likely crashes (with a null pointer exception).

You can compare two pointers. In this case, you don't compare the values that the pointers reference. Instead, you compare the memory addresses that they point to. When two pointers reference the same variable (the same memory location), then a comparison operation yields true, false otherwise. You can also compare a pointer with null to check if it is valid.

var x : charptr;    /% x is a pointer to a char variable %/
var x : integer;
var y : intptr;
x = 5;
y = &x;
x = 6;
var x : charptr;
var y : string[10];
var z : char;
y = "foobar";
x = &y[5];          /% x points to 'r' %/
z = ^(x - 5);       /% z is 'f' %/
y = "barfoo";       /% z is still 'f', but x now points to 'o' %/

Incorrect (illegal):

var x : booleanptr; /% no such pointer type %/
x = &(1+3)
var x : char;
var y : intptr;
y = &x;             /% address of x is of type charptr %/
var x : charptr;
var y : char;
x = &(&y);          /% can only take the address of variable or array element, and (&y) is an expression %/