Assignment 5
GALA: GVM Assembly LAnguage
The purpose of this programming assignment is to give you programming experience with:
- Text file
- String processing
- object-oriented design (OOD)
- stepwise refinement of procedure definition
- the Map interface & the HashMap class
This assignment is more challenging than those that have come before. There are several reasons why this is so: I am not giving you any of the OOD; you will design it completely. I am not giving you any skeletons, as doing so would implicitly be giving design information. The logic, at points, is somewhat involved.
You will need to think carefully about both the OOD & the stepwise refinement of your methods. As a starting point for the former, the nouns describing the project are candidate objects; the verbs are candidate methods; the adjectives are candidate attributes. This however is only a starting point. Assigning to each object those responsibilities that are most suited to it materially simplifies your implementation. Try to discover the "nature" of the system you are designing. If it feels unduly complicated, it probably is.
Methods similarly should be carefully refined. Each method should be a short sequence of actions that all are at approximately the same level of abstraction. If a method is starting to feel long, you probably are rendering some of it in too much detail. Factor lower level details into helper methods. Method design, in this sense, is recursive. It is unlikely that your 1st design is the best. Stay flexible.
When you successfully complete this assignment, you will have earned the pride you will feel.
From Wikipedia's Assembly language page:
An assembly language is a low-level programming language for a computer, or other programmable device, in which there is a very strong (generally one-to-one) correspondence between the language and the architecture's machine code instructions. Each assembly language is specific to a particular computer architecture, in contrast to most high-level programming languages, which are generally portable across multiple architectures, but require interpreting or compiling.
Assembly language is converted into executable machine code by a utility program referred to as an assembler; the conversion process is referred to as assembly, or assembling the code.
Assembly language uses a mnemonic to represent each low-level machine instruction or operation. Typical operations require one or more operands in order to form a complete instruction, and most assemblers can therefore take labels, symbols and expressions as operands to represent addresses and other constants, freeing the programmer from tedious manual calculations.
GALA Specification
A GALA program is a sequence of GALA statements. GALA statements comprise sequences of strings from 3 syntactic categories:
-
opcode
- There is a unique GALA assembler opcode for each GVM instruction type (e.g., "set" for the machine set instruction whose GVM code is the integer 1). The set of GALA assembler opcodes is:- stop
- set
- load
- store
- add
- goto
- zero
- setcolor
- drawline
- drawrect
- drawoval
- filllrect
- filloval
-
number
- an integer represented as a finite string of 1 or more digits, possibly preceded by a minus sign ("-"). -
identifier
- a finite string of 1 or more letters (upper or lower case), digits, or underscore ("_"), whose first character cannot be a digit (e.g., "17" and "7x" are not identifiers; "_3B" is a valid identifier).
Each statement is of the following general form, where items within square brackets may or may not occur:
[<statement label>] <opcode> [<operand> [<operand>]]
The define instruction
This is the only assembler instruction (as opposed to machine instruction), & the only instruction with 2 operands.
The 1st operand is an identifier;
the 2nd is a number.
The meaning: associate that number with that identifier.
This is analogous to a Java static final int
constant.
For example, the analog of the Java statement
static final int IMAGE_SIZE = 600;
define IMAGE_SIZE 600;
An identifier, used as a symbolic constant, can be defined only once. If it occurs in another define statement, the duplicate use is reported by the assembler as an error. The statement that defines a symbolic constant must occur before any statement that refers to it.
define INCREMENT 20
define INCREMENT DECREMENT
Opcodes with 1 operand
The following opcodes have 1 operand: set, load, store, add, goto, & zero. The operand for each can be a number or an identifier.
If it is a number, it has the same meaning as the number operand of its corresponding GVM instruction.
For example, store 17
means store the contents of the accumulator (data memory location 0) into data memory location 17.
If the operand is an identifier, its meaning depends on the opcode. If the opcode is neither a goto nor a zero, then the identifier must be a previously defined symbolic constant; otherwise, it is an invalid statement. For example,
add INCREMENT
If the opcode is goto or zero, the identifier must refer to a statement label (i.e., it refers to the location of an instruction in the program memory). The referenced label may occur on a statement that comes after a statement that references it. Your GALA assembler is a 1-pass assembler: When it finishes reading & processing all the lines in the text file, it does not need to rescan them. The implication for forward statement label references is that, when a statement label is referenced (in a goto or zero statement), if it is not already defined, then the instruction that refers to it must be put in a set or list of such instructions. When the statement label is defined, the unresolved reference list is accessed. Every instruction on the list, if any, then can have its operand patched with the correct value: the program memory location of the instruction with the statement label.
Opcodes with 0 operands
The following opcodes have no operand: stop, setcolor, drawline, drawrect, drawoval, fillrect, & filloval. They simply translate into the corresponding machine instruction.
GALA Predefined symbolic names
Gala has the following predefined symbolic constants that reference standard data memory cells:
data memory cell | symbolic name |
---|---|
0 | ACC |
1 | X |
2 | Y |
3 | WIDTH |
4 | HEIGHT |
5 | RED |
6 | GREEN |
7 | BLUE |
Blank lines
Blank lines are ignored by the assembler.
The GALA Comment
Whenever the character "#" appears on a line, that character & all that follow are ignored by the assembler.
GUI controls
In addition to the controls on the GVM, now there is a Load JButton, a program JTextArea, & a console JTextArea. The program text area is where the file to be assembled is displayed. The console text area is used by the assembler to report its conclusions: success, failure, error messages, & assembly time.
The load button is how the user initiates the assembler. The procedure is as follows:
- Use JFileChooser to let the user select a text file to be assembled (see the course web page on Sample Code);
- The assembler processes each text line; it keeps track of the time to assemble the file, starting immediately after the user selects the file to be assembled.
- If the assembly is error-free:
- indicate this on the console text area, including the time to assemble the file;
- load the assembler's output, the program, into the GVM.
The Map interface & the HashMap class
You probably will want to create what often is referred to as a symbol table: a function from a string to its value, which, in the case of GALA, always is an integer. The Java Map is ideal for this purpose. Please look in your textbook, the Java tutorial, & the Javadoc for this interface & the HashMap class that implements the Map interface.
The String & Pattern classes
The process of doing this assignment will give you considerable experience with string processing in Java. Please look at the String class's methods to see which may be of use. I found, for example, both the trim & replaceFirst methods useful when deleting non-meaningful white space & comments, as preparation for subsequent processing.
Sample GALA test files
Below, 2 test files are given. The 1st is of a correct GALA program for producing a 17 X 17 checkerboard (with alternating squares & circles); the 2nd is a sequence of erroneous GALA statements that you can use to test your assembler's error-checking ability. We will use these files to test your implementation.GALA Checker board program
# GVM data memory addresses define DECREMENT 20 define ROW_COUNTER 21 define COLUMN_COUNTER 22 define DELTA 23 # GVM program constants define N 17 define DELTA_VALUE 40 set -1 store DECREMENT set DELTA_VALUE store DELTA store WIDTH store HEIGHT set N store ROW_COUNTER store COLUMN_COUNTER FillShape load RED zero IsRed # is black set 255 # make it red store RED setcolor filloval goto IfAdvanceColumn IsRed set 0 # make it black store RED setcolor fillrect IfAdvanceColumn load COLUMN_COUNTER add DECREMENT store COLUMN_COUNTER zero AdvanceColumn # Finished columns. Advance row? load ROW_COUNTER add DECREMENT store ROW_COUNTER zero AdvanceRow # Finished rows. stop AdvanceColumn load X add DELTA store X goto FillShape AdvanceRow load Y add DELTA store Y # reset column set 0 store X set N store COLUMN_COUNTER goto FillShape
Here is a screenshot of my [current] GALA assembler, in response to this file:
GALA program to test error-handling
define # too few operands define oneOperand # too few operands define 17 # not an identifier define SYMBOL 17 18 # too many operands define TYMBOL 17 xx # too many operands define 17 UYMBOL # 17 not an identifier 17xh # invalid label abs # no opcode xyz yzx # invalid opcode label add 20 # valid label store 16 # duplicate label 17x # neither an identifier nor an opcode set NotDefined # operand symbol is not defined goto UndefinedLabel # undefined label define xyzz 8g # 8g is not a number
You do not need to report all errors in a statement, only 1. Here is what my [current] GALA assembler puts to the console, in response to this file:
Starting assembly ... 1: The number of operands for this opcode is incorrect. 2: The number of operands for this opcode is incorrect. 3: The number of operands for this opcode is incorrect. 4: The number of operands for this opcode is incorrect. 5: The number of operands for this opcode is incorrect. 6: define opcode's 1st operand must be an identifier. 7: 17xh is an invalid statement label. 8: labeled statement is missing an opcode. 9: yzx is an invalid opcode. 11: label has been defined previously. 12: 17x is an invalid statement label. 13: NotDefined is undefined. 15: 8g is not interpretable as a number. UndefinedLabel is undefined in statements with the following numbers:14 BUILD FAILED. Assembly time: 0.01 sec. Number of invalid statements: 13
Implementation notes
- You may wish to prepopulate the symbol table with opcodes & predefined symbolic names (e.g., HEIGHT); these are invalid as statement labels & symbolic constants.
- You may find it useful to use place some GUI components in a JScrollPane, which gives the user scroll bars, when they are needed.
- When a load action starts, clear the JTextArea of the program listing by invoking
setText( "" )
.
Discussion of possible enhancements
- Disable Run/Step buttons until a load (assembly) has completed successfully. Disable Load button while running.
-
There is an easy way to test your regular expressions before plugging them into your program, exemplified by the main method below.
public static void main( String[] args ) { System.out.println("identifierRegex: '<your regex goes here>' " + Pattern.matches( <your regex goes here>, "identifiertest73" ) ); }
Please post any enhancements to the assembler that would you like to see on Discussion forum.
Rubric
Value | Aspect |
---|---|
5 | Correctly assembles the "checkerboard" program (the operational definition of correct: when the program runs, it produces the correct out) |
3 | Correctly assembles & diagnoses the errors of the error-handling test program |
2 | Javadoc & Style |
10 | Total |
For style, we want:
- Javadoc is tastefully authored according to standard guidelines; the web pages are sitting in your GitHub repository, in their own directory. Ensure that your @author, @param, @returns, & @throws annotations are completed properly.
- Consistent application of an indentation policy;
- Carefully chosen, meaningful identifier names;
- Use of symbolic program constants in place of literals;
- Thoughtful comments, where needed, expressed in concise, precise English.
- Well chosen OOD & assignments of responsibility.
- Methods are short, readable, & clear, using private helper methods to achieve this.