Thursday, August 29, 2013

Program Code – Internal Format

The internal program code of a BASIC program will consist of 16-bit instruction words.  Each instruction word will consist of two parts, the instruction code (command, operator, internal function, etc.) to perform, and the sub-code information that will only used to recreate the original program text (with the Parentheses, Colon and Let sub-codes).  The sub-code information will not be used by the run-time module, but there will be a few exceptions (the Question and Keep on the various INPUT statement codes).

Some instruction words will have a second 16-bit operand word, which could contain one of three types of information depending on the instruction code.  For instructions that are variables, arrays, constants, remarks, define functions, user functions, etc., this second word will be an index into one of the dictionaries.  For single line structure statements (an IF statement for example), the second word will contain an offset to where to jump to.  For example, in an IF statement followed by a set of commands to execute upon a true expression, the offset will tell the IF command how many words to skip when the expression is false.

The final type of information in the operand word is a block number, which will be used on multiple line structure statements.  For example, an IF/END IF structure over several lines, both the IF and END IF commands will have the same block number.  Structured block will probably also have a dictionary, so technically this operand type is also an index.  The dictionary entry for a block will contain the locations of the IF and END IF statements.  When running, if the IF expression is false, it will go to the dictionary for the block number to find out where the associated END IF is located and jump the instruction after it.

Encoder – Introduction

As mentioned a while ago (March 25, 2011), the translation of more BASIC commands is being postponed so that the other modules can be developed.  The translation of enough commands with expressions has been implemented (INPUT, LET, and PRINT) to make a useful, though very limited, BASIC program (limited by the lack of conditionals and loops).

These modules include the encoder to convert a translated program line into the internal program code, the recreator to convert the internal program code back to program text, and the run-time module to execute the internal program code.  Once initial versions of these three modules are complete and connected to the GUI, additional commands will be implemented one at a time for each of the four modules.

Only certain elements of the BASIC language will be implemented initially to simplify development of the remaining modules.  This includes just simple variables and constants, with arrays, defined functions and user functions implemented later.  Variables come from the identifier with no parentheses token type, which could also be user functions (either a call to a function with no arguments, or an assignment of the function return value inside the function).  For now this token type will be assumed to be a variable until functions are implemented.

A major part of encoder development is to define the internal program code format, the code that will be stored in the program and executed by the run-time module.  The other major part are the dictionaries that will hold the information about variables, arrays, constants, remarks, functions, etc., which will be referenced from the program code.  For example, the actual names (variables, functions, etc.) are not be stored in the internal program, but in a dictionary and the program code contains references to these dictionary entries.

A minor part of encoder development (not needed by the final application), is the conversion of individual instructions of the program code into text.  This is similar to conversion of the translated tokens into text for output by the command line test mode, or in the GUI program view.  The GUI program view translator output will be replaced by the encoder output, which will have the same reverse polish notation layout as the translator output.