Sunday, May 23, 2010

Internal Code Format

The encoded BASIC program will be stored in memory in a very compact form. It will not be any kind of linked list of tokens like come out of the Translator. A linked list would have way too much overhead during execution and would waste a lot of memory. Instead, the program will be very simple format where each code will take up a single 16-bit word.

A 16-bit word has plenty of space to support all the codes, which currently stands at 137 (there will be lot more, but far less than the 65,536 numbers possible in a 16‑bit word). This handles the actual codes (operators, internal functions, and commands), but what about entered identifiers like variables, arrays, define and user functions, constants, and so on?

Entered identifiers will have an index value into a table, known as the Dictionary, which will contain the information about each identifier (like name, data type, number of array dimensions, size of array dimensions, number of arguments, argument data types, constant values, etc.). The index to the Dictionary entry will be stored in the program. During run-time, much of the information in the Dictionary will not be used.

These indexes will be preceded by a code. For example, one of these codes will push a double variable value onto the evaluation stack. The routine for this code will know to get the next 16-bit word that will contain the Dictionary index. Using a 16-bit word for the index implies a maximum of 65,536 Dictionary entries. This should be sufficient since subroutines and functions will each have their own Dictionary, with each having a maximum 65,536 Dictionary entries.

The 16-bit code word has more bits than are needed for the code value. The extra space can be user for other things. One use would be for information like the presence of unnecessary parentheses or the optional LET keyword. This information will be called a sub-code and ignored by run-time module, but would be used by the Recreator. The detail of the internal program doesn't need to defined right now, just the knowledge that there will be a sub-code present in the internal code. Next, what this means for the Translator...

Translator – Commands (Introduction)

Now that expressions have been fully implemented in the Translator (at least until more language features are added), it's time to start implementing the BASIC commands. In fact, one command has already been implemented, the assignment statement without the optional LET keyword.

It was mentioned some time ago that there was a way to eliminate the need for the dummy close parentheses codes. These dummy codes are put into the translated output where unnecessary parentheses were entered into an expression, so that the Recreator would know to reproduce them. This will prevent confusion from having entered parentheses just disappear. During run-time, these dummy codes would be skipped.

It turns the same method to get rid of these dummy parentheses codes will also be used for the LET command, first command to be implemented. There are already 8 assignment codes, and there will be 4 more once temporary strings are fully implemented in the Encoder. All of these assignment codes are for without the optional LET. Either another dummy code is needed for the LET or each of the assignment operator needs to duplicated when the optional LET is included – that's 12 more codes.

Neither of these two alternatives is desirable. There is a third alternative that will help eliminate the need for these dummy codes, but some look ahead planning is required into the design of the format for how the BASIC program code will be stored internally. In fact, as each BASIC command is implemented, so forward planning is necessary into how the command will be stored in memory and executed during run-time. Next a preliminary design of the internal code...