The initial implementation of the encode is complete, which includes preparing the tokens for encoding. Version v.0.5.0 has been tagged. Work will now begin in encoding the tokens into the internal program code format.
[commit 6ed93f833c]
Saturday, September 7, 2013
Encoder Test Mode – Blank Lines
When the new line was added to encoder test #1, a blank line was added before the new line, but the line was ignored. The test run routine was modified to allow a blank line only when testing the encoder. The expected results were updated accordingly.
[commit c84fbfa08b]
[commit c84fbfa08b]
Encoder – Assigning Positional Indexes
The next step in preparing the tokens in the RPN list for encoding is assigning a position index to each token. This is a position, or offset, that the token will have in the encoded program code within the line. This index will be used later for calculating offsets of a single line statements like from the IF token to the ELSE, END IF or last token on a line.
For now however, this index is not needed, but after assigning the indexes to all the tokens, the final count or size of the encoded program code for the line will be known. This value will be used for allocating a program word array that will be filled in during the next step of encoding.
It was more efficient to assign the position index after assigning codes to tokens instead of having a separate routine with another loop, the assign codes routine was renamed to the prepare tokens routine. A count variable was added to the routine, which is initialized to zero. After the token type switch, the index of the token is set to the count and the count is incremented. If the token will have an an operand (the token code has the has operand flag), the count is incremented again for the operand word.
The prepare tokens routine was also changed to return the size required for the encoder program line instead of a success/fail boolean status. For errors, a -1 value is returned. The calling encode routine was updated accordingly for the new return value.
The RPN list, item and token text routines were modified to optionally output the index of each token. The token text routine was also modified to output an index for the operand word, to make sure that the sub-codes are output after the code word (not after the operand word), and to treat the comment of remark as the operand word (when index output is selected). Another test line with multiple statements was added to encoder test #1 and the expected results were updated for the indexes now output on each token.
[commit 9e0e442171]
For now however, this index is not needed, but after assigning the indexes to all the tokens, the final count or size of the encoded program code for the line will be known. This value will be used for allocating a program word array that will be filled in during the next step of encoding.
It was more efficient to assign the position index after assigning codes to tokens instead of having a separate routine with another loop, the assign codes routine was renamed to the prepare tokens routine. A count variable was added to the routine, which is initialized to zero. After the token type switch, the index of the token is set to the count and the count is incremented. If the token will have an an operand (the token code has the has operand flag), the count is incremented again for the operand word.
The prepare tokens routine was also changed to return the size required for the encoder program line instead of a success/fail boolean status. For errors, a -1 value is returned. The calling encode routine was updated accordingly for the new return value.
The RPN list, item and token text routines were modified to optionally output the index of each token. The token text routine was also modified to output an index for the operand word, to make sure that the sub-codes are output after the code word (not after the operand word), and to treat the comment of remark as the operand word (when index output is selected). Another test line with multiple statements was added to encoder test #1 and the expected results were updated for the indexes now output on each token.
[commit 9e0e442171]
Encoder – Variable Reference Fix
In testing the next step, a small problem was discovered with variable references where the table entry for the main variable reference had the constant associated code array. This was corrected and the expected results for encoder test #1 was updated.
[commit 88913ed121]
[commit 88913ed121]
Initial Encoder – Assigning Codes
The new Encoder class was implemented initially with just the first step where tokens without codes are assigned codes. For now, only variables (identifier with no parentheses token type) and constants are handled. The other token types will be implemented once the recreator and run-time modules are implemented for all the initial set of commands (LET, PRINT, INPUT and REM). The first phase of the Encoder class implementation (steps 1 and 2 described on August 30) only prepare the RPN output list from the translator (will be referred to as the RPN input list for the encoder). The second phase (step 3) involves the generation of the program code.
The initial Encoder class contains two member variables, a reference to the table instance and a pointer to the RPN input list. The class includes a single public encode routine and a private assign codes routine that the encode routine calls after saving the pointer to input list. For now, both routines simply return a success or fail status as a boolean value. The assign codes routine loops through the input list and does a switch on the token type.
For a constant token type, and the appropriate code is found for the data type of the constant from the new base code Const (with associated codes ConstInt and ConstStr). An identifier with no parentheses token type is assumed to be a variable (until functions are implemented later). If the reference flag is set, then the appropriate code is found for the data type from the new base code VarRef (with associated codes VarRefInt and VarRefStr). For non-reference tokens, the appropriate code is found for from the new base code Var (with associated codes VarInt and VarStr).
For the command, operator, and internal function (with or without parentheses) token types, no action is preformed since these token types already have a code. For the other token types (identifiers with parentheses, and defined functions with and without parentheses), the token is set as an error in RPN input list with a "not yet implemented" error, the input list is cleared and false is returned.
A pass through find code routine was added to the Table class that takes a single token and a base code. The token is set to the base code and its type is set to the type for the base code. This type replaces the token type present (constant or identifier with no parentheses). The full find code routine is called with the token passed as both the main token and operand token, since the token is the token to be modified (for an associated code as needed) and contains the information (data type) needed to set the code.
Table entries for the new codes were added, each given the internal function with no parentheses token type (same as for the hidden convert codes). Each of these entries were also given the new has operand flag. This flag will be used to determine if the code has a second operand program word. The token text routine was modified for internal function token types, where if the has operand flag is set, then the string of the token is output like a separate token representing a separate program word.
An initial encoder test file was created containing a assignment statement for a variable of each data type assigned to a constant along with a PRINT statement for each variable. This contains each of the nine tokens that need a code assigned (constant, variable, and variable reference for each data type). Encoder command line options were added to the Tester class with an encode input routine that first translates the line and if no error, encodes the line. For now, the tokens of the RPN list are output like with translator testing. The test scripts and batch file were updated to handle the encoder tests.
[commit 94ca2692d0]
The initial Encoder class contains two member variables, a reference to the table instance and a pointer to the RPN input list. The class includes a single public encode routine and a private assign codes routine that the encode routine calls after saving the pointer to input list. For now, both routines simply return a success or fail status as a boolean value. The assign codes routine loops through the input list and does a switch on the token type.
For a constant token type, and the appropriate code is found for the data type of the constant from the new base code Const (with associated codes ConstInt and ConstStr). An identifier with no parentheses token type is assumed to be a variable (until functions are implemented later). If the reference flag is set, then the appropriate code is found for the data type from the new base code VarRef (with associated codes VarRefInt and VarRefStr). For non-reference tokens, the appropriate code is found for from the new base code Var (with associated codes VarInt and VarStr).
For the command, operator, and internal function (with or without parentheses) token types, no action is preformed since these token types already have a code. For the other token types (identifiers with parentheses, and defined functions with and without parentheses), the token is set as an error in RPN input list with a "not yet implemented" error, the input list is cleared and false is returned.
A pass through find code routine was added to the Table class that takes a single token and a base code. The token is set to the base code and its type is set to the type for the base code. This type replaces the token type present (constant or identifier with no parentheses). The full find code routine is called with the token passed as both the main token and operand token, since the token is the token to be modified (for an associated code as needed) and contains the information (data type) needed to set the code.
Table entries for the new codes were added, each given the internal function with no parentheses token type (same as for the hidden convert codes). Each of these entries were also given the new has operand flag. This flag will be used to determine if the code has a second operand program word. The token text routine was modified for internal function token types, where if the has operand flag is set, then the string of the token is output like a separate token representing a separate program word.
An initial encoder test file was created containing a assignment statement for a variable of each data type assigned to a constant along with a PRINT statement for each variable. This contains each of the nine tokens that need a code assigned (constant, variable, and variable reference for each data type). Encoder command line options were added to the Tester class with an encode input routine that first translates the line and if no error, encodes the line. For now, the tokens of the RPN list are output like with translator testing. The test scripts and batch file were updated to handle the encoder tests.
[commit 94ca2692d0]
Pre-Encoder Issues – More Refactoring
As the initial encoder class was being implemented, the need for some more minor refactoring was noticed. The first was the token mode enumeration that was needed by the old translator routines was not removed. The second was the member variable and access function for the double value of a constant token was renamed from valueDbl to just value for consistency. Since double is the default data type, generally the 'Dbl' part is not included (just 'Int' and 'Str') in the name.
[commit 0ec09e02e4] [commit 98e22bda3f]
[commit 0ec09e02e4] [commit 98e22bda3f]
Subscribe to:
Posts (Atom)