Interactive BASIC Compiler Project

Wednesday, January 21, 2015

Table – Add To Table Refactoring

The add to table function is large and hard to read so it will be refactored. Refactoring this function will require several passes of changes. The first round was to break the function into seven smaller functions. The add expected data type function was already separate since it is called from three places. The result was a simpler main function, but is still rather involved. The separated functions are also simpler, but less than readable (and one is still quite large).

I won't detail the results since this is not the final form (and if successful, no explanation should be necessary as the code should be easy to read).

[branch table commit 23515be87b]

For the next round, I wanted to clean up the main add to table function. There were a number of return statements that were cluttering up the function. The first change was to restructure two of the if statements to eliminate two of the return statements. For the others, there was a pattern with three of the separated functions returning a null pointer indicating that no further action was necessary. The main function then checked for this null pointer and returned.

To clean this up, an empty Done structure was added with no members to be used as an exception. Instead of these three functions returning a null, they instead throw an instance of this empty structure. The main function catches this exception and does nothing (just returns). This is not really an exception condition, but it cleaned up the main function quite a bit. This mechanism allows lower-level functions to cause a return of the add to table function.

[branch table commit 69bd606994]

For the third round, there was some more attempt to make the main add to table function and some of the separated function more readable. This involved renaming functions (again) and adding access functions to make what is being tested in if statements more self explanatory.

One of the new access functions was named has operands, which was too similar to the has operand access functions. These were used to determine if the code has an operand (variable, constant, array, etc.) by checking if the code has an operand text function. These functions were renamed to is code with operand. The last operands access functions were corrected to return an integer instead of a boolean (though this didn't appear to affect its functionality).

[branch table commit 8b9b3f9a4b]

Tuesday, January 20, 2015

Table – Single Base Class

The Table and Table Entry classes are going to be combined into a single class, which will be the base class in the new table model class hierarchy. The only members left in the Table class are static members and functions except for the constructor and entry member pointer, which is just pointing to the static array of initialized table entries.

The entry pointer member was removed and the static array is accessed directly. The entry pointer and size arguments were removed from the constructor. The constructor loops through each entry in the array calling the add function to add the necessary info to the static table members. The alternate info initializer list is then iterated to add other alternates not automatically set up by the table entries in the add function.

Previously when the add function threw an error, the constructor caught the error and added it to a list. At the end of the constructor, if this list is not empty, then the errors are output and the application is aborted. With the new table model, it will not be possible to catch all errors in a list since errors thrown by static constructors abort the application immediately. The add function was modified to catch its own error, output it and abort. This is only a diagnostic tool during development, so trying to report all errors would be nice but not a necessity.

Both the add and add expected data type functions operated on a single table entry and therefore were moved to the Table Entry class. These functions will be called by the constructors of the new table class hierarchy. The function was renamed to add to table, but the entire function is too large and needs to be cleaned up (refactored, in other words, broken up into smaller functions; and this is certainly not the only function with this problem).

[branch table commit 07064dbde4]

Monday, January 19, 2015

Table – Code Type Enumeration

The code enumeration where enumerators were used as table indexes was replaced with a code type C++11 enumeration class. Code types are only needed for codes that are referenced, for example, special operators (open parentheses, close parentheses, equal, comma, semicolon, and colon), some commands (LET and REM), hidden codes (convert to double or integer), and codes with operands (constant, variable, array, etc.). Several codes are assigned to the same code type enumerator (for example double, integer, string, non-reference and reference variables).

The code type enumeration was named the same as the old Code enumeration because this makes the code read better where they are used. The old Code enumeration was moved to the table source file for the temporary alternate code initializer lists and was renamed to Code Index to conflict with the new Code enumeration class. These enumerators will eventually be removed. In the rest of the classes, the old enumerators were changed to the new enumerators.

The table entry type member was changed to the code (new code type enumeration). Table entry initializers were updated with new enumerators. For most of the entries, this was just the default code type enumerator. The type access functions were removed from the table entry and token classes, and the code access functions were updated to use the new code member.

A new code to entry map static member was added to the table class. In the add function, if the code is not the default code type and the code type of the entry is not already in the map, then the pointer to the entry is added to the map for the code type. The argument of the entry static access functions were changed from an index to a code type enumerator. An entry access function taking an index is still needed for converting program code back to tokens.

The switch statements on the token type in the translator and tester class routines were updated to switch on code type. This included the test token type name function, where the names output were also changed to match the code type. This affected the parser tests output, which were updated accordingly.

[branch table commit f19b061fe3]

Sunday, January 18, 2015

Table – Internal Function Type

The next type replaced with a table flag was the Internal Function type. There were no complications here either. A new Function table flag was added (the "internal" nomenclature will no longer be used). An is function table entry access function was added that just calls the has flag function with the Function flag. A token is function table pass-through access function was also added. The Function table flag was added to all table entries with the Internal Function type, which were then changed to the default type.

[branch table commit eaa6e6d7f5]

Table – Operator Type

The next type replaced with a table flag was the Operator type. There were no complications. A new Operator table flag was added. An is operator table entry access function was added that just calls the has flag function with the Operator flag. A token is operator table pass-through access function was also added. The Operator table flag was added to all table entries with the Operator type, which were then changed to the default type.

[branch table commit e064047030]

Table – Command Type

The only uses of code enumerators remaining are those that will become code type enumerators and for the temporary initializers of additional alternate codes. The replacement of the type enumerators can now begin. As mentioned previously (see January 3), table flags will be used to identify commands, operators and functions since some of these codes will need their own code type enumerator (for example, LET, Equal, Comma and Semicolon).

The command type was first to be replaced with use of a table flag. A Command table flag already exists and was being used for the Comma, Semicolon, and the assignment codes. The purpose this flag is to identify that these codes can have the Colon sub-code like command codes and not the Parentheses sub-code that operators can have. Both of these sub-codes share the same sub-code bit.

For the purposes of checking for a command type code, these other codes needed to be excluded. An is command table entry access function was added to do this check, which first checks if the Command table flag is set. The assignment codes all have the Reference table flag, so these are excluded by checking if this flag is not set. For Comma and Semicolon (used as commands for PRINT statements) are treated as special operators allowing them in expressions where an operator is expected. These are excluded by checking if the code is not an operator, which for now checks for an Operator type but will be changed to checking the new Operator flag.

A token is command table pass-through access function was also added. When checking for the Colon and Parentheses sub-codes (in the program word stream insert operator and recreator operator functions), just the Command table flag is checked, otherwise the is command function is used (throughout the translator and tester classes).

For all of the command code table entries where the Command table flag was added, the current type initializer was changed to the default type (null). This was necessary since the Command type was removed. The type initializers will be replaced with code type enumerators.

[branch table commit 098009de24]

Saturday, January 17, 2015

Table – Assign Code Recreation

A small part of a change made a few commits ago (see January 10) modified the assign string recreate function where the name of the Equal code is used instead of the name of the assign code. This was done to remove the use of the Assign code enumerator (the Equal code enumerator is used in many places and will be one of the enumerators in the new code type enumeration).

The assign recreate function was modified to also get the name from the equal code table entry. Since both assign recreate functions use the name from the equal code table entry, their table entries no longer need a name, so their names were set to a blank string. The expected test results were updated accordingly.

[branch table commit 0c107e08f9]

Table – Code Indexes

Indexes of table entries are used in the program code to represent a table entry of a code. When the program code is read, then these indexes should be converted back to a table entry point, which is then used to access information about the code. These code indexes should only be used in the program code. However, the code indexes have been used throughout mainly within tokens.

As of the last change, they are no longer used within tokens. The only remaining uses are a few code enumerators to get a table entry and some uses within the tester class. The few code enumerators still used but will become code type enumerators when these code enumerators are combined with the type enumerators.

The tester class was using code indexes to determine if define functions contain parentheses or not. When define functions are implemented, their dictionary entries will contain the number of arguments. No arguments will mean that there are no parentheses. Since this is not yet implemented, the code indexes were used temporarily to determine if parentheses were present. The tester class should not be using code indexes.

A new Define Function No Arguments type was added for the three defined function codes without parentheses. The tester routines were modified to use this new type instead of checking for code indexes. There will be an equivalent temporary code type enumerator for this same purpose once the type enumeration and code index enumeration is removed. A case for this new type was added to the translator get operand function at the current Define Function case. With a new type, the expected results for parser test #2 (identifiers) needed to be updated.

[branch table commit c9c4ddeb44]

Friday, January 16, 2015

Token – Table Entry Member

The code member of the token class was replaced with a table entry pointer member. The index member, which gets set to the offset within the program line when the tokens of the line are encoded. These offsets will be used later as more BASIC commands are implemented (for example IF-THEN statements). Accordingly, this member along with its access functions were renamed to offset.

The reason for this change was that the index name is more appropriate for the code index stored in the program code. The code access function was renamed to index. This function formally returned the value of the code member and was changed to call the index table access function, itself also being renamed from code. The return types of these functions were changed from the Code enumeration to an integer. This table function still returns the index of the table entry, but a type cast is not needed anymore.

Accesses to the code member were changed to use the table entry pointer member. Uses of the table access function internally within the token class were changed to use the new member directly. There were a number of uses of the table access function external to the token class. Sufficient pass-through table access functions were added to the token class making the use of this access function unnecessary. Callers were updated to remove this extra call. Pass-through functions were not added for the table entry function pointer members as these will be changed to virtual member functions.

The table access function was renamed to table entry to be consistent with the set table entry access function. The order of the token access functions were arranged to the same order as the member and extra comments were removed.

[branch table commit 2a0a6f2fa3]

Thursday, January 15, 2015

Token – Set Table Entry

The code member of the token class will be replaced a table entry pointer. This means that the access functions for the code need to be changed. It is easy to change a table entry pointer to a code or a code to a table entry pointer since the code is currently just an index of the table entry. There was already a table entry access function, which for now calls the static table entry access function that just returns the pointer to the entry in the table entry array.

Once the code member is replaced, the table access function will simply return the table entry pointer member. The set code access function was replaced with a matching set table entry access function and callers were updated accordingly. The name of the getter should be named table entry but use of this function is going to be largely replaced with additional pass-through table access functions in the token class (like already exists with the type, data type, has flag, precedence and last operand functions). This will wait until the table entry pointer member is added.

There was also the token constructor that took a code enumerator. Uses of this constructor was replaced with the token constructor taking a table entry pointer (where default values were added for the column and length arguments). Most calls to the former constructor had a table entry pointer and was obtaining the code from it, so the call to the code table access function was simply removed from these.

The program word class was also using the code enumeration type for setting codes and returning codes. This required C-style type casts (though C++ static casts could have been used) to change to and from unsigned 16-bit integers. The code and token types will be replaced with a code type and a code index. Code indexes will only be used in the program code. The code enumeration in the program word access functions was changed to unsigned integers and their callers updated accordingly.

[branch table commit 0dc88434d2]

Wednesday, January 14, 2015

Table – Single Instance

The find code function was the last function in the translator using the table reference member of the class. The recreator and program model classes also no longer used their table reference members. Both the translator and recreator classes contained a table access function for use by the various translate and recreate functions of the codes. These functions were no longer using this access function, so these access functions were removed. The table reference members were removed from these classes. (The table reference member has already been removed from the parser class.)

The table reference members were initialized from the static table instance function. Upon the first call to this function, the table instance is created and initialized. With the removal of all of the table reference members, there were no calls left to initialize the table. This function (along with the static table instance member) was replaced with a static table instance in the table source file. This required the table constructor to be made public.

[branch table commit 2c22674e61]

Tuesday, January 13, 2015

Translator – First/Last Operands Handling

After refactoring the token convert and changing its caller, the process done stack top function, I noticed this later function could be better implemented. Specifically how the first and last operands of the tokens were being returned. If the callers wanted these tokens returned, they would pass pointers to were to put the tokens, otherwise null pointers passed (also the default). Using arguments as outputs is not the best design. Changing this caused a cascade of additional refactoring.

The process done stack top function was changed to return the first and last operands as a standard pair of token pointers (now referred to appropriately as the first and second operands, the members of the pair). If the caller doesn't need them, then it ignores the return value. An Operands alias was added for this pair. The local first and last variables were replaced with an operands pair, which is returned at the end of the function. The comments to this function were removed as they were now outdated and restated what the function did.

The process final operand function was changed to use a first/second operands pair. The operand index argument was used to determine if an operator token was unary. The token itself can be used to determine this. This argument was also passed to the process done stack top function, but it was always the last operand of the token, which can be obtained from the token. This argument was removed. The generic token2 argument was renamed more appropriately to first since it is the first operand of the token, and was made an r-value reference to force the caller to move its token to this function. The comments to this function were also removed.

The process operator function, another caller of the process done stack top function, was also modified to use an operands pair (only the first operand returned is used). The final caller, the LET translate function, does not use the returned operands and didn't need to be modified.

The first and last members of the internal Done Item structure (the values of the done stack member) were replaced with an operands pair. The constructors were updated to initialize the first and second members of the operands pair and a new constructor was added with an operands pair argument. The replace first last function (a poor name) was renamed to the more appropriate replace operands.

[branch table commit 240f967d4a]

Sunday, January 11, 2015

Token – Conversion Handling

The table find code function was the one function taking a token pointer argument that was not modified with the rest of the table functions with a token argument since it did more than just access the code of the token. It had two token arguments, the token being processed (operator, assignment, print item, and input assignment) and its operand token.

This function took no action if the operand data type matched the expected data type of the token. If the operand was a constant, an attempt is made to change the constant to the expected data type if at the last operand and the token does not have the Use Constant As Is flag set. If there is an alternate code of the token that matches that of the operand, the token is changed to that code. Otherwise a hidden conversion code is obtained. An error is thrown if the operand cannot be converted.

Since this function is acting on a token (or possibly two tokens), it made more sense for this function to be a member of the token class. After moving it to the token class, it was renamed to the convert function. With a lot going on in this function, some refactoring was performed to improve the readability and clarity of the code:

Added the is last operand access function to determine if an operand index is the last operand of the code in the token.
Renamed the convert constant function to change constant. Also renamed its argument.
Added the change constant ignore error function, which calls the change constant function and catches an error that may be throw.
Renamed the convert code function to convert code entry. Also renamed its argument.
Removed most of the comments as they were only restarting what appears in the code (this included the callers in the translator routines).

[branch table commit ce6adffc15]

Saturday, January 10, 2015

Table – Entry Access Functions

There were a number of table access functions with a code argument, which was used as an index into the table entry array to access the desired entry member. There also a number of access functions with a token pointer argument, but the token argument was only used to obtain the code (index). All of these functions (except one) were replaced with in-line table entry access functions.

The table access function of the token is used to access the table access functions. Like the existing token type and data type access functions that access the table, has flag and precedence functions were added to the token class so that the intermediate table function call is not needed. Since all of the functions are in-line members, the resulting code is just as efficient.

The table class was also made a friend class of the table entry class so that the static table members are directly accessible. The table entry class was already made a friend class of the table class. Eventually these two classes will be combined as the new table model takes shape.

There were a number of other locations where use of a code was replaced with a table entry pointer. The table entries were made private since all are now covered with access functions. However, this required a table entry constructor (to initialize all of the members) so that the current static table entry array can still be initialized. This is temporary until the new table entry class hierarchy is implemented. See the commit log for other changes that were made.

[branch table commit 2f013da528]

Table – Functions Returning Codes

There were several table (and token) functions that returned a code enumerator (which was really an index). These functions were changed to return a table entry pointer. In the short term this means in some cases calling the temporary code access function to get the index of the entry (the code is still stored in a token).

The functions modified included the find code and token convert code functions. Several new table entry alternate functions replaced equivalent table set token code functions. Click Continue... for details of the changes. Replacing of the use of code enumerators with table entry pointers has been started, which is necessary before the code and type enumerations is replaced with the code type enumeration.

[branch table commit 3df697fa40]

Continued... »

Wednesday, January 7, 2015

Token – Data Type Member

The data type member of the token class was removed since its value should always the same as the table entry for the code in the token (and all tokens now have a code). The data type, is data type, and is data type compatible access functions were modified to get its return value from the table entry for the code. The set data type access function was removed since it is no longer used.

A data type access function was added to the Table Entry class. To access the data type of the table entry, which is within the expression info member, the Expression Info structure definition was moved from the table source file to the table header file so that its members are available from the in-line table entry functions.

These changes caused a problem with the not yet implemented array, defined function and user function codes. There were only one table entry for each of these codes and their data type was Double. This previously worked since the correct data type was in the token. The problem was corrected by adding the missing table entries for these codes for each of the other data types.

There were several locations (translator and tester routines) that to check for these codes by looking for the one code. Now with additional codes, the checks needed to expanded to include the additional alternate codes. This is not ideal, but will be resolved once the code type enumeration is implemented (there will be one code type each for variable, array, defined function and user function codes).

[branch table commit f8d338ae5e]