Interactive BASIC Compiler Project: January 2015

Thursday, January 29, 2015

Table – Encode Virtual Function

The encode mechanism needed to be improved. The encode function in the program model converts an RPN list to a program line. For each token in the RPN list, the first program word is generated from the code index and sub-codes of the token and added to the program line. If the code in the token has an operand (using the is code with operand access function), the second program word is generated by calling the encode function of the table entry to get a index, which is added to the program line.

This mechanism was not making effective use of polymorphism. A better way is to call the encode function for the table entry, which generates the one or two program words dependent on the requirements of the code in the token (via its table entry). This was accomplished with several changes.

An encode function was added to the token class since the contents of the token are used for encoding. The first argument is a pointer to the program unit (an instance of the program model class) needed for the codes with operands to access the dictionaries. This function also needed access to the program line being encoded. This was handled by passing a C++11 STL back inserter iterator as the second argument. This iterator is used to add words to the program line, such as:

*backInserter = <ProgramWord>;

The token encode function then calls the encode function for the table entry. The arguments include the program unit pointer, back inserter iterator and a pointer to the token. A plain token pointer is passed (the this pointer). The token argument of the entry encode functions were changed from a standard shared token pointer to a plain pointer (a shared pointer is not necessary here anyway). This changes propagated to the dictionary add functions and the dictionary info add element and set element functions.

With the table entry encode functions adding the operand words to the program line directly (via the back inserter iterator), a return value is no longer needed. The default encode function does nothing since most codes don't require a second operand program word. For codes with operands (REM, constants, variables, etc.), their encode functions call the dictionary add function whose index return value is added to the program line.

[branch table commit 0e039c280f]

Wednesday, January 28, 2015

Table – Entry Virtual Functions

The implemented commands (LET, REM, PRINT, INPUT, and INPUT PROMPT) each have different initialization values (a code for LET and REM, an option name for the INPUTs, a translate function pointer for all except REM, a recreate function pointer for all but LET, and an encode, operand text and remove functions pointers for only REM). Creating a single command sub-class constructor to cover these would be messy, and so would creating a different constructor each.

The intention was to replace the functions pointers with virtual pointers, so it was not worth the effort to try and handle these. Virtual functions were added to the base table class, which temporarily call the function pointer if set (translate and recreate), or just call the function pointer (encode, operand text, and remove). For the command sub-class, which is only used for commands not yet implemented, the translate virtual function overrides the base class function and throws a "Not Yet Implemented" error.

The is code with operand access function checked if the operand text function pointer was set. Since the function pointers will be removed, another method for checking if the code has an operand was needed. A new Operand table flag was added to the REM, REM operator, three constant, and six variable table entries. This access function was modified to check this flag.

The users of the function pointers (translator and program model) were modified to call the virtual functions. For the encode, operand text, and remove functions, the program model needs to call the is code with operand function before calling the function. This resulting code a bit messy and needs to be improved.

[branch table commit c24cf75c90]

Tuesday, January 27, 2015

Table – Command Sub-Class (Initial)

An initial command table entry derived class was implemented consisting of a single constructor taking name and second name string arguments with the second name defaulting to a blank string. This constructor can be used with both one-word and two-word commands. It calls the base table constructor passing the two names along with values needed by most commands (Command table flag, precedence of 4 and a pointer to the null expression info instance).

This command constructor is only sufficient for the not yet implemented commands. For now it also sets the default code (no code), blank option string, and null function pointers. This class definition was put into a new header file. In the corresponding source file, instances were created for all of the not yet implemented commands, for example:

static Command Dim("DIM");

The entries for these commands were removed from the static table entry array and the code index enumerators for these commands were removed from the code index enumeration.

The type of the string arguments to the table constructor (name, second name, and option) was changed from a standard string to a C-style string (constant character array pointer) because otherwise, passing a blank string to this constructor required a standard string std::string("") instance, instead of just a blank "" string.

The first time these changes were one, a crash occurred during initialization when trying to add the first of the new command instances because the static table members had not been initialized yet. Because the new command source file appeared before the table source file in the source file list in the CMake build file, so command instances were initialized before the static table members, but these are needed to initialize the command instances. To resolve this, the basic sub-directory source files (which will contain the entry instances) were moved to the end of the source file list so that the table source file with the static table members are initialized first.

[branch table commit 97ff351060]

Monday, January 26, 2015

Table – Automatic Two Flag

The table is no longer dependent on the static table entry array so the implementation of the table class hierarchy can begin. The first sub-class will be for commands. While investigating the requirements for this sub-class, it was noticed that the Two table flag could be set automatically. The Two table flag indicates if a command has a two-word form (for example the INPUT and INPUT PROMPT commands). It is also used for symbol operators that has a two-character form (for example <, <= and <> operators).

The erector class functions were modified to set the Two table flag automatically for these commands and operators. After a two word command is added to name to entry map, if the first word of the two-word command has an entry, then the Two flag of the one-word entry is set. Similarly, after a new primary entry is added to the name to entry map, if the entry is a two-character operator, the Two flag is set on the entry for the first character is there is one. Some additional table entry access functions were added.

The Two table flag was removed from all of the entries. The Two flag was set on the <> operator, which was not necessary (only the one-character form required this flag). Not all of the command entries with the flag were covered by parser tests, so additional tests were added to parser test #2 (identifiers). Not all forms of all operators were covered by the tests. Translator test #11 (temporary strings) contained an extra set of <= tests, which should have been >= tests. New translator test #18 was added to cover all operators with all possible data types.

[branch table commit 937c5e3a57]

Sunday, January 25, 2015

Table – Entry Indexes

Up to now, the index for a code that is stored in the program is the index of the table entry in the static table entry array defined in the table source file. The new table model will not have an array of table entries, but instead will be separate table instances spread among several source files. The common connection will be through the constructor of the base table class. This constructor will put pointers to the entries into a standard vector. As each entry is put into this vector, its index within the vector will be put into an index member of the table entry.

A static index to entry vector was added to the table class along with the index instance member. The set index and add entry access function was added to set the index of the an entry and add the pointer to the entry to this vector. A call to this function was added to the erector function. The index access function was changed to return the value of the index member. The entry access function was changed to access the index to entry vector instead of the static table entries array. Both functions were made inline.

The comments were removed from the instance members since they were only stating the obvious, plus the comment for the expression info pointer member stated that a null pointer indicates no expression info even though this no longer applies because all entries are now assigned an expression info instance even it it is the null instance. This is an example of comments going stale and lying - a good reason for eliminating comments with code explains itself.

[branch table commit 9d74aacf42]

Table – Static Member Access

While refactoring the add to table function and sub-functions into the Erector class functions, one of the desired changes to improve readability was to provide access functions to the static table members. All of these statements contained the table class scope (Table::) to get to the static member and a table entry pointer. This meant that table entry access functions could be added that would then access the static table member.

Ideally these access functions would be inline for efficiency. The reason these changes were not made was that in order to be inline, the table class needs to be defined before the table entry class, but the table entry class needs to be defined before the table class. Using forward declarations may have solved this issue. This would not be necessary if the two classes were merged and since they were going to be merged, no effort was used to solve this issue.

With the two classes merged, these access functions could be implemented. The necessary access functions for the expected data type, alternate, name to entry, and code to entry maps were added and callers were updated accordingly. The table class scope was eliminated from the erector class functions. The erector add to code map and two add to expected data type functions were moved back to the table class since both were passed entry pointers.

Several of the existing access functions were renamed for improved readability. One of the access functions, the alternate count function, was only called from one location in the translator and was used to determine if the unary operator in the token has an alternate binary operator. This section of code looked like this (note the comments and magic numbers):

// check if code has a binary operator
if (m_token->alternateCount(1) > 0) {
m_token->setFirstAlternate(1); // change to binary operator
}

This function was renamed to the has binary operator function. Several constant expressions (integers) were added for predefined operand indexes for use with the various alternate access functions so magic numbers are not used. With the renamed access function and constant expressions, the resulting code is much more readable (note the absence of comments):

if (m_token->hasBinaryOperator()) {
m_token->setToFirstAlternate(BinaryOperator);
}

[branch table commit 03140858b0]

Saturday, January 24, 2015

Table – Single Class

The table class, with static members, and the table entry class, with entry (instance) members, were combined into a single class, named the Table class. The former table entry constructor (which had no body) was moved to the table source file and a call to an Erector instance was added to add the needed info to the static table table members.

The iteration of the entries in the static table entry array that called an Erector instance for each entry was removed from the default constructor since the table entry constructor now does this. The only statements remaining in this constructor are the initialization of the other alternates. This constructor along with the single table instance will remain to initialize these other alternates until this can be reimplemented in the new class hierarchy.

With these constructors in place, the class hierarchy can be implemented one derived class at a time. As each is implemented, the entries will be removed from the static array along with the enumerator from the entry.

The method previously used to disable the copy constructor and copy assignment was to make them private so they are not accessible, at least from outside the class. C++11 has a better method of accomplish this using the new delete feature. C++11 also has move constructors and assignments, so these also needed to be deleted. For example, the copy constructor and assignments are disabled like this:

Table &operator=(const Table &) = delete;
Table(const Table &) = delete;

Some other minor changes were made. All instances of variables named operand index (mostly arguments of access functions) were renamed operand. This was already done in the Erector class. Several redundant comments were also removed.

[branch table commit 9153b17e5e]

Table – Add To Table Refactoring (Part 3)

The add to table function and all the functions it calls share several variables, which required them to be passed between the functions. Enter the new Erector class to contain these functions and shared variables. This class is a function operator class with a constructor that takes a table entry pointer, and the function operator function for adding the element to the table (static member variables). This class was made a friend class of the table and table entry classes to make private member functions accessible.

The shared members of the Erector class include the table entry pointer being added, table entry pointer to the primary entry, table entry pointer to an alternate entry, and an operand index used for iterating and used by multiple functions. Most of the member functions were declared in the header file after the class definition to not clutter up the class definition. Each of these were declared inline since each is called once. Two functions called multiple times were put into the source file.

Many of the functions were renamed again to better explain what they do. There was addition refactoring moving statements up and done the functions during this renaming. Not widely known (at least is wasn't to me) is that C++ contains alternate representations for the logical operators, for example and for &&, or for ||, and not for !. To make the code a little more readable, the not operator was used since the ! operator can be head to see.

The main function was restructured to eliminate the need of the Done structure exception. This unusual mechanism was used as sort of a goto to cause an exit from anywhere. The elimination of this exception-exit mechanism was now possible since the sub-functions no longer returned table entry pointers (which are now member variables and set directly) opening up the use of boolean return values.

[branch table commit f199936e70]

Friday, January 23, 2015

Table – Add To Table Refactoring (Part 2)

Refactoring the remaining large function called from the add to table function was painful, but was finally achieved, though the result is far from ideal. The problem is that there are several variables that need to be shared among the functions. Some functions take a variable and then return the same variable possibly modified. Another needs to modify two arguments, so couldn't return both so reference arguments were used.

Having arguments used as both input and output can be confusing when reading code (arguments should be used for inputs, return values should be used for outputs). Sharing several variables between functions implies that the functions should be wrapped up into another class. This will be the subject of the next refactoring change.

[branch table commit a9cd8e9a1a]

Wednesday, January 21, 2015

Table – Add To Table Refactoring

The add to table function is large and hard to read so it will be refactored. Refactoring this function will require several passes of changes. The first round was to break the function into seven smaller functions. The add expected data type function was already separate since it is called from three places. The result was a simpler main function, but is still rather involved. The separated functions are also simpler, but less than readable (and one is still quite large).

I won't detail the results since this is not the final form (and if successful, no explanation should be necessary as the code should be easy to read).

[branch table commit 23515be87b]

For the next round, I wanted to clean up the main add to table function. There were a number of return statements that were cluttering up the function. The first change was to restructure two of the if statements to eliminate two of the return statements. For the others, there was a pattern with three of the separated functions returning a null pointer indicating that no further action was necessary. The main function then checked for this null pointer and returned.

To clean this up, an empty Done structure was added with no members to be used as an exception. Instead of these three functions returning a null, they instead throw an instance of this empty structure. The main function catches this exception and does nothing (just returns). This is not really an exception condition, but it cleaned up the main function quite a bit. This mechanism allows lower-level functions to cause a return of the add to table function.

[branch table commit 69bd606994]

For the third round, there was some more attempt to make the main add to table function and some of the separated function more readable. This involved renaming functions (again) and adding access functions to make what is being tested in if statements more self explanatory.

One of the new access functions was named has operands, which was too similar to the has operand access functions. These were used to determine if the code has an operand (variable, constant, array, etc.) by checking if the code has an operand text function. These functions were renamed to is code with operand. The last operands access functions were corrected to return an integer instead of a boolean (though this didn't appear to affect its functionality).

[branch table commit 8b9b3f9a4b]

Tuesday, January 20, 2015

Table – Single Base Class

The Table and Table Entry classes are going to be combined into a single class, which will be the base class in the new table model class hierarchy. The only members left in the Table class are static members and functions except for the constructor and entry member pointer, which is just pointing to the static array of initialized table entries.

The entry pointer member was removed and the static array is accessed directly. The entry pointer and size arguments were removed from the constructor. The constructor loops through each entry in the array calling the add function to add the necessary info to the static table members. The alternate info initializer list is then iterated to add other alternates not automatically set up by the table entries in the add function.

Previously when the add function threw an error, the constructor caught the error and added it to a list. At the end of the constructor, if this list is not empty, then the errors are output and the application is aborted. With the new table model, it will not be possible to catch all errors in a list since errors thrown by static constructors abort the application immediately. The add function was modified to catch its own error, output it and abort. This is only a diagnostic tool during development, so trying to report all errors would be nice but not a necessity.

Both the add and add expected data type functions operated on a single table entry and therefore were moved to the Table Entry class. These functions will be called by the constructors of the new table class hierarchy. The function was renamed to add to table, but the entire function is too large and needs to be cleaned up (refactored, in other words, broken up into smaller functions; and this is certainly not the only function with this problem).

[branch table commit 07064dbde4]

Monday, January 19, 2015

Table – Code Type Enumeration

The code enumeration where enumerators were used as table indexes was replaced with a code type C++11 enumeration class. Code types are only needed for codes that are referenced, for example, special operators (open parentheses, close parentheses, equal, comma, semicolon, and colon), some commands (LET and REM), hidden codes (convert to double or integer), and codes with operands (constant, variable, array, etc.). Several codes are assigned to the same code type enumerator (for example double, integer, string, non-reference and reference variables).

The code type enumeration was named the same as the old Code enumeration because this makes the code read better where they are used. The old Code enumeration was moved to the table source file for the temporary alternate code initializer lists and was renamed to Code Index to conflict with the new Code enumeration class. These enumerators will eventually be removed. In the rest of the classes, the old enumerators were changed to the new enumerators.

The table entry type member was changed to the code (new code type enumeration). Table entry initializers were updated with new enumerators. For most of the entries, this was just the default code type enumerator. The type access functions were removed from the table entry and token classes, and the code access functions were updated to use the new code member.

A new code to entry map static member was added to the table class. In the add function, if the code is not the default code type and the code type of the entry is not already in the map, then the pointer to the entry is added to the map for the code type. The argument of the entry static access functions were changed from an index to a code type enumerator. An entry access function taking an index is still needed for converting program code back to tokens.

The switch statements on the token type in the translator and tester class routines were updated to switch on code type. This included the test token type name function, where the names output were also changed to match the code type. This affected the parser tests output, which were updated accordingly.

[branch table commit f19b061fe3]

Sunday, January 18, 2015

Table – Internal Function Type

The next type replaced with a table flag was the Internal Function type. There were no complications here either. A new Function table flag was added (the "internal" nomenclature will no longer be used). An is function table entry access function was added that just calls the has flag function with the Function flag. A token is function table pass-through access function was also added. The Function table flag was added to all table entries with the Internal Function type, which were then changed to the default type.

[branch table commit eaa6e6d7f5]

Table – Operator Type

The next type replaced with a table flag was the Operator type. There were no complications. A new Operator table flag was added. An is operator table entry access function was added that just calls the has flag function with the Operator flag. A token is operator table pass-through access function was also added. The Operator table flag was added to all table entries with the Operator type, which were then changed to the default type.

[branch table commit e064047030]

Table – Command Type

The only uses of code enumerators remaining are those that will become code type enumerators and for the temporary initializers of additional alternate codes. The replacement of the type enumerators can now begin. As mentioned previously (see January 3), table flags will be used to identify commands, operators and functions since some of these codes will need their own code type enumerator (for example, LET, Equal, Comma and Semicolon).

The command type was first to be replaced with use of a table flag. A Command table flag already exists and was being used for the Comma, Semicolon, and the assignment codes. The purpose this flag is to identify that these codes can have the Colon sub-code like command codes and not the Parentheses sub-code that operators can have. Both of these sub-codes share the same sub-code bit.

For the purposes of checking for a command type code, these other codes needed to be excluded. An is command table entry access function was added to do this check, which first checks if the Command table flag is set. The assignment codes all have the Reference table flag, so these are excluded by checking if this flag is not set. For Comma and Semicolon (used as commands for PRINT statements) are treated as special operators allowing them in expressions where an operator is expected. These are excluded by checking if the code is not an operator, which for now checks for an Operator type but will be changed to checking the new Operator flag.

A token is command table pass-through access function was also added. When checking for the Colon and Parentheses sub-codes (in the program word stream insert operator and recreator operator functions), just the Command table flag is checked, otherwise the is command function is used (throughout the translator and tester classes).

For all of the command code table entries where the Command table flag was added, the current type initializer was changed to the default type (null). This was necessary since the Command type was removed. The type initializers will be replaced with code type enumerators.

[branch table commit 098009de24]

Saturday, January 17, 2015

Table – Assign Code Recreation

A small part of a change made a few commits ago (see January 10) modified the assign string recreate function where the name of the Equal code is used instead of the name of the assign code. This was done to remove the use of the Assign code enumerator (the Equal code enumerator is used in many places and will be one of the enumerators in the new code type enumeration).

The assign recreate function was modified to also get the name from the equal code table entry. Since both assign recreate functions use the name from the equal code table entry, their table entries no longer need a name, so their names were set to a blank string. The expected test results were updated accordingly.

[branch table commit 0c107e08f9]

Table – Code Indexes

Indexes of table entries are used in the program code to represent a table entry of a code. When the program code is read, then these indexes should be converted back to a table entry point, which is then used to access information about the code. These code indexes should only be used in the program code. However, the code indexes have been used throughout mainly within tokens.

As of the last change, they are no longer used within tokens. The only remaining uses are a few code enumerators to get a table entry and some uses within the tester class. The few code enumerators still used but will become code type enumerators when these code enumerators are combined with the type enumerators.

The tester class was using code indexes to determine if define functions contain parentheses or not. When define functions are implemented, their dictionary entries will contain the number of arguments. No arguments will mean that there are no parentheses. Since this is not yet implemented, the code indexes were used temporarily to determine if parentheses were present. The tester class should not be using code indexes.

A new Define Function No Arguments type was added for the three defined function codes without parentheses. The tester routines were modified to use this new type instead of checking for code indexes. There will be an equivalent temporary code type enumerator for this same purpose once the type enumeration and code index enumeration is removed. A case for this new type was added to the translator get operand function at the current Define Function case. With a new type, the expected results for parser test #2 (identifiers) needed to be updated.

[branch table commit c9c4ddeb44]

Friday, January 16, 2015

Token – Table Entry Member

The code member of the token class was replaced with a table entry pointer member. The index member, which gets set to the offset within the program line when the tokens of the line are encoded. These offsets will be used later as more BASIC commands are implemented (for example IF-THEN statements). Accordingly, this member along with its access functions were renamed to offset.

The reason for this change was that the index name is more appropriate for the code index stored in the program code. The code access function was renamed to index. This function formally returned the value of the code member and was changed to call the index table access function, itself also being renamed from code. The return types of these functions were changed from the Code enumeration to an integer. This table function still returns the index of the table entry, but a type cast is not needed anymore.

Accesses to the code member were changed to use the table entry pointer member. Uses of the table access function internally within the token class were changed to use the new member directly. There were a number of uses of the table access function external to the token class. Sufficient pass-through table access functions were added to the token class making the use of this access function unnecessary. Callers were updated to remove this extra call. Pass-through functions were not added for the table entry function pointer members as these will be changed to virtual member functions.

The table access function was renamed to table entry to be consistent with the set table entry access function. The order of the token access functions were arranged to the same order as the member and extra comments were removed.

[branch table commit 2a0a6f2fa3]

Thursday, January 15, 2015

Token – Set Table Entry

The code member of the token class will be replaced a table entry pointer. This means that the access functions for the code need to be changed. It is easy to change a table entry pointer to a code or a code to a table entry pointer since the code is currently just an index of the table entry. There was already a table entry access function, which for now calls the static table entry access function that just returns the pointer to the entry in the table entry array.

Once the code member is replaced, the table access function will simply return the table entry pointer member. The set code access function was replaced with a matching set table entry access function and callers were updated accordingly. The name of the getter should be named table entry but use of this function is going to be largely replaced with additional pass-through table access functions in the token class (like already exists with the type, data type, has flag, precedence and last operand functions). This will wait until the table entry pointer member is added.

There was also the token constructor that took a code enumerator. Uses of this constructor was replaced with the token constructor taking a table entry pointer (where default values were added for the column and length arguments). Most calls to the former constructor had a table entry pointer and was obtaining the code from it, so the call to the code table access function was simply removed from these.

The program word class was also using the code enumeration type for setting codes and returning codes. This required C-style type casts (though C++ static casts could have been used) to change to and from unsigned 16-bit integers. The code and token types will be replaced with a code type and a code index. Code indexes will only be used in the program code. The code enumeration in the program word access functions was changed to unsigned integers and their callers updated accordingly.

[branch table commit 0dc88434d2]

Wednesday, January 14, 2015

Table – Single Instance

The find code function was the last function in the translator using the table reference member of the class. The recreator and program model classes also no longer used their table reference members. Both the translator and recreator classes contained a table access function for use by the various translate and recreate functions of the codes. These functions were no longer using this access function, so these access functions were removed. The table reference members were removed from these classes. (The table reference member has already been removed from the parser class.)

The table reference members were initialized from the static table instance function. Upon the first call to this function, the table instance is created and initialized. With the removal of all of the table reference members, there were no calls left to initialize the table. This function (along with the static table instance member) was replaced with a static table instance in the table source file. This required the table constructor to be made public.

[branch table commit 2c22674e61]

Tuesday, January 13, 2015

Translator – First/Last Operands Handling

After refactoring the token convert and changing its caller, the process done stack top function, I noticed this later function could be better implemented. Specifically how the first and last operands of the tokens were being returned. If the callers wanted these tokens returned, they would pass pointers to were to put the tokens, otherwise null pointers passed (also the default). Using arguments as outputs is not the best design. Changing this caused a cascade of additional refactoring.

The process done stack top function was changed to return the first and last operands as a standard pair of token pointers (now referred to appropriately as the first and second operands, the members of the pair). If the caller doesn't need them, then it ignores the return value. An Operands alias was added for this pair. The local first and last variables were replaced with an operands pair, which is returned at the end of the function. The comments to this function were removed as they were now outdated and restated what the function did.

The process final operand function was changed to use a first/second operands pair. The operand index argument was used to determine if an operator token was unary. The token itself can be used to determine this. This argument was also passed to the process done stack top function, but it was always the last operand of the token, which can be obtained from the token. This argument was removed. The generic token2 argument was renamed more appropriately to first since it is the first operand of the token, and was made an r-value reference to force the caller to move its token to this function. The comments to this function were also removed.

The process operator function, another caller of the process done stack top function, was also modified to use an operands pair (only the first operand returned is used). The final caller, the LET translate function, does not use the returned operands and didn't need to be modified.

The first and last members of the internal Done Item structure (the values of the done stack member) were replaced with an operands pair. The constructors were updated to initialize the first and second members of the operands pair and a new constructor was added with an operands pair argument. The replace first last function (a poor name) was renamed to the more appropriate replace operands.

[branch table commit 240f967d4a]

Sunday, January 11, 2015

Token – Conversion Handling

The table find code function was the one function taking a token pointer argument that was not modified with the rest of the table functions with a token argument since it did more than just access the code of the token. It had two token arguments, the token being processed (operator, assignment, print item, and input assignment) and its operand token.

This function took no action if the operand data type matched the expected data type of the token. If the operand was a constant, an attempt is made to change the constant to the expected data type if at the last operand and the token does not have the Use Constant As Is flag set. If there is an alternate code of the token that matches that of the operand, the token is changed to that code. Otherwise a hidden conversion code is obtained. An error is thrown if the operand cannot be converted.

Since this function is acting on a token (or possibly two tokens), it made more sense for this function to be a member of the token class. After moving it to the token class, it was renamed to the convert function. With a lot going on in this function, some refactoring was performed to improve the readability and clarity of the code:

Added the is last operand access function to determine if an operand index is the last operand of the code in the token.
Renamed the convert constant function to change constant. Also renamed its argument.
Added the change constant ignore error function, which calls the change constant function and catches an error that may be throw.
Renamed the convert code function to convert code entry. Also renamed its argument.
Removed most of the comments as they were only restarting what appears in the code (this included the callers in the translator routines).

[branch table commit ce6adffc15]

Saturday, January 10, 2015

Table – Entry Access Functions

There were a number of table access functions with a code argument, which was used as an index into the table entry array to access the desired entry member. There also a number of access functions with a token pointer argument, but the token argument was only used to obtain the code (index). All of these functions (except one) were replaced with in-line table entry access functions.

The table access function of the token is used to access the table access functions. Like the existing token type and data type access functions that access the table, has flag and precedence functions were added to the token class so that the intermediate table function call is not needed. Since all of the functions are in-line members, the resulting code is just as efficient.

The table class was also made a friend class of the table entry class so that the static table members are directly accessible. The table entry class was already made a friend class of the table class. Eventually these two classes will be combined as the new table model takes shape.

There were a number of other locations where use of a code was replaced with a table entry pointer. The table entries were made private since all are now covered with access functions. However, this required a table entry constructor (to initialize all of the members) so that the current static table entry array can still be initialized. This is temporary until the new table entry class hierarchy is implemented. See the commit log for other changes that were made.

[branch table commit 2f013da528]

Table – Functions Returning Codes

There were several table (and token) functions that returned a code enumerator (which was really an index). These functions were changed to return a table entry pointer. In the short term this means in some cases calling the temporary code access function to get the index of the entry (the code is still stored in a token).

The functions modified included the find code and token convert code functions. Several new table entry alternate functions replaced equivalent table set token code functions. Click Continue... for details of the changes. Replacing of the use of code enumerators with table entry pointers has been started, which is necessary before the code and type enumerations is replaced with the code type enumeration.

[branch table commit 3df697fa40]

Continued... »

Wednesday, January 7, 2015

Token – Data Type Member

The data type member of the token class was removed since its value should always the same as the table entry for the code in the token (and all tokens now have a code). The data type, is data type, and is data type compatible access functions were modified to get its return value from the table entry for the code. The set data type access function was removed since it is no longer used.

A data type access function was added to the Table Entry class. To access the data type of the table entry, which is within the expression info member, the Expression Info structure definition was moved from the table source file to the table header file so that its members are available from the in-line table entry functions.

These changes caused a problem with the not yet implemented array, defined function and user function codes. There were only one table entry for each of these codes and their data type was Double. This previously worked since the correct data type was in the token. The problem was corrected by adding the missing table entries for these codes for each of the other data types.

There were several locations (translator and tester routines) that to check for these codes by looking for the one code. Now with additional codes, the checks needed to expanded to include the additional alternate codes. This is not ideal, but will be resolved once the code type enumeration is implemented (there will be one code type each for variable, array, defined function and user function codes).

[branch table commit f8d338ae5e]

Tuesday, January 6, 2015

Invalid and Null Code Enumerators

The Invalid code enumerator was used by the table find code function to indicate an invalid conversion and within a token to indicate no code has been set yet. Both uses no longer exist, therefore this enumerator was removed along with the any uses of it (it is no longer necessary to check if a token has a valid code), including the token has valid code access function.

The Null code enumerator was also removed, replaced with the use of the default code enumerator. The first code enumerator was set to 1 so as not to conflict with the default code enumerator. While making this change, it was noticed that the table unary code function was not being used any more, so it was removed.

[branch table commit 7d503805f4]

Token – Data Type Conversion

Hidden conversion codes are inserted into the program when a numeric operand needs to be converted to either a double or integer. However, for numeric constants, no conversion is needed since both the integer and double representation of the constant is available (except for large values that cannot be converted).

The exception mentioned in the last post where the data type of the token is set (excluding token creation) was when a constant is changed or cannot be used because an integer is required but the double is too large (the data type is temporarily changed to the default, which is then checked for by the callers that converts a constant and returns a hidden conversion code for an operand).

The convert constant function was modified to throw an "Expected Valid Integer Constant" status error when an integer is required but the token contains an unconvertible large double constant. Previously this was up to the caller to check if the expected data type was an integer and a double constant wasn't converted. Also, instead of setting the data type before calling the table set token code function, the set token code function with the data type arguments is used.

The convert code function was modified to throw "Expected Type Expression" errors when the token cannot be converted to the desired data type. Only the error Status is thrown and not a Token Error allowing the caller to construct the Token Error. Previously, this function returned the Invalid Code enumerator, which callers checked for.

The table find code function was modified for this change. The "Expected Valid Integer Constant" error is thrown from the convert constant function are caught and ignored because it is possible that the operator or function may contain an alternate that takes a double argument (where a large double constant would be acceptable). If the operator or functions only accepts an integer, then an exception will be throw along with other tokens that cannot be converted.

The translator get expression function was modified to catch error statuses throw from the convert code function and construct a Token Error to throw (by calling the done stack top token error function). Previously this routine had to check for an unconvertible double constant. The process done stack top function was similarly modified to catch errors from the table find code function (also previously checking for an unconvertible constant).

The translator get operand function also called the convert code to check the data type of references. This was not appropriate since no conversion code was needed. The modified convert code was not throwing appropriate errors for references anyway. Therefore, a new token is data type compatible access function was added to check the data type only.

The set token code function without a data type argument was an adapter to the set token code passing the data type contained in the token. Besides the convert constant function, the only other user of this function was the token constructor for string constants, which was modified to use the function with the data type argument directly. The initializer for the code member was also removed since it get initialized by the set token code function. The set token code adapter function was removed.

[branch table commit c2f2420a89]

Sunday, January 4, 2015

Token – Type Member

The token constructors each contain calls to the table static instance function so that the set token or set token code function can be called. These two table functions set the code, type and data type of the token. Now that tokens always contain a code (and later a table entry pointer), the type and data type members will be the same as the table entry (with one exception for data type). So it is not necessary for the token to contain these members.

The type member was removed from the token class (the data type member will have to wait until that one exception is eliminated). The type and is type access functions were modified to read from the table using a new table access function (which for now uses the code to get the table entry pointer). The set type access function was removed as it was only used when the token was created. A type access function was added to the table entry class.

These changes required some refactoring of header files. The token header file include statement in the table file needed to be removed since the token header file needs to access the table entry. The table header file include statement was removed from the token header file. This required the Type enumeration to be moved from the token class to the table header file. This enumeration was not renamed since it will soon be replaced with the new code type enumeration.

[branch table commit 68593c2dde]

Parser – Table Instance Member

From the last change, most uses of the table instance reference member of the parser class was removed; replaced with use of static table functions. The last two uses were for setting of the code (plus type and data type) for constant tokens. For constant string tokens, the setting of the token code was moved to the token constructor for string constants.

The other use of the table instance was for number constants also in the main function operator function of the parser, but with some statements to determine from the desired data type what the data type of the number constant should be and whether to set the Integer Constant sub-code. These statements and the setting of the code of the token were moved to the token constructors for integer and double constants.

For integer constant tokens, the data type is set to integer unless the desired data type is double. There is no need to for the Integer Constant sub-code since the constant is an integer. For double constant tokens, if the value is within the integer range, the Integer Constant is only set if the desired data type is not an integer or double. If the desired data type is indeterminate (Number or Any) then the data type of the token is set to Double. For values outside the integer range, the data type is set to Double.

The table instance reference member was removed since it was no longer being used. The remaining parser constructor was only initializing the input string stream member, so it was moved to and made in-line in the header file.

[branch table commit b89a1b663b]

Parser – Table Entry Pointers

Before the code enumeration (and token type enumeration) can be replaced with the new code type enumeration, uses of the code enumeration type need to be replaced with the use of table entry pointers. This will be done on different parts at a time adding table entry class access functions as needed.

This process was started by changing the table find functions from returning a code enumerator to a table entry pointer (returning a null pointer to indicate no table entry was found). Only the parser routines were using the find functions, so the get identifier and get operator functions were updated to use table entry pointers instead of a code enumerator.

A few table entry access functions were added to support the parser changes including the code, is code, name, has flag and alternate functions. For now the code function simply returns an index value of the entry by subtracting the base of the table entries array. This function is temporary. The is code function checks if the table entry is for a particular code. For now it compares to the code function return value, but eventually will compare to the code member that will be added to the table entry. The alternate function is similar to the alternate code function but returns a table entry pointer.

The code argument of the two token constructors for codes were changed to table entry pointers. For now they just access the code function of the table entry. The immediate goal is to change the interfaces and later to change the underlying code when the token code member is replaced with a table entry pointer. The token constructor taking a code was temporarily left (though the unneeded arguments were removed) for use by the translator routines.

The table entry class was made a friend class of the table class (specifically so the alternate member function can access the static alternate member of the table class). Eventually, the table entry and table classes will be combined into a single class.

[branch table commit 78f0b39780]

Saturday, January 3, 2015

Parser – Parentheses Token Handling

With the forthcoming change from the token type and code enumerations to the code type enumeration, it will be advantageous if all similar codes have the same code type. Table flags will be used command, operator and functions codes since some will need their own code type (for example the LET command and equal operator codes).

The codes being discussed are the codes with operands: constants, variables, arrays, defined functions and user functions. Each have separate codes for each of the three data types, and each (except constants) have an additional three of each data type for reference codes. Currently only constants and variables are fully implemented. For variables, there will be a single Variable code type for each of its six codes. The translator will only need to check this single code type for a variable code.

There was a distinction between token types with parentheses and without (internal functions, defined functions, and generic tokens). For defined and internal functions, this distinction was removed (the token type enumerators were combined for each). For internal functions, instead of checking the token type to determine if there are parentheses, the number of operands is now checked. Eventually defined functions (and later user functions) will have a similar check as the number of operands will be stored in their associated dictionaries.

The parser get word helper function was modified to only check if an open parentheses is present and add it to the identifier string as before, but not take it from the input stream. The parentheses is still needed for functions when searching the table. The get identifier function already removed the parentheses if a table entry was not found, but was modified to remove the parentheses from the input for internal functions only. A new get parentheses access function was added to check if the next character in the stream is an open parentheses, remove the parentheses, and return whether it was a parentheses.

The translator get operand function was modified accordingly. For the single internal function token type, the process internal function routine is only called if the function has operands. For the single defined function token type, the process parentheses token is only called if the next character in the parser is a parentheses (calls the new get parentheses function to remove the parentheses). For the generic parentheses token type (array or user function), the get parentheses function is called to remove the parentheses.

The two separate table entries for defined functions with and without parentheses remain for now with their associated code enumerators and are used to determine if parentheses are present. Once defined functions are fully implemented, these two codes will be replaced with six codes (as described above) and the defined function dictionary will contain whether there should be parentheses (if there are operands).

The test token stream inserter and print token functions were modified for the change in token type enumerators. The token has parentheses access function was only used by the print token function, which used a static map member. The print token function was modified to not require this access function, so it and the static map member were removed. There was also a static map member for precedence. Since all tokens now have a code assigned, all precedences can be obtained from the table, so this static member and its access function were also removed. The expected parser results for tests #2, #4 and #5 were updated for the change in the token types.

[branch table commit 3dd768f604]

Friday, January 2, 2015

Token – Header File Refactoring

With the new table model, the token class will contain a pointer to the table entry of the code instead of the code enumerator (currently used as an index). The table header file currently includes the token header file for the many table functions with a token pointer argument. This is no longer necessary since only token pointer and token pointer references are used (forward declarations are sufficient).

When the token class contains a pointer to the table entry, optimal access to the table entry members will be achieved if the entry access functions are defined in-line. Similarly, optimal access indirectly through to the token access functions to the table members is desired. This will require the table header file to be included in the token header file.

Therefore, some header file refactoring was performed where the token header include statements were moved from all the other header files to the associated source file with the exception of the table header file, which is uses the token type enumeration. When this enumeration is replaced with the code type enumeration, this statement can be removed. Forward declarations were added as needed.

[branch table commit 0777959464]

Parser – Token Creation

A side effect of the last change was that tokens for both codes for operators, functions and commands and codes with operands (constants, variables, arrays, defined functions and user functions) were using the same token constructor. This token constructor searched through alternate codes for the code with the appropriate return data type.

This was unnecessary for operator, function and command codes. The table new token function called for these codes passed in the return data type from the table entry of the code. The token constructor then called the new table set token code function. Since the data type matched the return data type (which was just passed in), no alternates were checked and the code, type and data type of the token was set. This was extra unnecessary work.

A new token constructor was added for operator, function and command codes, which only required arguments for the code, column, length and string. The string argument is only used for the REM and REM operator codes. This constructor replaces the table new token function. This constructor calls the table set code function which just sets the code, type and data type of the token from the table entry of the code. For consistency the code argument was put first in the other token constructor for codes with operands.

While looking at the creation of tokens, I decided that using the standard unique pointer within the parser was unnecessary. The parser can just allocate a token and return its pointer. The translator then can put the allocated tokens into a standard shared pointer. The parser was changed to use plain token pointers. The translator routines were changed to use the new token constructor directly via the standard make shared function. The translator get operand was changed to use the reset function to set the token member since shared pointers cannot be assigned directly to a pointer.

[branch table commit 1bfb76ae0a]

Parser – Codes With Operands

The last token type not being set fully in the parser were codes with operands (constants, variables, arrays, defined functions and user functions). Constant tokens were corrected with the last change. Arrays, defined functions and user functions are not fully implemented and so did not need to be changed. Variables however, were only partially set in the parser (only to the base Variable or Variable Reference code) and weren't set for the data type of the variable until the translator.

The parser get identifier function was modified to set the data type to Double if the word obtained from the input does not have a type. This applies to all identifiers not found in the table. The token constructor for codes is used for commands, operators, functions and codes with operands. The type argument was unnecessary since that is set from the table entry. However, an issue was found with how codes were found in the table.

For operators and functions, the [return] data type of the token is set from the table entry. (This issue doesn't affect commands since command don't have a return data type.) For codes with operands, the data type of the identifier is used to find the appropriate table entry (for example, Variable, Variable Integer, or Variable String) by looking at the data types of alternate codes. The current table set token code function did not work correctly because it searches alternate codes by operand data type. For this instance, the alternate codes need to be search by return data type.

A new set token code function was added without an operand index argument to search by return data type. If the data type (of the identifier) does match the code passed, then the alternate codes are searched for a matching return data type. If there are no alternates or none were found, then the code passed is set in the token along with the token type of the code. The data type is set to the data type of the identifier and not from the table entry (which may not match for codes like arrays that are not fully implemented yet).

The type argument was removed from the token constructor for codes. The type from the table entry of the code was passed (and the new set token code now does this). A call to the new set token code was added to the body of the constructor (previously empty).

Since codes for constants, variable, and variable references were found in the table incorrectly by operand data type, these table entries contained operand data types so that it would work. These codes do not have operands (in the sense that operands and functions do within expressions; not to be confused that in the program, these codes do have an operand index). These table entries were corrected with expression info instances containing no operands.

The translator get operand previously set the default data type of the token just obtained (set to Double if None and not a function). This was removed since the parser now does this. The token set default data type function called to do this was removed. The call to set the code for a no parentheses (variable) token was also no longer needed. With the parser now setting the default data type to Double, the expected results to the parser tests (#2, #3 and #5) needed to be updated.

[branch table commit acc37f0650]