Interactive BASIC Compiler Project: 2014

Wednesday, December 31, 2014

Constant Token Codes

The code and token type enumerations will be combined into a single code type enumeration. Before proceeding, the parser needs to return all tokens assigned to a code. This has been mostly accomplished, though one exception is constant tokens, which were still being assigned codes in the translator. This is complicated because the type of numerical constants may not be known when the constant is parsed. Consider these statements:

A = B + 5
A% = B% + 5
A% = B% + 5.4

For numerical constants, both the integer and double representations of the constant is stored in the constant dictionary except for the case where a double constant does not fit into a 32-bit signed integer. Optimally the representation required is used without a hidden conversion code to unnecessarily convert the constant. For the first statement above, the double value of the constant is used. With the other two statements, the integer value of the constant is used. Number constant tokens have three states:

An integer (no decimal point or exponent; fits into 32 bits)
A small double (has a decimal point or an exponent; fits into 32 bits when converted)
A large double (does not fit into 32 bits; cannot be converted)

The token was set to the integer data type for an integer and small double, and the double data type for a large double. For small doubles, the Double sub-code was set. This required many [somewhat complicated] checks in the translator. To simplify these checks, the data type is now set to integer for integers only and double for all doubles. For small doubles, a new Integer Constant sub-code is set (which does not survive past the translator and therefore does not use one of the available sub-code bits).

Instead of passing whether a number is allowed flag to the parser, the requested data type is now passed. If the data type is integer or double, the code of a constant token is fixed (the Integer Constant sub-code is not needed and is cleared if set). For other requested data types, either the Constant or Constant Integer code is set is described above with the Integer Constant sub-code set for small doubles. The parser also now sets the Constant String code for string constants. The parser makes no attempt to report any errors for data type mismatches.

The decimal flag argument of the token constructor for double constants was removed as the data type is set to Double and the Integer Constant sub-code is set if the value is within the integer range. The convert code was cleaned up by making the desired data type the primary switch and there was no need for secondary switches on the token data type since only one of two data types need to be checked for each desired data type. A convert constant helper function was added to handle changing constant token codes.

The table find code function was simplified due to the change on how constants are represented. The first argument of the set token and set token code functions were changed from a standard shared token pointer reference to a straight token pointer so that they can be called from the parser (with a standard unique pointer), translator (with a standard shared pointer) or token member function (with just a pointer). This simply required calling the get access function of the unique or shared pointers.

The translator get operand function no longer sets the code for constants. The get expression and process internal function functions no longer need to look for and set the codes for constants (the later needs to clear the Integer Constant sub-code for functions taking both number argument types, specifically ABS, SGN and STR$). And the get token function now only needs to pass the data type to the parser. The token convert and table find code functions are used by the translator and will finalize constants not set by the parser once the final data type is known.

[branch table commit 3099f8850f]

Saturday, December 27, 2014

Table – More Alternate Codes

Since it will be ideal to reduce the number of unique code type enumerators needed, several uses of the current code enumerators was eliminated. Most of these were accomplished by assigning the codes needed as alternate codes to commands that need them in the Alternate Info initializer list and obtaining the codes using the alternate map:

Assign → Let (alternate 0)
Input Begin → Input (alternate 0)
Input Begin String → Input Prompt (alternate 0)
Input Assign → Input (alternate 1)
Input Assign → Input Prompt (alternate 1)
Print Double → Print (alternate 0)
Print → Semicolon (alternate 0)
Variable Reference → Variable (alternate 1)

Since the command code is available in the translate function needing them (LET, INPUT, or PRINT) , it can be used to find the alternate code needed. The Print code is assigned as an alternate to the Semicolon code for recreation (see below). For a PRINT statement ending with a semicolon, no PRINT command code is stored in the program code so the semicolon recreate needs to also recreate the PRINT keyword.

The Input code enumerator was used in the common INPUT translate routine to determine if the routine was called for the INPUT or INPUT PROMPT command. This test was changed to checking if the second name of the command token code is empty, which is empty for INPUT and not empty for INPUT PROMPT.

The Constant String code enumerator was used in the token equality operator function to determine if token string comparison should be case sensitive (for REM, REM operator and string constants) or case insensitive (variables, arrays, etc.). This test was changed to checking if the token type is a constant and the data type is a string.

The Print code enumerator was used in the print recreate function instead of using the name from the table entry. This was done because the print semicolon recreate function called this function after appending a semicolon. If the table entry was used, a semicolon would have been output for the command instead of PRINT. The function was changed to create a temporary RPN item with a temporary token with the Print code, but not the code enumerator using the alternate code. This set of statements is convoluted, but will be far simpler once the new table model is implemented.

Unrelated to removing the use of code enumerators, it was noticed that the print comma recreate and print semicolon recreate functions were using the strings for a comma and semicolon directly but technically should have been using the name from the table entry. Both of these were changed to use the table entry name.

[branch table commit b6850c7c7f]

Table – New Code Type Enumeration

With the current table model, each entry contains a token type that identifies the type of token that is created for the table entry. Each entry has a code enumerator, which is simply used as an index to the entry. The code (index) is put into the program code. The code enumeration was originally automatically generated from comments next to the table entries. This avoided mismatches between the code enumerators and the table entries (though this was a poor design choice). Only some entries were identified using these code enumerators.

For the new table model, each table entry will be a unique class with a unique instance. Each table entry instance will be assigned a unique index by the base table entry constructor. Some table entries will still need to be found be some means by the parser and translator. This could be done directly by referencing the table entry instance, but this would require exposing the derived table entry classes. The plan is only to expose the base table entry class definition.

To find these small number of table entries, an enumeration will still be used. This enumeration will be similar to the token type enumeration currently given to each table entry. Therefore a single Code Type enumeration will be defined and will also replace the token type enumeration. Unlike the current Code enumeration, table entries could have the same Code Type enumerator (for example, all of the six variable codes will be assigned the Variable code type). Only the first table entry assigned to a code type will be returned for a code type enumerator; the others will be assigned as alternate codes of the first.

Code types will generally not be assigned to table entries for commands, operators and functions (which will no longer be referred to as internal functions) except for a few cases (for example, the LET command and the equal operator). New table flags will be assigned instead. There is already a Command table flag, and there will be Operator and Function table flags. Each of the codes with operands (variables, arrays, constants, defined functions, user functions, and subroutines) will have a code type. There will not be separate enumerators for tokens with and without parentheses (more on this later).

Friday, December 26, 2014

Sub-Code Enumeration Refactoring

When the Double sub-code enumerator value was changed to support double identifiers with the double data type character (#), it was noticed that the Parentheses and Colon sub-codes would never be used with the same codes. The Parentheses sub-code will only be set for operators and operands (variable, constants, etc.), and the Colon sub-code will only be set for commands (with some exceptions).

The available sub-code bits are limited - there are only six. Several sub-codes were already combined with the Option sub-code ('LET' for assignment codes, 'Question' for the Input Begin String code, and 'Keep' for the INPUT and INPUT PROMPT command codes). The same thing was done for the Parentheses and Colon sub-codes, but instead of using the same sub-code enumerator for both (creating an appropriate name was impossible), both current enumerators were simply given the same value.

There was one problem with this scheme. Two codes that could have the Colon sub-code are the Comma and Semicolon codes, which are only present in a PRINT statement and will be commands at run-time. These codes are defined as operators. The assignment codes are defined as no types, but are commands and could have the Colon sub-code. To indicate that these are commands, a new Command table flag was added and their table entries were given this flag.

Therefore, a code has the Colon sub-code if it is a command type or has the Command table flag, otherwise the code has the Parentheses sub-code. Checks for the Command table flag were added to the recreator function operator, the stream insert operator for program words, and the test stream insert operator for tokens. A few other minor sub-code related changes were made, click Continue... for details.

Continued... »

Thursday, December 25, 2014

Double Identifier Problem

An existing problem was discovered when the parser was modified to not store the data type character of identifiers. The issue was with double identifiers when using the optional # data type character. The identifiers Variable and Variable# were incorrectly added to the dictionary as separate entries when they should have been the same entry.

The parser get identifier function was modified to not store the data type character in the token. This caused a problem when recreating double identifiers where the # character entered would disappear. Recreating all double identifiers with a # character was also not desirable. This was corrected by adding the Double sub-code to the token. A sub-code argument was added to the token constructor for identifiers. This sub-code is encoded into the program code so that the # character is recreated when it is entered.

The Double sub-code was only being used for constants. When the value of a constant is within the integer range, its data type is set to integer, and if a decimal point is present, the Double sub-code is set. The translator uses this sub-code to determine if a constant can be used as a double even though the data type is integer (see post from October 28 for details). This sub-code does not survive past the translator (not put into the program code).

A new string with data type access function was added to the token to add the data type character (#, % for integers, and $ for strings) to the token string returned. A # character is only added if the Double sub-code is set. This function replaced the string access function in the test token stream insert operator, tester print token, and several recreate functions.

A sub-code argument was added to the table entry operand text functions. The variable operand text functions were modified to add the data type character to the variable name. For double variables, the character is only added if the Double sub-code is set. An Ignore sub-code enumerator was added, and when passed to the operand text function, no data type character is added to the variable name. This option was needed for the program model decode function that uses the operand text function to set the string of the token (since tokens no longer store the data type character).

The value of the Double sub-code was changed so that its bit value was within the range of the sub-code bits (not necessary before since this sub-code was not used in the program code). The return type of program code instruction sub-code access function was changed to the Sub-Code enumeration type (from an integer). The expected encoder test results were updated, specifically the dictionaries output since the data type characters are no longer present in the entries.

[branch table commit e97057efca]

Parser – Identifier Codes

The parser previously set the code for an identifier token only when the word was found in the table (command, operator or function). The codes for other identifiers were set in the translator: defined functions with no parentheses and variables (get operand); arrays, functions, and defined functions with parentheses (process parentheses tokens). This was changed to set all codes in the parser.

To do this in the parser, the parser needed to know if a reference operand was being requested. For now identifiers with no parentheses are set to variables, and with parentheses are set to arrays unless they start with an F (temporary check for testing). Defined functions are identifiers that start with an FN. Eventually the parser will need access to the program dictionaries to fully determine which code to assign to an identifier token.

The get identifier function was modified to set the code as described above for identifiers not found in the table. A reference argument was added, which was also added to the parser function operator. (The Reference enumeration was moved from the translator class to the main header file so that its enumerators are accessible.) The token constructor for codes and identifiers were combined to a single constructor with default arguments for the string and reference members.

For variables, the reference argument is used to determine if the code is a variable or a variable reference. Only the base code is set as the translator changed the code for the data type. In the case of a variable reference, the reference member of the token is not set (the translator did not previously set it either).

Several token type cases in the translator get operand function was modified. For defined functions with no parentheses, the token reference and code members no longer need to be set. For no parentheses tokens (variables), the code is still updated for the data type. The parser will do this once the new table model is implemented. For parentheses tokens (arrays), the token reference member no longer needs to be set.

The translator process parentheses token function no longer does the check for functions (temporarily identifiers starting with F), or set the code of the token. For determining an array (to set the expected expression types to integer for the subscripts), the Array code is checked for. This check will need to be modified when arrays are implemented since there will be different array codes for each data type, which will be set by the parser.

[branch table commit 69dff18e26]

Sunday, December 21, 2014

Table – Entry Pointers

The new table implementation will have a single Table class that will represent a table entry for a single code and will serve as the base class to all of the derived table entry classes. Global table information (name to entry map, alternate code map, etc.) will be static members of this class. This is equivalent to the current since table instance.

Currently the Table class and Table Entry structure are separate definitions, but eventually the table entry members will be members of the new base table class. The current table class has many access functions where their first argument is a code enumerator (which is currently used as an index). There are also many access functions that have a token pointer as their first argument, but this is mostly used to get the code from the token.

The new table model will access the table entries by a pointer to the entry instead of be a code enumerator used as an index. The code argument access functions will become simply access functions to the table entry. The next step in this transition to the new table model will be to use pointers to table entries instead of code enumerators. The token code member will become a table entry pointer.

To start this transition, the Table Entry structure was moved from the table source file to the table header file. The members were also renamed with the member (m_) prefix except for the function pointer members (which will be replaced with virtual functions in the new table model).

[branch table commit 94decad26c]

Table – Internal Code Token Types

There were several internal code table entries (null, assignment, print item, input assign and input parse) that were assigned to either an Operator or Internal Function token type. These internal codes do not require a token type because they are not produced by the parser (the token type is only used for tokens from the parser). These table entries were changed to the default token type (in other words, no type).

The reason for this change will become more evident when the new table class hierarchy is implemented. One of the goals of which is to reduce the amount of unnecessary initialization values. It may even turn out that the token type member of all table entries will be unnecessary, but this is not clear yet.

These changes did cause a minor issue in test output. By default the test output stream insert operator for a token outputs nothing for a token without a type, which cause the above modified codes to not produce their debug name. This function was modified to output the debug name in the default case instead of doing nothing. Since the internal function types also only output the debug name, these cases were removed to let the default case handle these types.

[branch table commit 232a97f6d6]

Table – New Token Consolidation

The next major change to the table class will be to start transitioning to using table entry pointers instead of code enumerators and to remove the use of code enumerators as indexes. The code and index values will be separate members of table entries. All table entries will have an index (which is put into the program code), but only a few table entries will have a code (only those the require specific lookup like some of the special symbols including comma, parentheses, colon, etc.). Before proceeding with this, a small simplification was made first.

There were two new token functions in the table class, one taking a single code argument and the other taking column, length and code arguments. The single argument version relied on the default token constructor. This was one of three uses of the default token constructor. The default token constructor contained optional column and length arguments (default to -1 indicating unset), but there were no callers of the default token constructor that used these arguments.

The single code argument new token function was removed along with the default token constructor. The code argument was made the first argument of the other version of the new token function with default -1 values provided for the optional column and length arguments. This second version does not use the default token constructor. All callers of this function were in the parser and were modified for the reordering of the arguments.

The second use of the default token constructor was by the decode function in the program model class to create a default token, which it then used the token set code access function to set the code. This function was changed to use the new token function.

The third use of the default token constructor was in the INPUT translate function where a new token is needed for an input assign code and another token (comma or semicolon) is not available for reuse. This was changed to use the new token function with a Null code.

[branch table commit b5dd96c272]

Saturday, December 20, 2014

Table – Operand Arrays

There was a macro used for generating two arguments to the Expression Info constructor, which took an argument identifying the array less its suffix. This macro was removed (the last such macro) and the predefined operand data type arrays were replaced with standard initializer lists. With an initializer list, the size of the list is available.

The operand count and operand data type array pointer arguments of the Expression Info constructor were replaced with a standard initializer list of data types (with a default of a blank list). The operand count member is initialized to the size of the operands list. A standard initializer list is implemented as an array internally. The begin access function is used to access the beginning of the initializer list to initialize the operand data type array. The arguments of the expression info instances were modified to the initializer lists.

Another set of related changes were also made. A null expression info instance was added (with no return value or operands). The table entries that previously had their expression info pointer member initialized to a null pointer were changed to point to this null expression info instance. This allowed the removal of the check for a null expression info pointer member in several of the access functions and from the add function.

[branch table commit fe5801c227]

Table – Associated Code Removal

Now that use of the associated code variables and access functions has been replaced with the alternate code map, the associated code members could be removed from the Expression Info structure and their access functions removed from the table class. The predefined associated code arrays and the associated code macros were also removed.

Table entries that created their own expression info instance using one of the associated code macros were replaced with the appropriate pointer to a predefined expression info instance. A few additional predefined expression info instances were needed.

[branch table commit 18cb29c185]

Friday, December 19, 2014

Table – Expected Data Type

The expected data type table entry member was recently moved from the Expression Info structure (because codes using the same return and operand data types could have different expected data types). There were several issues with the expected data types initialization implementation (which were initialized automatically to prevent programming mistakes):

Every table entry had an expected data type even it is was never used (for example, commands). Even when it was in the Expression Info structure, it was not used for many codes (no argument functions, assignment codes, etc.).
A separate iteration loop was needed to initialize the expected data types since the alternate code information was used, the alternate code map needed to be initialized first.
The expected data type initialization took into account if a code had the possibility of having all three data types (Double, Integer, and String) where the expected data type would be set to Any even though there were actually such codes.

This implementation was replaced with a new table entry pointer to expected data type static map member. Only table entries requiring an expected data type are added to this map. This includes all primary codes and any alternate primary code. An alternate primary code is the primary code for the second operand, for example, a binary operator with two integer operands (where the primary has two double operands). Entries are added to the expected data type in the add function when:

A new entry is added to the name to entry map (a new primary code).
An entry is replaced in the name to entry map (a new primary when the operand count is less than the current primary; and in this case the old entry is removed for an internal function).
A new secondary primary is added to the alternate code map (a binary operator to a unary operator).
A replacement alternate primary is found (one that has the same operands, see last post; the old alternate primary entry is removed).
A new alternate primary is found (the alternate is added for the first operand of a binary operator, otherwise the current entry of the primary is modified).

An add expected data type private support function adds or modifies an entry to the expected data type taking table entry pointer and data type arguments. If there is currently no entry, a new entry is added. If there is an entry and its data type is Double or Integer, then the entry is changed to Number (the new data type will be either Integer or Double). Otherwise, the entry is left unchanged.

The expected data type access function was modified to use the new map. The expected data type member was removed from the Table Entry structure, and its initializer values were removed from the table entries. The separate iteration loop in the table constructor to initialize the expected data types was removed.

A problem was found in the set token code function (used to set the code in a token, possibly an alternate code, depending on its data type) where it could incorrectly add a new blank element to the alternate code map. This did not appear to cause a problem, but was corrected by checking if the code is present before iterating over alternate codes. This issue was that for a standard map, the bracket operator adds blank elements if the key does not exist.

[branch table commit 1374657d7e]

Thursday, December 18, 2014

Table – Other Alternate Codes

There were many alternate codes that couldn't be initialized automatically like with operators and internal functions. These codes include the assignment, sub-string assignment, internal command (for INPUT and PRINT) and codes with operators. For now these other alternate codes need to be initialized manually.

An Alternate Information structure was added containing the primary code, the array index for the alternate codes and a initializer list of alternate codes. An initializer list of these structures was added containing the information for all of these other alternate codes. After iterating through the list of entries, the constructor iterates through this initializer list to add these other alternate codes to the alternate code map.

The rest of the uses of the associated code arrays were changed to use the alternate code map including the set token code function (used to set the code in a token, possibly an alternate code, depending on its data type), the LET translate function (for setting a sub-string assignment, string keep assignment and list assignment codes), and the INPUT translate function (for setting an input parse code).

The check in the constructor for validating the second associated code index was removed. The section for setting the expected data type of an operator or internal function was modified to use the alternate code map. This required a separate entry iteration loop since the alternate code map needs to be initialized completely before looking at the alternate codes.

There was problem with the automatic alternate code map initialization because of the current order of entries. The issue was that the binary operators with two integer operands was being made an alternate code of the operator with first integer and second double operand. This was different than how the associated code arrays were initialized. Instead of moving all of these entries (and their enumerators), a check was added to the alternate map initialization to check for with situation and to swap the entries.

Two additional checks were added to the alternate map initialization to make sure the primary binary operator code has operands with the same data type, throwing an error if not. In the check if a multiple internal function code with the greater number of operands is before the code with less operands had to also set the Multiple flag. The code with less operands was already being made the primary code.

[branch table commit d38f79b8a1]

Sunday, December 14, 2014

Table Alternate Codes – Operators/Functions (Use)

With the new alternate codes map implemented and partially filled with all the alternate codes for operators and internal functions, the translator was modified to start using this map instead of the associated codes array.

Two access functions were added to the table class, which included the alternate code and alternate code count functions. Both take code enumerator and operand index arguments. These functions have temporary implementations. When the new table model is fully implemented, these functions won't need the code argument as the this pointer will be used as the key to the map. They will also return a entry pointer instead of a code enumerator.

The binary operator check for a unary operator in the translator get expression routine was modified to use the new access functions. In the process internal function routine, the new access function is used to get the alternate code for a function for an operand of a different data type as the primary function code, and when an extra argument is found for a function with multiple forms.

There was an issue with the subtract code table entries. The current associated code arrays are still being used to process operands of operators because the find code routine is still being used to get associated codes for codes that don't have entries in the new alternate map yet, so couldn't be modified to use the new access functions.

The problem was caused by the change to make the main binary subtract code (two double operands) the second associated code of the negate code, which was made the primary code for minus operator. This subtract code was moved to after the subtract code with the first integer operand. Since this code was first, it was made the main binary code and the primary binary alternate to the negate code. This code did not have the correct associated codes on the current associated code array, so hidden conversion operators were incorrectly added to the output list.

This order of the alternates in the table does not matter with the alternate generation, but for the moment, the new alternate map and the associated code arrays need to agree. This problem was corrected by moving the main subtract code to before the subtract with first integer operand. Since the code enumeration is not automatic, the subtract enumerator also had to be moved to match the table. This is a temporary situation.

[branch table commit 01db8002ba]

Table Alternate Codes – Operators/Functions

The alternate codes map will be implemented in a number of steps including adding alternate codes automatically for operators and internal functions, using the alternate map for operators and functions, removing the associated codes for operators and functions, manually adding alternate codes for internal codes, using the alternate map for internal codes, and adding additional alternate codes (to further reduce the need for code enumerators).

First, the definition for the alternate code static map member was added to the table class along with its instantiation in the table source file. The standard array is new to the C++11 STL and is just as efficient as a built-in array with improvements. Since the definition is quite lengthy, it was broken into two definitions (there was no reason to define a constant for the 3 since this is the only place that it is needed; indicating up to three operands or arguments):

using EntryVectorArray = std::array<std::vector<TableEntry *>, 3>;
static std::unordered_map<TableEntry *, EntryVectorArray> s_alternate;

The add function was modified to add an entry as an alternate code if appropriate. Alternate codes are based on having the same name as another code. If an entry name is newly added to the name to entry static map, then the routine returns immediately, in other words, the code is a primary code. If the name is already in this map, has an expression information structure, has operands, and does not have its Reference flag set, then the entry can be added as an alternate code.

If the operand count of the entry is less than the count of the primary code, then it should be the primary code. In this case, the pointer to the entry replaces the value in the name to entry map, and the previous primary is made an alternate code of the entry, by adding to the array element for its operand count minus one, and returning. This was a better solution then reporting an error.

The routine does a series of comparisons between the operand data types of the entry with that of the primary to identify which primary code it should be added as an alternate to. If all the operand data types of the primary code match that of entry and the entry is an internal function with more operands, then it is made an alternate of the primary in the array element one less than the operand count. This is a multiple argument entry (ASC, INSTR or MID$) so the Multiple flag is set on the primary code. Otherwise, the entry has duplicate operands as the primary and an error is thrown.

Since the Multiple flag is set automatically, it no longer needs to be specified in the table entry array initialization. Also since this is automatic, and the requirement that a multiple entry be in the following entry in the table was eliminated, the validation of multiple non-assignment entries was removed from the table constructor.

Using the debugger within QtCreator, the alternate map was verified to be setup correctly. At this point however, this new alternate map is not being used. This will be the subject of the next change, which will be to use the alternate map for the operators and internal functions instead of the associated code arrays.

[branch table commit b7021a78ae]

Table – Alternate Codes Map

The handling of alternate codes (formally known as associated codes) will be handled differently in the new table model. In the current table model, each code in its expression information structure contained a single array of associated codes along with a count and an index to a second set of associated codes within the array. In the new table model, these will be removed from the expression information structure.

The new table model will contain the information for a code in a single table entry instance, which will be handled by a pointer. The table class will contain some static data members (members shared by all instances). The alternate code information will be stored in one of these new static data members, specifically a map from a primary code table entry pointer (the key) to its alternate codes (the value).

The value of this alternate map will contain an array (a standard array will be used) of three elements. Each element represents the alternate codes for a particular operand. Generally, the first element (index of 0) will have alternate codes where the data type of the first operand is different from the primary code. The second element will have alternate codes where the data type of the second operand is different from the primary code. This was roughly the purpose of the second associated codes.

The third element of the array is applicable only for three argument internal functions, which is new. There are currently no planned internal functions that have different data types in the third argument. This third element will be used to associate three argument functions to there primary code with two arguments. This applies to the MID$ and INSTR functions which have two and three argument versions. This similarly applies to the ASC function, but its second form has two arguments, so the second element of the array is used.

Each element of this array will contain a vector of alternate code table entry pointers. A particular element may have an empty vector indicating no alternate codes with different data types for that operand or argument. The first step will be to automatically generate this map from operator and internal function table entries from the operand data type information.

Saturday, December 13, 2014

Table – Code Enumeration Increment Functions

Before the code enumeration can be changed to a C++11 enumeration class, the increment operators need to be removed (though they could be made to work with an enumeration class by using static casts, but this not desirable). There were only two uses of this increment operator.

One use of the code increment operator was in the assign string recreate function for sub-string assignments, where the name of the function was used to find the original sub-string function code. If the sub-string function has multiple entries (MID$), then the sub-string code is incremented. For functions with variable number of arguments (ASC, INSTR, and MID$), the second code with an additional argument followed the first code, which was required for the increment method to work. The sub-string code was used to recreate the sub-string assignment. The sub-string code was needed since it had the correct number of operands (the sub-string assignments did not).

The assignment, list assignment and string keep assignment code entries were given one operand for the data type of the value being assigned. Only this first operand data type was used and it did not matter if there were more operands (the count of operands was not used), so the sub-string assignments were given the same operand data types as the sub-string functions (the sub-string keep assignments already had these operands). With the correct operands, the sub-string assignments assignments can be recreated directly without having to look up the sub-string function code, eliminating the need for the code increment operator.

The set token code function is used to set the correct code for the operand data type. One use is for operators to set the correct code for the data type of its operand. The second associated codes are used for the second operand. A negative second associated code index indicates no second associated codes. A negative index was only used by the sub-string functions to prevent it from using any of its associated codes). The set token code function is no longer called for functions, so this check was unnecessary. The negative index was removed from the sub-string code entries.

The other use of the code increment operator was in the translator process internal function to move to the next code a function with variable number of operands (ASC, INSTR, and MID$mentioned above). The token next code access function was used to increment the code. The second code of these functions were associated to the first code, so now when a comma is processed for the next argument instead of a closing parentheses, the second code is obtained by getting the associated code of the first code. The next code function was removed along with the code increment operator functions.

[branch table commit e5b55fa271]

Friday, December 12, 2014

Table – Code Enumeration

With the new table model, the code enumeration will be a subset of the current code enumeration and will only include codes that are referenced (for example, Comma, Equal, Semicolon, Open Parentheses, Closing Parentheses, etc.). With the bracketing codes removed, all the remaining code enumerators represent actual codes. The current auto-generated code enumeration was copied to the main header file.

Temporarily, the code enumeration definition must match the table entry array and there are no checks to insure this (which was the purpose of auto-generating the code enumeration). Since the code enumerators are still used as indexes, this enumeration was not changed to a C++11 enumeration class yet. There are also some in-line functions for incrementing a code enumerator, which is possible since plain enumerators can be used as indexes, but more difficult with an enumeration class. Notes were added to which enumerators will be removed.

The enumerations awk script was removed. The CMake build file was modified to remove the auto-generation of the code enumeration header file. Since the awk program is no longer used, looking for this program was also removed. The next goal will be remove the use of code enumerators as indexes so that this enumeration can be changed an enumeration class.

[branch table commit 9a3c075ba0]

Thursday, December 11, 2014

Table – Name Lookup Mechanism

In order to eliminate the auto-generated code enumeration, unnecessary code enumerators will be removed. Table bracketing entries were used to break the table into search groups. There were three of these groups, which included plain words (including two-word commands), words with parentheses, and symbols.

When the table was initialized, the start and end of each group was determined and stored in the range member indexed by the search type enumerator. This enumeration was not changed to a C++11 enumeration class because its enumerators were used as indexes. When a caller wanted to search the table for a name, it passed one of these search types and the search looked only within that range (by sequentially iterating over the entries).

The new name lookup mechanism uses a standard unordered map where the key will be the name of the code and value will be a pointer to a table entry. For two-word codes, the two words are combined with a space separator between words. The key hash and key equal function operators defined in the dictionary class were moved to the utility header file and renamed to case optional. These are used for this map, defined as a static member, so that searches are case insensitive.

The search functions were renamed find to mirror the standard library names and were made static. The primary find function calls the map find function with the string argument. If the string was found then the code index is calculated by subtracting the base table entry pointer from the table entry pointer, otherwise the Invalid code is returned. After the new table is fully implemented, the table entry pointer itself will be returned or the default table entry pointer. The find function for two words combines the words with a space separator and calls the primary find function.

The add function was added to add an entry to the table, which now just consists of adding to the name to entry pointer map. This function is called at the beginning of the entry iteration loop in the table constructor and throws an exception (a string) if an error is found (a two-word code is already in the map). Eventually this code will be put into or called from the base table constructor.

The search type enumeration was removed along with the bracketing table entries (include their code enumerators) and the range member used to hold the indexes of the bracketing entries. The End Plain Word bracketing code was used by the unary operator recreate function to determine if a space should be added after the operator it is a name and not a symbol. This check was changed to checking if the last character of the operator name is a letter. There was also a match function that was no longer used and was removed.

[branch table commit f5f563cad2]

Wednesday, December 10, 2014

Table – Initialization (Errors)

Before creating the name and alternate code maps, the table constructor was modified to use a standard vector of standard strings to record table errors. Any errors found are then output to the standard error stream. Finally, the standard abort function is called to terminate the program when errors are found. The translation call was also removed as this isn't necessary because these errors are programming bugs not requiring translation.

[branch table commit 70f9125512]

Table – Name and Alternate Code Maps

Since the table source file is going to be receiving a lot of changes, the immediate goal is eliminate the auto-generated code enumeration header file so that the entire project doesn't need to be recompiled for each change to the table source file. This will be accomplished by changing how code enumerators (indexes) are handled.

A code enumeration will still be required for a limited number of codes that are referenced throughout, for example, the special operators like Comma, Equal, Semicolon, End-of-Line, etc. So there will still be a code enumeration, but it will not be used as an index to codes. Currently however, there are many more code enumerators that are used, specifically the range code enumerators and the code enumerators used for the associated code arrays.

The range code enumerators will be removed first. To accomplish this, the search mechanism will be changed, which currently searches for a name within three different ranges (plain words, parentheses words, and symbols) of the table entry array. The new mechanism will have one-word names in a name to table entry pointer map. The two-word names will be in a separate two-word names map. These maps will be static table members. The table entry structure members will eventually be in the new table class, so these table entry pointers will become table instance pointers.

When the table consists of many code table instances, the base table class constructor will setup these maps when the constructor of the code classes calls it. For now, these maps will be setup in the current single instance table constructor when it iterates the table entry array.

Similarly, there will be maps from code table instance pointers to vectors of table instance pointers for alternate codes (the new name for associated codes). These static member maps will also be setup by the new base table class constructor, but will temporarily by setup in the current table constructor.

Tuesday, December 9, 2014

Table – New Model

One of the major goals for the new table design (missed in the December 3 post) is to eliminate all the standalone code work functions (translate, encode, recreate, etc.), which require their definitions so that pointers to them can be put into the table entries. Many of the codes do not have several of these work functions (for example, only commands have translate functions), and many codes share work functions (for example, all binary operator have the same recreate function). Once all the codes are implemented, there would have been an explosion of these work functions, especially considering that each code will need a unique run function.

Therefore, a class hierarchy will be the basis for the new table model where the base class holds the information members and virtual functions used for these functions. Using virtual functions allow defining common work functions in an upper class. Unfortunately, there will be an explosion of classes since all codes need a unique run virtual function. Fortunately, there is a way to define these classes without requiring every source file to know about them so only the base table class needs to be known globally, which will contain the interface for all of these derived classes.

There will be a table instance for each code containing the information for that code only. Instead of identifying a code by an enumerator (essentially an index), a code will be identified by a base table class pointer. Code information will be accessed by inline access functions, and the work functions accessed using virtual functions. The index of the code will only be accessed when a token is finally encoded into the program. Each code will be assigned a unique index during initialization (more on this later). The token will therefore contain a code table instance pointer instead of a code enumerator (index).

Though there will no longer be a monolithic table instance, there will be some static table class members and functions. For example, the search functions will be static since these will not require a table instance (they will return an instance for a code). The search functions will use a static map member for looking up a name to get a code table instance. Each of these static members will be described as they are created.

Table – Current Model

The model of the current table design is a monolithic single instance where the information for each code is obtained with a code enumerator that is used within the table as an index into an array of plain structures where each element contains the information for a single code. There is a series of access functions taking a code enumerator as an argument. There is a similar series of access functions taking a token as an argument, where the code enumerator stored in the token is used. Finally there are a couple of functions for searching the table for a code by a name.

A singleton pattern was used for the table instance so that a global table instance would not be used. However, the table entry array was global (though only within the table source file) along with a static pointer to the instance (within the table class and defined in the table source file). Is this really a singleton? In any case, this singleton pattern will be temporarily replaced with a single global table instance (until the new table model starts to get implemented).

One problem with the current table is when it comes time to add a new variable to a table entry structure, like the just added expected data type member. If a value for one of the entries is missed, the compiler reports the error, but error is reported against the end of the array giving no clue which entry has the problem. Finding the problem entry is very time consuming. (This actually occurred).

Another problem is with the auto-generation of the code enumeration. The actual code enumerators were defined as comments in the table source file, which a awk script combed through an generated a header file with the code enumeration definition. This design was an attempt to eliminate the problem of matching the code enumeration with the entries.

This issue was that this auto-generated header file is dependent on the table source file, the main header file includes this header file, and all source files are dependent on the main header file. So every time the table source file is modified, the entire project needs to be rebuilt, and this is becoming a nuisance. Therefore, the first goal will be to eliminate this.

Sunday, December 7, 2014

Pre-Table – Expected Data Type

The expected data type member of the Expression Info structure was initialized during table initialization based on the type (operator or function), number of operands and data types of the operands. This variable was moved to the table entry so that the preset expression info instances are not modified. This resolved the issue with the REPEAT$ function. An initialization value needed to be added to all of the table entries, a drawback of the current flat table entry array (though this will be changed in time). The default data type value was used.

This is about the end of the preliminary changes for the table. The next post will begin to describe the new table design. Since the changes will be significant, an attempt will be made to transition to the new design in a series of smaller changes. The movement of the expected data type could be considered the first of these changes.

[branch table commit 81992b2d51]

Pre-Table – Expression Return Data Type

The data type table entry member was originally used for both the next expected token data type for a command and the return data type for operators and functions. The next expected data type was used with the now replaced token centric translator. Since the translator is now command centric, a next expected data type for commands is not needed. Since the return data type is expression related, it was moved into the Expression Info structure.

The preset expression info structure instances were updated to include a return data type. The table entries were updated where the data type initialization value was removed and the expression info structure pointer name updated or the return data type value added to the expression structure constructor call.

For the REPEAT$ function entry, the same preset expression info structure instance that is also used by several assign operators could not be used since the REPEAT$ function required a different expected data type.

The data type access function was replaced with a return data type function that checks if the expression info structure is present before attempting to access the return data type member. If the structure is not present, the None data type is returned.

[branch table commit 0186dcb76b]

Pre-Table – Multiple Member

The multiple table entry member identified codes that either could be the first word of a two-word command (for example INPUT), was a two-word command (INPUT PROMPT), could be the first character of a two-character operator (<), or was a two-character operator (<=). Having an entire member for just this was unnecessary since a flag could be used for this purpose. In addition, it was not necessary to identify two-character operators.

Therefore, the multiple table entry member was replaced with a new Two table flag (a Multiple table flag already existed) and its access function removed. The Multiple enumeration contained separate enumerators for both characters and words, though were assigned to the same value and even contained enumerators for three characters or words, even though there were no codes that used these. This enumeration was removed.

Some changes were made to the Table Flag enumeration, including assigning it an underlying type of an unsigned integer (32 bits). Instead of assigning the enumerators to hard to read hexadecimal constants, they were assigned values using the shift operator where an unsigned one value is shifted by a unique number of bits. The shift operation is calculated during compilation. The Null table flag enumerator was removed; the default table flag value (TableFlag{}) is used instead. The flag table entry member type was also changed to an unsigned integer. The has flag access functions were changed to return a boolean value.

The parser get identifier and get operator routines were updated to use the has flag access function with the Two flag instead of using the multiple access function. In the table header file, there were a few remaining previously missed uses of a Qt type (quint16) on function pointer definitions that were replaced with the standard equivalent type (uint16_t).

[branch table commit 2f5edfc30e]

Saturday, December 6, 2014

Pre-Table – Unary Operator Detection

The expression information structure within the table entry contained a unary code member, which was used to determine if a code was or could be a unary operator. This member contained the code of the unary operator or a null code. Only four codes contained a non-null code, which included the two negate codes (double and integer), the NOT operator, and the main subtract code. The subtract code contained the first negate code, but the others contained their own code.

This was not efficient use of the member as it was only used for four codes especially considering there was another way to determine a unary operator, namely checking if token type is Operator and the number of operands is one.

This required a change in how the negate and subtract operators are associated with each other. Originally these codes were not associated as the unary code member was used to get from the subtract code to the negate code. Without the unary code member, an association was needed. It did not make sense and was problematic to associate the negate code to the subtract code, which already had a number of associated codes.

Therefore the subtract code was associated with the negate code as a secondary associated code (the negate code already has the integer negate code associated to it). Making it the second associated code makes sense as the subtract has two operands. This change required some changes with how the translator handles unary operators.

When getting an operand, the translator get expression routine checked if the current token was a unary operator, and if it was, changed it to a unary operator. Now it just checks if the current token is not a unary operator before calling the get operand routine. When getting an operator, if the operator was a unary operator, an error was thrown immediately. This was changed to check if the unary operator has a secondary associated code (a binary operator), it if it does, the token is changed to the secondary associated code, otherwise an error is thrown as before.

The table initialization that sets the expected data type member was changed to look through the primary associated codes only for unary operator instead of the secondary associated codes. There was an is unary operator function definition in the table class definition that was not used and didn't have a function, so it was removed. The unary code argument was removed from all of the expression info constructor calls.

[branch table commit b472f6a36d]

Pre-Table – Sub-String Assignments

In researching the requirements for the new table design, specifically how to implement associated codes (which will be renamed alternate codes), there was an issue with the associated codes for sub-string assignments (LEFT$, MID$ and RIGHT$).

The sub-string assignment codes are the first associated code for the sub-string codes. The sub-string assignment-keep codes are the first associated code of the sub-string assignment codes. For recreation, the original sub-string code was the second associated code for each of these assignment codes. This was necessary since the sub-string code contained the actual number of arguments as the sub-string assignment codes contain only two arguments (one for the value and one for the reference being assigned).

This circular association back to the original sub-string code was going to be a problem with the new table design and so the sub-string code associated were removed. The first change made to the sub-string assignment table entries was to put back the original sub-string keyword name. Since the debug name is the combination of the primary and secondary names, the secondary names were changed to remove redundancy, for example, the debug name for AssignLeft changes to LEFT$(Assign. The affected expected test outputs were updated accordingly.

The assign string recreate function was modified to use the keyword name to look up the original sub-string code instead of using the second associated code for the sub-string assignment codes. This caused a problem with the MID3 codes as the MID2 code was found, which contained the wrong number of arguments. This was resolved by adding the Multiple table flag to the MID3 codes. When this flag is set, the sub-string code found is incremented to the next code. This change caused a problem during table initialization checking, which was modified to only check Multiple flagged entries if the Reference flag is not also set.

The assign string recreate function was also using the fact that the second associated index was set zero to detect an assignment-keep code (which are used in string list assignment statements). The string built so far needs to be put back onto the recreation stack. With the above changes, the assignment-keep codes no longer have associated codes. This was resolved by adding a new Keep table flag to identify the assignment-keep codes.

[branch table commit f0c7ebdacd]

Pre-Table – Maximum Checks

There were two constants defined in the table source file, one for the maximum number of operands and one for the maximum number of associated codes. During table initialization, while it is iterating over all of the table entries, it looks for the largest operand count and associated code count for any entry. If these largest values are larger than the maximum, then a table initialization error occurs (the application then aborts).

I was not able to determine why these constants and checks were put in. There is nothing that necessarily limits the number of operands or associated codes. These maximum constants were only used for these checks. Therefore, these constants and checks were removed.

[branch table commit 489c59904c]

Friday, December 5, 2014

Pre-Table – Standard String Members

The string members of the table entries were changed to standard strings including the return value of the their access functions. This includes these members and their uses:

name - the main keyword name of the code; blank for an internal code
name2 - the second keyword name of a two-word command; blank for a one-word commands; also used as the debug name for other codes like operators and internal functions
option - the debug name for codes that support the option sub-code (to reduce the number of sub-codes, this single sub-code has a different meanings for different codes, for example Question for the input begin string code, LET for assignment codes, and Keep for the intermediate sub-string assignment codes)

Besides the access functions for these members, there is also a debug name access function that returns the debug name for a code. The purpose of this name is to represent each code with a unique name. For example, there are several add codes depending on the data types of the operands, which includes the +, +%1, +%2, +%, and +$ strings. For all but the primary add code (taking double operands), these strings were put into the name2 member. When the name2 was empty, the name member would be returned for the debug name.

This mechanism was modified where the name2 member now contains only the second part of the debug name. So, for the add operators, all contain + for their name member, and the name2 members contain the "", %1, %2, %, and $ strings. The debug name function now always combines these two strings. This also works for internal codes that have a blank name member.

However, the resulting debug names for internal functions are now a little different. For example, the ASC, MID$, and STR$ functions previously had debug names (first line below where the 2 and 3 represent the number of arguments and the % represents an integer argument) become (where the name2 members are shortened to "", 2, 3, %):

ASC( ASC2( MID2$( MID3$( STR$( STR%$(
ASC( ASC(2 MID$(2 MID$(3 STR$( STR$(%

For the non-sub-string assignments, their name member is set to = and their name2 member contain the name of their code (Assign, Assign%, Assign$, etc.). The = is necessary so that the assignment statements are recreated correctly. The result of combining these two names are the strings =Assign, =Assign%, =Assign$ etc. This was left as is instead of trying to code around it since this only affects test output.

Obviously the change in debug names affected the expected outputs of many of the tests. These were updated accordingly. Since standard strings cannot be initialized to a null pointer, all blank strings in the table were changed to the "" empty string.

[branch table commit 4f13d11a05]

Thursday, December 4, 2014

Pre-Table – Removed Test Names Header

The test names header file contained an array of C-style strings indexed by a code enumerator each with the name of that code. This file was automatically generated by an awk script that extracted the information from the table class source file. This array was only used by the tester print token function (itself only called from the tester parse input function).

There was really no reason that the code name itself couldn't be used for this test output, specifically the debug name name for the code, which included the secondary name if set or the primary name. The print token function was modified to use the debug name instead. The auto-generating test names awk script was removed and the CMake build file was updated accordingly. The expected parser test output files were updated.

A problem was found in the table debug name access function. For a two-word command, this function was only returning the secondary name. This only affected the INPUT PROMPT command, which was being output as just PROMPT. This function was modified to also get the primary name if the multiple table entry member is set to Two Word (with no space between to the two words). This affected two of the expected encoder test results, which were updated.

[branch table commit 34c45230f6]

Wednesday, December 3, 2014

New Table Implementation

The Table class is only remaining class to be modified to utilize C++ and the STL. The table was poorly designed and was implemented in a way that was more appropriate for C than is was for C++. Goals of the new implementation include:

Eliminating the auto-generated header files, one containing code enumeration and one containing a code names array of C-style strings.
Eliminating some unnecessary table entry variables.
Eliminating the use of the untyped C-style macros.
Reduce the number of values needed to define each code table entry. Currently many entries have default values. This was somewhat alleviated by the Expression Information structure where some entries (like commands) don't have an instance of one these. This may possibly be accomplish using inheritance and templates.
Making setup of the table entries easier and less error prone especially when it comes to the associated codes (which will be renamed alternate codes).

I have been investigating how to redesign the table. Along the way I discovered some things about the table that could be simplified. So before beginning the new design, these things will be done that may make the transition to the new design a little easier. This work will begin in a new table topic branch.

Tuesday, December 2, 2014

Miscellaneous Minor Changes

A bunch of miscellaneous minor changes have been accumulating during this topic branch that were finally taken care of, which included:

Adding the C++11 override keyword to functions in derived dictionary information classes that implement virtual function from the base abstract dictionary class.
Changing the dictionary information array functions to return a reference to the vector instead of a pointer to the array contained within the vector. The pointer to the array was originally returned as it was thought would allow more efficient access to the array elements during runtime, but the vector bracket operator should essentially be the same.
Adding the noexcept keyword to all parser functions that don't throw exceptions.
Replacing several Qt forever macros with an empty for (to remove dependency on the Qt header files).
Removing two c_str function calls from recreator functions that were previously missed (these were originally added to work with QString, but were later replaced with standard strings).
Correcting some minor code formatting issues.
Removing two unused table search functions.
Changing two newline '\n' character outputs in tester class functions to std::endl, which outputs the newline character plus flushes the output stream. This was helpful during debugging to know which input line was being processed when some sort of crash occurred (otherwise the output was still in a buffer in memory making it difficult to know which line it was on).
Changing the header argument of the tester translate input function from a constant C-style character string to an rvalue reference to a standard string. An rvalue reference requires a temporary value, which is what is provided by calls to this function that don't use the blank string default.
Assigning the underlying to the sub-code enumeration to an unsigned 16-bit integer (uint16_t), something possible by C++11. The enumerator values were changed to unsigned 16-bit values (where the four leading zero digits were removed).

[branch misc-cpp-stl commit 69ae2f0b15]

This concludes this development topic and the misc-cpp-stl branch was merged to the develop branch and deleted. The Table class is the only remaining class that has not been transitioned from Qt to the STL and will also be reimplemented for better C++ utilization.

[branch develop merge commit 7f303f1bb1]

Sunday, November 30, 2014

Translator – Better Token Handling

A common pattern in the translator functions was a reference to a token pointer argument that a token was pass into and out of the function. The token returned was the next token that the function could not process (a terminating token). Several of the functions allowed an unset token pointer, in which case, they would get a token, otherwise they would use the token passed in. This was a strange pattern and somewhat difficult to work with.

The translator functions were modified to a better pattern where a member token pointer was added to the translator class to hold the current token. The translator functions were modified to use this current token member. The token is moved out of this member variable when successfully processed (consumed). The token pointer reference argument was removed from the translator functions.

The get token function was modified to put the token obtained from the parser into this new member, and the token argument was removed. If the current token member already has a token, then no action is taken.

The process command function when modified was reduced to only a few lines and since it was only called once, its code was moved to the get commands function. The command token and token arguments were removed from the LET, PRINT and INPUT translate functions, which were modified to get the current token from the translator. Several access functions for the current token were added to the translator including getting a constant reference to the token pointer member, reseting the token pointer member, and moving the token pointer out of the member. The latter two force a new token to be obtained upon the next get token function call.

The get operand function was modified to leave the current token pointer member empty if a valid operand was processed (added to the output list and pushed to the done stack). This forced the next call to the get token function to obtain a new token from the parser.

For sub-string assignments, the get operand function set the reference flag of the token if the token was a sub-string function (LEFT$, MID$, or RIGHT$) and a reference was requested. The process internal function function to identify a sub-string assignment and to request a string variable for the first argument of the function. Upon return, this reference flag was cleared. This reference flag toggling was replaced by passing the reference enumerator as an argument.

There was some code in the get operand function that set the code of the token to the Define Function with Parentheses code enumerator. These statements were moved to the process parentheses token where the similar statement reside for setting the Array and Function codes.

Some other minor changes were made including changing the RPN list token access function to return a reference to the token pointer instead of the token pointer itself (to prevent the copying of the token pointer and updating the shared pointer use counts), reorganizing some code in the various routines, renaming some local token variables, and updating comments for changes made.

[branch misc-cpp-stl commit 828f210f18]

Friday, November 28, 2014

Translator – Some Minor Improvements

The done stack pop error token function was used create an error token from the item on top of the done stack. An item on the done stack could consist of an entire expression so in addition to a token, it contained the first and last tokens of the expression, which would be null if the item only contained a single token.

There is no need to create an error token since exceptions are now thrown for errors, which do not contain tokens. All callers to this function only used the error token to create a token error (status with the column and length from the error token). This function was modified to return a token error and was more appropriately renamed to the done stack top token error function. There is also no need to pop the done item from the stack since when an error is thrown, the translator instance goes out of scope and all members are deleted including the done stack.

Several other minor changes were also made including removing the token set through function since it was no longer used, replacing the uses of the Qt forever macros with a plain empty for (;;), renaming the expected error status function (had "Error" abbreviated as "Err"), adding the C++ noexcept keyword to translator functions that do not through an exception, and updating a few comments missed for changes made earlier.

[branch misc-cpp-stl commit 625195f18d]

Thursday, November 27, 2014

Translator Improvements – Getting Tokens

While modifying the translator routines to throw exceptions, it was noticed that some minor improvements could be made. The first of these is with calls to the get token function. There was a similar pattern to most of the calls to this function, where the call was in a try block and for caught errors, the error was set to an appropriate error.

An error status argument was added to the get token function. The caller puts its desired error status to be returned when the parser returns an Unknown Token. With this change, the caller no longer needs to catch errors from the get token function call. Several of the callers need the parser error to be thrown as is, so if the error status argument is a null status, the parser error is thrown as is. The first status enumerator was assigned to a value of one so that a null status is not one of the existing enumerators.

[branch misc-cpp-stl commit 950d3cf9ed]

Translator Exceptions – Expressions

The get expression function was modified to throw errors. The return type was removed since a no exception return now indicates success. The temporary error tokens for errors no longer need to be created. The Done status that was previously returned for success was no longer used, so this status enumerator was removed.

All callers of the get expression function were modified for an error being thrown, which for the most part meant catching errors instead of looking for an error return status. However, there was an issue with how errors were processed when an unexpected unary operator token appeared.

When an unexpected unary operator appeared, get expression returned an "expected binary operator or end-of-statement" error along with the unary operator token. When another error was appropriate, the caller essentially ignored this error when it a unary operator, and threw the appropriate error. Now that get expression throws an error, there is no token returned, so callers cannot look for a unary operator token. The callers were were modified to look for this error and then throw the appropriate error.

[branch misc-cpp-stl commit 697a0d25d8]

Wednesday, November 26, 2014

Translator Exceptions – Tokens

The get token function used to get the next token from the parser was modified to throw errors. The return type was removed since a no exception return now indicates success. The temporary error token for an error no longer needs to be created. The Good status that was previously returned for success was no longer used, so this status enumerator was removed.

All the callers of the get token function were modified for errors being thrown, which mostly included catching the error and throwing the appropriate error for the Unknown Token error. The get operand function just passes the error on to its caller (In this case, in addition to an Unknown Token error, a number constant error could be returned for non-reference token requests).

The get expression function was modified to catch the error from the get token function, obtain the error status and create an error token. Th error token won't be necessary once this function is modified to throw errors (which is now the only function remaining now modified yet to throw errors).

In the LET translate function where the token after a reference token is not a comma or equal character, the token on top of the done stack was checked to see if it has the sub-string flag set, and if is does this item (a sub-string assignment function) is popped. This is not necessary since once an error is thrown out of the translator, the translator instance is deleted along with the done stack.

[branch misc-cpp-stl commit 217de6d02c]

Tuesday, November 25, 2014

Translator Exceptions – Operators

The process operator function called by the get expression function was modified to throw errors. This function returned Done status for a token that is not an operator, Good status for a successfully processed unary or binary operator, or an error status. Since errors on now thrown, the return type was changed to a boolean, true for success and false for not an operator.

The process done stack top function is used to pop an item from the done stack and add a conversion code to the output if needed or report an error if the item can't be converted. This function used by the process final operand, INPUT translate, and LET translate functions was modified to throw the error and its return type was removed. These callers were modified to just call the function and any error thrown will the passed to their caller.

Similarly, the process operator function was changed to just call the process final operand function, and its error is not caught passing any error thrown up to its caller, which is the get expression function.

The process first operand function was called to process the first operand of a binary operator, and push the unary or binary operator to the hold stack. At first it was also modified to throw errors (again from the process done stack top function), but this function ended up being only a few lines, and since is only called by the process operator function, its code was just moved into the process operator function eliminating the separate function.

The get expression was temporarily modified to catch errors from the process operator function where the status is obtained from the thrown error, and an error token is created from the error column and length. The error status and the token is returned as before. This won't be necessary once get expression is modified to throw errors.

[branch misc-cpp-stl commit cf45a8e393]