After refactoring the token convert and changing its caller, the process done stack top function, I noticed this later function could be better implemented. Specifically how the first and last operands of the tokens were being returned. If the callers wanted these tokens returned, they would pass pointers to were to put the tokens, otherwise null pointers passed (also the default). Using arguments as outputs is not the best design. Changing this caused a cascade of additional refactoring.
The process done stack top function was changed to return the first and last operands as a standard pair of token pointers (now referred to appropriately as the first and second operands, the members of the pair). If the caller doesn't need them, then it ignores the return value. An Operands alias was added for this pair. The local first and last variables were replaced with an operands pair, which is returned at the end of the function. The comments to this function were removed as they were now outdated and restated what the function did.
The process final operand function was changed to use a first/second operands pair. The operand index argument was used to determine if an operator token was unary. The token itself can be used to determine this. This argument was also passed to the process done stack top function, but it was always the last operand of the token, which can be obtained from the token. This argument was removed. The generic token2 argument was renamed more appropriately to first since it is the first operand of the token, and was made an r-value reference to force the caller to move its token to this function. The comments to this function were also removed.
The process operator function, another caller of the process done stack top function, was also modified to use an operands pair (only the first operand returned is used). The final caller, the LET translate function, does not use the returned operands and didn't need to be modified.
The first and last members of the internal Done Item structure (the values of the done stack member) were replaced with an operands pair. The constructors were updated to initialize the first and second members of the operands pair and a new constructor was added with an operands pair argument. The replace first last function (a poor name) was renamed to the more appropriate replace operands.
[branch table commit 240f967d4a]
Showing posts with label Translator. Show all posts
Showing posts with label Translator. Show all posts
Tuesday, January 13, 2015
Tuesday, January 6, 2015
Token – Data Type Conversion
Hidden conversion codes are inserted into the program when a numeric operand needs to be converted to either a double or integer. However, for numeric constants, no conversion is needed since both the integer and double representation of the constant is available (except for large values that cannot be converted).
The exception mentioned in the last post where the data type of the token is set (excluding token creation) was when a constant is changed or cannot be used because an integer is required but the double is too large (the data type is temporarily changed to the default, which is then checked for by the callers that converts a constant and returns a hidden conversion code for an operand).
The convert constant function was modified to throw an "Expected Valid Integer Constant" status error when an integer is required but the token contains an unconvertible large double constant. Previously this was up to the caller to check if the expected data type was an integer and a double constant wasn't converted. Also, instead of setting the data type before calling the table set token code function, the set token code function with the data type arguments is used.
The convert code function was modified to throw "Expected Type Expression" errors when the token cannot be converted to the desired data type. Only the error Status is thrown and not a Token Error allowing the caller to construct the Token Error. Previously, this function returned the Invalid Code enumerator, which callers checked for.
The table find code function was modified for this change. The "Expected Valid Integer Constant" error is thrown from the convert constant function are caught and ignored because it is possible that the operator or function may contain an alternate that takes a double argument (where a large double constant would be acceptable). If the operator or functions only accepts an integer, then an exception will be throw along with other tokens that cannot be converted.
The translator get expression function was modified to catch error statuses throw from the convert code function and construct a Token Error to throw (by calling the done stack top token error function). Previously this routine had to check for an unconvertible double constant. The process done stack top function was similarly modified to catch errors from the table find code function (also previously checking for an unconvertible constant).
The translator get operand function also called the convert code to check the data type of references. This was not appropriate since no conversion code was needed. The modified convert code was not throwing appropriate errors for references anyway. Therefore, a new token is data type compatible access function was added to check the data type only.
The set token code function without a data type argument was an adapter to the set token code passing the data type contained in the token. Besides the convert constant function, the only other user of this function was the token constructor for string constants, which was modified to use the function with the data type argument directly. The initializer for the code member was also removed since it get initialized by the set token code function. The set token code adapter function was removed.
[branch table commit c2f2420a89]
The exception mentioned in the last post where the data type of the token is set (excluding token creation) was when a constant is changed or cannot be used because an integer is required but the double is too large (the data type is temporarily changed to the default, which is then checked for by the callers that converts a constant and returns a hidden conversion code for an operand).
The convert constant function was modified to throw an "Expected Valid Integer Constant" status error when an integer is required but the token contains an unconvertible large double constant. Previously this was up to the caller to check if the expected data type was an integer and a double constant wasn't converted. Also, instead of setting the data type before calling the table set token code function, the set token code function with the data type arguments is used.
The convert code function was modified to throw "Expected Type Expression" errors when the token cannot be converted to the desired data type. Only the error Status is thrown and not a Token Error allowing the caller to construct the Token Error. Previously, this function returned the Invalid Code enumerator, which callers checked for.
The table find code function was modified for this change. The "Expected Valid Integer Constant" error is thrown from the convert constant function are caught and ignored because it is possible that the operator or function may contain an alternate that takes a double argument (where a large double constant would be acceptable). If the operator or functions only accepts an integer, then an exception will be throw along with other tokens that cannot be converted.
The translator get expression function was modified to catch error statuses throw from the convert code function and construct a Token Error to throw (by calling the done stack top token error function). Previously this routine had to check for an unconvertible double constant. The process done stack top function was similarly modified to catch errors from the table find code function (also previously checking for an unconvertible constant).
The translator get operand function also called the convert code to check the data type of references. This was not appropriate since no conversion code was needed. The modified convert code was not throwing appropriate errors for references anyway. Therefore, a new token is data type compatible access function was added to check the data type only.
The set token code function without a data type argument was an adapter to the set token code passing the data type contained in the token. Besides the convert constant function, the only other user of this function was the token constructor for string constants, which was modified to use the function with the data type argument directly. The initializer for the code member was also removed since it get initialized by the set token code function. The set token code adapter function was removed.
[branch table commit c2f2420a89]
Saturday, January 3, 2015
Parser – Parentheses Token Handling
With the forthcoming change from the token type and code enumerations to the code type enumeration, it will be advantageous if all similar codes have the same code type. Table flags will be used command, operator and functions codes since some will need their own code type (for example the LET command and equal operator codes).
The codes being discussed are the codes with operands: constants, variables, arrays, defined functions and user functions. Each have separate codes for each of the three data types, and each (except constants) have an additional three of each data type for reference codes. Currently only constants and variables are fully implemented. For variables, there will be a single Variable code type for each of its six codes. The translator will only need to check this single code type for a variable code.
There was a distinction between token types with parentheses and without (internal functions, defined functions, and generic tokens). For defined and internal functions, this distinction was removed (the token type enumerators were combined for each). For internal functions, instead of checking the token type to determine if there are parentheses, the number of operands is now checked. Eventually defined functions (and later user functions) will have a similar check as the number of operands will be stored in their associated dictionaries.
The parser get word helper function was modified to only check if an open parentheses is present and add it to the identifier string as before, but not take it from the input stream. The parentheses is still needed for functions when searching the table. The get identifier function already removed the parentheses if a table entry was not found, but was modified to remove the parentheses from the input for internal functions only. A new get parentheses access function was added to check if the next character in the stream is an open parentheses, remove the parentheses, and return whether it was a parentheses.
The translator get operand function was modified accordingly. For the single internal function token type, the process internal function routine is only called if the function has operands. For the single defined function token type, the process parentheses token is only called if the next character in the parser is a parentheses (calls the new get parentheses function to remove the parentheses). For the generic parentheses token type (array or user function), the get parentheses function is called to remove the parentheses.
The two separate table entries for defined functions with and without parentheses remain for now with their associated code enumerators and are used to determine if parentheses are present. Once defined functions are fully implemented, these two codes will be replaced with six codes (as described above) and the defined function dictionary will contain whether there should be parentheses (if there are operands).
The test token stream inserter and print token functions were modified for the change in token type enumerators. The token has parentheses access function was only used by the print token function, which used a static map member. The print token function was modified to not require this access function, so it and the static map member were removed. There was also a static map member for precedence. Since all tokens now have a code assigned, all precedences can be obtained from the table, so this static member and its access function were also removed. The expected parser results for tests #2, #4 and #5 were updated for the change in the token types.
[branch table commit 3dd768f604]
The codes being discussed are the codes with operands: constants, variables, arrays, defined functions and user functions. Each have separate codes for each of the three data types, and each (except constants) have an additional three of each data type for reference codes. Currently only constants and variables are fully implemented. For variables, there will be a single Variable code type for each of its six codes. The translator will only need to check this single code type for a variable code.
There was a distinction between token types with parentheses and without (internal functions, defined functions, and generic tokens). For defined and internal functions, this distinction was removed (the token type enumerators were combined for each). For internal functions, instead of checking the token type to determine if there are parentheses, the number of operands is now checked. Eventually defined functions (and later user functions) will have a similar check as the number of operands will be stored in their associated dictionaries.
The parser get word helper function was modified to only check if an open parentheses is present and add it to the identifier string as before, but not take it from the input stream. The parentheses is still needed for functions when searching the table. The get identifier function already removed the parentheses if a table entry was not found, but was modified to remove the parentheses from the input for internal functions only. A new get parentheses access function was added to check if the next character in the stream is an open parentheses, remove the parentheses, and return whether it was a parentheses.
The translator get operand function was modified accordingly. For the single internal function token type, the process internal function routine is only called if the function has operands. For the single defined function token type, the process parentheses token is only called if the next character in the parser is a parentheses (calls the new get parentheses function to remove the parentheses). For the generic parentheses token type (array or user function), the get parentheses function is called to remove the parentheses.
The two separate table entries for defined functions with and without parentheses remain for now with their associated code enumerators and are used to determine if parentheses are present. Once defined functions are fully implemented, these two codes will be replaced with six codes (as described above) and the defined function dictionary will contain whether there should be parentheses (if there are operands).
The test token stream inserter and print token functions were modified for the change in token type enumerators. The token has parentheses access function was only used by the print token function, which used a static map member. The print token function was modified to not require this access function, so it and the static map member were removed. There was also a static map member for precedence. Since all tokens now have a code assigned, all precedences can be obtained from the table, so this static member and its access function were also removed. The expected parser results for tests #2, #4 and #5 were updated for the change in the token types.
[branch table commit 3dd768f604]
Friday, January 2, 2015
Parser – Token Creation
A side effect of the last change was that tokens for both codes for operators, functions and commands and codes with operands (constants, variables, arrays, defined functions and user functions) were using the same token constructor. This token constructor searched through alternate codes for the code with the appropriate return data type.
This was unnecessary for operator, function and command codes. The table new token function called for these codes passed in the return data type from the table entry of the code. The token constructor then called the new table set token code function. Since the data type matched the return data type (which was just passed in), no alternates were checked and the code, type and data type of the token was set. This was extra unnecessary work.
A new token constructor was added for operator, function and command codes, which only required arguments for the code, column, length and string. The string argument is only used for the REM and REM operator codes. This constructor replaces the table new token function. This constructor calls the table set code function which just sets the code, type and data type of the token from the table entry of the code. For consistency the code argument was put first in the other token constructor for codes with operands.
While looking at the creation of tokens, I decided that using the standard unique pointer within the parser was unnecessary. The parser can just allocate a token and return its pointer. The translator then can put the allocated tokens into a standard shared pointer. The parser was changed to use plain token pointers. The translator routines were changed to use the new token constructor directly via the standard make shared function. The translator get operand was changed to use the reset function to set the token member since shared pointers cannot be assigned directly to a pointer.
[branch table commit 1bfb76ae0a]
This was unnecessary for operator, function and command codes. The table new token function called for these codes passed in the return data type from the table entry of the code. The token constructor then called the new table set token code function. Since the data type matched the return data type (which was just passed in), no alternates were checked and the code, type and data type of the token was set. This was extra unnecessary work.
A new token constructor was added for operator, function and command codes, which only required arguments for the code, column, length and string. The string argument is only used for the REM and REM operator codes. This constructor replaces the table new token function. This constructor calls the table set code function which just sets the code, type and data type of the token from the table entry of the code. For consistency the code argument was put first in the other token constructor for codes with operands.
While looking at the creation of tokens, I decided that using the standard unique pointer within the parser was unnecessary. The parser can just allocate a token and return its pointer. The translator then can put the allocated tokens into a standard shared pointer. The parser was changed to use plain token pointers. The translator routines were changed to use the new token constructor directly via the standard make shared function. The translator get operand was changed to use the reset function to set the token member since shared pointers cannot be assigned directly to a pointer.
[branch table commit 1bfb76ae0a]
Parser – Codes With Operands
The last token type not being set fully in the parser were codes with operands (constants, variables, arrays, defined functions and user functions). Constant tokens were corrected with the last change. Arrays, defined functions and user functions are not fully implemented and so did not need to be changed. Variables however, were only partially set in the parser (only to the base Variable or Variable Reference code) and weren't set for the data type of the variable until the translator.
The parser get identifier function was modified to set the data type to Double if the word obtained from the input does not have a type. This applies to all identifiers not found in the table. The token constructor for codes is used for commands, operators, functions and codes with operands. The type argument was unnecessary since that is set from the table entry. However, an issue was found with how codes were found in the table.
For operators and functions, the [return] data type of the token is set from the table entry. (This issue doesn't affect commands since command don't have a return data type.) For codes with operands, the data type of the identifier is used to find the appropriate table entry (for example, Variable, Variable Integer, or Variable String) by looking at the data types of alternate codes. The current table set token code function did not work correctly because it searches alternate codes by operand data type. For this instance, the alternate codes need to be search by return data type.
A new set token code function was added without an operand index argument to search by return data type. If the data type (of the identifier) does match the code passed, then the alternate codes are searched for a matching return data type. If there are no alternates or none were found, then the code passed is set in the token along with the token type of the code. The data type is set to the data type of the identifier and not from the table entry (which may not match for codes like arrays that are not fully implemented yet).
The type argument was removed from the token constructor for codes. The type from the table entry of the code was passed (and the new set token code now does this). A call to the new set token code was added to the body of the constructor (previously empty).
Since codes for constants, variable, and variable references were found in the table incorrectly by operand data type, these table entries contained operand data types so that it would work. These codes do not have operands (in the sense that operands and functions do within expressions; not to be confused that in the program, these codes do have an operand index). These table entries were corrected with expression info instances containing no operands.
The translator get operand previously set the default data type of the token just obtained (set to Double if None and not a function). This was removed since the parser now does this. The token set default data type function called to do this was removed. The call to set the code for a no parentheses (variable) token was also no longer needed. With the parser now setting the default data type to Double, the expected results to the parser tests (#2, #3 and #5) needed to be updated.
[branch table commit acc37f0650]
The parser get identifier function was modified to set the data type to Double if the word obtained from the input does not have a type. This applies to all identifiers not found in the table. The token constructor for codes is used for commands, operators, functions and codes with operands. The type argument was unnecessary since that is set from the table entry. However, an issue was found with how codes were found in the table.
For operators and functions, the [return] data type of the token is set from the table entry. (This issue doesn't affect commands since command don't have a return data type.) For codes with operands, the data type of the identifier is used to find the appropriate table entry (for example, Variable, Variable Integer, or Variable String) by looking at the data types of alternate codes. The current table set token code function did not work correctly because it searches alternate codes by operand data type. For this instance, the alternate codes need to be search by return data type.
A new set token code function was added without an operand index argument to search by return data type. If the data type (of the identifier) does match the code passed, then the alternate codes are searched for a matching return data type. If there are no alternates or none were found, then the code passed is set in the token along with the token type of the code. The data type is set to the data type of the identifier and not from the table entry (which may not match for codes like arrays that are not fully implemented yet).
The type argument was removed from the token constructor for codes. The type from the table entry of the code was passed (and the new set token code now does this). A call to the new set token code was added to the body of the constructor (previously empty).
Since codes for constants, variable, and variable references were found in the table incorrectly by operand data type, these table entries contained operand data types so that it would work. These codes do not have operands (in the sense that operands and functions do within expressions; not to be confused that in the program, these codes do have an operand index). These table entries were corrected with expression info instances containing no operands.
The translator get operand previously set the default data type of the token just obtained (set to Double if None and not a function). This was removed since the parser now does this. The token set default data type function called to do this was removed. The call to set the code for a no parentheses (variable) token was also no longer needed. With the parser now setting the default data type to Double, the expected results to the parser tests (#2, #3 and #5) needed to be updated.
[branch table commit acc37f0650]
Wednesday, December 31, 2014
Constant Token Codes
The code and token type enumerations will be combined into a single code type enumeration. Before proceeding, the parser needs to return all tokens assigned to a code. This has been mostly accomplished, though one exception is constant tokens, which were still being assigned codes in the translator. This is complicated because the type of numerical constants may not be known when the constant is parsed. Consider these statements:
Instead of passing whether a number is allowed flag to the parser, the requested data type is now passed. If the data type is integer or double, the code of a constant token is fixed (the Integer Constant sub-code is not needed and is cleared if set). For other requested data types, either the Constant or Constant Integer code is set is described above with the Integer Constant sub-code set for small doubles. The parser also now sets the Constant String code for string constants. The parser makes no attempt to report any errors for data type mismatches.
The decimal flag argument of the token constructor for double constants was removed as the data type is set to Double and the Integer Constant sub-code is set if the value is within the integer range. The convert code was cleaned up by making the desired data type the primary switch and there was no need for secondary switches on the token data type since only one of two data types need to be checked for each desired data type. A convert constant helper function was added to handle changing constant token codes.
The table find code function was simplified due to the change on how constants are represented. The first argument of the set token and set token code functions were changed from a standard shared token pointer reference to a straight token pointer so that they can be called from the parser (with a standard unique pointer), translator (with a standard shared pointer) or token member function (with just a pointer). This simply required calling the get access function of the unique or shared pointers.
The translator get operand function no longer sets the code for constants. The get expression and process internal function functions no longer need to look for and set the codes for constants (the later needs to clear the Integer Constant sub-code for functions taking both number argument types, specifically ABS, SGN and STR$). And the get token function now only needs to pass the data type to the parser. The token convert and table find code functions are used by the translator and will finalize constants not set by the parser once the final data type is known.
[branch table commit 3099f8850f]
A = B + 5For numerical constants, both the integer and double representations of the constant is stored in the constant dictionary except for the case where a double constant does not fit into a 32-bit signed integer. Optimally the representation required is used without a hidden conversion code to unnecessarily convert the constant. For the first statement above, the double value of the constant is used. With the other two statements, the integer value of the constant is used. Number constant tokens have three states:
A% = B% + 5
A% = B% + 5.4
- An integer (no decimal point or exponent; fits into 32 bits)
- A small double (has a decimal point or an exponent; fits into 32 bits when converted)
- A large double (does not fit into 32 bits; cannot be converted)
Instead of passing whether a number is allowed flag to the parser, the requested data type is now passed. If the data type is integer or double, the code of a constant token is fixed (the Integer Constant sub-code is not needed and is cleared if set). For other requested data types, either the Constant or Constant Integer code is set is described above with the Integer Constant sub-code set for small doubles. The parser also now sets the Constant String code for string constants. The parser makes no attempt to report any errors for data type mismatches.
The decimal flag argument of the token constructor for double constants was removed as the data type is set to Double and the Integer Constant sub-code is set if the value is within the integer range. The convert code was cleaned up by making the desired data type the primary switch and there was no need for secondary switches on the token data type since only one of two data types need to be checked for each desired data type. A convert constant helper function was added to handle changing constant token codes.
The table find code function was simplified due to the change on how constants are represented. The first argument of the set token and set token code functions were changed from a standard shared token pointer reference to a straight token pointer so that they can be called from the parser (with a standard unique pointer), translator (with a standard shared pointer) or token member function (with just a pointer). This simply required calling the get access function of the unique or shared pointers.
The translator get operand function no longer sets the code for constants. The get expression and process internal function functions no longer need to look for and set the codes for constants (the later needs to clear the Integer Constant sub-code for functions taking both number argument types, specifically ABS, SGN and STR$). And the get token function now only needs to pass the data type to the parser. The token convert and table find code functions are used by the translator and will finalize constants not set by the parser once the final data type is known.
[branch table commit 3099f8850f]
Thursday, December 25, 2014
Parser – Identifier Codes
The parser previously set the code for an identifier token only when the word was found in the table (command, operator or function). The codes for other identifiers were set in the translator: defined functions with no parentheses and variables (get operand); arrays, functions, and defined functions with parentheses (process parentheses tokens). This was changed to set all codes in the parser.
To do this in the parser, the parser needed to know if a reference operand was being requested. For now identifiers with no parentheses are set to variables, and with parentheses are set to arrays unless they start with an F (temporary check for testing). Defined functions are identifiers that start with an FN. Eventually the parser will need access to the program dictionaries to fully determine which code to assign to an identifier token.
The get identifier function was modified to set the code as described above for identifiers not found in the table. A reference argument was added, which was also added to the parser function operator. (The Reference enumeration was moved from the translator class to the main header file so that its enumerators are accessible.) The token constructor for codes and identifiers were combined to a single constructor with default arguments for the string and reference members.
For variables, the reference argument is used to determine if the code is a variable or a variable reference. Only the base code is set as the translator changed the code for the data type. In the case of a variable reference, the reference member of the token is not set (the translator did not previously set it either).
Several token type cases in the translator get operand function was modified. For defined functions with no parentheses, the token reference and code members no longer need to be set. For no parentheses tokens (variables), the code is still updated for the data type. The parser will do this once the new table model is implemented. For parentheses tokens (arrays), the token reference member no longer needs to be set.
The translator process parentheses token function no longer does the check for functions (temporarily identifiers starting with F), or set the code of the token. For determining an array (to set the expected expression types to integer for the subscripts), the Array code is checked for. This check will need to be modified when arrays are implemented since there will be different array codes for each data type, which will be set by the parser.
[branch table commit 69dff18e26]
To do this in the parser, the parser needed to know if a reference operand was being requested. For now identifiers with no parentheses are set to variables, and with parentheses are set to arrays unless they start with an F (temporary check for testing). Defined functions are identifiers that start with an FN. Eventually the parser will need access to the program dictionaries to fully determine which code to assign to an identifier token.
The get identifier function was modified to set the code as described above for identifiers not found in the table. A reference argument was added, which was also added to the parser function operator. (The Reference enumeration was moved from the translator class to the main header file so that its enumerators are accessible.) The token constructor for codes and identifiers were combined to a single constructor with default arguments for the string and reference members.
For variables, the reference argument is used to determine if the code is a variable or a variable reference. Only the base code is set as the translator changed the code for the data type. In the case of a variable reference, the reference member of the token is not set (the translator did not previously set it either).
Several token type cases in the translator get operand function was modified. For defined functions with no parentheses, the token reference and code members no longer need to be set. For no parentheses tokens (variables), the code is still updated for the data type. The parser will do this once the new table model is implemented. For parentheses tokens (arrays), the token reference member no longer needs to be set.
The translator process parentheses token function no longer does the check for functions (temporarily identifiers starting with F), or set the code of the token. For determining an array (to set the expected expression types to integer for the subscripts), the Array code is checked for. This check will need to be modified when arrays are implemented since there will be different array codes for each data type, which will be set by the parser.
[branch table commit 69dff18e26]
Sunday, November 30, 2014
Translator – Better Token Handling
A common pattern in the translator functions was a reference to a token pointer argument that a token was pass into and out of the function. The token returned was the next token that the function could not process (a terminating token). Several of the functions allowed an unset token pointer, in which case, they would get a token, otherwise they would use the token passed in. This was a strange pattern and somewhat difficult to work with.
The translator functions were modified to a better pattern where a member token pointer was added to the translator class to hold the current token. The translator functions were modified to use this current token member. The token is moved out of this member variable when successfully processed (consumed). The token pointer reference argument was removed from the translator functions.
The get token function was modified to put the token obtained from the parser into this new member, and the token argument was removed. If the current token member already has a token, then no action is taken.
The process command function when modified was reduced to only a few lines and since it was only called once, its code was moved to the get commands function. The command token and token arguments were removed from the LET, PRINT and INPUT translate functions, which were modified to get the current token from the translator. Several access functions for the current token were added to the translator including getting a constant reference to the token pointer member, reseting the token pointer member, and moving the token pointer out of the member. The latter two force a new token to be obtained upon the next get token function call.
The get operand function was modified to leave the current token pointer member empty if a valid operand was processed (added to the output list and pushed to the done stack). This forced the next call to the get token function to obtain a new token from the parser.
For sub-string assignments, the get operand function set the reference flag of the token if the token was a sub-string function (LEFT$, MID$, or RIGHT$) and a reference was requested. The process internal function function to identify a sub-string assignment and to request a string variable for the first argument of the function. Upon return, this reference flag was cleared. This reference flag toggling was replaced by passing the reference enumerator as an argument.
There was some code in the get operand function that set the code of the token to the Define Function with Parentheses code enumerator. These statements were moved to the process parentheses token where the similar statement reside for setting the Array and Function codes.
Some other minor changes were made including changing the RPN list token access function to return a reference to the token pointer instead of the token pointer itself (to prevent the copying of the token pointer and updating the shared pointer use counts), reorganizing some code in the various routines, renaming some local token variables, and updating comments for changes made.
[branch misc-cpp-stl commit 828f210f18]
The translator functions were modified to a better pattern where a member token pointer was added to the translator class to hold the current token. The translator functions were modified to use this current token member. The token is moved out of this member variable when successfully processed (consumed). The token pointer reference argument was removed from the translator functions.
The get token function was modified to put the token obtained from the parser into this new member, and the token argument was removed. If the current token member already has a token, then no action is taken.
The process command function when modified was reduced to only a few lines and since it was only called once, its code was moved to the get commands function. The command token and token arguments were removed from the LET, PRINT and INPUT translate functions, which were modified to get the current token from the translator. Several access functions for the current token were added to the translator including getting a constant reference to the token pointer member, reseting the token pointer member, and moving the token pointer out of the member. The latter two force a new token to be obtained upon the next get token function call.
The get operand function was modified to leave the current token pointer member empty if a valid operand was processed (added to the output list and pushed to the done stack). This forced the next call to the get token function to obtain a new token from the parser.
For sub-string assignments, the get operand function set the reference flag of the token if the token was a sub-string function (LEFT$, MID$, or RIGHT$) and a reference was requested. The process internal function function to identify a sub-string assignment and to request a string variable for the first argument of the function. Upon return, this reference flag was cleared. This reference flag toggling was replaced by passing the reference enumerator as an argument.
There was some code in the get operand function that set the code of the token to the Define Function with Parentheses code enumerator. These statements were moved to the process parentheses token where the similar statement reside for setting the Array and Function codes.
Some other minor changes were made including changing the RPN list token access function to return a reference to the token pointer instead of the token pointer itself (to prevent the copying of the token pointer and updating the shared pointer use counts), reorganizing some code in the various routines, renaming some local token variables, and updating comments for changes made.
[branch misc-cpp-stl commit 828f210f18]
Friday, November 28, 2014
Translator – Some Minor Improvements
The done stack pop error token function was used create an error token from the item on top of the done stack. An item on the done stack could consist of an entire expression so in addition to a token, it contained the first and last tokens of the expression, which would be null if the item only contained a single token.
There is no need to create an error token since exceptions are now thrown for errors, which do not contain tokens. All callers to this function only used the error token to create a token error (status with the column and length from the error token). This function was modified to return a token error and was more appropriately renamed to the done stack top token error function. There is also no need to pop the done item from the stack since when an error is thrown, the translator instance goes out of scope and all members are deleted including the done stack.
Several other minor changes were also made including removing the token set through function since it was no longer used, replacing the uses of the Qt forever macros with a plain empty for (;;), renaming the expected error status function (had "Error" abbreviated as "Err"), adding the C++ noexcept keyword to translator functions that do not through an exception, and updating a few comments missed for changes made earlier.
[branch misc-cpp-stl commit 625195f18d]
There is no need to create an error token since exceptions are now thrown for errors, which do not contain tokens. All callers to this function only used the error token to create a token error (status with the column and length from the error token). This function was modified to return a token error and was more appropriately renamed to the done stack top token error function. There is also no need to pop the done item from the stack since when an error is thrown, the translator instance goes out of scope and all members are deleted including the done stack.
Several other minor changes were also made including removing the token set through function since it was no longer used, replacing the uses of the Qt forever macros with a plain empty for (;;), renaming the expected error status function (had "Error" abbreviated as "Err"), adding the C++ noexcept keyword to translator functions that do not through an exception, and updating a few comments missed for changes made earlier.
[branch misc-cpp-stl commit 625195f18d]
Thursday, November 27, 2014
Translator Improvements – Getting Tokens
While modifying the translator routines to throw exceptions, it was noticed that some minor improvements could be made. The first of these is with calls to the get token function. There was a similar pattern to most of the calls to this function, where the call was in a try block and for caught errors, the error was set to an appropriate error.
An error status argument was added to the get token function. The caller puts its desired error status to be returned when the parser returns an Unknown Token. With this change, the caller no longer needs to catch errors from the get token function call. Several of the callers need the parser error to be thrown as is, so if the error status argument is a null status, the parser error is thrown as is. The first status enumerator was assigned to a value of one so that a null status is not one of the existing enumerators.
[branch misc-cpp-stl commit 950d3cf9ed]
An error status argument was added to the get token function. The caller puts its desired error status to be returned when the parser returns an Unknown Token. With this change, the caller no longer needs to catch errors from the get token function call. Several of the callers need the parser error to be thrown as is, so if the error status argument is a null status, the parser error is thrown as is. The first status enumerator was assigned to a value of one so that a null status is not one of the existing enumerators.
[branch misc-cpp-stl commit 950d3cf9ed]
Translator Exceptions – Expressions
The get expression function was modified to throw errors. The return type was removed since a no exception return now indicates success. The temporary error tokens for errors no longer need to be created. The Done status that was previously returned for success was no longer used, so this status enumerator was removed.
All callers of the get expression function were modified for an error being thrown, which for the most part meant catching errors instead of looking for an error return status. However, there was an issue with how errors were processed when an unexpected unary operator token appeared.
When an unexpected unary operator appeared, get expression returned an "expected binary operator or end-of-statement" error along with the unary operator token. When another error was appropriate, the caller essentially ignored this error when it a unary operator, and threw the appropriate error. Now that get expression throws an error, there is no token returned, so callers cannot look for a unary operator token. The callers were were modified to look for this error and then throw the appropriate error.
[branch misc-cpp-stl commit 697a0d25d8]
All callers of the get expression function were modified for an error being thrown, which for the most part meant catching errors instead of looking for an error return status. However, there was an issue with how errors were processed when an unexpected unary operator token appeared.
When an unexpected unary operator appeared, get expression returned an "expected binary operator or end-of-statement" error along with the unary operator token. When another error was appropriate, the caller essentially ignored this error when it a unary operator, and threw the appropriate error. Now that get expression throws an error, there is no token returned, so callers cannot look for a unary operator token. The callers were were modified to look for this error and then throw the appropriate error.
[branch misc-cpp-stl commit 697a0d25d8]
Wednesday, November 26, 2014
Translator Exceptions – Tokens
The get token function used to get the next token from the parser was modified to throw errors. The return type was removed since a no exception return now indicates success. The temporary error token for an error no longer needs to be created. The Good status that was previously returned for success was no longer used, so this status enumerator was removed.
All the callers of the get token function were modified for errors being thrown, which mostly included catching the error and throwing the appropriate error for the Unknown Token error. The get operand function just passes the error on to its caller (In this case, in addition to an Unknown Token error, a number constant error could be returned for non-reference token requests).
The get expression function was modified to catch the error from the get token function, obtain the error status and create an error token. Th error token won't be necessary once this function is modified to throw errors (which is now the only function remaining now modified yet to throw errors).
In the LET translate function where the token after a reference token is not a comma or equal character, the token on top of the done stack was checked to see if it has the sub-string flag set, and if is does this item (a sub-string assignment function) is popped. This is not necessary since once an error is thrown out of the translator, the translator instance is deleted along with the done stack.
[branch misc-cpp-stl commit 217de6d02c]
All the callers of the get token function were modified for errors being thrown, which mostly included catching the error and throwing the appropriate error for the Unknown Token error. The get operand function just passes the error on to its caller (In this case, in addition to an Unknown Token error, a number constant error could be returned for non-reference token requests).
The get expression function was modified to catch the error from the get token function, obtain the error status and create an error token. Th error token won't be necessary once this function is modified to throw errors (which is now the only function remaining now modified yet to throw errors).
In the LET translate function where the token after a reference token is not a comma or equal character, the token on top of the done stack was checked to see if it has the sub-string flag set, and if is does this item (a sub-string assignment function) is popped. This is not necessary since once an error is thrown out of the translator, the translator instance is deleted along with the done stack.
[branch misc-cpp-stl commit 217de6d02c]
Tuesday, November 25, 2014
Translator Exceptions – Operators
The process operator function called by the get expression function was modified to throw errors. This function returned Done status for a token that is not an operator, Good status for a successfully processed unary or binary operator, or an error status. Since errors on now thrown, the return type was changed to a boolean, true for success and false for not an operator.
The process done stack top function is used to pop an item from the done stack and add a conversion code to the output if needed or report an error if the item can't be converted. This function used by the process final operand, INPUT translate, and LET translate functions was modified to throw the error and its return type was removed. These callers were modified to just call the function and any error thrown will the passed to their caller.
Similarly, the process operator function was changed to just call the process final operand function, and its error is not caught passing any error thrown up to its caller, which is the get expression function.
The process first operand function was called to process the first operand of a binary operator, and push the unary or binary operator to the hold stack. At first it was also modified to throw errors (again from the process done stack top function), but this function ended up being only a few lines, and since is only called by the process operator function, its code was just moved into the process operator function eliminating the separate function.
The get expression was temporarily modified to catch errors from the process operator function where the status is obtained from the thrown error, and an error token is created from the error column and length. The error status and the token is returned as before. This won't be necessary once get expression is modified to throw errors.
[branch misc-cpp-stl commit cf45a8e393]
The process done stack top function is used to pop an item from the done stack and add a conversion code to the output if needed or report an error if the item can't be converted. This function used by the process final operand, INPUT translate, and LET translate functions was modified to throw the error and its return type was removed. These callers were modified to just call the function and any error thrown will the passed to their caller.
Similarly, the process operator function was changed to just call the process final operand function, and its error is not caught passing any error thrown up to its caller, which is the get expression function.
The process first operand function was called to process the first operand of a binary operator, and push the unary or binary operator to the hold stack. At first it was also modified to throw errors (again from the process done stack top function), but this function ended up being only a few lines, and since is only called by the process operator function, its code was just moved into the process operator function eliminating the separate function.
The get expression was temporarily modified to catch errors from the process operator function where the status is obtained from the thrown error, and an error token is created from the error column and length. The error status and the token is returned as before. This won't be necessary once get expression is modified to throw errors.
[branch misc-cpp-stl commit cf45a8e393]
Sunday, November 23, 2014
Translator Exceptions – Parentheses Tokens
The process parentheses token function used to translate tokens with parentheses token (arrays and user functions) for the get operand function was modified to throw errors. This function only returned a Good status or an error status, so a return value was no longer needed. Like the process internal function function, the get expression call was changed to throw an error if an error status is return, which is immediately caught. This simulates the get expression function throwing an error, which has not been modified yet to throw errors.
Unrelated to this function, the comments for the other functions modified were updated to reflect the changes made so far, which was missed when these functions were changed. The new expression error status function added for the process internal function function was moved to the support functions section of the source file.
[branch misc-cpp-stl commit 4f03c8bd77]
Unrelated to this function, the comments for the other functions modified were updated to reflect the changes made so far, which was missed when these functions were changed. The new expression error status function added for the process internal function function was moved to the support functions section of the source file.
[branch misc-cpp-stl commit 4f03c8bd77]
Saturday, November 22, 2014
Translator Exceptions – Internal Functions
The process internal function function used to translate internal function tokens for the get operand function was modified to throw errors. Some of the changes made in the failed attempt to add exceptions throughout the translator routines around the get operand call used for sub-string assignments ended up in the code during the last commit. This did not affect the functionality (since the regression tests had passed, it wasn't noticed).
The expression error status private helper function was added to determine the error status depending on whether at the last operand of the internal function, if the internal function has multiple arguments and if the bad token is a unary operator. This function is called in three locations.
Since all errors are thrown, there was no longer a need for a return value. The unary operator local variable was no longer needed and was removed. The status and expected data type local variable declarations were moved closer to where they are used. The status variable can be removed once the get expression function is modified to throw errors.
[branch misc-cpp-stl commit 640255514d]
The expression error status private helper function was added to determine the error status depending on whether at the last operand of the internal function, if the internal function has multiple arguments and if the bad token is a unary operator. This function is called in three locations.
Since all errors are thrown, there was no longer a need for a return value. The unary operator local variable was no longer needed and was removed. The status and expected data type local variable declarations were moved closer to where they are used. The status variable can be removed once the get expression function is modified to throw errors.
[branch misc-cpp-stl commit 640255514d]
Translator Exceptions – Operands
The get operand function used to get an operand was modified to throw errors. The command routines either returned an error or a Done status, so once they were modified to throw errors, there was no need to return anything. In addition to errors, the get operand function returned a Good status representing successfully getting an operand or a Done status representing no operand token (an operator or command).
The return value of the get operand function was changed to a boolean where true represents successfully getting an operand and false representing no operand. When a reference operand is requested this function will not return a false status, it will just throw an error. So for the process internal function, LET translate, and INPUT translate routines, it is not necessary to test the return value since these only request reference operands.
There were three locations in the get operand function where errors were returned that popped the token on top of the hold stack before returning. This was intended to prevent memory leaks. This wasn't necessary since the translator cleans up the stack when the translator goes out of scope, so these pop calls were removed.
One of the errors for define function identifiers with parentheses tokens added the length of the token to the column to point the error to the parentheses of the token. The token class add length to column access function was used for this. Instead of using this access function (the only caller), the token length is added to the column when the token error instance is created and thrown. This access function was removed.
[branch misc-cpp-stl commit 640255514d]
The return value of the get operand function was changed to a boolean where true represents successfully getting an operand and false representing no operand. When a reference operand is requested this function will not return a false status, it will just throw an error. So for the process internal function, LET translate, and INPUT translate routines, it is not necessary to test the return value since these only request reference operands.
There were three locations in the get operand function where errors were returned that popped the token on top of the hold stack before returning. This was intended to prevent memory leaks. This wasn't necessary since the translator cleans up the stack when the translator goes out of scope, so these pop calls were removed.
One of the errors for define function identifiers with parentheses tokens added the length of the token to the column to point the error to the parentheses of the token. The token class add length to column access function was used for this. Instead of using this access function (the only caller), the token length is added to the column when the token error instance is created and thrown. This access function was removed.
[branch misc-cpp-stl commit 640255514d]
Translator Exceptions – Commands
The routines that process commands were the next modified to throw exceptions. This included the get commands routine used to get command statements separated by colons. For now, if the two calls to the get token function return an error (not good status), the appropriate error is thrown. The get token function will also be modified to throw errors soon. Since get commands now throws errors, it no longer needs to return a status so the return type was removed (along with the local status variable).
The process command routine handles the processing of one command and was modified to throw exceptions. It calls a specific translate function for a command or the LET translate function if the first token is not a command. The LET, PRINT and INPUT translate functions were also modified to throw exceptions. The status return type was also removed and if statement surrounding the call to process command in get commands was removed (which will cause thrown exceptions to be thrown up its caller - the translator function operator function).
In the translator function operator function, a status is no longer returned from the get commands function, so assignment of the local status variable was removed. And thrown exceptions are passed up to the caller of the function operator function (which will catch the errors). For a successful return, the local status variable is set to Done. This is temporary until the get expression function is also modified to throw exceptions.
The LET, PRINT and INPUT translate functions were restructured a bit. Local variable declarations were moved to where the variables are first used. Some of the error checking if statements were rearranged to ease the handling of errors (when the lower functions are modified to throw exceptions). These changes came from the failed total translator exception changes. The checks for the end-of-statement were also moved to a more logical place in each of these routines.
[branch misc-cpp-stl commit 7217ce58c6]
The process command routine handles the processing of one command and was modified to throw exceptions. It calls a specific translate function for a command or the LET translate function if the first token is not a command. The LET, PRINT and INPUT translate functions were also modified to throw exceptions. The status return type was also removed and if statement surrounding the call to process command in get commands was removed (which will cause thrown exceptions to be thrown up its caller - the translator function operator function).
In the translator function operator function, a status is no longer returned from the get commands function, so assignment of the local status variable was removed. And thrown exceptions are passed up to the caller of the function operator function (which will catch the errors). For a successful return, the local status variable is set to Done. This is temporary until the get expression function is also modified to throw exceptions.
The LET, PRINT and INPUT translate functions were restructured a bit. Local variable declarations were moved to where the variables are first used. Some of the error checking if statements were rearranged to ease the handling of errors (when the lower functions are modified to throw exceptions). These changes came from the failed total translator exception changes. The checks for the end-of-statement were also moved to a more logical place in each of these routines.
[branch misc-cpp-stl commit 7217ce58c6]
Friday, November 21, 2014
Translator Exceptions – Top-Level
Several convenience functions were added to the Token Error structure to make it syntactically easier to use. These include a function operator function with no arguments for returning the status of the error, a function operator function with a status argument for checking if the error status is the passed status, and an assignment operator function taking a status value to assign to the error status.
The translator function operator function was modified to throw errors instead of setting a local status variable, which is then used to throw the error at the end of the function. Eventually the get expressions and get commands functions will throw exceptions for errors and the local status variable won't be necessary.
The try block in the tester parse input function was reformatted where the try-catch blocks were moved outside of the forever loop to the function block. The return statement in the catch block could then be removed.
[branch misc-cpp-stl commit c639347b07]
The translator function operator function was modified to throw errors instead of setting a local status variable, which is then used to throw the error at the end of the function. Eventually the get expressions and get commands functions will throw exceptions for errors and the local status variable won't be necessary.
The try block in the tester parse input function was reformatted where the try-catch blocks were moved outside of the forever loop to the function block. The return statement in the catch block could then be removed.
[branch misc-cpp-stl commit c639347b07]
Translator Exceptions – Development Strategy
The next incremental change was initially difficult to see for adding exception throws throughout the translator routines. Starting at the bottom by changing the get token function to throw exceptions was problematic because all callers would have to catch the errors and create an error token to hold the column and length of the error. Starting at the top by changing both the get expressions (used for testing) and get commands functions to throw exceptions was problematic because all functions in between would also need to be modified. An attempt to change the entire translator was made.
After the code was modified and corrected for compile errors, the tests were run, but there were many problems, which was not unexpected considering the large number of changes made. Instead of trying to debug, the changes were committed to a temporary work branch (though not pushed to the official GitHub repository). This is a scheme to use git to temporary save work:
After the code was modified and corrected for compile errors, the tests were run, but there were many problems, which was not unexpected considering the large number of changes made. Instead of trying to debug, the changes were committed to a temporary work branch (though not pushed to the official GitHub repository). This is a scheme to use git to temporary save work:
git checkout -b work (create a temporary work branch)This saved the changes of the failed attempt and restored the original files. Changes from the work branch could now be transferred to the working directory piecemeal. For example, several changes were made to the Token Error structure in the token header file (details in the next post). For more details on using this scheme, click Continue...
git commit -a (stages all changed files and commits them)
git checkout -b misc-cpp-stl (restore all files before changes)
Sunday, November 16, 2014
Translator – Exceptions (Top-Level)
The goal of the next step is to modify the translator to throw errors instead of returning an RPN list containing an error. This will be done in steps starting at the top-level of the translator in the function operator function with a couple of other minor improvements to the code.
The first minor improvement was made to the Error Item class that holds information about an error in the program (includes type, line number, column, length and error status). The is empty function that checks if the error type is none (no error) was replaced with the explicit operator bool function, which allows an error item to be checked for an error without a named function (this function is called when a boolean is expected and an error item instance is provided):
The tester translate input routine was modified to catch an error in a try block. This routine no longer needs to check if RPN list has an error, and if it did, retrieve the error information to create a local error instance and print the error. Now upon a caught error, it simply prints the error. It also returns an empty RPN list that the caller uses to detect an error.
The program model update line routine was modified to catch an error in a try block. For an error, the local error item is set to an error with the info in the thrown error. If there is no error then the error item will not contain an error (the default constructor sets the error type to none).
To detect a change, the update line routine compared the new RPN list to the current RPN list decoded from the program. The lists were not equal if either contained an error. The current RPN list is empty for a line with an error, but the decode routine does not set the error variables. Now that the RPN list does not have an error status, so the error item also needs to be checked when checking if the line changed. The RPN error variables are no longer used and were removed along with their access functions. The check for errors in the RPN list equality operator function was also removed.
The final improvement made was to changed the RPN list argument of the encode routine from a constant reference to an rvalue reference. All calls to the encode routine no longer needs the RPN list after the call, so it can be moved. Since the arguments of these calls were not a temporary instance, the local RPN list variable needs to moved using std::move.
[branch misc-cpp-stl commit 52359711e7]
The first minor improvement was made to the Error Item class that holds information about an error in the program (includes type, line number, column, length and error status). The is empty function that checks if the error type is none (no error) was replaced with the explicit operator bool function, which allows an error item to be checked for an error without a named function (this function is called when a boolean is expected and an error item instance is provided):
if (!errorItem.isEmpty()) → if (errorItem)At the end of the translator function operator function, if the status is not Done, the RPN list was cleared and its error member variables were set (column, length and status). This was changed to throw an Error structure with the status, token column and token length. The RPN list no longer needs to be cleared since it will be cleared when the temporary translator instance goes out of scope.
The tester translate input routine was modified to catch an error in a try block. This routine no longer needs to check if RPN list has an error, and if it did, retrieve the error information to create a local error instance and print the error. Now upon a caught error, it simply prints the error. It also returns an empty RPN list that the caller uses to detect an error.
The program model update line routine was modified to catch an error in a try block. For an error, the local error item is set to an error with the info in the thrown error. If there is no error then the error item will not contain an error (the default constructor sets the error type to none).
To detect a change, the update line routine compared the new RPN list to the current RPN list decoded from the program. The lists were not equal if either contained an error. The current RPN list is empty for a line with an error, but the decode routine does not set the error variables. Now that the RPN list does not have an error status, so the error item also needs to be checked when checking if the line changed. The RPN error variables are no longer used and were removed along with their access functions. The check for errors in the RPN list equality operator function was also removed.
The final improvement made was to changed the RPN list argument of the encode routine from a constant reference to an rvalue reference. All calls to the encode routine no longer needs the RPN list after the call, so it can be moved. Since the arguments of these calls were not a temporary instance, the local RPN list variable needs to moved using std::move.
[branch misc-cpp-stl commit 52359711e7]
Subscribe to:
Posts (Atom)