Thursday, December 25, 2014

Double Identifier Problem

An existing problem was discovered when the parser was modified to not store the data type character of identifiers.   The issue was with double identifiers when using the optional # data type character.  The identifiers Variable and Variable# were incorrectly added to the dictionary as separate entries when they should have been the same entry.

The parser get identifier function was modified to not store the data type character in the token.  This caused a problem when recreating double identifiers where the # character entered would disappear.  Recreating all double identifiers with a # character was also not desirable.  This was corrected by adding the Double sub-code to the token.  A sub-code argument was added to the token constructor for identifiers.  This sub-code is encoded into the program code so that the # character is recreated when it is entered.

The Double sub-code was only being used for constants.  When the value of a constant is within the integer range, its data type is set to integer, and if a decimal point is present, the Double sub-code is set.  The translator uses this sub-code to determine if a constant can be used as a double even though the data type is integer (see post from October 28 for details).  This sub-code does not survive past the translator (not put into the program code).

A new string with data type access function was added to the token to add the data type character (#, % for integers, and $ for strings) to the token string returned.  A # character is only added if the Double sub-code is set.  This function replaced the string access function in the test token stream insert operator, tester print token, and several recreate functions.

A sub-code argument was added to the table entry operand text functions.  The variable operand text functions were modified to add the data type character to the variable name.  For double variables, the character is only added if the Double sub-code is set.  An Ignore sub-code enumerator was added, and when passed to the operand text function, no data type character is added to the variable name.  This option was needed for the program model decode function that uses the operand text function to set the string of the token (since tokens no longer store the data type character).

The value of the Double sub-code was changed so that its bit value was within the range of the sub-code bits (not necessary before since this sub-code was not used in the program code).  The return type of program code instruction sub-code access function was changed to the Sub-Code enumeration type (from an integer).  The expected encoder test results were updated, specifically the dictionaries output since the data type characters are no longer present in the entries.

[branch table commit e97057efca]

Parser – Identifier Codes

The parser previously set the code for an identifier token only when the word was found in the table (command, operator or function).  The codes for other identifiers were set in the translator: defined functions with no parentheses and variables (get operand); arrays, functions, and defined functions with parentheses (process parentheses tokens).  This was changed to set all codes in the parser.

To do this in the parser, the parser needed to know if a reference operand was being requested.  For now identifiers with no parentheses are set to variables, and with parentheses are set to arrays unless they start with an F (temporary check for testing).  Defined functions are identifiers that start with an FN.  Eventually the parser will need access to the program dictionaries to fully determine which code to assign to an identifier token.

The get identifier function was modified to set the code as described above for identifiers not found in the table.  A reference argument was added, which was also added to the parser function operator.  (The Reference enumeration was moved from the translator class to the main header file so that its enumerators are accessible.)  The token constructor for codes and identifiers were combined to a single constructor with default arguments for the string and reference members.

For variables, the reference argument is used to determine if the code is a variable or a variable reference.  Only the base code is set as the translator changed the code for the data type.  In the case of a variable reference, the reference member of the token is not set (the translator did not previously set it either).

Several token type cases in the translator get operand function was modified.  For defined functions with no parentheses, the token reference and code members no longer need to be set.  For no parentheses tokens (variables), the code is still updated for the data type.  The parser will do this once the new table model is implemented.  For parentheses tokens (arrays), the token reference member no longer needs to be set.

The translator process parentheses token function no longer does the check for functions (temporarily identifiers starting with F), or set the code of the token.  For determining an array (to set the expected expression types to integer for the subscripts), the Array code is checked for.  This check will need to be modified when arrays are implemented since there will be different array codes for each data type, which will be set by the parser.

[branch table commit 69dff18e26]