Saturday, May 29, 2010

Parser – Bug / Test Updates

During the testing of the LET command implementation, some problems were discovered. The first problem was in the Parser's get number token function. The simple single 0 digit caused an “invalid leading zero in number constant” error. This error check was to prevent a constant like 01 from being accepted, but it did allow a leading zero when followed by a decimal point like in 0.1. This fix was to check if the next character is a digit, and if it's not, then it terminates looking for more characters, and then otherwise cause the error.

Due to this error, it was a good idea to re-run all of the parser tests since it has been a while since these were checked. Low and behold, they all failed or miscompared. The print token function used by these tests hadn't been updated for the changes to the code and data type enumerations, so these were updated. Once corrected, there were still miscompares, but these were due to either the decimal code values changing (because of all the new codes that have been added) or because the data type of many operators and internal functions were changed from None to their appropriate data type. Additional 0 constant test inputs were added to the Parser number test inputs.

The regression test scripts were updated to also include the parser tests. The Windows batch file uses the comp command, which has a nice feature that a wild card can be used for both files names and it is able to compare all sets of files with one command (like all 8 translator test output files). However, it also has an irritating feature where after it's done comparing files, it asks if there are more files to compare. No has to be entered to continue. There is not option to prevent this. Since there are two compares (one for the parser files, one for the translator files), no has to be entered after the comparing the parser file before the translator files are compared.

One last problem was discovered that affects both the print token and print small token functions.  Numeric constants were not being output as intended. The code was outputting an integer using the C %d format specifier and doubles using the %g specifier. The problem is that if the constant 1.0 (a double) was entered, it would be output as 1 making it impossible to know if it was an integer or a double. The raw strings entered for the numbers were suppose to be output -  the reason these strings were saved in the first place, to preserve the original string for later output by the Recreator.

Translator – Token Sub-Codes (Implementation)

The sub-code flag was implemented, which consisted of adding the sub-code memory to the Token class, adding the sub-code flag value definitions and modifying the print token test routine to output the flags.

The setting of the parentheses sub-code flag was handled in the do pending parentheses function, which checks if unnecessary parentheses were entered and appended a dummy parentheses token. The code was changed to set the parentheses sub-code flag of the last  token appended to the output. Two issues were discovered.

The first issue found was if two sets of unnecessary parentheses are entered, for example, A=((B)), then the parentheses sub-code flag can only be set once. Upon reproducing the original source, the Recreator will not know that there were two sets of unnecessary parentheses. So for this case, a dummy parentheses token will still be appending for each additional set of unnecessary parentheses entered.

Curiously, with this change, if three (or and odd number of) unnecessary sets of parentheses are entered, for the third (fifth, etc.) set, the parentheses sub-code flag gets set in the second (fourth, etc.) set's dummy parentheses token. Something the Recreator will need to handle.

The second issue found was if the last token appended was a hidden conversion code, the conversion code's token parentheses sub-code flag gets set. It is anticipated that this would cause a problem for the Recreator. The Recreator should be able to safely ignore the conversion codes, but if it needs to look for sub-code flags, these can't be ignored. To avoid this, the code was modified that if token's table entry has the new Hidden flag set, then the item previous to the conversion code has its parentheses sub-code set.

The setting of the comma sub-code flag was handled in the new equal token handler function. When an equal token is received, if the mode was a multiple comma assignment, then the comma sub-code flag is set when the token's code is set to the assign list operator. The comma sub-code flag will not be set of the mode was a multiple equal assignment.

Translator – Token Handlers (Implementation)

A TokenHandler typedef was needed to define the pointer to token handler function. I was not able to define the function pointer directly on the variables. This always proves difficult with more complex types, especially involving pointers. Fortunately, using typedef simplifies the issue. Here is the definition that was inserted before the TableEntry class:
class Translator;   // forward reference to Translator class
typedef TokenStatus (*TokenHandler)(Translator &p, Token *&token);
The TokenHandler type definition is then used for defining an token handler function pointers. Each of the token handlers were created using the existing code in the switch statement, where the code was modified to add the Translator pointer in front of all the Translator variables and the Translator scope (Translator::) was added to the Translator enumeration values.

The code that handles operators (which was after the switch statement for the special codes) was also put into a handler function. This greatly simplified the code at the end of the add token function. For processing the token, the temporary token handler function pointer is set to the code's table entry value. If the value is not set (is NULL), then the temporary pointer is set to the default operand token handler function. The function is called using the temporary pointer and it's status return value is immediately returned.

The program was compiled several times during the making of the changes. I got tired of running of the test cases (to output is redirected to a file in the base directory) and comparing to the official test output files (in the test directory), so I wrote a MSYS (bash) script to do it automatically and check all the test cases. An equivalent Windows batch file was also written, but does not work near as nice. Both will be included in the next release.