Sunday, June 30, 2013

New Translator – Simple Expressions (Tagged)

The implementation for simple expressions is now complete in the new translator and version v0.4.0 has been tagged.  Note that expression test #1 currently fails with the regression and memory test scripts because of problems with the expression mode in the old translator routines.  The new translator routines run this test successfully, but should not yet be used on any of the other expression tests.  Also, parser errors are not yet handled correctly by the new translator routines, but this is going to require a major change to how token errors are handled.

[commit 552aa6176a]

New Translator – Memory Leaks

Several of the error conditions caused a memory leak because the token at which the error occurred was not deleted.  The error token can't just be deleted because it may be in the RPN output list.  Tokens in the output list will be deleted when the list is clears when an error is detected.  A method was needed to determine when an error token should be deleted.

This was accomplished by added a new UnUsed sub-code.  At the locations where an error detected, the routine setting the error needs to set this sub-code if the token has not been added to the output list.  This sub-code was set in two places, one in the get operand routine when there is a command or operator token, and the other in the translate routine that called the get expression routine when the terminating token is not the end-of-line token.

One other problem that caused a uninitialized variable used error from valgrind was also in the translate routine when the terminating token is checked for the end-of-line token.  The check also needed to test if the token has a table entry before checking the token for the end-of-line code (non-table entry tokens don't have a code).

[commit a9b45327d6]

New Translator – Expression Error Checks

There were several error conditions that were not being caught correctly by the new translator routines.  The get expression routine was also rewritten with a loop instead of recursively calling itself, which should be slightly more efficient and use slightly less stack space for complex expressions.

The first error condition occurred when a command or operator token is found when getting an operand.  When this occurs, the proper "expected XX expression" error needed to be return, which is why the get operand routine contains the data type argument.

The second error condition occurred with the token after an operand in the end of expression token check.  This check tested whether the token was an operator, however, both operators and command type tokens are reported as operators.  The test was changed to specifically check for the operator token type.

The third error condition occurred when the hold stack is cleared of higher precedence operators.  The check included obtaining the precedence of the current token and whether the token was a unary operator.  However, these assumed the token had a valid code (was in the table), but the token could be any type.  A different precedence function was used (taking a token instead of a code argument and handles any token type), and a new is unary operator function was implemented that takes a token argument that handles any token types.  Precedences for all the token types also needed to be assigned (only some of them were assigned precedences previously).

Several more tests were added to expression test #1 for testing many of the error conditions.  Note that the old translator expression mode has trouble with some of these error tests because the routine is not as robust as the new routines.  The problems only occur in the expression mode.  These problems will not be corrected, so for the time being expressions test #1 will fail when using the old translator (including the regression and memory test scripts).

[commit fab9d675a6] [commit fc7398879f]

New Translator – Simple Expressions

To be able to use the new translator routines, two new temporary command lines options were added, namely ‑n and ‑ne, which do the same thing as ‑t and ‑te except that the new translator routine is called.  These will be removed once the new translator has been fully implemented.

Several new expression tests were added to expression test #1 (simple expressions) including a test of the "‑E" expression (which previously reported an error), several more unary operator tests (including multiple unary operators), many with mixed data types (double and integer) and some with invalid mixed data type types (number and string).

When running the old translator, it was discovered that a negative constant at the beginning of the expression was interpreted as a negate and a positive constant instead of a negative constant.  This occurred because the parser was not set to the operand state being that the translator was in its initial state, so a check was added to set the parser operand state if in the initial state and expression mode is selected.

Currently for simple expressions in the new translator routines, parser errors are not being handled correctly and therefore not reported correctly.  Expression test #1 does not contain any expressions demonstrating this (some will be added once this issue is resolved).  The old translator routines crash on these when using expression mode (command mode does not have a problem).  The new translator is also not handling invalid operands and operators correctly.

[commit 58a03cb51b]

New Translator – Get Expression Design

Five new functions were implemented to support simple expressions (simple operands, unary and binary operators) in the new translator and two existing translator functions are utilized.  The names of new translator routines that conflict with existing functions are being temporarily suffixed with a '2' character.  This functions will replace the existing functions once the new translator is fully implemented and working.

The new routines consist of the top level translate routine (currently only supports expression mode); the get expression routine for getting an expression (to be utilized by any command needing an expression), which will return the token terminating the expression; the process operator routine for handling the precedence of operators; the get operand routine for getting an operand (simple operands to start); and the get token routine for getting a token from the parser.  Click Continue... for some more details of each of these new routines.

New Translator – Expected Data Type

The old translator routines had a somewhat complex method to detect and report data type errors.  The new translator will have a simpler method.  The new translator will have a routine for getting an expression from the input line and will have an argument for the desired data type.  At the end of the expression, if the data type does not match this, then the appropriate hidden conversion code (CvtDbl or CvtInt) will be added or an "expected XX expression" error will be reported.

For example, an IF command will call get expression for an integer expression, the INPUT PROMPT command will call for a string expression and the PRINT command will call for any type of expression.  The same method will apply to the arguments of internal functions and operands of operators.  Two new data types were implemented to support this, the Number and Any data types.

Since the initial new translator implementation will only handle unary and binary operators, the get expression routine needs to know the expected data type that will follow a unary operator.  For the minus (negate) operator, this is a numeric expression (either double or integer).  For the NOT operator, this is an integer expression.

For binary operators, the first operand has already been processed and the appropriate associated code of the operator will already have been found.  The get expression routine will need to know the expected data type of the second operand for the selected associated code.

Therefore, a new expected data type member was added to the expression information structure stored in the table.  Code was added to the table setup and check routine to automatically generate the values for this new member by looking at the table entry for of the code and its associated codes, specifically at the operand data types of the last operand.  If multiple data types are found, then the expected data type is set to the Number or Any data type appropriately.

[commit aa7d7bec92]