Saturday, July 27, 2013

New Translator – Assignment Error Reporting

The remaining issues with translator tests #7 (Errors), #8 (More Errors), and #9 (Semicolon Errors) were related to incorrect errors being reported for a number of the test statements.  The new get operand routine was not taking into account which type of reference was being asked for to determine the appropriate error to return.  This was only a problem for the string type where different errors were needed depending if the reference type being requested was a Variable ("expected string variable" error) or All ("expected string item for assignment" error signifying that sub-string assignments are allowed).

There were translator functions for returning the expected expression error and the expected variable error from a given data type.  Instead of adding a third function for the All reference type, the three were combined into the expectedErrStatus() function, which was given the reference argument in addition to the existing data type argument defaulting to the None reference type.

In the get operand function, the new  expectedErrStatus() function with both arguments was used for parser errors, command and operand token types, functions with no parentheses tokens, and functions with parentheses tokens that are not sub-string functions.  The new function was also used in the LET translate routine when a reference with a wrong data type is returned using the All reference type.

Finally for define functions with parentheses, the get operand routine should have reported an "expected equal or comma" error pointing to just the open parentheses when a reference was requested since define functions with no parentheses tokens are valid in assignments.

All the tests containing only assignment statements (tests #1 to #5, #7 to #11, and #13) now pass with the new translator routines except for tests #5, #11 and #13 that each contain a lone PRINT statement that is reporting a not yet implemented error.

[commit bbe3b01e37]

New Translator – Operator Processing

One of the remaining major issues are how operators were being processed in the new process operator routine.  This routine pops tokens from the hold stack and processes them (processes their final operands and adds them to the RPN output list) if the token on top of the hold stack is of higher or the same precedence as the incoming token.  However, incoming unary operators do not force tokens off of the hold stack regardless of precedence (since they only have one operand, they get pushed right to the hold stack).

The issue was that the incoming token can be any token type including operands, commands, functions, non-unary or binary operators (like comma or colon).  These token types would have a low precedence, indicate the end of the expression, and will force all unary or binary operators off of the hold stack.  These tokens types are considered terminating tokens and it is up to the caller to determine their validity.

The problem was that the token on the hold stack is not necessarily a unary or binary operator in the case of an open parentheses, internal functions, define functions or identifiers with parentheses, which are also pushed onto the hold stack.  If the incoming token was also one of these tokens (or other tokens with the same precedence like identifiers with no parentheses), it would incorrectly force the token off of the hold stack (causing a malfunction).

This problem was corrected by only forcing unary and binary operators from the hold stack to be processed.  A new isUnaryOrBinaryOperator() table access function was added that supports all token types and only returns true if the token type is an operator and the operator has operands (operators with zero operands like a comma do not count).  The new process operator routine was optimized a bit by setting the incoming token precedence once before entering the precedence check loop, and a pointer to the top token is obtained once at the beginning of the loop.  The end of the routine was also modified, also using the new table access function to determine if the incoming token's first operand should be processed (only unary and binary operators), otherwise the incoming token is a terminator and the done status is returned.

The expected results for translator tests #7 and #11 were updated to fix an incorrect error message (#7 only) and for sub-string assignment translation changes (the old result files were saved).  With this change, there are still some issues with translator tests #7 (Errors), #8 (More Errors), and #9 (Semicolon Errors).  Also any tests with PRINT statements report not yet implemented errors.

An unrelated change was made to the data type enumeration where the numberof value was removed as a separate value and is now set to None data type.  The numberof value is used for the number of real data types that don't include the None, Number and Any data types.  It is not necessary for numberof to be separate a value and was requiring dummy values to be put into the various arrays sub-scripted by the data type.

[commit 016c120804] [commit c6012c5600]