Saturday, November 10, 2012

Parser Errors – New Test

There are seven parser errors that can occur, but only two major types - an unrecognizable character error and some type of error in a numerical constant.  To test each of these seven errors, a new translator test (#14) was created for all the possible places during translation each of these errors can occur.  If no instances were missed, this is 16 places for each of the seven errors for a total of 112 test inputs.

The translator was temporarily modified to output the string "PARSER:" in front of the parser errors so they can easily be seen.  This does affect any of the current expected test results since none of the current translator tests have parser errors (an obvious oversight).

For now, the expected results for this new test contains the current output, but will be updated as these are corrected to the desired "expected such-and-such" error messages.  The numeric errors are appropriate if the translator was expecting a numerical constant (in other words, an operand), so the existing parser errors will be changed to the "expected such-and-such" format with the possible exception of the "constant is out of range" error.

Note that because of the recent changes to the regression test scripts (which now automatically detect tests), no modifications were needed to add this new test.

[commit  aa9271b8c9]

Translator Class Usage

Just like the Parser class, a single instance of the Translator class is created and a reference to it is passed around the program.  Unlike the Parser class, the Translator class has many more member variables, many of which are complex types (mainly stacks).  So that all these members do not need to be initialized for each new instance, the single instance design will remain for now.

There is a loop that performs the translation of an input line - the test translate input routine - which also prints the results of the translation.  This translation loop should really be handled within the Translator class, which should either return the translated output or an error.  The test routine should call this and then handle printing the results.

Therefore, a new setInput() function was implemented in the Translator class (the function name chosen to mirror the Parser class).  The functionality of the start() and getOperateState() functions was moved to this new function, which also instances its own parser.  A boolean success/fail flag is returned.

For the caller, upon success, it can proceed to obtain the output using the Translator output() function, renamed from getResults().  If an error is found, the Translator will clean up its internal variables and save the token at which the error was found with an error message.  The caller can access these through the new errorToken() and errorMessage() access functions.

The Translator will handle releasing the memory used by the error token.  There is a new internal function to set the error token.  If the error token pointer is already set, the old error token is freed.  The error token is also freed in the Translator's destructor if the pointer is set.  There is one problem with this scheme - Parser errors are returned directly, and these are not in the form "expected such-and-such" so errors could be confusing to the user.  This will be tackled next.

One other minor change was made, there was a setDefaultDataType() function in the Translator for setting the default data type of a token.  This was the remaining in-line access function still in the Translator header file and worked by examining the various members of the token.  No Translator variables were used, so it is more appropriate for this to be a Token class function, so was moved to the Token class as a new setDataType() function (taking no arguments, overloading the current function that takes a data type argument).

[commit 0dc86d1a4] [commit a88ed2f2bb]

Parser Class Usage

Currently, an instance of the Parser class is created for the program and a reference to it is passed around between the various routines.  The Parser class only has a few member variables and except for the reference to the table instance, these members are initialized for each new line that is parsed.  Therefore, it is not necessary to have a single parser instance for the program - a parser instance can be created as needed and destroyed when no longer needed.

The Parser class can be though of as a function call, though a complex one.  The parser is given an input line and returns one token at a time until either the end of the line is reached or an error is found.  A new function could be implemented to return a list of tokens for the line being parsed.  There are two locations where the parser is currently being used: the test parse input and test translate input routines.

Both of these callers work slightly different.  The test parse input routine just gets tokens until the line end or an error is found.  However, the test translate routine sets the Parser operands state from its own operand state (whether looking for an operand or not) before getting each token.  This is done so that the Parser can determine when it should be looking for a negative sign on a constant (operand state) or a minus operator (not operand state).

This implies a intimate use of the parser while a line is being translated.  So a function was not implemented to return a list of tokens.  The code was modified to instance a Parser when needed - in the two test routines - instead of passing a single instance reference of the Parser around.  These leads to the usage of the Translator class...