Thursday, November 6, 2014

Parsing Identifiers – Standard Library

The get identifier function was changed to use a standard input stream (again using a temporary input string stream like the previous functions).  This function used scan word support function to look for a word and was renamed more appropriately to get word.  Instead of checking for a REM command first (because unlike other commands, a space is not required after the command), the get word function is called first to get a word.  If no valid word is found, an empty token pointer is returned.

The first check on a valid word is if the word starts with letters in the REM command name using the std::equal function with the no case compare lambda function.  Since the REM command name is still in the table as a QString, it is temporarily converted to standard string.  For a remark, the input position is set to beginning of the string of the remark, and the word string is then replaced with the rest of the characters on the line, from which the token is created and returned.

An issue was discovered with the parsing of define function tokens (identifiers that start with "FN").  A valid defined function name should start with a letter, but there was no check for a letter or even a check if there were any characters after the "FN" so identifiers like FN and FN1 were incorrectly accepted as defined function tokens.  Instead of rejecting these names as invalid defined function names, the decision was made to allow these names, and treat them as regular names (variables and arrays).

The get word support function was modified in the same way (by using a temporary input stream).  It also returned three values, the position after the word found, the data type of the word and whether the word has a parentheses.  Two of these were returned by passing references.  The position is not needed since it will be obtained from the input stream, however, a string for the word is needed because it is read from the stream.  A new Word structure was added to hold the word string, data type and parentheses flag, which is now returned.  An empty word string indicates no valid word found.

To simplify the handling of two word commands in the get identifier function, the check of the second to make sure that it does not have  a data type or parentheses was moved to the get word function.  If the second word does, an empty word string is returned and the input stream is repositioned back to the beginning of the word.  A word type argument was added with values first and second to enable this second word checking.

The get identifier function uses the two-word table search function, which was modified to take two standard strings.  The token constructor for identifiers was modified to take a standard string argument, which is temporarily converted to a c-style string to initialize the QString token member.  Several invalid defined function names were added to parser test #2 (identifiers) to verify these names are treated as plain identifiers (with and without parentheses).

[branch parser commit dbbd9fe054]