Wednesday, January 6, 2010

Parsing Remarks

Special Note: I realized as I was thinking about the Parser, that I forgot to consider comments, or in BASIC terminology, “remarks” (i.e. the REM command). I have updated posts Parser and Parser – Token Identification with additional information about remarks. Remark tokens will be their own tokens that will include the string of the comment. There will be two different type of for remarks – the REM command and the single-quote.

Parsing remarks will require special handling. When a single word command is found, if it is REM (REM will be a single word command), then the parser will proceed to read the rest of the line and will return the REM remark token with the string of the comment.  The Translator will check to make sure there is a colon if front of the REM (ignore white-space) if it's not at the beginning of the line.

For the single-quote form of remarks, the single-quote will be treated as an operator symbol character.  When a single character operator is found, if it is a single-quote (which will be marked as a single character operator), then the parser will proceed to read the rest of the line and will return the single-quote operator, however, it will be marked as a remark, with the string of the comment.  The Translator will treat the comment as the end of the line and will make sure the command up to that point is valid if the single quote was not at the beginning of the line.  Single-quotes in the middle of a string constant will not be treated as a comment since the Parser will be collecting characters of the string constant up to the closing double-quote, in other words, the Parser will not be looking for an operator.

Parsing Operators

When neither an identifier or constant was found then the Parser will assume that the token will be an operator. The Operator Table is search for the single character. If not found in the table, then a syntax error is reported. Each table entry will have a two character flag if the operator can also be part of a two character operator (for example, < and > will have this flag set). If the entry does not have this two character flag set, then the internal code for the operator is returned.

For table entries with the two character flag, if the next character is not white space, the Operator Table is searched for the two character operator. If found, then the internal code for the operator is returned.  Otherwise the internal code for the single character operator is returned (assuming the single character is a valid operator, otherwise a syntax error is reported).