Sunday, June 27, 2010

Translator – New Token Modes

Translating INPUT statements will require additional token modes. The current token modes are:
Command – Translator is expecting a command token (or start of an assignment)
Assignment – Translator is expecting an item for an assignment statement
EqualAssignment – An equal token was received when the mode was Command or Assignment; another equal token would indicate a multiple assignment statement (commas are not permitted)
CommaAssignment – A comma token was received when the mode was Command or Assignment; another comma token would indicate continuation of a multiple assignment statement (an equal token would indicate the end of the list and the begin of the expression)
Expression – Translator is expecting operands of operators depending on the current state
When a semicolon appears at the end of an INPUT statement, no further tokens should be received except for an end of statement token (EOL, colon, ELSE, and ENDIF). A new mode is required so the Translator can make sure no additional non end of statement tokens are received:
EOS – Translator is expecting an end of statement token only
An INPUT statement contains variable(s) that are to be input. Expressions are not allowed (except for the string expression after the PROMPT keyword, or within subscripts of array variables). The INPUT translation could be implemented to check if the token on top of the done stack has the reference flag set, and if not, report an “expected variable” error. But that could leave to strange errors being reported, consider this example (with the translation of the expression):
INPUT A*B+C  A B * C +
The + will be on top of the done stack after this expression is translated (being the result of the translated expression). The + token will not have the reference flag set since it is an operator. The INPUT is expecting  a reference, so it would report “expected variable” pointing to the + token. This would be very confusing – why would a variable be expected at the +? The correct error should be “expecting comma or end of statement” pointing to the * token. A new mode is required so the Translator can make sure no operators (except for end of expression operators comma, semicolon and EOL) are received:
Reference – Translator is only expecting reference tokens and end of expression operator
Reference tokens include tokens without parentheses and tokens with parentheses. However, these type of tokens could be variables, arrays or functions, but the Translator is not able to determine which. Therefore, the Encoder could still find errors if a user function was placed in an INPUT statement. Lastly, sub-string functions (while valid in assignment statements) are not valid in INPUT statements, therefore the Reference mode needs to check for these and report an error.

Translator – INPUT command

Similar to the translation of the PRINT statement, a lot of the translation work of the INPUT statement will take place in the semicolon and comma token handlers, with the INPUT and INPUT PROMPT command handlers being called at the end of the statement. The codes have been renamed for consistency and clarity, InputGet code will now be InputBegin, and InputPromptStr and InputPrompTmp will now be InputBeginStr and InputBeginTmp (the Tmp versions are not handled by the Translator).

There will be two command flags to keep track of the INPUT statement translation. The first is the InputBegin command flag, which indicates whether an InputBegin has been appended to the output yet. The semicolon and comma token handlers will use this flag to determine if the InputBegin code has been appended (INPUT) and whether a prompt string expression result is expected (INPUT PROMPT).

The semicolon will also use the InputBegin command flag to determine if it is at the end of the statement, which determines whether to set the second command flag, InputKeep. The InputKeep flag will be used by the command handlers at the end of the statement to determine whether the InputKeep sub-code flag should be set in the Input and InputPrompt codes at the end of the translated statement.