The colon token will work very similar to the end-of-line token and so will have the end statement flag in its table entry. The difference with the end-of-line token is that instead of terminating the translation or a line, the token mode will be set to command for the next statement.
There will be two error conditions. Two colons will not be allowed. While a second colon would not affect anything and is allowed in other BASICs, it will be considered an error here. There is no point to allowing this. Also, a colon will not be allowed at the end of the line with no command after it.
When a colon token is received, the colon sub-code will be set in the command token on top of the command stack. The command token will be appended to the end of the statement when the command has been processed (by the command handler).
However, there is an issue with print statements. The actual print command token may not be appended to the output if there is a semicolon, comma or print function at the end of the print statement. In this case, the print command handler will have to transfer the colon sub-code to the last print code that was appended to the output, which may be a print type (when a semicolon is not needed), comma, semicolon, SPC or TAB.
Wednesday, March 30, 2011
Tuesday, March 29, 2011
Colon – Statement Separator
The Colon is used to separate statements in the BASIC language though it is not part of the ANSI BASIC, it is part of the more common BASICs (GW-Basic, QBasic, FreeBasic, etc.). Before moving to the translation of colon tokens, like everything else, the action during run-time must be defined. However, the Colon does not actually do anything at run-time.
Colons don't actually need to be stored as separately in the program, however, for them to be reproduced, something needs to be put into the internal program. The assumption that there will be a colon at the end of each statement except at the end of the line is not sufficient. Consider this statement:
Colons don't actually need to be stored as separately in the program, however, for them to be reproduced, something needs to be put into the internal program. The assumption that there will be a colon at the end of each statement except at the end of the line is not sufficient. Consider this statement:
IF A>B THEN PRINT A ELSE PRINT B:A=B:B=0.0There is no colon after the first print statement. To reproduce colons properly, there will be a colon sub-code set for a command token that has a colon following the statement. For the statements after the ELSE in the example above, the colon sub-code will be set as shown in this translation:
B PrintDbl'Colon' A<ref> B Assign'Colon' B<ref> 0.0 AssignWhen a line is reproduced, the Recreator will add a colon after a statement that has the colon sub-code set.
Sunday, March 27, 2011
Translator – A Unary Operator Curiosity
One of the test statements created for testing the unary operator fix was:
So while the exponentiation is highest precedence, with NOT having a low precedence allowed the ADD to bind the rest of the expression to the NOT, which becomes the second operand (yellow above) of the exponentiation, with the first negation being the final operator. In C/C++, the not (!) and negation (-) operators are very high precedence (and there is no exponentiation operator). But here, NOT was given a low precedence just above the other logical operators but below the math operators (see Translator – Operator Precedence for reasoning). Normally the NOT operator would probably not be used in the same expression as exponentiation like above.
A = -B^NOT C% + -D*NOT E%The intention of this statement was test the NOT unary operator in front of the second operand of both the exponentiation and multiplication operators. The translation of this statement was expected to be (the blue expression being the first operand and the red expression being the second operand of the addition):
A<ref> B C% NOT ^* Neg D Neg E% NOT *%2 + AssignHowever, what was produced was this unexpected translation:
A<ref> B C% D Neg E% NOT *%2 +%1 Cvtint NOT ^* Neg AssignUpon reviewing the precedence of the operators (see Translator – Operator Precedence) and the code, it turns out that this translation was correct. The ADD is higher precedence than the NOT, so the operands of the ADD are C% and the –D*NOT E% expression with MUL (*%2) higher precedence than ADD, its operands are the –D and NOT E%).
So while the exponentiation is highest precedence, with NOT having a low precedence allowed the ADD to bind the rest of the expression to the NOT, which becomes the second operand (yellow above) of the exponentiation, with the first negation being the final operator. In C/C++, the not (!) and negation (-) operators are very high precedence (and there is no exponentiation operator). But here, NOT was given a low precedence just above the other logical operators but below the math operators (see Translator – Operator Precedence for reasoning). Normally the NOT operator would probably not be used in the same expression as exponentiation like above.
Translator – Unary Operator Problem
While testing the negative constant changes, a new problem was discovered with unary operators, specifically this statement:
Basically, unary operators should not force any tokens (unary operators, binary operators, arrays, or functions) from the hold stack regardless of their precedence because not all of their operands have been received yet (the negate and its operand will be their operand and it has not been fully received yet). As currently implemented, other non-unary operators should still force unary operators from the done stack if the unary operator has higher precedence.
The check to force tokens from the hold stack was changed to if the precedence of the operator on the hold stack is higher than the current operator and if the current token is not a unary operator. Unary operators will now not force other tokens from the hold stack, but other tokens will still force unary operators from the hold stack if higher in precedence. While testing this change, a curious result was produced from one of the test statements...
A = ---BWhich produced a “done stack empty” bug error at the first negation token. The problem occurred because the second negation operator forced the first negation operator from the hold stack because it was greater or equal precedence, and when checking the operand of the first negation operator, there was nothing on the done stack. Here and some additional examples:
A = B*-CThe first statement translated correctly because negation is higher precedence than multiplication leaving multiplication on the hold stack. However, the second statement failed because negation is lower precedence than exponentiation forcing exponentiation from the hold stack but with only one operand on the done stack generating the done stack empty bug error. A new rule was needed for unary operators.
A = B^-C
Basically, unary operators should not force any tokens (unary operators, binary operators, arrays, or functions) from the hold stack regardless of their precedence because not all of their operands have been received yet (the negate and its operand will be their operand and it has not been fully received yet). As currently implemented, other non-unary operators should still force unary operators from the done stack if the unary operator has higher precedence.
The check to force tokens from the hold stack was changed to if the precedence of the operator on the hold stack is higher than the current operator and if the current token is not a unary operator. Unary operators will now not force other tokens from the hold stack, but other tokens will still force unary operators from the hold stack if higher in precedence. While testing this change, a curious result was produced from one of the test statements...
Parser – Negative Constants
Negative constants were previously not considered by the Parser, which interpreted a minus as the subtract operator. The Translator then changed it to a negate operator when it appeared in the operand state. Consider these two examples (along with there current translations):
In order for the Parser to correctly interpret negative signs on numerical constants, it needs to be aware of whether the Translator is in operand state or not. If in operand state, the Parser can look for a negative sign in front of a number constant, otherwise a minus should be interpreted as an operator.
A new operand state flag was added to the Parser with an access function to set its value (which is initialized to off). The Parser get number routine was modified to have a new sign flag used to determine if a negative sign was found first. This flag will also prevent multiple negative signs. However, it will only check for a negative sign if no digits or a decimal was seen and if the new operand state flag is on.
An access function was added to the Translator to get the current operand state (either operand or operand-or-end state). Before calling the Parser get token routine, the Parser operand state is set from the Translator's current operand state. While testing, a problem was discovered with unary operators...
A = B-1.5 A B 1.5 Sub AssignThe reason for the Parser to not look for signs on numerical constants can be seen in the first example. If the Parser produced the four tokens A = B -1.5, the Translator would generate an “expected operator” error at the -1.5 token since a second operand token was received when it was expecting a binary operator. The second example produces an unnecessary negate token after the constant. While perfectly valid, this is not desirable.
A = -1.5+B A 1.5 Neg B Sub Assign
In order for the Parser to correctly interpret negative signs on numerical constants, it needs to be aware of whether the Translator is in operand state or not. If in operand state, the Parser can look for a negative sign in front of a number constant, otherwise a minus should be interpreted as an operator.
A new operand state flag was added to the Parser with an access function to set its value (which is initialized to off). The Parser get number routine was modified to have a new sign flag used to determine if a negative sign was found first. This flag will also prevent multiple negative signs. However, it will only check for a negative sign if no digits or a decimal was seen and if the new operand state flag is on.
An access function was added to the Translator to get the current operand state (either operand or operand-or-end state). Before calling the Parser get token routine, the Parser operand state is set from the Translator's current operand state. While testing, a problem was discovered with unary operators...
Saturday, March 26, 2011
Parser – Tokens With Parentheses
While correcting issues with define function tokens, it was noticed that it is not necessary to also store the opening parentheses in the string field of the token. This also includes the generic tokens with parentheses. The parentheses is not necessary because there are separate token types to identify tokens with and without parentheses.
It is also advantageous to not store the parentheses so that an array or define function name can be found in the dictionary. For define functions, take the code snippet:
A similar issue also applies to arrays. When functions and subroutines are implemented, there will be a feature to allow an entire array to be passed to a function or subroutine. Exactly how this will work has not been defined yet, but this code snippet shows how it would look:
It is also advantageous to not store the parentheses so that an array or define function name can be found in the dictionary. For define functions, take the code snippet:
DEF FNHypot(X,Y)Both the FNHypot( and FNHypot tokens appear, which represent the same function. If the parentheses was stored in the dictionary for the function name, it would require complicated string comparisons to figure out that the FNHypot token is the same function. This same issue applies to regular function names.
FNHypot=SQR(X*X+Y*Y)
END DEF
Z = FNHypot(3,4)
A similar issue also applies to arrays. When functions and subroutines are implemented, there will be a feature to allow an entire array to be passed to a function or subroutine. Exactly how this will work has not been defined yet, but this code snippet shows how it would look:
DIM Array(10)The changes for removing the parentheses from these tokens were very simple. In the Parser get identifier routine, when creating the string for these tokens, the length provided to the string constructor was changed to be one less than the actual length so the parentheses would not be included. In the Translator when reporting an error at the open parentheses of a define function with parentheses token, the minus one was removed since the length is now one less. Finally, in the token test output routines, an open parentheses was added to the output of these tokens.
CALL subroutine(A)
Project – Tokens Status Enumeration
Each time a new error (token status) is added, renamed or deleted, two changes were needed. Both the token status enumeration (include file) and the message array (source file) needed to be changed. The correct changes were needed or the two files will be out of sync. To check for problems, code had been added to initialization to check for duplicates and missing entries.
Similar to automatic generation of the code enumeration from the table entries, the token status enumeration will also be generated automatically. Each message array element was a structure containing a token status value and a pointer to a message string. The elements were changed to just the message string with the name of the token status in a comment at the end of the line.
The codes awk script was renamed to enums and was modified to also read the token status message array to generate the token status enumeration automatically be reading the name in the comment after the message string. The name of the output file was changed from codes.h to autoenums.h to be more generic and allow for additional automatic enumerations.
During initialization, in addition to checking for duplicates and missing entries, a translation index array was built to translate from token status value to index. Both the checking and the translation array were removed since they are no longer necessary. The awk script will check for duplicates and the token status is now the same as the index into the message array.
The error type template class is no longer needed for token status errors (but still for the table entry erros). Also, the duplicate and missing were no longer needed and were removed. Some problems were found in this template class for table entries where the range error was not working because the wrong constructor was being called. Instead of storing indexes to the errors, the variables were changed to be the type of the template. This made the range error constructor unique.
Similar to automatic generation of the code enumeration from the table entries, the token status enumeration will also be generated automatically. Each message array element was a structure containing a token status value and a pointer to a message string. The elements were changed to just the message string with the name of the token status in a comment at the end of the line.
The codes awk script was renamed to enums and was modified to also read the token status message array to generate the token status enumeration automatically be reading the name in the comment after the message string. The name of the output file was changed from codes.h to autoenums.h to be more generic and allow for additional automatic enumerations.
During initialization, in addition to checking for duplicates and missing entries, a translation index array was built to translate from token status value to index. Both the checking and the translation array were removed since they are no longer necessary. The awk script will check for duplicates and the token status is now the same as the index into the message array.
The error type template class is no longer needed for token status errors (but still for the table entry erros). Also, the duplicate and missing were no longer needed and were removed. Some problems were found in this template class for table entries where the range error was not working because the wrong constructor was being called. Instead of storing indexes to the errors, the variables were changed to be the type of the template. This made the range error constructor unique.
Friday, March 25, 2011
Translation – INPUT Command (Release)
The remaining problem was due to the input command handler deleting the token passed in (the token terminating the invalid string prompt expression), however, the caller (the call command handler routine) was also deleting the token since the command handler changed the token to point to the error token. The extra token delete was removed from the input command handler.
The INPUT command is fully working and ibcp_0.1.15-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. To support the INPUT command, several other changes were needed including making the token codes and table entries indexes one in the same, correcting print function issues, correcting sub-string assignment issues, handling assignment token mode differently, handling define function tokens correctly, implementing an end statement Translator state, and implementing the reference token mode.
Next up, a slight change to the direction of this project to make it a little more interesting. Instead of prodding along with the translation of more commands, work will begin of the other components including the encoder, dictionary, recreator, program (maintaining the internal program and the program editor), and the run-time module.
The goal is to get this BASIC working as there are (almost) enough commands to make a very simple BASIC program (input, assignments and output). Once this is working, more commands can be implemented. But first a couple of minor things will be implemented before proceeding with this new direction.
The INPUT command is fully working and ibcp_0.1.15-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. To support the INPUT command, several other changes were needed including making the token codes and table entries indexes one in the same, correcting print function issues, correcting sub-string assignment issues, handling assignment token mode differently, handling define function tokens correctly, implementing an end statement Translator state, and implementing the reference token mode.
Next up, a slight change to the direction of this project to make it a little more interesting. Instead of prodding along with the translation of more commands, work will begin of the other components including the encoder, dictionary, recreator, program (maintaining the internal program and the program editor), and the run-time module.
The goal is to get this BASIC working as there are (almost) enough commands to make a very simple BASIC program (input, assignments and output). Once this is working, more commands can be implemented. But first a couple of minor things will be implemented before proceeding with this new direction.
Thursday, March 24, 2011
Translation – INPUT Error Debugging
The main problem with the wrong tokens being reported for errors in the INPUT command was because the token with the error was not put into the command item structure passed to the INPUT command – a requirement of command handlers reporting an error. Once this was added, most of the errors were now pointing to the correct token.
A new “expected operator, semicolon or comma” error was added for when an end statement token (for example the EOL in the incomplete statement INPUT PROMPT A$) is received after a valid string expression because this is a little more accurate than just the “expected semicolon or comma” error.
The two remaining errors were “invalid mode” bug errors that were occurring in the end expression error routine that is called when an end statement token is received during operand state. Support for the reference mode needed to be added to this routine, which needed to return the “expected variable” error.
One problem remains. The error test statement INPUT PROMPT A+B*C is causing extra token deletes, which was detected by the memory leak detection mechanism that was implemented a little while ago. To be continued...
A new “expected operator, semicolon or comma” error was added for when an end statement token (for example the EOL in the incomplete statement INPUT PROMPT A$) is received after a valid string expression because this is a little more accurate than just the “expected semicolon or comma” error.
The two remaining errors were “invalid mode” bug errors that were occurring in the end expression error routine that is called when an end statement token is received during operand state. Support for the reference mode needed to be added to this routine, which needed to return the “expected variable” error.
One problem remains. The error test statement INPUT PROMPT A+B*C is causing extra token deletes, which was detected by the memory leak detection mechanism that was implemented a little while ago. To be continued...
Wednesday, March 23, 2011
Translation – INPUT PROMPT Debugging
The first problem found with the INPUT PROMPT command was that reference mode was not being set after the string prompt expression was processed. The next problem was that the Translator was still in binary operator state following the comma or semicolon after the string prompt string expression was processed, so this required setting the state back to operand state, which lead to the discovered of some other minor state issues.
The first operand state is very similar to the operand state except that end expression tokens (like comma, semicolon, and EOL) are also acceptable (normally considered operators). This state was implemented for the PRINT command since these tokens are allowed when an operand is expected (for example the PRINT,,A statement). This state is also set after a command is received, and it is up to the command handler to decide if an immediate end expression is allowed (which it currently is only for the PRINT command).
The equal token handler was found to be incorrectly setting the first operand state in an assignment statement. This did not cause a problem because the end expressions operators were being caught elsewhere when in expressions incorrectly (by their respective token handlers). Anyway, calling it the first operand state was a little confusing and was therefore renamed more appropriately to the operand or end state. The equal token handler was corrected to set only operand state.
The valid INPUT PROMPT test statements are now working, now on to the invalid INPUT statements, which for the most part are not reporting the correct token where an error is detected or just not reporting the correct error including some bug errors...
The first operand state is very similar to the operand state except that end expression tokens (like comma, semicolon, and EOL) are also acceptable (normally considered operators). This state was implemented for the PRINT command since these tokens are allowed when an operand is expected (for example the PRINT,,A statement). This state is also set after a command is received, and it is up to the command handler to decide if an immediate end expression is allowed (which it currently is only for the PRINT command).
The equal token handler was found to be incorrectly setting the first operand state in an assignment statement. This did not cause a problem because the end expressions operators were being caught elsewhere when in expressions incorrectly (by their respective token handlers). Anyway, calling it the first operand state was a little confusing and was therefore renamed more appropriately to the operand or end state. The equal token handler was corrected to set only operand state.
The valid INPUT PROMPT test statements are now working, now on to the invalid INPUT statements, which for the most part are not reporting the correct token where an error is detected or just not reporting the correct error including some bug errors...
Monday, March 21, 2011
Translation – INPUT Debugging
Several minor issues were found and corrected. When the end of the INPUT command occurs, the last input parse code has to be marked with the end sub-code. Support for reference mode also needed to be added to comma token handler.
When the EOL token was received by the INPUT command handler, it reused the token for the input assign code for the variable. Upon return from command handler, the EOL token handler proceeded to delete the EOL token (which was not the EOL token anymore). To prevent this, a check was added that if the token no longer contains an EOL code, it is assumed to have been used by the command handler and will not be deleted.
The InputBegin code was just being appended to the output, but the first variable had already been added to the output, so it was after the variable instead of at the start of the statement. This code could be inserted at the beginning of the output list, however, this would only work if the INPUT command was at the beginning of the line, which may not be the case (multiple statements per line will be supported).
So instead, the element pointer in the command item will be set to the current last item in the output list when a command is pushed to the command stack. Since this pointer can no longer be checked for null to determine if an input begin code has been added, a new input begin command flag was added, which is set once an input begin code has been added to the output.
Another problem found was that none of the input variables had their reference flag set once added to the output. The find code routine was not checking for a reference (not important) but was clearing the reference flag (a problem) since the token being checked (the input assign code) did not have its reference flag set. The reference flag of the input assign token was set before calling the process final operand routine to make the find code routine work as desired, and then cleared upon return.
The valid INPUT test statements are now working, now on to the INPUT PROMPT statements, which are not working...
When the EOL token was received by the INPUT command handler, it reused the token for the input assign code for the variable. Upon return from command handler, the EOL token handler proceeded to delete the EOL token (which was not the EOL token anymore). To prevent this, a check was added that if the token no longer contains an EOL code, it is assumed to have been used by the command handler and will not be deleted.
The InputBegin code was just being appended to the output, but the first variable had already been added to the output, so it was after the variable instead of at the start of the statement. This code could be inserted at the beginning of the output list, however, this would only work if the INPUT command was at the beginning of the line, which may not be the case (multiple statements per line will be supported).
So instead, the element pointer in the command item will be set to the current last item in the output list when a command is pushed to the command stack. Since this pointer can no longer be checked for null to determine if an input begin code has been added, a new input begin command flag was added, which is set once an input begin code has been added to the output.
Another problem found was that none of the input variables had their reference flag set once added to the output. The find code routine was not checking for a reference (not important) but was clearing the reference flag (a problem) since the token being checked (the input assign code) did not have its reference flag set. The reference flag of the input assign token was set before calling the process final operand routine to make the find code routine work as desired, and then cleared upon return.
The valid INPUT test statements are now working, now on to the INPUT PROMPT statements, which are not working...
Sunday, March 20, 2011
INPUT Translation – Variable Handling and Ending
To look up the appropriate input assign code, the process final operand routine is used, which calls the find code routine that checks the token on the done stack to see if the reference flag is set since the input assign codes have the reference flag (this should not be necessary since the INPUT command uses the recently implemented reference mode). The reference variable will be popped from the done stack and the input assign code will be appended to the output before returning.
The InputAssignInt and InputAssignStr are associated codes for the InputAssign code. Once the input assign code has been appended to the output, the appropriate input parse code needs to be inserted after the input begin code or last input parse code. The easiest way to get the appropriate input parse code was to have each (InputParse, InputParseInt, and InputParseStr) be associated codes to the input assign codes. The second associated code will be used for these.
When a final semicolon token is received, the stay on line command flag is set (the same flag used for the PRINT command) and the state is set to end statement. When an end statement token is received, the INPUT command handler check if the stay flag is set and sets the keep sub-code of the INPUT command token, which is then appended to the output.
If an end statement token is received with no semicolon, the INPUT command token is immediately appended to the output without the keep sub-code. The INPUT command handler has now been implemented and the code compiles, so debugging can begin...
The InputAssignInt and InputAssignStr are associated codes for the InputAssign code. Once the input assign code has been appended to the output, the appropriate input parse code needs to be inserted after the input begin code or last input parse code. The easiest way to get the appropriate input parse code was to have each (InputParse, InputParseInt, and InputParseStr) be associated codes to the input assign codes. The second associated code will be used for these.
When a final semicolon token is received, the stay on line command flag is set (the same flag used for the PRINT command) and the state is set to end statement. When an end statement token is received, the INPUT command handler check if the stay flag is set and sets the keep sub-code of the INPUT command token, which is then appended to the output.
If an end statement token is received with no semicolon, the INPUT command token is immediately appended to the output without the keep sub-code. The INPUT command handler has now been implemented and the code compiles, so debugging can begin...
Saturday, March 19, 2011
Translator – End Statement State
Once a semicolon is received at the end of an INPUT statement, no more tokens should be received except for an end-of-statement token. If another token is received, then an error should be reported. To accomplish this, a new end statement state similar to the end expression state is needed. While the end expression state only needs to be checked when operators are expected, the end statement state needs to be checked for all tokens.
The end statement state check was added just before the check for operand or first operand state in the main translator add token routine. When in end statement state, if the token does not have the end statement flag set in its table entry, then an “expected end-of-statement” error is reported against the token.
The access function for getting the table entry flags for a token already checks if the token has a table entry, and if there is not table entry, then zero (no flags) is returned. Currently only the EOL code has the end statement flag set, but eventually the Colon, ELSE and ENDIF tokens will also and possibly other codes.
The end statement state check was added just before the check for operand or first operand state in the main translator add token routine. When in end statement state, if the token does not have the end statement flag set in its table entry, then an “expected end-of-statement” error is reported against the token.
The access function for getting the table entry flags for a token already checks if the token has a table entry, and if there is not table entry, then zero (no flags) is returned. Currently only the EOL code has the end statement flag set, but eventually the Colon, ELSE and ENDIF tokens will also and possibly other codes.
Friday, March 18, 2011
Translation – Define Function Token Issues
For now in the process operand routine, define function with parentheses tokens will not be allowed in command or assignment mode. This will need to be changed later when the DEF command is implemented. This check was also made in the check assignment list item routine.
However, since a define function without a parentheses token is allowed in assignments, the error was set to point to the open parentheses as an "expected equal or comma for assignment" error. The open parentheses is at the end of the token, so to get the error to point to it, the column of the token was incremented by the length of the token minus one.
Previously in the close parentheses token handler, the reference flag was being set for token with parentheses and define function with parentheses tokens. This check was modified to only set the reference flag for token with parentheses.
The reference flag for define function without parentheses tokens was already being set, so no change was needed for these tokens.
However, since a define function without a parentheses token is allowed in assignments, the error was set to point to the open parentheses as an "expected equal or comma for assignment" error. The open parentheses is at the end of the token, so to get the error to point to it, the column of the token was incremented by the length of the token minus one.
Previously in the close parentheses token handler, the reference flag was being set for token with parentheses and define function with parentheses tokens. This check was modified to only set the reference flag for token with parentheses.
The reference flag for define function without parentheses tokens was already being set, so no change was needed for these tokens.
Thursday, March 17, 2011
Language – Define Functions
Before defining what needs to be done with define function tokens in the Translator, a quick review of their syntax is required. The will be two forms of define functions that will be supported, a single line and a multiple line. An example of a single line define function would be:
DEF FNHypot(X,Y)=SQR(X*X+Y*Y)Notice that this form has the same format as an assignment except for the DEF command at the beginning. The define function token in this statement is FNHypot( - in other words, a define function with parentheses. This implies that a define function with parentheses token could appear in assignment mode (assuming the DEF command sets this mode), but only for the DEF command. An example of the multiple line form of the same function would be:
DEF FNHypot(X,Y)The assignment of the define function name (a define function without a parentheses) returns the value for the function. Here a define function without a parentheses can appear in an assignment statement, but only inside a DEF/END DEF block. Since the Translator is not aware of blocks, it will permit an assignment of define function without a parentheses token. It will be the Encoder's job to verify if the assignment is valid.
FNHypot=SQR(X*X+Y*Y)
END DEF
Wednesday, March 16, 2011
Translation – Assignment Token Issues
Finally, some issues were discovered when making the change from command mode to assignment after processing an operand token. There are two main types of operand tokens, ones with parentheses and ones without parentheses.
The operand tokens with parentheses include internal functions, defined user functions (DEF FN) and generic tokens with parentheses (which can be arrays or user functions). Internal functions were already invalid for command or assignment mode except for sub-string functions. A check was added for defined functions with parentheses.
The operand tokens with no parentheses include internal functions with no arguments (currently only RND), constants, define user functions with no arguments, and generic tokens with no parentheses (which can be variables or user functions with no arguments). Internal functions and constants were already invalid because they didn't have the reference flag set when the comma or equal token looked for the reference flag. That left define function tokens...
The operand tokens with parentheses include internal functions, defined user functions (DEF FN) and generic tokens with parentheses (which can be arrays or user functions). Internal functions were already invalid for command or assignment mode except for sub-string functions. A check was added for defined functions with parentheses.
The operand tokens with no parentheses include internal functions with no arguments (currently only RND), constants, define user functions with no arguments, and generic tokens with no parentheses (which can be variables or user functions with no arguments). Internal functions and constants were already invalid because they didn't have the reference flag set when the comma or equal token looked for the reference flag. That left define function tokens...
Tuesday, March 15, 2011
Translation – Command Token Issues
The last problem statements related to unexpected command tokens were:
Both statements start as assignment statements, but assignment mode was not being set (unless preceded by the LET keyword). When an equal token is received expression mode is set, or when a comma token is received assignment list mode is set. The “unexpected command” error was only occurring when the mode was not set to command, which didn't occur with the statements above. Also, this message again does fit with the “expected ...” type of message.
Once a command token is received in command mode, the mode is set according to the token mode in the command's table entry. A change was made to the process operand routine that once an operand token is processed, if in command mode, the operand is assumed the beginning of an assignment statement and so the mode is changed to assignment.
To remove the “unexpected command” error and report a more appropriate error, the command token has to be passed through the rest of the Translator. This will occur when the Translator is not in command mode. So the main add token routine was changed to not report this error if a command token is received and the mode is not command.
Command tokens received in operand state were already being reported correctly since commands are also considered operators, which are not valid operands (unless the operator is a unary operator, which commands are not). Commands are considered operators because some commands can be found where an operator is expected, for example, THEN and ELSE.
Commands tokens received in operator state are only valid if they have a token handler. In the process operator routine, when an operator token does not have a token handler a default operator token handler is called. Before the default operator token handler is called, a check was added to return an appropriate error if the token is a command.
A PRINT BThe first statement gave an “expected operator or end-of-statement” error at the B token. The second statement was actually accepted, but with a strange translation. The problems were caused because when the PRINT command token was received, it was immediately pushed to the command stack because the mode was still set to command.
MID$(A$ PRINT,4)=""
Both statements start as assignment statements, but assignment mode was not being set (unless preceded by the LET keyword). When an equal token is received expression mode is set, or when a comma token is received assignment list mode is set. The “unexpected command” error was only occurring when the mode was not set to command, which didn't occur with the statements above. Also, this message again does fit with the “expected ...” type of message.
Once a command token is received in command mode, the mode is set according to the token mode in the command's table entry. A change was made to the process operand routine that once an operand token is processed, if in command mode, the operand is assumed the beginning of an assignment statement and so the mode is changed to assignment.
To remove the “unexpected command” error and report a more appropriate error, the command token has to be passed through the rest of the Translator. This will occur when the Translator is not in command mode. So the main add token routine was changed to not report this error if a command token is received and the mode is not command.
Command tokens received in operand state were already being reported correctly since commands are also considered operators, which are not valid operands (unless the operator is a unary operator, which commands are not). Commands are considered operators because some commands can be found where an operator is expected, for example, THEN and ELSE.
Commands tokens received in operator state are only valid if they have a token handler. In the process operator routine, when an operator token does not have a token handler a default operator token handler is called. Before the default operator token handler is called, a check was added to return an appropriate error if the token is a command.
Sunday, March 13, 2011
Translation – Other Errors
After changing the “item cannot be assigned” to “expecting item for assignment” error, there were several other errors that didn't fit the “expecting ...” type of message. It turned out that most of these were not actually being used, so there were removed.
Another remaining message was the “missing open parentheses” error that occurs when there is a parentheses with no open parentheses, function or array. After some consideration of possibly leaving this message as is, it was decided to change this to an “expected operator or end-of-expression” error since the problem could also be a missing function or array, or even that the open parentheses was just a mistake.
Again assuming that everything is correct up to the problem, this change seemed appropriate, and “...expression” was used instead of “...statement” because the next token could be a comma or semicolon in a PRINT statement or a THEN in an IF statement.
The last message was the “unexpected command” error that occurs when there is a command token when not in command mode. However, there were a number of additional problems with command tokens received when not expected...
Another remaining message was the “missing open parentheses” error that occurs when there is a parentheses with no open parentheses, function or array. After some consideration of possibly leaving this message as is, it was decided to change this to an “expected operator or end-of-expression” error since the problem could also be a missing function or array, or even that the open parentheses was just a mistake.
Again assuming that everything is correct up to the problem, this change seemed appropriate, and “...expression” was used instead of “...statement” because the next token could be a comma or semicolon in a PRINT statement or a THEN in an IF statement.
The last message was the “unexpected command” error that occurs when there is a command token when not in command mode. However, there were a number of additional problems with command tokens received when not expected...
Translation – Parentheses Issue
The next problem statement was:
The crash occurred because the open parentheses token was returned for the error with its range extended to the closing parentheses to report the entire (A$). The caller deleted the error token since it was an open parentheses to prevent a memory leak. However, in this case, the A$ token was still on top of the done stack with the open parentheses attached as the first operand. When the Translator clean up routine (called upon an error) was emptying the done stack, it deletes each item's first and last operand – the open parentheses was getting deleted twice causing the crash.
Initially to fix this problem, when an error occurs and the first through last operand is returned, the first operand pointer for the item on the done stack was set to null to prevent it from being deleted a second time. While this fix was sufficient for the statement above, this statement still were not being reported correctly:
MID$((A$),4)=""This was reported as an “item cannot be assigned” error and then crashed. Again, this error didn't fit the “expected ...” messages. This error is also returned for statements like 3=A and 1,A=B and was renamed to the “expecting item for assignment” error. For the statement above, the “expecting string variable” error should be returned.
The crash occurred because the open parentheses token was returned for the error with its range extended to the closing parentheses to report the entire (A$). The caller deleted the error token since it was an open parentheses to prevent a memory leak. However, in this case, the A$ token was still on top of the done stack with the open parentheses attached as the first operand. When the Translator clean up routine (called upon an error) was emptying the done stack, it deletes each item's first and last operand – the open parentheses was getting deleted twice causing the crash.
Initially to fix this problem, when an error occurs and the first through last operand is returned, the first operand pointer for the item on the done stack was set to null to prevent it from being deleted a second time. While this fix was sufficient for the statement above, this statement still were not being reported correctly:
MID$(-A$,4)=""The error was “expected numeric expression” pointing to the A$. Both the open parentheses and the minus are initially processed in the process unary operator routine. So a check was added to this routine to return an error if there is a sub-string function on top of the hold stack with its reference flag set (sub-string assignment) and it is as the first operand. Both of theses statements then correctly reported “expected string variable” at the open parentheses and minus. The initial fix was not necessary since the error was being caught sooner.
Saturday, March 12, 2011
Translation – Another Print Function Issue
While the new reference mode was being implemented, some problems were discovered in the code that carefully constructed statements would exploit giving incorrect results. The first problem statement was:
The error occurred when trying to get the type of expression because the hold stack was empty (first item to figure out the expression type) so it when to the command stack, which was also empty. This check should not have caught this because there was a check after this with for catching all internal functions in command mode. The check was modified to if the command stack is not empty and if there is a PRINT on top of the command stack or the hold stack is not empty.
TAB(10)=AThis caused a “command stack empty for expression” bug error. This should have been “expecting command” error. The check was if the command stack was empty or there was a PRINT command on top of the command stack or if the top of the hold stack was not empty (except for the null token).
The error occurred when trying to get the type of expression because the hold stack was empty (first item to figure out the expression type) so it when to the command stack, which was also empty. This check should not have caught this because there was a check after this with for catching all internal functions in command mode. The check was modified to if the command stack is not empty and if there is a PRINT on top of the command stack or the hold stack is not empty.
Thursday, March 10, 2011
Translator – Reference Mode
A new token mode is needed for the INPUT commands. The current token modes are command, assignment, assignment list and expression. Consider these invalid INPUT statements:
However, in the second statement, an error occurs before the INPUT command handler gets a chance to report an error. When the multiplication token checks its second operand, it will report an “expecting numeric expression” error pointing at the C$ token. This would make no sense. The proper error for both of these statements would be an “expecting semicolon, comma or end-of-statement” error pointing at the add token.
The new reference mode will only accept variables and array elements, specifically tokens with no parentheses and tokens with parentheses. If these tokens turn out to be user functions, which the Translator cannot determine, the error will be reported by the Encoder. Reference mode will be set when the INPUT command is received and after the comma or semicolon of the prompt string expression of the INPUT PROMPT command. It will also be used later for the READ command.
In reference mode, internal functions, define functions, and unary operators will be reported as invalid (“expecting variable” error). After the variable or array element token is pushed to the done stack, the state will be set to end expression so that only end expression tokens are valid. Binary operators will then be correctly be reported as invalid.
While reference mode was being implemented, some more problems were found in the Translator code...
INPUT A+B*CThe INPUT commands must have variables, not expressions. If expression mode was used, at each comma and the end of the statement, the INPUT command handler would need to check the token on top of the done stack to see if its reference flag is set. The first statement above, the INPUT command handler will see the multiplication token on top of the stack. Its first token will be A and last token will be C. An “expecting variable” error would be reported pointing to the whole A+B*C expression. This would be acceptable.
INPUT A+B*C$
However, in the second statement, an error occurs before the INPUT command handler gets a chance to report an error. When the multiplication token checks its second operand, it will report an “expecting numeric expression” error pointing at the C$ token. This would make no sense. The proper error for both of these statements would be an “expecting semicolon, comma or end-of-statement” error pointing at the add token.
The new reference mode will only accept variables and array elements, specifically tokens with no parentheses and tokens with parentheses. If these tokens turn out to be user functions, which the Translator cannot determine, the error will be reported by the Encoder. Reference mode will be set when the INPUT command is received and after the comma or semicolon of the prompt string expression of the INPUT PROMPT command. It will also be used later for the READ command.
In reference mode, internal functions, define functions, and unary operators will be reported as invalid (“expecting variable” error). After the variable or array element token is pushed to the done stack, the state will be set to end expression so that only end expression tokens are valid. Binary operators will then be correctly be reported as invalid.
While reference mode was being implemented, some more problems were found in the Translator code...
Sunday, March 6, 2011
Translation – PRINT Function Problem
Due to the problems found with the print codes, some additional error tests were added to translator test 10 (PRINT statement tests) including these statements:
In the fourth statement, the error can be caught by checking if the hold stack is not empty (an empty hold stack has only a null token). In this case, an operator will be on top of the hold stack. In fact, this check can replace the count stack is not empty check because the open parentheses, internal function and array/user function will be on top of the hold stack.
The last statement was more difficult to check for - the expression should end after the print function. When the operator was received and checked its operand, a bug occurred because the done stack is empty since the print function didn't get pushed to the done stack. The error should be “expected semicolon, comma or end-of-statement” and point to the operator.
To catch this error, a new end expression state was added. The closing parentheses is received when in binary operator state. After the closing parentheses, the state was left at binary operator since another binary operator is expected (an end of expression token is acceptable as a binary operator). For the end expression state, only operators with the end expression flag are acceptable, which currently include the semicolon, comma and end-of-line tokens – other operators will generate the error.
In resolving these issues, the “invalid used of print function” error did not match the other “expecting...” errors (remember the goal is to help the user by suggesting what is expected at the location of an error). Therefore, this error was changed to be one of the “expecting xxx expression” depending on the current expression type (xxx would be blank, numeric or string).
PRINT (TAB(10))The first three test the situation of a print function inside a parentheses, internal function and an array or user function, which were caught by adding count stack is not empty check when a print function token is received.
PRINT INT(TAB(10))
PRINT A(TAB(10))
PRINT A+TAB(10)
PRINT TAB(10)+A
In the fourth statement, the error can be caught by checking if the hold stack is not empty (an empty hold stack has only a null token). In this case, an operator will be on top of the hold stack. In fact, this check can replace the count stack is not empty check because the open parentheses, internal function and array/user function will be on top of the hold stack.
The last statement was more difficult to check for - the expression should end after the print function. When the operator was received and checked its operand, a bug occurred because the done stack is empty since the print function didn't get pushed to the done stack. The error should be “expected semicolon, comma or end-of-statement” and point to the operator.
To catch this error, a new end expression state was added. The closing parentheses is received when in binary operator state. After the closing parentheses, the state was left at binary operator since another binary operator is expected (an end of expression token is acceptable as a binary operator). For the end expression state, only operators with the end expression flag are acceptable, which currently include the semicolon, comma and end-of-line tokens – other operators will generate the error.
In resolving these issues, the “invalid used of print function” error did not match the other “expecting...” errors (remember the goal is to help the user by suggesting what is expected at the location of an error). Therefore, this error was changed to be one of the “expecting xxx expression” depending on the current expression type (xxx would be blank, numeric or string).
Saturday, March 5, 2011
Translation – PRINT Code Issues
The process final operand currently doesn't push print codes to the done stack. This needs to be expanded to include the input begin prompt codes. It would be inconvenient not to keep expanding this test to include additional codes that don't need to be pushed to the done stack. The check could be changed to see if the code does not have a return data type (it is set to none). This would include the print codes, the input begin codes, the input parse type codes, the input assign type codes, and probably many more.
However, when this change was made, it did not work for the TAB and SPC print functions because the token data type for these had been incorrectly set to double. This occurred in the set default data type function when the token was received. This function was fixed to not set the data type for internal functions (these are set from their table entry data type).
Now when the token's data type is none, it won't be pushed to the done stack. The check if the command on top of the command stack is the PRINT command only applies to print functions, but is not necessary here as this check was already made when the print function was first received, so the check was removed. The check if the token's has the print flag remains (to set the stay and print function command flags, which is used by the print command handler to determine if the final print code should be appended to the output).
For the print type codes, when process final operand routine is called by the add print code routine, the second token passed is a null, not a closing parentheses. The process final operand was deleting the second token for print functions assuming that it was a closing parentheses (which only applies to TAB and SPC). For some reason, doing a delete with a null argument was not causing a problem. None the less, a check was added to only delete the second token if it is not a null.
Another problem was discovered for print functions. The situation if a TAB or SPC was contained within parentheses, internal function, or and array/user function, was not caught since it was only checking if there was a print command. Therefore, an additional check was added to make sure the count stack is also empty.
However, when this change was made, it did not work for the TAB and SPC print functions because the token data type for these had been incorrectly set to double. This occurred in the set default data type function when the token was received. This function was fixed to not set the data type for internal functions (these are set from their table entry data type).
Now when the token's data type is none, it won't be pushed to the done stack. The check if the command on top of the command stack is the PRINT command only applies to print functions, but is not necessary here as this check was already made when the print function was first received, so the check was removed. The check if the token's has the print flag remains (to set the stay and print function command flags, which is used by the print command handler to determine if the final print code should be appended to the output).
For the print type codes, when process final operand routine is called by the add print code routine, the second token passed is a null, not a closing parentheses. The process final operand was deleting the second token for print functions assuming that it was a closing parentheses (which only applies to TAB and SPC). For some reason, doing a delete with a null argument was not causing a problem. None the less, a check was added to only delete the second token if it is not a null.
Another problem was discovered for print functions. The situation if a TAB or SPC was contained within parentheses, internal function, or and array/user function, was not caught since it was only checking if there was a print command. Therefore, an additional check was added to make sure the count stack is also empty.
Thursday, March 3, 2011
INPUT Translation – Some Issues
As the INPUT command handler was being implemented, some issues were found. When the new element pointer was added to the command item, it was noticed that there was a code member. This member is no longer needed because the token now contains the code (replacing the index member, through the code is now an index), so the code member was removed from the command item.
There was no table entry for the two word INPUT PROMPT command, so one was added along with the three input begin codes. It was noticed that some table entries still had the string flag set. This is not necessary because the string flag is now set automatically during table initialization it there are any string arguments, so these string flags were removed.
To look up which input begin code to append for the INPUT PROMPT command, the process final operand routine will be called, which in turn will call the find code routine that will pick the correct code based on the type (string or temporary string) that is on the done stack. The input begin codes will not push anything to the done stack (accomplished by setting the done push flag to false). Some additional issues were found in these routines...
There was no table entry for the two word INPUT PROMPT command, so one was added along with the three input begin codes. It was noticed that some table entries still had the string flag set. This is not necessary because the string flag is now set automatically during table initialization it there are any string arguments, so these string flags were removed.
To look up which input begin code to append for the INPUT PROMPT command, the process final operand routine will be called, which in turn will call the find code routine that will pick the correct code based on the type (string or temporary string) that is on the done stack. The input begin codes will not push anything to the done stack (accomplished by setting the done push flag to false). Some additional issues were found in these routines...
Wednesday, March 2, 2011
INPUT Translation – Variables
Processing variables is a bit more complicated. With the PRINT statement, the appropriate print value type code was simply appended to the output. However, with the INPUT statement, an input parse code needs to be inserted after the begin code or after last parse code (before all of the variables), and an input assign code needs to be appended to the end of the output (after the variable).
There needs to be a way to point to the location where the input parse codes are to be inserted. This will be accomplished with a new output list element pointer member to the command stack item. This pointer will be initialized to null when a new command token is pushed to the command stack.
When an input begin code is appended to the output, this new element pointer will be set to the input begin code element. This pointer can also be used to indicate if any input variables have been received yet (when it does not contain a null).
To insert a input parse code, the input parse token will be appended to the list at (after) this element pointer. The element pointer will then be set to the output list element of the input parse token just inserted so that the next input parse token will be inserted after this input parse token.
There needs to be a way to point to the location where the input parse codes are to be inserted. This will be accomplished with a new output list element pointer member to the command stack item. This pointer will be initialized to null when a new command token is pushed to the command stack.
When an input begin code is appended to the output, this new element pointer will be set to the input begin code element. This pointer can also be used to indicate if any input variables have been received yet (when it does not contain a null).
To insert a input parse code, the input parse token will be appended to the list at (after) this element pointer. The element pointer will then be set to the output list element of the input parse token just inserted so that the next input parse token will be inserted after this input parse token.
Subscribe to:
Posts (Atom)