Execution of the INPUT command has been defined and therefore the form of the translation, the actual process of the translation can now be defined. The translation begins with one of the InputBegin codes, which will be triggered by a comma or semicolon token. The INPUT or INPUT PROMPT token will already be on top of the command stack.
For the INPUT command, when the first comma, a semicolon or an end-of-line token is received, an InputBegin token will be appended to the output. The token received can be converted to the InputBegin token (more efficient to change the token than to delete the token not needed and then create a new one).
For the INPUT PROMPT command, when a comma or semicolon token is received, there must be a string on top of the done stack (the prompt string expression). Depending on whether this string is temporary or not will determine whether an InputBeginStr or InputBeginTmp will be appended to the output. For InputBeginStr, the string will be attached since the translator will not know if it is a variable, array or a user function. The token received can be converted to this token. An end-of-line token at this point would produce an “expected comma or semicolon” error.
Monday, February 28, 2011
Sunday, February 27, 2011
Automatic Code Enumeration Generation
To have the Code enumeration generated automatically from the table entries, the table source was structured so the awk scripts can read it. Since the Code enumeration value will now be the same as the table entry index, the code member of the table entry is not necessary and was removed. The code name initializers in the table entries was moved to a comment on the line of the entries' open brace.
The awk scripts were rewritten to read the table source file instead of the main include file. The awk script were also changed to read the table source file directly and write the output files directly. This eliminates the requirement to redirect the input and output to the correct files when running the awk scripts. Logic was also added to the codes awk script to check for duplicate code names.
The code to index conversion array that was initialized in the table class constructor, along with the check for duplicate and missing codes, was removed. The code and index access functions in the table class were also removed. The token class index member was replaced with a code member. All the code was updated to use the code enumeration value instead of the index, though the code will be used as an index.
One problem with using an enumeration value instead of an integer index is that normal math functions cannot be used, like the add and increment operators. These operators are needed, so operator functions were created for the code enumeration, which includes the add, prefix increment and postfix increment operators. These functions type cast to integer to add and then type cast back to the code enumeration value.
The null code entry at the end of the table was moved to the beginning so that that null code enumeration value (index) would be zero. The table search function for searching for an immediate command previously assumed that the immediate commands were at the beginning of the table entry array. Moving the null code to be the beginning of the array complicated this. Therefore, immediate command bracketing codes were put around these entries for this search function.
Due to the these table entry changes, the parser test output files were updated since all the code indexes changed. This would be a good time to make another pre-release, but since there have been no download activity for recent pre-releases, there will not be a pre-release at this time. Now the translation of the INPUT command can begin...
The awk scripts were rewritten to read the table source file instead of the main include file. The awk script were also changed to read the table source file directly and write the output files directly. This eliminates the requirement to redirect the input and output to the correct files when running the awk scripts. Logic was also added to the codes awk script to check for duplicate code names.
The code to index conversion array that was initialized in the table class constructor, along with the check for duplicate and missing codes, was removed. The code and index access functions in the table class were also removed. The token class index member was replaced with a code member. All the code was updated to use the code enumeration value instead of the index, though the code will be used as an index.
One problem with using an enumeration value instead of an integer index is that normal math functions cannot be used, like the add and increment operators. These operators are needed, so operator functions were created for the code enumeration, which includes the add, prefix increment and postfix increment operators. These functions type cast to integer to add and then type cast back to the code enumeration value.
The null code entry at the end of the table was moved to the beginning so that that null code enumeration value (index) would be zero. The table search function for searching for an immediate command previously assumed that the immediate commands were at the beginning of the table entry array. Moving the null code to be the beginning of the array complicated this. Therefore, immediate command bracketing codes were put around these entries for this search function.
Due to the these table entry changes, the parser test output files were updated since all the code indexes changed. This would be a good time to make another pre-release, but since there have been no download activity for recent pre-releases, there will not be a pre-release at this time. Now the translation of the INPUT command can begin...
Saturday, February 26, 2011
Code Enumeration vs. Table Entry Indexes
While designing the error handling mechanism for the INPUT command, a thought occurred related to program codes. For efficient program execution at run-time, the index of the table entry will be stored in the internal program code, not the code enumeration value.
If the code enumeration value was used, the index for the table entry (needed to get the run-time handler function pointer) would first need to be converted to an index by going through the code to index array setup during table initialization. The intention all along has been to use table entry indexes in the internal program code.
For the INPUT command's error recovery, when it is backing up execution and checking for input parse codes, it will need to convert the table entry indexes to a code before it can check if it is an input parse code. This would not efficient during program execution. Even though for the INPUT command, execution time is not critical since it is about to stop and wait for user input. A few extra program cycles won't matter much. But this problem could occur for other more critical commands.
Therefore, it is desirable if the code enumeration values were the same as the table entry indexes. One simple solution is to make sure the code enumeration values matched the table entries. Unfortunately this relies on the programmer to keep the two in sync, is very error prone and is just a general pain to begin with.
There is a better way where the code enumeration is generated automatically from the table entries using an awk script. This method would be similar to how the test_codes.h file (used by the test_ibcp.cpp source file) is generated automatically by scanning for codes in the ibcp.h file.
If the code enumeration value was used, the index for the table entry (needed to get the run-time handler function pointer) would first need to be converted to an index by going through the code to index array setup during table initialization. The intention all along has been to use table entry indexes in the internal program code.
For the INPUT command's error recovery, when it is backing up execution and checking for input parse codes, it will need to convert the table entry indexes to a code before it can check if it is an input parse code. This would not efficient during program execution. Even though for the INPUT command, execution time is not critical since it is about to stop and wait for user input. A few extra program cycles won't matter much. But this problem could occur for other more critical commands.
Therefore, it is desirable if the code enumeration values were the same as the table entry indexes. One simple solution is to make sure the code enumeration values matched the table entries. Unfortunately this relies on the programmer to keep the two in sync, is very error prone and is just a general pain to begin with.
There is a better way where the code enumeration is generated automatically from the table entries using an awk script. This method would be similar to how the test_codes.h file (used by the test_ibcp.cpp source file) is generated automatically by scanning for codes in the ibcp.h file.
INPUT Execution – Error Handling
Errors can occur while parsing the input – in one of the input parse codes. When an error occurs, after it is reported, the already parsed values in the temporary input values stack need to be thrown away and execution needs to resume at the beginning of the INPUT statement, except instead of issuing the prompt again, the cursor will be positioned at the beginning of the input, which will contain the previously erroneous input to allow the user to correct the input instead of reentering it (the traditional “redo from start”).
The temporary input values stack can't simply be reset (setting the internal index to -1, the empty stack indicator) because elements may contain allocated string values, which need to be deleted to prevent memory leaks. Like the evaluation stack, the elements in the temporary input values stack won't have any indicator what data type they are. Consider the basic format of the INPUT statement (only up to the parsing codes is shown):
As each code is passed, one element will be popped from the temporary input values stack. If the code passed is an InputParseStr code, then the element popped is a string that needs to be deleted. The beginning is reached when a non input parse code is reached (it could be InputBegin, InputBeginStr or InputBeginTmp). The stack will now be empty.
The INPUT begin code will be executed again and the begin code will call the get input routine. The get input routine will normally allocates the temporary input values stack. For error recovery, it will see that the stack is already allocated, so instead of outputting the prompt and saving the cursor position to the beginning of the input, it will restore the saved cursor position and get the input starting with the previously entered erroneous input. Execution will then resume with the corrected input.
The temporary input values stack can't simply be reset (setting the internal index to -1, the empty stack indicator) because elements may contain allocated string values, which need to be deleted to prevent memory leaks. Like the evaluation stack, the elements in the temporary input values stack won't have any indicator what data type they are. Consider the basic format of the INPUT statement (only up to the parsing codes is shown):
InputBegin InputParseType1 InputParseType2 InputParseType3'End' ...Say an error occurs on the second parse code (the program execution pointer will be pointing at next code, the InputParseType3, in other words, the pointer is incremented after reading each program code word, then the code read is executed by calling its run-time handler). Execution needs to be backed up until the InputBegin is reached (reading the program codes in reverse).
As each code is passed, one element will be popped from the temporary input values stack. If the code passed is an InputParseStr code, then the element popped is a string that needs to be deleted. The beginning is reached when a non input parse code is reached (it could be InputBegin, InputBeginStr or InputBeginTmp). The stack will now be empty.
The INPUT begin code will be executed again and the begin code will call the get input routine. The get input routine will normally allocates the temporary input values stack. For error recovery, it will see that the stack is already allocated, so instead of outputting the prompt and saving the cursor position to the beginning of the input, it will restore the saved cursor position and get the input starting with the previously entered erroneous input. Execution will then resume with the corrected input.
Friday, February 25, 2011
INPUT Execution – Temporary Values
The values parsed from the entered input to be assigned to the input variables need to be stored somewhere other than the evaluation stack. The logical place is another temporary input values stack. This stack will be allocated and initialized in the get input routine and will be removed by the final INPUT command code.
After a value is parsed by one of the input parse codes, it will be saved (pushed) to a temporary input values stack. At the last input parse code (with the 'End' sub-code), an index to be used to access this stack will be set to zero. As each value is assigned by an input assign code, this index will be incremented. In other words, this stack is being used as First-In-First-Out out list instead of a Last-In-First-Out standard stack.
The SimpleStack (to be renamed to just Stack since there is no other class named Stack), does not currently have this mechanism. This will be added when the run-time code implemented, but it is not needed currently for translating the INPUT command, so this addition will wait.
After a value is parsed by one of the input parse codes, it will be saved (pushed) to a temporary input values stack. At the last input parse code (with the 'End' sub-code), an index to be used to access this stack will be set to zero. As each value is assigned by an input assign code, this index will be incremented. In other words, this stack is being used as First-In-First-Out out list instead of a Last-In-First-Out standard stack.
The SimpleStack (to be renamed to just Stack since there is no other class named Stack), does not currently have this mechanism. This will be added when the run-time code implemented, but it is not needed currently for translating the INPUT command, so this addition will wait.
Thursday, February 24, 2011
INPUT Execution Codes – Ending
There will be the final INPUT command code at the end of the input statement. Besides cleaning up, the only action that needs to be performed is to advance to the next line if the 'Keep' sub-code is not set (this sub-code is set when there is a semicolon at the end of the INPUT statement). In summary, the “INPUT I%,A(I%)” statement will be translated as:
InputBegin InputParseInt InputParseDbl'End' I%<ref> InputAssignInt I% A(<ref> InputAssignDbl InputAs usual for RPN format, the command is at the end of the translation. What remains to be designed is where the temporary input values will be stored and how errors will be handled. An error can occur in the parsing codes. When an error occurs, execution must go back to the InputBegin, however, the prompt does not need to be output again, the cursor only needs to be positioned back where is was after the prompt was output (after the error is reported).
Wednesday, February 23, 2011
INPUT Execution Codes - Assigning
The assigning of the input values must be done separately from the parsing of the values entered due to the two input rules. For the example, this is steps 7 through 11. Some of these steps are standard expression codes: push a reference to an integer variable (step 7), push a value to an integer variable (step 9), and calculating a reference to an array element by popping a integer subscript value and pushing the reference to the element (step 11).
There will be a code to assign an input value to an input variable for each data type: InputAssignInt, InputAssignDbl and InputAssignStr. For InputAssignStr, the InputParseStr will created a string from the input value. This string will be assigned to the string variable replacing the previous string value, which will be deleted. Therefore, there will be no need to deal with temporary strings.
The values being assigned will need to stored temporarily somewhere other than the evaluation stack.
There will be a code to assign an input value to an input variable for each data type: InputAssignInt, InputAssignDbl and InputAssignStr. For InputAssignStr, the InputParseStr will created a string from the input value. This string will be assigned to the string variable replacing the previous string value, which will be deleted. Therefore, there will be no need to deal with temporary strings.
The values being assigned will need to stored temporarily somewhere other than the evaluation stack.
Tuesday, February 22, 2011
INPUT Execution Codes - Parsing
The parsing of the entered input values must be done separately from the assignment of the values entered to the input variables. This a departure from the design previously described and is due to the two input rules. For the example, this is steps 3 through 6. Notice that after a value is parsed (steps 3 and 5), a check is made for the next character. This character must be a comma after each value except for the last value where an end-of-line (no character) is expected.
There will be a code to parse an input value for each data type: InputParseInt, InputParseDbl and InputParseStr. An 'End' sub-code will be set on the last parse code. If this sub-code is not set, then the next character must be a comma, otherwise an end-of-line (no character) is expected.
The values that are parsed need to be put somewhere. If input values were pushed on to the evaluation stack, then when the references to the input variables are pushed, the input values would be down the stack and would not be easily accessible. Therefore, the evaluation stack can't be used.
There will be a code to parse an input value for each data type: InputParseInt, InputParseDbl and InputParseStr. An 'End' sub-code will be set on the last parse code. If this sub-code is not set, then the next character must be a comma, otherwise an end-of-line (no character) is expected.
The values that are parsed need to be put somewhere. If input values were pushed on to the evaluation stack, then when the references to the input variables are pushed, the input values would be down the stack and would not be easily accessible. Therefore, the evaluation stack can't be used.
Monday, February 21, 2011
INPUT Execution Codes - Prompting
Execution and translation of the INPUT command was previously described mostly in posts on June 24, 2010, June 25, 2010 and June 27, 2010 . This now needs to be revised since the execution broke the two rules listed at the end on February 16, 2011. Steps 1 and 2 handle issuing the prompt and getting input from the user. This is almost the same as previously defined, which are these codes:
There will be another argument for whether to output the default prompt where InputBegin will set this to true and the other two will set to true if the 'Question' sub-code is set (if the prompt string expression was followed by a comma instead of a semicolon).
For InputBeginTmp, the temporary string can be deleted upon returning from the get input routine since it will no longer be needed with the improvement in execution described in last Saturday's posts.
InputBegin – output default prompt and get inputDuring execution, the run-time handlers for each of these codes will call a common routine for getting input. This common get input routine can also handle outputting the prompt and can simply use the string on top of the evaluation stack. There will be an argument for whether to output the prompt string on top of the stack and InputBegin will set this argument to false.
InputBeginStr – output string prompt and get input
InputBeginTmp – output string prompt, get input and delete temporary string
There will be another argument for whether to output the default prompt where InputBegin will set this to true and the other two will set to true if the 'Question' sub-code is set (if the prompt string expression was followed by a comma instead of a semicolon).
For InputBeginTmp, the temporary string can be deleted upon returning from the get input routine since it will no longer be needed with the improvement in execution described in last Saturday's posts.
Saturday, February 19, 2011
INPUT Command – Execution Procedure
To determine how the INPUT command will be encoded into internal memory, the internal codes need to be arranged according to how the INPUT command will be executed including handling errors. Using the same “INPUT I%,A(I%)” example statement from before, here is the procedure (for the moment without taking into account error handling):
- Issue prompt (for this statement, the default “? ” prompt)
- Get the input from the user (allowing for editing like backspace, cursor left/right, insert/overwrite, delete, and terminated by enter)
- Parse an integer value from the input for I% and save it (I% can't actually be assigned yet)
- Check if the next character in the input is a comma
- Parse a double value from the input for A(I%) and save it (A(I%) can't be assigned yet since I% hasn't been assigned)
- Check if there are no more characters
- Push reference of I% to evaluation stack
- Assign saved integer value to reference on top of stack – I%
- Push value of I% to evaluation stack (I% has now been assigned)
- Calculate reference for array A by popping index from evaluation stack, push calculated reference, A(I%), to stack
- Assign saved double value to reference on top of stack – A(%)
- End of INPUT command, advance to next line of output
INPUT Command – Execution Improvement
An improvement can be made to the execution of the INPUT command that will simplify the execution and will work nicer from the user standpoint. The traditional implementation of the INPUT command was designed to work with Teletypes. This has no place on modern computers. This has to do with what happens when the input entered by the user is invalid.
After surveying several BASIC implementations, all do a form of “Redo from start” on a new line and then reissuing the prompt on another new line for the input again and forcing the user to start their input from the beginning. Many of the implementations don't even check the presence for all the values requested or even accept string values and then simply set the non-entered or invalid values to zero. This is very sloppy programming and forces the programmer to do more work validating input.
A better alternative is to properly parse and validate the input entered. If something isn't valid, then output a temporary error message, point to where the error is (so the user doesn't have to guess what was wrong) and then allow the user to edit their input. Using extra output lines is unnecessary. The error message will be removed from the screen when INPUT is done. This also simplifies run-time in that the prompt string does not need to be saved until the end of the INPUT statement (deleting it if is a temporary string) since it now only needs to be output once.
There are other enhancements that can be made to the INPUT command, like having a fixed length input field with an optional template and accepting special exit keys (function keys, escape, page up/down, arrow up/down, etc) that the programmer can check for, but this will be later once the project is up and running. For now, the translation of current INPUT command needs to be implemented. As always, before the translation can be implemented, some idea how the INPUT command will work at run-time is needed to determine what tokens need to be put into the RPN output list.
After surveying several BASIC implementations, all do a form of “Redo from start” on a new line and then reissuing the prompt on another new line for the input again and forcing the user to start their input from the beginning. Many of the implementations don't even check the presence for all the values requested or even accept string values and then simply set the non-entered or invalid values to zero. This is very sloppy programming and forces the programmer to do more work validating input.
A better alternative is to properly parse and validate the input entered. If something isn't valid, then output a temporary error message, point to where the error is (so the user doesn't have to guess what was wrong) and then allow the user to edit their input. Using extra output lines is unnecessary. The error message will be removed from the screen when INPUT is done. This also simplifies run-time in that the prompt string does not need to be saved until the end of the INPUT statement (deleting it if is a temporary string) since it now only needs to be output once.
There are other enhancements that can be made to the INPUT command, like having a fixed length input field with an optional template and accepting special exit keys (function keys, escape, page up/down, arrow up/down, etc) that the programmer can check for, but this will be later once the project is up and running. For now, the translation of current INPUT command needs to be implemented. As always, before the translation can be implemented, some idea how the INPUT command will work at run-time is needed to determine what tokens need to be put into the RPN output list.
Wednesday, February 16, 2011
Translator – INPUT Command – New Problem
A few new ideas for the INPUT command have been developed that may simplify the execution of the INPUT command and work better than traditional implementation (more on this in a bit). Upon reviewing the previous posts on the INPUT command and reading the ANSI Standard document for the INPUT command again, a problem was discovered with the previous design. The problem can be shown with this INPUT statement:
INPUT I%,A(I%)This is probably bad programming - if the value entered is outside the range of the array, an exception occurs. In any case, it is allowed. Both GW-Basic and QBASIC act as expected where the first value entered becomes the index of the element that is assigned the second value. However, the design laid before won't act as expected. Consider the planned translation for this statement:
InputGet I%<ref> InputInt I% A(<ref> InputDbl InputThis will not work, consider the items pushed to the evaluation stack upon reaching the Input code at the end:
- Input information (prompt is specified, question flag, and location)
- I%<ref>
- Parsed integer value from input
- Pointer to the integer assign routine
- A(I%)<ref> (after I% pushed, popped and reference to A(I%) calculated)
- Parsed double value from input
- Pointer to the double assign routine
- Can't put references to the evaluation stack until previous values have been assigned.
- Can't assign any variables to the input values until the entire input has been parsed and validated.
Monday, February 14, 2011
Translator – Restructure Pre-Release
Since the restructuring of the code was significant, it is probably a good time to make a pre-release of the code before commencing with the INPUT command. The file ibcp_0.1.15-pre-1-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. When uploading these files, it was discovered that the ibcp_0.1.14-src.zip was missing the complete set of test files, and so was uploaded again. The shell script used to automatically generate a release was using the wrong list to generate the source zip file. Now on to the INPUT command (finally)...
Sunday, February 13, 2011
Translator – More Code Restructuring
It turns out there were really no more issues, just some minor bugs that needed to be corrected. However, in looking at the error messages, the “expected statement” just didn't seem to be as clear as it could be and so was changed to “expected command” along with the name of the corresponding token status.
The size of the add token routine has been getting out of hand for some time. It's not really a good idea to let a function get so big because it becomes much harder to understand and maintain. So it is time to break it up into smaller functions. Generally, no variables need to be passed between these functions except for a reference to the token pointer and token status (usually the return value). The add token function was broken up into these functions:
The size of the add token routine has been getting out of hand for some time. It's not really a good idea to let a function get so big because it becomes much harder to understand and maintain. So it is time to break it up into smaller functions. Generally, no variables need to be passed between these functions except for a reference to the token pointer and token status (usually the return value). The add token function was broken up into these functions:
process operand – handles operands when in operand status and token is not an operator
end expression error – gets the error when an expression is ended prematurely (an end expression token is received in operand state) and is only called when the state is not first operand
process unary operator – handles the checking if an operator token is a unary operator received in operand state including processing open parentheses tokens
process binary operator – handles tokens when in binary operator state
process_operator – empties higher precedence operators from hold stack adding them to the output list (functionality merged with the add operator function, which was removed since it only contains a few lines), then calls the token handler for the operator tokenNow the code will be a bit more manageable as the INPUT and other commands are implemented.
Saturday, February 12, 2011
Translator – PRINT and Assign Restructuring
Only the PRINT and Assign commands commands are implemented so far, which were restructured where the related code for these commands contained in the comma and semicolon tokens was relocated into the command handlers. These token handlers will now call the command handlers just like the end-of-line token handler does. A switch was added to the command handlers on the code of the token passed in. Previously this token was only the end-of-line token.
The code from the end-of-line token handler that called a command handler was moved to a new call command handler routine that does as described in the last post except at the beginning, a check was added if the command stack is empty. This condition can occur in an assignment statement before a comma or equal token has been received. A null token status will be returned for the caller to report the appropriate error.
For the comma token handler, a null token status can only occur in expression only test mode because this mode does not put a command on the command stack. For the semicolon token handler, a null token status can occur if the semicolon is at the beginning of the line (where a command is expected) or after a variable (where a comma or equal is expected). The same conditions don't occur at a comma because there are specific checks depending on the current mode (command, assignment list or expression), where as the semicolon doesn't check the mode.
For the PRINT command handler, the code related code was moved moved from these token handlers into the appropriate cases of the switch. Since no other tokens are currently expected, the default case was set to return an unexpected token bug error.
The Assign Command handler only expects an end-of-line token, so for the default case, the assignment list mode returns a equal or comma expected error and expression mode returns an operator or end of statement expected error when expecting a binary operator or a numeric or string expression expected error when expecting an operand. Before returning the error, the command item token needed to be set to the token passed in to point the error to the token causing the error.
A few other issues were discovered after the code was restructured and tested...
The code from the end-of-line token handler that called a command handler was moved to a new call command handler routine that does as described in the last post except at the beginning, a check was added if the command stack is empty. This condition can occur in an assignment statement before a comma or equal token has been received. A null token status will be returned for the caller to report the appropriate error.
For the comma token handler, a null token status can only occur in expression only test mode because this mode does not put a command on the command stack. For the semicolon token handler, a null token status can occur if the semicolon is at the beginning of the line (where a command is expected) or after a variable (where a comma or equal is expected). The same conditions don't occur at a comma because there are specific checks depending on the current mode (command, assignment list or expression), where as the semicolon doesn't check the mode.
For the PRINT command handler, the code related code was moved moved from these token handlers into the appropriate cases of the switch. Since no other tokens are currently expected, the default case was set to return an unexpected token bug error.
The Assign Command handler only expects an end-of-line token, so for the default case, the assignment list mode returns a equal or comma expected error and expression mode returns an operator or end of statement expected error when expecting a binary operator or a numeric or string expression expected error when expecting an operand. Before returning the error, the command item token needed to be set to the token passed in to point the error to the token causing the error.
A few other issues were discovered after the code was restructured and tested...
Wednesday, February 9, 2011
Translator – Token/Command Handler Restructuring
While looking over the design notes for the INPUT and INPUT PROMPT commands, besides implementing a new INPUT command handler, parts would also be implemented in the Comma and Semicolon token handlers just like the PRINT command. As further commands are implemented, these token handlers will increase in size. The code to handle a command will be located in several different routines. A better design would be if all of the command is processed in a single routine – the command handler.
The EOL token handler already calls the command handler, where each currently assumes that it is called for an EOL token. The code that would be in the comma and semicolon token handlers could be moved to the command handler, which would have a switch on the code of the token passed in to perform the appropriate action. Then the comma and semicolon token handlers would also call a command handler. If the command doesn't support the passed token, then it would return an error.
There is a sequence of code to call a command handler: get the command item from on top of the command stack, get the command token's command handler from the table, return an error if there is no handler in the table, call the command handler, if it returns an error then check if the command handler changed the token (for the error) and if is has delete the original token passed, and set the error token to return. This sequence will be put into a new call command handler routine so the code is not repeated in each token handler.
The EOL token handler already calls the command handler, where each currently assumes that it is called for an EOL token. The code that would be in the comma and semicolon token handlers could be moved to the command handler, which would have a switch on the code of the token passed in to perform the appropriate action. Then the comma and semicolon token handlers would also call a command handler. If the command doesn't support the passed token, then it would return an error.
There is a sequence of code to call a command handler: get the command item from on top of the command stack, get the command token's command handler from the table, return an error if there is no handler in the table, call the command handler, if it returns an error then check if the command handler changed the token (for the error) and if is has delete the original token passed, and set the error token to return. This sequence will be put into a new call command handler routine so the code is not repeated in each token handler.
Tuesday, February 8, 2011
Translator – Expression Type Release
The changes in getting the expression type code working correctly are completed. The plan was that the next official release would include support for INPUT, but that was before the issue of checking expression types came up when considering the reporting of errors with the string operand of the INPUT PROMPT command.
This is also the first release since development was moved from VIDE2 (using Insight, a GUI for gdb, for debugging) to NetBeans (which includes a GUI front end to gdb). A make file was developed so that NetBeans is not required to build the program for the project. The program is build by simply issuing the make command in the main IBCP directory.
All of the test programs were also modified to work using the same make file. The VIDE2 project files were removed. The test programs are built using the make tests command. A specific test can be built, for example the stack test program is built with the make test_stack command. The parser and table test programs were removed as there functionality is now built into the main IBCP program.
The ibcp_0.1.14-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. This release contains the complete current source including the test programs and a single make file for building the main program and optionally the test programs. It is recommended that the source be extracted into a clean directory. Now the INPUT command can be implemented...
This is also the first release since development was moved from VIDE2 (using Insight, a GUI for gdb, for debugging) to NetBeans (which includes a GUI front end to gdb). A make file was developed so that NetBeans is not required to build the program for the project. The program is build by simply issuing the make command in the main IBCP directory.
All of the test programs were also modified to work using the same make file. The VIDE2 project files were removed. The test programs are built using the make tests command. A specific test can be built, for example the stack test program is built with the make test_stack command. The parser and table test programs were removed as there functionality is now built into the main IBCP program.
The ibcp_0.1.14-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. This release contains the complete current source including the test programs and a single make file for building the main program and optionally the test programs. It is recommended that the source be extracted into a clean directory. Now the INPUT command can be implemented...
Monday, February 7, 2011
Translator – Assignments and INSTR
While debugging assignments of temporary strings, specifically the assignment of a string that needs to be attached to the assignment operator, done stack empty errors were occurring. This was caused because it was trying to attach two strings, only one of which was on the done stack. Somehow the number of strings member in the table entry was set to two – it appeared to be counting both operands even though a check was put in not to count the first operand of an assignment operator (and was working).
The problem was finally determined to be caused by the INSTR table entries. The assignment operator table entries and the INSTR table entries were sharing the same expression information structure instance. When the assignment operator entry was initialized, it properly set the number of strings to one. But when it processed the INSTR table entries, it changed the number of strings to two, since it was the same instance. This was corrected by giving the assignment operator entries their own instances of the expression information structure.
This corrected the assignment operator problem, but the INSTR functions were giving an “expected operator or close parentheses” error at the second comma of the three argument form of INSTR. This occurred because the associated INSTR codes (Instr2T1, Instr2T2 and Instr2TT) did not have the multiple flag set. This was corrected by setting the multiple flag in the T1, T2 and TT forms of the Instr2 code entries, and the Instr3 associated code entries were placed after each of their corresponding Instr2 codes.
The problem was finally determined to be caused by the INSTR table entries. The assignment operator table entries and the INSTR table entries were sharing the same expression information structure instance. When the assignment operator entry was initialized, it properly set the number of strings to one. But when it processed the INSTR table entries, it changed the number of strings to two, since it was the same instance. This was corrected by giving the assignment operator entries their own instances of the expression information structure.
This corrected the assignment operator problem, but the INSTR functions were giving an “expected operator or close parentheses” error at the second comma of the three argument form of INSTR. This occurred because the associated INSTR codes (Instr2T1, Instr2T2 and Instr2TT) did not have the multiple flag set. This was corrected by setting the multiple flag in the T1, T2 and TT forms of the Instr2 code entries, and the Instr3 associated code entries were placed after each of their corresponding Instr2 codes.
Sunday, February 6, 2011
Translator – Assignment Table Entries
 The assignment operator table entries previously only had one operand, which was used for both the variable(s) being assigned and for the expression being assigned, since there both the same data type. This did not work for temporary strings because the variable operand needed to a string and operand being assigned needed to be a temporary string. So, the assignment operators were given two operands, the first for the variable(s) being assigned and the second for the operand of the assignment.
This change caused a problem with the main assign and assign list operators, which contained the list of all the related associated assignment operator codes (one for each data type). When find code is processing the assignment operand (the second operand) for double assignment, it does not need search for associated codes. To stop this, the index to the second set of associated codes, needed to be set to the number of associated codes, but this caused the Table initialization check to fail. So this initialization was modified to accept this condition.
The Table initialization was also modified to not count the first string operand for assignment operators (the entry's reference flag is set) because the string variable(s) being assigned are not attached to the assign string operator.
One strange problem remained with assignments, which involved the INSTR table entries...
This change caused a problem with the main assign and assign list operators, which contained the list of all the related associated assignment operator codes (one for each data type). When find code is processing the assignment operand (the second operand) for double assignment, it does not need search for associated codes. To stop this, the index to the second set of associated codes, needed to be set to the number of associated codes, but this caused the Table initialization check to fail. So this initialization was modified to accept this condition.
The Table initialization was also modified to not count the first string operand for assignment operators (the entry's reference flag is set) because the string variable(s) being assigned are not attached to the assign string operator.
One strange problem remained with assignments, which involved the INSTR table entries...
Translator – Assignments of Temporary Strings
A problem with assignments of temporary strings was that the assignment command handler was not performing all the actions it needed to. These additional tasks were found be comparing to the process final operand routine, which it turned out could be used with some special allowances for assignment operators.
The code in the assignment command handler was replaced with a call to the process final operand routine except for the check if the done stack is empty (needed otherwise the find code routine would flag it as a bug with its empty done stack check) and the clearing of the reference flag of the assignment operator token.
In the process final operand routine, a check was added if the token's table entry has the reference flag set (set only for assignment operators), then the assignment operator token will not be put onto the done stack, though it will still get operands attached (strings, but not temporary strings), and any parentheses in the first and last operands of the token being assigned are deleted.
There were additional problems with the Table entries for the assignment operators...
The code in the assignment command handler was replaced with a call to the process final operand routine except for the check if the done stack is empty (needed otherwise the find code routine would flag it as a bug with its empty done stack check) and the clearing of the reference flag of the assignment operator token.
In the process final operand routine, a check was added if the token's table entry has the reference flag set (set only for assignment operators), then the assignment operator token will not be put onto the done stack, though it will still get operands attached (strings, but not temporary strings), and any parentheses in the first and last operands of the token being assigned are deleted.
There were additional problems with the Table entries for the assignment operators...
Saturday, February 5, 2011
Translator – Temporary String Support - Debugging
The Parser tests expected results needed to be updated because the EOL code had changed (due to extra table entries being added). Several of the Translator tests expected results needed to be updated for the new temporary string codes. For example, the convention for the operators are CatStr: +$, CatStrT1: +$1, CatStrT2: +$2, and CatStrTT: +$T. The convention for functions is a T added to the function name, for example Len: LEN( and LenTmp: LENT(. The INSTR functions with two string arguments use the convention T1, T2 and TT.
Some minor table entries problems were fixed, but one significant problem that needed correcting was in the number of strings member in the expression information structure in each table entry. This member is automatically calculated during table initialization, but was counting both string and temporary string operands. This field is used to determine how many strings need to be popped from the done stack and attached. Temporary strings are not left on the done stack, so the number of strings should have only included the number of strings only, not temporary strings.
There was one other issue with constant strings. Technically these are already known to be strings, unlike tokens with and without parentheses that may be variables/arrays or user functions. Therefore, constants don't need to be attached to operands. But there is no allowance in the current implementation for this, so to keep things simple, they will continue to be attached. The proper code is already set (for a string operand), and the Encoder will see that they are constants and leave the code as is. Normally, if the Encoder determines that an operand is user function, meaning it produces a temporary string, it will change the code appropriately to one that has a temporary string operand.
There is still a problem with assignments...
Some minor table entries problems were fixed, but one significant problem that needed correcting was in the number of strings member in the expression information structure in each table entry. This member is automatically calculated during table initialization, but was counting both string and temporary string operands. This field is used to determine how many strings need to be popped from the done stack and attached. Temporary strings are not left on the done stack, so the number of strings should have only included the number of strings only, not temporary strings.
There was one other issue with constant strings. Technically these are already known to be strings, unlike tokens with and without parentheses that may be variables/arrays or user functions. Therefore, constants don't need to be attached to operands. But there is no allowance in the current implementation for this, so to keep things simple, they will continue to be attached. The proper code is already set (for a string operand), and the Encoder will see that they are constants and leave the code as is. Normally, if the Encoder determines that an operand is user function, meaning it produces a temporary string, it will change the code appropriately to one that has a temporary string operand.
There is still a problem with assignments...
Translator – Full Temporary String Support
The main work for adding full support for temporary string includes adding all the new codes for temporary strings and adding all the table entries for these new codes. Part of this was adding new operand arrays that contain the temporary string type(s), adding new associated arrays and increasing the maximum size of the associated array (the assign and assign list associated arrays contain five associated codes).
For example, previously there was the CatStr code (both operands are strings). This was expanded to the CatStrT1 (first operand is a temporary string), CatStrT2 (second operand is a temporary string), and CatStrTT (both operands are strings) codes. The is similarly for functions. For example, previously there was the Len code (argument is a string). This was expanded to the LenTmp (argument is a temporary string). The INSTR function that takes two arguments, the T1/T2/TT format was used.
The sub-string functions don't need to be expanded, because their operands are transferred to the next operator or function. If their operand is a string, then the next operator or function will be one that takes a string operand. If their operand is a temporary string, then the next operator or function will be one that takes a temporary string (that would need to be deleted at run-time when it is no longer needed). Therefore, there does not need to be separate codes (one that takes a string and a temporary string).
For example, previously there was the CatStr code (both operands are strings). This was expanded to the CatStrT1 (first operand is a temporary string), CatStrT2 (second operand is a temporary string), and CatStrTT (both operands are strings) codes. The is similarly for functions. For example, previously there was the Len code (argument is a string). This was expanded to the LenTmp (argument is a temporary string). The INSTR function that takes two arguments, the T1/T2/TT format was used.
The sub-string functions don't need to be expanded, because their operands are transferred to the next operator or function. If their operand is a string, then the next operator or function will be one that takes a string operand. If their operand is a temporary string, then the next operator or function will be one that takes a temporary string (that would need to be deleted at run-time when it is no longer needed). Therefore, there does not need to be separate codes (one that takes a string and a temporary string).
Thursday, February 3, 2011
Translator – No Array Assignments
Tokens with parentheses can be either an array or a user function, but the Translator can't determine which since it doesn't have access to the Dictionary (the repository for all elements of the program including variables, arrays, defined functions, user functions, constants, etc. with information about each). The Dictionary will be developed along side the Encoder. For now, the Translator will simply translate from the input code entered to reverse polish notation.
It was thought that since a token with parentheses on the left side of an assignment can only be an array, that the Translator could assume that it was an array and handle the subscripts accordingly. Currently the Translator just attaches all the operands (subscripts) to these tokens. The Encoder (by looking up the identifier in the Dictionary to determine what it is) will handle arrays and user functions. For arrays, all the subscripts need to be numerical (doubles would get a CvtInt added) and for functions, the arguments must match the function definition.
There is one additional issue with arrays through that the Encoder will handle, which is to check if the number of subscripts is correct for the array. This information will be in the Dictionary, which the Translator does not have access to. If arrays were assumed, the current attachment of subscripts, including the number of subscripts attached, would not include. The Encoder would then not know how many subscripts were found (which needs to be checked) unless this number was somehow passed through. Therefore, the current mechanism will be left in place. The only thing left to the expression type processing is to add full support for temporary strings...
It was thought that since a token with parentheses on the left side of an assignment can only be an array, that the Translator could assume that it was an array and handle the subscripts accordingly. Currently the Translator just attaches all the operands (subscripts) to these tokens. The Encoder (by looking up the identifier in the Dictionary to determine what it is) will handle arrays and user functions. For arrays, all the subscripts need to be numerical (doubles would get a CvtInt added) and for functions, the arguments must match the function definition.
There is one additional issue with arrays through that the Encoder will handle, which is to check if the number of subscripts is correct for the array. This information will be in the Dictionary, which the Translator does not have access to. If arrays were assumed, the current attachment of subscripts, including the number of subscripts attached, would not include. The Encoder would then not know how many subscripts were found (which needs to be checked) unless this number was somehow passed through. Therefore, the current mechanism will be left in place. The only thing left to the expression type processing is to add full support for temporary strings...
Wednesday, February 2, 2011
Translator – Array Assignments
There's been a change of plan for the Translator handling arrays on the left side of assignment statements, which will be explained in the next post. What was needed to handle array subscripts was a stripped down routine similar to the code routine, to handle integers and doubles (for which a hidden CvtInt code would be added). As the customized new function was being implemented, some unnecessary code was found.
In the find code routine, when the token with an error was not a reference, there was a check if the last token added to the output was different from the token on the done stack, and if it was, it was assumed that the last token was a sub-string function (which was not put on the done stack because its operand was left on the done stack). So this sub-string token was returned for the error instead of the token on top of the done stack.
In addition, if the last token was not the expected sub-string function, a bug error was returned since it was assumed to be invalid condition. However, this is a valid condition if a dummy close parentheses token was added to the output, which would then be the last token added to the output.
With the first and last operand implementation, these checks are no longer necessary, since the for sub-string functions, the first operand left on the done stack acquires the sub-string function token as its first operand and this first operand token is returned as the location of the error. Therefore, these extra checks were removed.
In the find code routine, when the token with an error was not a reference, there was a check if the last token added to the output was different from the token on the done stack, and if it was, it was assumed that the last token was a sub-string function (which was not put on the done stack because its operand was left on the done stack). So this sub-string token was returned for the error instead of the token on top of the done stack.
In addition, if the last token was not the expected sub-string function, a bug error was returned since it was assumed to be invalid condition. However, this is a valid condition if a dummy close parentheses token was added to the output, which would then be the last token added to the output.
With the first and last operand implementation, these checks are no longer necessary, since the for sub-string functions, the first operand left on the done stack acquires the sub-string function token as its first operand and this first operand token is returned as the location of the error. Therefore, these extra checks were removed.
Tuesday, February 1, 2011
Translator – Correcting More Token Leaks
There were a few more token leaks that needed to corrected, one with the first and last operand changes and rest with the PRINT command. Details of the rest of the token leaks found and corrected are after the continue. All tokens leaks have now been corrected, time to handle array assignments...
Subscribe to:
Comments (Atom)
