Interactive BASIC Compiler Project

Sunday, April 4, 2010

Translator – Internal Functions (Release)

During testing of the “wrong number of arguments” error, it was noticed that the error was reported on the closing parentheses. While this is acceptable, it would be better if the error pointed to the actual function. However, the caller always pointed to the token that caused the error. Therefore, in order to report an error at any token, the token pointer argument of the add_token() function was changed to a reference, so that it could be changed to any token so that caller will point to this token instead when the error is reported.

The name2 member is being used for the name to output during testing, so for example, for Mid2_Code and Mid3_Code, name2 is set to “MID2$(” and “MID3$(” so that is can easily be seen which code is in the token. This was needed when outputting internal function tokens in addition to operator tokens. So instead of copying these lines of code, a new access function debug_name() was added to get the appropriate name for debug test output.

I also realized there were some memory leaks in the code - the EOL token was not deleted; when an error occurs, the token of the error was not deleted; and when the token is replaced (see above), the original token was not deleted. Therefore, the code was cleaned up to prevent these memory leaks. Nothing was done when the “BUGs” occur since these should not happen once the code is debugged.

To make the code clearer, “Error_” was added to all the error and “BUG_” was added to all the diagnostic error Translator status enumeration names. By the way, each of theses codes is only used in one place so that it is easy to figure where an error occurs (if one “stack empty” was used instead of five unique codes, then if one occurs, there would be confusion which place in the code it came from).

The code now checks the number of arguments for internal functions and ibcp_0.1.7-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Next handling assignment statements and the concept of references...

Translator – Internal Functions (Implementation)

The number of arguments member needs to be added to the TableEntry structure and values to the table entries along with an access function to this member in the Table class. New entries are needed for the MID$, ASC and INSTR functions with the appropriate number of arguments. A new multiple entry flag is needed to the first entry for each of these functions to indicate that more entries exist with different number of arguments.

The check for the number of arguments for internal functions will be added to the closing parentheses token processing in the array or function section. If the token popped off of the hold stack is an internal function and the number of operands doesn't match the number of arguments in the table, then if the multiple entry flag is set it needs to search for another entry of the function that matches the number of operands. If found, the the token's index will be changed to the new entry's index. Otherwise a “wrong number of arguments” error will occur.

A new table search function is needed that will search for a function with a specific number of arguments. The index of the first entry for the function will be specified assuming that it has already been checked for the number of arguments (this is the index the Parser originally returned). The search will proceed with the entry after the index and continue until the end of the internal functions (when an entry with a NULL name if found). The additional entries must be in the same internal functions section (words with parentheses).

Saturday, April 3, 2010

Translator – Internal Functions

The Translator is unable to check the number of operands for arrays or functions without more information – information that will be contained in the Dictionary, so this checking will be left for the Encoder. However, the number of arguments for internal functions are fixed and can be checked within the Translator.

The number of arguments cannot be checked with knowing the number, therefore a new number of arguments member needs to be added to the Table entries. However, some internal functions may have more than one form. Several of these types or functions are currently planned, which include MID$, ASC and INSTR.

This means that there will need to be multiple Table entries, like for the existing Sub_Code and Neq_Code for the minus operator. For example with MID$, there will be Mid2_Code and Mid3_Code. Initially the Parser will set the code to Mid2_Code since that will be the first entry found in the table.

When the MID$ token is about to be added to the output list, the Translator will have the number of operands that needs to be checked. If the number matches what's in the table, nothing further needs to be done. If the number doesn't match, the Translator either needs to return an error or go looking for another Table entry with the correct number of operands found. Therefore, there needs to be a flag in the Table that indicates that there are more entries. If this flag is set then the Translator needs to look for it.

Friday, April 2, 2010

Translator – Arrays and Functions (Release)

The SimpleStack class is in the new source file stack.h. A simple stack test program was written to test the SimpleStack class, especially the automatic expanding feature of SimpleStack. There is no VIDE2 project file for with program, it was compiled with the command line “g++ test/test_stack2.cpp -o teststack2.exe” via MSYS.

Several test expressions were added for testing arrays and functions including multiple levels of identifiers with parentheses, defined functions and internal functions. At this point there is no difference between arrays and user functions – both are identifiers with parentheses. The last several expressions are ones that were used to test error conditions (these actually caused problems during testing).

A change was made to how the test output is generated specifically related to internal operator codes. Right now there is only one of these, the Negate code. Originally the name string was set to “Neg” to distinguish between “-” (binary operator subtraction) and “-” (unary operator negation) in the test output. However, the name will eventually need to be “-” for the Recreator. There will be many more of these operators. Therefore, name2 will now be used for the test output and name will be for the actual output. For test output, if name2 is not set (NULL) like it is for all operators, name will be used for the test output, otherwise name2 will be used.

The code now handles arrays and functions and ibcp_0.1.6-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program (the test stack program is not included). Next the checking for the correct number of arguments for internal functions will be added to the Translator...

Translator – Arrays and Functions (Implementation)

For the new counter stack in the Translator class, the SimpleStack class was implemented. This class is a very simple stripped down version of the List class implementation that uses simple allocated array (expanded as needed) instead of using a double linked list. The template contains arguments for the initial size of the array and the size the array is increased when it gets filled (both arguments default to 10). A key feature of this class is that the top() function can be used to manipulate the item on top of the stack (in this case a comma counter) directly.

When an array or function token is encountered (any token that has a parentheses), it is pushed onto the hold stack. The Translator state is left set to Operand since the next token must be an operand (or optional unary operator). The pending parentheses check needs to be made before the token that has a parentheses is handled.

For comma handling, a 0 is pushed onto the counter stack for an open parentheses token. If a comma token is encountered and the counter stack is either empty or the top of the counter stack is 0, then an “unexpected comma” error occurs. For a token that has a parentheses, a 1 is pushed onto the counter stack. For each comma token encountered, the counter on top of the stack is incremented.

Now when a closing parentheses token is encountered, if the counter stack is empty, then a “missing opening parentheses” occurs. The top counter is popped off the stack. If this counter value is zero, then there was an opening parentheses, so an open parentheses token is expected on top of the hold stack. The pending parentheses token pointer only needs to be set for closing parentheses tokens.

When the value popped from the top of the counter stack is not zero, it contains the number of operands for an array or function, which is expected on top of the hold stack (specifically a token that has a parentheses). For now, the operands are just popped from the done stack. The array or function token is popped from the hold stack, added to the output list and pushed on the done stack.

Other changes needed was the precedence check when emptying the hold stack to work with tokens that don't have table entries (array and function tokens); at the EOL processing, adding a token has parentheses to the missing closing parentheses check; and emptying the counter stack in the clean up routine.

Sunday, March 28, 2010

Translator – Arrays and Functions (Design Issues)

The first issue is that the number of operands counter cannot be a simple variable because operands of arrays and functions can be other arrays and functions, so there needs to be a counter variable for each nested array or function. Therefore a counter stack is needed for multiple nested levels of arrays or functions. Using the existing List class for this simple counter stack would be overkill, so a new simple stack class will be developed with a similar interface (so it would be easy to switch from one to the other for stacks).

The second issue is when an array or function token is pushed on the stack. The Paren and DefFuncP token types do not have a table index and therefore a precedence can't be obtained from a table entry. To resolve this, a new precedence() function is needed in the Token class, which will work the same as the is_operator() and has_paren() functions using a static precedence array. A new precedence() function will be added to the Table class with a Token pointer argument that will use the new Token precedence() function.

The values in the static precedence array within the Token class will be initialized to -1 if the token type has an associated table entry, otherwise the value is set to the desired precedence. For the Command, Operator and IntFuncP token types that have table entries, the token precedence will be set to -1 and for the DefFuncP and Paren token types, the precedence will be set to 2 (the same as the open parentheses precedence). The other token types won't be pushed to the hold stack so no value is needed at the moment.

Saturday, March 27, 2010

Translator – Commas

The commas that separate the subscripts or arguments will be processed as operators. The precedence of the comma operator will be the same as the closing parentheses, which will empty the hold stack up to but not including the array or function token. The comma will be processed as a token requiring special processing, where the comma will simply be counted and the comma token will not be pushed onto the hold stack.

This will require a number of operands counter. This counter is initialized to 0. When a parentheses token is pushed onto the hold stack, this variable is set to 1 indicating there is an array or function token on the stack. Each comma operator processed increments this counter. If no commas tokens are seen, the counter remains set to 1. If the counter is 0 when a comma token is seen, then an “unexpected comma” error will be reported.

So now, upon reaching the closing parentheses, if the counter is 0, then an open parentheses token is expected to be on top of the hold stack as previously implemented. If the counter is 1 or more, then the counter contains the number of operands added to the output list and an array or function argument is expected to be on top of the hold stack. The array or function token is then popped off of the hold stack and added to the output list.

Translator – Arrays and Functions

As mentioned, the Translator will not know the number or subscripts that arrays should have. It will also not know (except for internal functions) the number of arguments that functions should have. Worse, the Translator will not even know if an identifier with a parentheses refers to an array or a function. (The Translator will be able to identify a define function since the Parser has already identified these by the “FN” with their own token type.)

As it turns out, translation wise, arrays and functions can be are handled identically – one has subscripts and the other has arguments. But these can also be thought as operands just like for operators except where operators have one (unary) or two (binary) operands, arrays and functions can have one or more operands. Functions with no arguments (including the IntFuncN and DefFuncN token types) are already being handled as simple operands – the Translator didn't need to distinguish between variables and functions with no arguments.

To translate arrays and functions, a token that has a parentheses will be pushed onto the hold stack. The precedence of these tokens will the same as the open parentheses (between Null and close parentheses for the same reason) to keep them on the hold stack. Upon reaching the matching closing parentheses token, the number of operands will be counted (and validated for an internal function). The array or functions token will then be popped off of the hold stack and added to the output list. But exactly how will the operands be counted...

Translator – Functions

Now that the translation arrays have been defined, it's time to define the translation of functions. Functions (whether Internal, one-line Defined or User) have arguments separated by commas. Take the example MID$(A$+B$,INT(A+0.5),I+5) containing two internal function calls where the tokens will be parsed as:

MID$( A$ + B$ , INT( A + 0.5 ) , I + 5 )

Again there are tokens for commas and closing parentheses. This expression will be translated to RPN as:

A$ B$ + A 0.5 INT( I 5 + MID$(

The functions are found after their associated arguments. Functions will pop their already evaluated arguments off of the run-time stack (which will be in reverse order), perform there operations, and then push their result back onto the run-time for the next part of the expression.

Though this example contains only internal functions, one-line defined functions and user functions work the same way. The big difference is that Translator (using Table) will know the number of arguments that internal functions have and can perform error checking for the number of arguments. However, this can't be done for one-line defined (FN) functions and user functions. This job will be left for the Encoder (which will access the Dictionary for this information).

Translator – Arrays

Arrays have subscripts as in A(I) and B(3,X). For now there will be no limitation on the number of subscripts that an array can have (there will be a limitation once the internal language is defined, but this limitation will be sufficiently large). Take the example A(X+5,Y*2,Z) for a three dimension array element where the tokens will be parsed as:

A( X + 5 , Y * 2, Z )

This includes tokens for the commas separating the subscripts and the closing parentheses. Also note that in this example, the subscripts can be expressions. This array element expression will be translated to RPN as:

X 5 + Y 2 * Z A(

Note that the subscript expressions are first with the array at the end. When this array element is processed at run-time, the expressions are processed as discussed before. The run-time stack will contain the values X+5, Y*2 and Z (Z on top of the stack) when execution reaches the array name.

The instruction code for the array will contain the number of subscripts, so when the array element offset is evaluated, it will know that there are 3 subscripts on the stack (in reverse order) that need to be popped. Once the offset is calculated, the value of the array element will be obtained from memory and pushed back onto the stack. This value will then be the operand for the rest of expression the array element is in.

There is a problem with arrays during translation, and that is that the Translator will not know the actual number of subscripts that a particular array will have. Therefore, the subscript checking will be delayed for the Encode, which will use the Dictionary to obtain this information. Remember that the purpose of the Translator is to rearrange the incoming program code into RPN format, doing as much error checking as possible.

Friday, March 26, 2010

Translator – Parentheses (Release)

The problem with some parentheses expressions was that the last precedence was set at the open parentheses token, but this caused a problem when another open parentheses is processed before the next operand because the last precedence would be reset to the highest precedence before the last last precedence value could be used. The solution was to set the last precedence value at a close parentheses token just before the stack is emptied up to the open parentheses.

Also discovered that the code needed to check if there was pending parentheses token at an open parentheses and a closing parentheses in addition to when added an operand or operator to the output list. These additional locations were discovered when more test expressions were added. Since the code for checking and adding a dummy token was needed in four locations, this code was replaced with a function call to the new function do_pending_paren() where the code was located. This function does last precedence is higher than current operator check and the same precedence only when the state is Operand (for operands and open parentheses). Since this check is also done for the EOL operator, the function call was added there also and because the top of the stack contain a Null operator (lowest precedence), the dummy token is always added.

One other problem wes found. At the EOL, the token popped off the stack (which should be the Null token at the bottom of the stack, but could be something else if there is an error), was not deleted if the token wasn't the Null token.

The code now appears to be handling parentheses correctly and ibcp_0.1.5-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Next array and function handling needs to be added to the Translator...

Thursday, March 25, 2010

Translator – Parentheses (Debugging)

A minor issue was discovered during debugging. During the initial detection for unary operators (in the operator section) when the last precedence variable is set to the highest precedence value, the opening parentheses token needed to be added directly to the hold stack, the state set to Operand and Good status returned. I initially last the code fall through to let the operator code process the opening parentheses unary operator, but this code malfunctioned for this token (the hold stack was emptied because of the low precedence open parentheses).

In the clean up function, which cleans up the Translator after an error occurs (empties the stacks), code needed to be added to also delete the pending parentheses token if it had been set. A bunch of test expressions were added with all various parentheses expression combinations previously discussed plus some more complex ones. Two expressions were also added to test each of the missing parentheses errors.

However, during testing it was found that when the expression (A+B)*(C+D) is translated, it incorrectly contains a dummy token: A B + ) C D + *. The problem is that at the point that it is processing the C operand, it sees that there is a parentheses pending. But, the parentheses before the C set the last precedence to the highest precedence, so it doesn't matter what operator is on the stack (in this case the *), it adds the dummy token.

Since most of the code is working, the ibcp_0.1.5-dev-1.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. I had to remove the windows shortcut from the binary because it only works on my system. The shortcut will need to be created on each target system. (Simple right-click on the ibcp.exe and select Create Shortcut, then Edit Properties of the shortcut and add “ -t i” to the Target field.)

Translator – Parentheses (Implementation)

While the previous several posts seem to suggest that the implementation of parentheses will be rather involved, this is not the case. First the unary code field of the opening parentheses needs to be set (to the opening parentheses code) so that it will be processed correctly.

The precedence of the parentheses operators need to be set in the Table. The precedence of Null is zero and the precedence of EOL was set to 4. The precedence of the closing parentheses needs to the same as EOL – both perform the same hold stack emptying functionality. The precedence of the opening parentheses needs to be between Null and EOL/opening parentheses and therefore will be set to 2 so that neither EOL or an closing parentheses will remove it automatically during the emptying of the hold stack (of higher precedence).

The difference between processing a closing parentheses and an EOL is that a closing parentheses expects to find an opening parentheses on top of the hold stack after it is emptied and an EOL expects the null token at the bottom of the hold stack. If these conditions are not the case, the appropriate missing parentheses error occurs.

Continued... »

Wednesday, March 24, 2010

Translator – Parentheses (End of Dummy Tokens)

There's a few cases of adding dummy parentheses tokens that need to be considered to make sure that the checks for before adding an operand and before adding an operator are handled properly. Consider the expression:

(A Op1 B Op2 C)

Again the decision point occurs when processing the EOL (as an operator). If this case however, there is no operator on top of the hold stack, just the Null token at the bottom of the stack. Fortunately the Null token has a precedence (lowest), so in this case, the precedence of the last operator will always be higher than the Null token, so the dummy token will always be added.

Similarly if the expression only contains (A). The Null token is lowest precedence but there is no last operator. Again the check covers this case with the if there is no last operator, the dummy token is always added. Finally, expressions with double (or more) parentheses need to be handled, take the expressions:

((A Op1 B)) Op2 C
A Op1 ((B)) Op2 C

When the second closing parentheses is being processed (an operator), if there is already a closing parentheses token pending and another closing parentheses is being processed (and no operator was on the hold stack that would have triggered the add dummy token check), then the pending closing parentheses token (the first one) needs to be added directly to the output list. The pending closing parentheses token is then set to the second closing parentheses.

The next token, Op2, will then be pushed onto the hold stack. At the C operand token, since there is still a closing parentheses pending (the second one), this will trigger the add dummy token check, where the last operator added (Op1) will be checked against the top of the hold stack (Op2) and the dummy token will only be added if the precedence of Op1 is higher than or same as Op2. (There will be one or two dummy tokens in the translation depending on the precedence of the operators.)

For the second double parentheses expression, the first dummy token will automatically be added to the output list. The second dummy token will also be added to the output list because there is no last operator, therefore the dummy token is always added.

This should handle all cases for adding dummy tokens for unnecessary parentheses. Now how all this is going to be implemented...

Tuesday, March 23, 2010

Translator – Parentheses (More on Dummy Tokens)

Not all cases of adding dummy parentheses tokens occur when an operand is about to be added to the output list. Next consider the expression:

A Op1 (B Op2 C)

This expression will be translated to A B C Op2 Op1. When this is recreated, the parentheses will only be added if the precedence of Op2 is lower than or the same as Op1 (in other words, when the parentheses are required). The dummy token is only required after Op2 when the precedence of Op2 is higher than Op1.

This decision, whether to add the dummy token, needs to be made before the Op1 operator is added to the list. At this point, Op1 is on the top of the hold stack and the EOL is being processed (the stack is about to be emptied). Therefore, in the code at the point that an operator is about to be added to the output list from the top of the hold stack, there needs to be the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) during the processing of closing parentheses (handling the case that the parentheses is just around an operand) is higher than the operator on top of the hold stack (Op1) then add the closing parentheses token to the list. Now consider the extension on the above expression:

A Op1 (B Op2 C) Op3 D

At the point of the decision, Op3 is being processed and Op1 is on top of the hold stack. Either Op1 will be popped off the stack and added to the list (Op1 is higher precedence than Op3), or Op3 will be pushed on the stack (Op3 is higher precedence than Op1). For the former case, the precedence check for the dummy token is made between Op2 (the last operator added during the parentheses processing) and Op1. For the later case, the precedence check is made between Op2 and Op3, which is on top of the stack at the EOL processing (Op1 would be popped and added last).

Remember that EOL is processed as a low precedence operator. Ditto for a closing parentheses, which is also processed as a low precedence operator. Therefore the dummy token check is made whenever an operator is going to be added to the output list. Almost done, but a few more expressions need to be considered...

Translator – Parentheses (Dummy Tokens)

To determine when a dummy parentheses token should be added to the code so that the Recreator knows to include the unnecessary (but entered by the programmer) parentheses, consider the following generic expression:

(A Op1 B) Op2 C

This expression will be translated to A B Op1 C Op2. When this is recreated, the parentheses will only be added if the precedence of Op1 is lower than Op2 (in other words, when the parentheses are required). The dummy token is only required after Op1 when the precedence of Op1 is higher than or the same as Op2.

This decision, whether to add the dummy token, needs to be made before the C operand is added to the list. At this point, Op2 is on the top of the hold stack. Therefore, in the code at the point that an operand is about to be added to the output list, there needs to the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) is higher than or the same as the operator on top of the hold stack (Op1) then add the closing parentheses token to the list. Now consider a variation of the above expression:

A Op1 (B) Op2 C

This is similar to the above except except at the decision, the last operator added is outside the parentheses (Op1). The test condition needs to be modified to say the last operator added during the processing of the closing parentheses. When there is no last operator, then the parentheses always needs to be added.

This takes care of the processing necessary when adding an operator to the output list and a closing parentheses was just processed. However, there are other expressions with unnecessary parentheses to consider...