Interactive BASIC Compiler Project

Thursday, March 25, 2010

Translator – Parentheses (Debugging)

A minor issue was discovered during debugging. During the initial detection for unary operators (in the operator section) when the last precedence variable is set to the highest precedence value, the opening parentheses token needed to be added directly to the hold stack, the state set to Operand and Good status returned. I initially last the code fall through to let the operator code process the opening parentheses unary operator, but this code malfunctioned for this token (the hold stack was emptied because of the low precedence open parentheses).

In the clean up function, which cleans up the Translator after an error occurs (empties the stacks), code needed to be added to also delete the pending parentheses token if it had been set. A bunch of test expressions were added with all various parentheses expression combinations previously discussed plus some more complex ones. Two expressions were also added to test each of the missing parentheses errors.

However, during testing it was found that when the expression (A+B)*(C+D) is translated, it incorrectly contains a dummy token: A B + ) C D + *. The problem is that at the point that it is processing the C operand, it sees that there is a parentheses pending. But, the parentheses before the C set the last precedence to the highest precedence, so it doesn't matter what operator is on the stack (in this case the *), it adds the dummy token.

Since most of the code is working, the ibcp_0.1.5-dev-1.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. I had to remove the windows shortcut from the binary because it only works on my system. The shortcut will need to be created on each target system. (Simple right-click on the ibcp.exe and select Create Shortcut, then Edit Properties of the shortcut and add “ -t i” to the Target field.)

Translator – Parentheses (Implementation)

While the previous several posts seem to suggest that the implementation of parentheses will be rather involved, this is not the case. First the unary code field of the opening parentheses needs to be set (to the opening parentheses code) so that it will be processed correctly.

The precedence of the parentheses operators need to be set in the Table. The precedence of Null is zero and the precedence of EOL was set to 4. The precedence of the closing parentheses needs to the same as EOL – both perform the same hold stack emptying functionality. The precedence of the opening parentheses needs to be between Null and EOL/opening parentheses and therefore will be set to 2 so that neither EOL or an closing parentheses will remove it automatically during the emptying of the hold stack (of higher precedence).

The difference between processing a closing parentheses and an EOL is that a closing parentheses expects to find an opening parentheses on top of the hold stack after it is emptied and an EOL expects the null token at the bottom of the hold stack. If these conditions are not the case, the appropriate missing parentheses error occurs.

Continued... »

Wednesday, March 24, 2010

Translator – Parentheses (End of Dummy Tokens)

There's a few cases of adding dummy parentheses tokens that need to be considered to make sure that the checks for before adding an operand and before adding an operator are handled properly. Consider the expression:

(A Op1 B Op2 C)

Again the decision point occurs when processing the EOL (as an operator). If this case however, there is no operator on top of the hold stack, just the Null token at the bottom of the stack. Fortunately the Null token has a precedence (lowest), so in this case, the precedence of the last operator will always be higher than the Null token, so the dummy token will always be added.

Similarly if the expression only contains (A). The Null token is lowest precedence but there is no last operator. Again the check covers this case with the if there is no last operator, the dummy token is always added. Finally, expressions with double (or more) parentheses need to be handled, take the expressions:

((A Op1 B)) Op2 C
A Op1 ((B)) Op2 C

When the second closing parentheses is being processed (an operator), if there is already a closing parentheses token pending and another closing parentheses is being processed (and no operator was on the hold stack that would have triggered the add dummy token check), then the pending closing parentheses token (the first one) needs to be added directly to the output list. The pending closing parentheses token is then set to the second closing parentheses.

The next token, Op2, will then be pushed onto the hold stack. At the C operand token, since there is still a closing parentheses pending (the second one), this will trigger the add dummy token check, where the last operator added (Op1) will be checked against the top of the hold stack (Op2) and the dummy token will only be added if the precedence of Op1 is higher than or same as Op2. (There will be one or two dummy tokens in the translation depending on the precedence of the operators.)

For the second double parentheses expression, the first dummy token will automatically be added to the output list. The second dummy token will also be added to the output list because there is no last operator, therefore the dummy token is always added.

This should handle all cases for adding dummy tokens for unnecessary parentheses. Now how all this is going to be implemented...

Tuesday, March 23, 2010

Translator – Parentheses (More on Dummy Tokens)

Not all cases of adding dummy parentheses tokens occur when an operand is about to be added to the output list. Next consider the expression:

A Op1 (B Op2 C)

This expression will be translated to A B C Op2 Op1. When this is recreated, the parentheses will only be added if the precedence of Op2 is lower than or the same as Op1 (in other words, when the parentheses are required). The dummy token is only required after Op2 when the precedence of Op2 is higher than Op1.

This decision, whether to add the dummy token, needs to be made before the Op1 operator is added to the list. At this point, Op1 is on the top of the hold stack and the EOL is being processed (the stack is about to be emptied). Therefore, in the code at the point that an operator is about to be added to the output list from the top of the hold stack, there needs to be the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) during the processing of closing parentheses (handling the case that the parentheses is just around an operand) is higher than the operator on top of the hold stack (Op1) then add the closing parentheses token to the list. Now consider the extension on the above expression:

A Op1 (B Op2 C) Op3 D

At the point of the decision, Op3 is being processed and Op1 is on top of the hold stack. Either Op1 will be popped off the stack and added to the list (Op1 is higher precedence than Op3), or Op3 will be pushed on the stack (Op3 is higher precedence than Op1). For the former case, the precedence check for the dummy token is made between Op2 (the last operator added during the parentheses processing) and Op1. For the later case, the precedence check is made between Op2 and Op3, which is on top of the stack at the EOL processing (Op1 would be popped and added last).

Remember that EOL is processed as a low precedence operator. Ditto for a closing parentheses, which is also processed as a low precedence operator. Therefore the dummy token check is made whenever an operator is going to be added to the output list. Almost done, but a few more expressions need to be considered...

Translator – Parentheses (Dummy Tokens)

To determine when a dummy parentheses token should be added to the code so that the Recreator knows to include the unnecessary (but entered by the programmer) parentheses, consider the following generic expression:

(A Op1 B) Op2 C

This expression will be translated to A B Op1 C Op2. When this is recreated, the parentheses will only be added if the precedence of Op1 is lower than Op2 (in other words, when the parentheses are required). The dummy token is only required after Op1 when the precedence of Op1 is higher than or the same as Op2.

This decision, whether to add the dummy token, needs to be made before the C operand is added to the list. At this point, Op2 is on the top of the hold stack. Therefore, in the code at the point that an operand is about to be added to the output list, there needs to the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) is higher than or the same as the operator on top of the hold stack (Op1) then add the closing parentheses token to the list. Now consider a variation of the above expression:

A Op1 (B) Op2 C

This is similar to the above except except at the decision, the last operator added is outside the parentheses (Op1). The test condition needs to be modified to say the last operator added during the processing of the closing parentheses. When there is no last operator, then the parentheses always needs to be added.

This takes care of the processing necessary when adding an operator to the output list and a closing parentheses was just processed. However, there are other expressions with unnecessary parentheses to consider...

Monday, March 22, 2010

Translator – Parentheses (Recreation Issue)

When expressions with parentheses are translated to RPN, parentheses get removed as they are not necessary. Instead, RPN is dependent on the order of the expression. This is why RPN calculators have no parentheses keys like regular calculators. Take the expressions (A*B)+C and A*B+C – both would be translated to A B * C + (in other words, the parentheses are unnecessary in the original expression).

One of my pet peeves with programmers are the addition of unnecessary parentheses into expressions when they are not needed (my opinion: learn and use the language properly). It would be easy to just let the parentheses disappear as occurs automatically when an expression is translated to RPN, as would happen when the expression above is recreated since the expression is equivalent without parentheses. Unlike, for the expression (A+B)*C, that would be translated to A B + C *; the parentheses would be added back when the expression is recreated, because the parentheses are needed or the meaning of the expression would be changed.

However, it would unexpected to a programmer for the parentheses to just disappear. This is the reason that the original strings of numeric constants are being kept in tokens. It would not be nice for a number to be entered as 0.000001, but show up as 1e-6 in the program. To keep these unnecessary parentheses in the code so that they will be in the recreated expression, something needs to be added to the internal code. However, it would just waste memory and run-time to just throw in parentheses everywhere they are entered. So for these unnecessary cases, a dummy token will be added to the RPN list and since the closing parentheses token is present, this token will be the token added. During run-time, the closing parentheses code will be a no-operation code.

Note that no-operation does not mean no execution time, some processing will be required to skip over the dummy code; which is why dummy parentheses tokens won't be put into the code everywhere there are parentheses in the original expressions. Later an option will be added to allow the user the option of having parentheses removed automatically. This will give the user the choice. Now the trick is to decide when to add one of these dummy tokens and when not to...

Sunday, March 21, 2010

Translator – Parentheses

Parentheses are used to override the precedence of operators in expressions. For the Translator, parentheses will be handled as operators themselves.

The opening parentheses will be considered a unary operator as it's placement in expressions is the same as for other unary operators – expected when an operand or optional unary operator is expected. The precedence of the opening parentheses will very low, above the Null operator, but below the EOL operator. Above the Null operator because the Null operator should never be removed from the hold stack until the very end. However, below the EOL operator because the opening parentheses token should not be removed from the hold stack at the end of the line – this would be a “missing closing parentheses” error.

The closing parentheses will be considered an ordinary binary operator as it's placement in expressions is the same as for other binary operators, in other words, after operands where a binary operator is expected. The precedence of the closing parentheses will be the same as the EOL operator. This will cause the hold stack to be emptied down to either the Null token at the bottom of the stack or an opening parentheses token.

A closing parentheses operator will require a little extra processing than other operators, similar to the EOL operator. Also, the closing parentheses token will not actually be pushed onto the hold stack. After the hold stack is emptied, if the Null token is on top of hold stack, then a “missing opening parentheses” error would occur. Otherwise, there must be an opening parentheses token on the top of the hold stack which would be popped off of the stack. Nothing needs to be put into the output RPN list...

Translator – Simple Expressions (Release)

The three possible errors were also validated (and the diagnostic errors did not occur as they were no suppose to). Finally, one of the test expressions is a string expression A$ = "this" + "test", which is correctly translated as A$ "this" "test" + =. However, technically this is not being translated as an assignment statement. Unlike C, in BASIC the “=” operator could be either assignment or equality depending on where it is found. Assignment statements will be handled later.

I discovered that in input mode, the up and down arrows can be used to recall previously entered lines in a session. This must also be built into the gets() standard library function. I also discovered that the ibcp.exe program does not work from a DOS window (Windows XP), but does run from Windows Explorer. And of course runs fine from an MSYS command window or with the Run in Console option from VIDE – the two ways I have been running the programs. Therefore, with the binary, there is now an included shortcut that starts the program in Translator input test mode.

The ibcp_0.1.4-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. I realized that the codes.txt files has not been updated for the recent code additions. Instead of creating that file again (I did it manually and it was a pain), the codes.awk awk script was created to generate this file automatically from the ibcp.h include file. Assuming that MSYS is installed, this script can be run using the command “awk −f codes.awk <ibcp.h >codes.txt”.

Now that simple expressions are being translated into RPN format, it's time to add the handling of parentheses...

Translator – Simple Expressions (Problem Solved)

I suspected the problem with the expression “A-B” had something to do with the fact the “-” could be a unary operator (the expression “A+B” worked fine). The first problem discovered was that the Parser was seeing this as the Auto command; “-” was valid for the start of a line number range, but the “B” was not valid, so an “unexpected error” was returned. However, the Translator test code did not check for an Error token and passed it to the Translator, which did not know what to do with it. Therefore, the test code was modified to output a Parser error and terminate the processing of the input line.

To correct the problem with this expression being treated as a invalid immediate command, this error was temporarily removed. Temporary because no valid BASIC line would have just “A-B” on it, but for now only expressions are being tested. Most of the Translator was handling the “-” as a binary operator correctly until it got to the add operator routine. There is a check that only pops one operand from the done stack for a unary operator and it was this check that was wrong.

To simplify the check for a unary operator (which should check if the Table entry code is the same as the unary code, not if the unary code is not set to Null), an is_unary_operator() function was added to the Table class that does this check. This should eliminate possible future dumb coding errors. Now it's time to package this up and release it...

Translator – Simple Expressions (Debugging)

After correcting some minor problem, I finally discovered the problem with the improper termination. I don't fully understand the mechanism, but I'm guessing the crash/lock-up occurred in the automatic destructor code of the Translator and/or List classes. In any case, the problem was due to the done stack not being empty.

When the last operator in the expression was processed (when the EOL token emptied the hold stack), it pushed it's output list element pointer on the done stack as it was suppose to. But nothing popped this off of the stack. Eventually, when the command of the line is processed, the done stack will be examined to checked for the arguments of the command. For now however, this last item needs to popped from the done stack. As a result of this correction, two more diagnostic errors needed to be added:

Stack Empty 3 – Occurs if the done stack is empty when popping the final result of the expression.

Stack Not Empty 2 – Occurs at end of line processing if the done stack is not empty after popping the final result.

With these corrections in place, additional test inputs were added expressions and discovered that the Translator could not handle a blank line. Therefore a check was added to the Initial state code to return Done if the first token contains an EOL operator (the special Null token is not added to the hold stack since processing is done). One more problem was found, while more complex expressions worked correctly, the simple expression “A-B” did not. More debugging required...

Saturday, March 20, 2010

Translator – Adding Token Errors

Several errors could occur when calling the add_token() function. Each of these errors need to be tested. These errors are:

Expected Operand – Occurs if the token contains a binary operator when the state of the Translator is Operand indicating an operand or a unary operator was expected.

Expected Operator – Occurs if the token contains an operand when the state of the Translator is Operator indicating an operator was expected.

Expected Binary Operator – Occurs if the token contains a unary operator that cannot be a binary operator when the state of the Translator is Operator indicating a binary operator was expected.

During development of the Translator, several other diagnostic errors can occur. These errors are the result of checks put into the code to verify that the Translator is working correctly, in other words, to report on situations that should not occur. These checks will be removed once the code is debugged and tested. These errors are:

Continued... »

Translator – Simple Expressions (Implemented)

Handling of simple expressions has been implemented and appears to be working. Not all the functionality or error handling has be tested yet. There is a bug in the code where the program is not terminating correctly (must hit Ctrl+C to abort the program; or crashes when using the debugger). I suspect there is a memory allocation error causing some sort of memory corruption. I wanted to release what what there is since these kind of problems take a while to figure out.

The Parser routines have been updated to set the length for non-string tokens – these are tokens for operators, commands and internal functions. I thought this would be helpful in highlighting the entire token when outputting errors.

Other changes include the Table class where the unary_code and precedence fields have been added with the associated initializer data. More access functions have been added to the Table class for the two new members and also for the name members (so that these can be output).

The translator test code has also been updated. Instead of printing all the tokens like was done when testing the parser, I decided to just output the RPN list as a list of tokens separated by spaces. The outputting of the full tokens will be added as a debug option.

The ibcp_0.1.4-dev-2.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program (test stack files have been removed). The ibcp.exe program doesn't terminate properly when done testing the Translator, but appears to work. Not everything has been tested and it only includes three simple expressions.

Translator – Adding Operators

Adding operator tokens will be handled by the separate function add_operator(). Some extra processing must be performed in addition to adding the token to the output list and pushing the output list's element pointer onto the done stack like is performed for simple operands.

Eventually this function will verify that the data types of operand(s) of the operator are the correct data type. Also for the case of operators taking two operands of the double data type, if either of the operands is an integer, then the special hidden CvtDbl operator must be inserted into the output list after the operand. This is the purpose of having the done stack and why it contains pointers to the output list elements. The pointers will be used to insert tokens after an element when necessary.

For now, the operands will simply be popped off of the done stack. For unary operators, only one operand is popped off of the done stack. This function will return the status of adding the operator, which will be an error or Good upon success.

Translator – Adding Tokens

Adding tokens to the output RPN list for simple expressions consists of four major parts. But upon the first call, the master element for the output list needs to be allocated and the special Null token pushed onto the hold stack (to prevent popping tokens past the bottom of the stack). The state of the Translator is then changed from Initial to Operand.

When the state is Operand, for simple operand tokens (tokens that not operators and do not have a parenthesis), the token is appended to the end of the output list and the pointer to the output list's element is pushed onto the done stack. The state is changed to BinOp since a binary operator is expected next. A Good status is returned.

Also when the state is Operand, the token may contain a unary operator. The token is determined to contain a unary operator if the operator's table entry's unary_code is not the Null code. The token's index is replaced with the table index for the unary operator code.

When the state is Operator, the token must be an operator and must not be a unary operator. An operator could be only a unary operator, for example the NOT operator. The operator is determined to be only a unary operator if the operator's table entry has the same code and unary_code.

For all operators, all higher or the same precedence tokens on the hold stack are popped and added to the output list via another function. Once the precedence of the token on the top of the hold stack is less than the current operator, if the token is not the EOL (end of line) token, the operator token is pushed onto the hold stack. The state is set to Operand since an operand is expected next (for a unary operator, the state stays at Operand). A Good status is returned.

If the token is the EOL token, then end of line processing is performed. The hold stack has already been emptied because the EOL token is lower precedence than all the other operators except of course for the special Null token that was put on the bottom of the stack. The Null token is popped off the stack. For simple expressions, nothing else needs to be done so a Done status is returned.

Friday, March 19, 2010

Translator Class – Data Member Changes

I decided to rename the data member in the Translator to make more sense. The rpn_list will simply be named output. The op_stack will actually be holding more than just operators (as will be seen, functions and array will be placed on this stack also). Calling it an “operator” stack is not accurate. The operand_stack also will be holding more than strictly operands, though technically “operand” is correct (operands of an operator could be the result of another operator). The tokens placed in this stack will be “operands” to operators, but could be operands of operators.

I wanted to come up with better names (“op” is ambiguous and “operator” and “operand” are too similar). It would be a bonus if they had the same number of characters. Therefore, I came up this the name hold_stack for operators and other tokens on “hold” or pending to be placed in the output list. The other stack will named done_stack for tokens that are “done” being processed and have been placed into the output list. By the way, this parallels the result stack at run-time, which will hold values of operands and results of operators (to be used as operands for the next operator or command).

Translator – Operator Precedence

The precedence member needs to be added to the Table, which the Translator will use to rearrange the input to the output RPN list. After considering the operator precedence in several BASICs and in C, the operator precedence that will be used for this project is (from highest to lowest):

^ (exponentiation)
- (negation)
*, /    (multiplication, division)
\ (integer division)
MOD (modulus)
+, -    (addition, subtraction)
<, >, <=, >= (relational operators)
=, <> (equivalence operators)
NOT (logical not/1's compliment)
AND (logical/bit wise and)
XOR (logical/bit wise exclusive or)
OR    (logical/bit wise or)
IMP   (implication)
EQV (equivalence)

Continued... »

Interactive BASIC Compiler Project

Thursday, March 25, 2010

Translator – Parentheses (Debugging)

Translator – Parentheses (Implementation)

Wednesday, March 24, 2010

Translator – Parentheses (End of Dummy Tokens)

Tuesday, March 23, 2010

Translator – Parentheses (More on Dummy Tokens)

Translator – Parentheses (Dummy Tokens)

Monday, March 22, 2010

Translator – Parentheses (Recreation Issue)

Sunday, March 21, 2010

Translator – Parentheses

Translator – Simple Expressions (Release)

Translator – Simple Expressions (Problem Solved)

Translator – Simple Expressions (Debugging)

Saturday, March 20, 2010

Translator – Adding Token Errors

Translator – Simple Expressions (Implemented)

Translator – Adding Operators

Translator – Adding Tokens

Friday, March 19, 2010

Translator Class – Data Member Changes

Translator – Operator Precedence

Email

Source and Downloads

Labels

Blog Archive