Tuesday, March 23, 2010

Translator – Parentheses (More on Dummy Tokens)

Not all cases of adding dummy parentheses tokens occur when an operand is about to be added to the output list.  Next consider the expression:
A Op1 (B Op2 C)
This expression will be translated to A B C Op2 Op1. When this is recreated, the parentheses will only be added if the precedence of Op2 is lower than or the same as Op1 (in other words, when the parentheses are required). The dummy token is only required after Op2 when the precedence of Op2 is higher than Op1.

This decision, whether to add the dummy token, needs to be made before the Op1 operator is added to the list. At this point, Op1 is on the top of the hold stack and the EOL is being processed (the stack is about to be emptied). Therefore, in the code at the point that an operator is about to be added to the output list from the top of the hold stack, there needs to be the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) during the processing of closing parentheses (handling the case that the parentheses is just around an operand) is higher than the operator on top of the hold stack (Op1) then add the closing parentheses token to the list.  Now consider the extension on the above expression:
A Op1 (B Op2 C) Op3 D
At the point of the decision, Op3 is being processed and Op1 is on top of the hold stack. Either Op1 will be popped off the stack and added to the list (Op1 is higher precedence than Op3), or Op3 will be pushed on the stack (Op3 is higher precedence than Op1). For the former case, the precedence check for the dummy token is made between Op2 (the last operator added during the parentheses processing) and Op1. For the later case, the precedence check is made between Op2 and Op3, which is on top of the stack at the EOL processing (Op1 would be popped and added last).

Remember that EOL is processed as a low precedence operator. Ditto for a closing parentheses, which is also processed as a low precedence operator. Therefore the dummy token check is made whenever an operator is going to be added to the output list.  Almost done, but a few more expressions need to be considered...

Translator – Parentheses (Dummy Tokens)

To determine when a dummy parentheses token should be added to the code so that the Recreator knows to include the unnecessary (but entered by the programmer) parentheses, consider the following generic expression:
(A Op1 B) Op2 C
This expression will be translated to A B Op1 C Op2. When this is recreated, the parentheses will only be added if the precedence of Op1 is lower than Op2 (in other words, when the parentheses are required). The dummy token is only required after Op1 when the precedence of Op1 is higher than or the same as Op2.

This decision, whether to add the dummy token, needs to be made before the C operand is added to the list. At this point, Op2 is on the top of the hold stack. Therefore, in the code at the point that an operand is about to be added to the output list, there needs to the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) is higher than or the same as the operator on top of the hold stack (Op1) then add the closing parentheses token to the list.  Now consider a variation of the above expression:
A Op1 (B) Op2 C
This is similar to the above except except at the decision, the last operator added is outside the parentheses (Op1). The test condition needs to be modified to say the last operator added during the processing of the closing parentheses. When there is no last operator, then the parentheses always needs to be added.

This takes care of the processing necessary when adding an operator to the output list and a closing parentheses was just processed. However, there are other expressions with unnecessary parentheses to consider...

Monday, March 22, 2010

Translator – Parentheses (Recreation Issue)

When expressions with parentheses are translated to RPN, parentheses get removed as they are not necessary. Instead, RPN is dependent on the order of the expression. This is why RPN calculators have no parentheses keys like regular calculators. Take the expressions (A*B)+C and A*B+C – both would be translated to A B * C + (in other words, the parentheses are unnecessary in the original expression).

One of my pet peeves with programmers are the addition of unnecessary parentheses into expressions when they are not needed (my opinion: learn and use the language properly). It would be easy to just let the parentheses disappear as occurs automatically when an expression is translated to RPN, as would happen when the expression above is recreated since the expression is equivalent without parentheses. Unlike, for the expression (A+B)*C, that would be translated to A B + C *; the parentheses would be added back when the expression is recreated, because the parentheses are needed or the meaning of the expression would be changed.

However, it would unexpected to a programmer for the parentheses to just disappear. This is the reason that the original strings of numeric constants are being kept in tokens. It would not be nice for a number to be entered as 0.000001, but show up as 1e-6 in the program. To keep these unnecessary parentheses in the code so that they will be in the recreated expression, something needs to be added to the internal code. However, it would just waste memory and run-time to just throw in parentheses everywhere they are entered. So for these unnecessary cases, a dummy token will be added to the RPN list and since the closing parentheses token is present, this token will be the token added. During run-time, the closing parentheses code will be a no-operation code.

Note that no-operation does not mean no execution time, some processing will be required to skip over the dummy code; which is why dummy parentheses tokens won't be put into the code everywhere there are parentheses in the original expressions.  Later an option will be added to allow the user the option of having parentheses removed automatically. This will give the user the choice. Now the trick is to decide when to add one of these dummy tokens and when not to...

Sunday, March 21, 2010

Translator – Parentheses

Parentheses are used to override the precedence of operators in expressions. For the Translator, parentheses will be handled as operators themselves.

The opening parentheses will be considered a unary operator as it's placement in expressions is the same as for other unary operators – expected when an operand or optional unary operator is expected. The precedence of the opening parentheses will very low, above the Null operator, but below the EOL operator. Above the Null operator because the Null operator should never be removed from the hold stack until the very end. However, below the EOL operator because the opening parentheses token should not be removed from the hold stack at the end of the line – this would be a “missing closing parentheses” error.

The closing parentheses will be considered an ordinary binary operator as it's placement in expressions is the same as for other binary operators, in other words, after operands where a binary operator is expected. The precedence of the closing parentheses will be the same as the EOL operator. This will cause the hold stack to be emptied down to either the Null token at the bottom of the stack or an opening parentheses token.

A closing parentheses operator will require a little extra processing than other operators, similar to the EOL operator. Also, the closing parentheses token will not actually be pushed onto the hold stack. After the hold stack is emptied, if the Null token is on top of hold stack, then a “missing opening parentheses” error would occur. Otherwise, there must be an opening parentheses token on the top of the hold stack which would be popped off of the stack. Nothing needs to be put into the output RPN list...

Translator – Simple Expressions (Release)

The three possible errors were also validated (and the diagnostic errors did not occur as they were no suppose to). Finally, one of the test expressions is a string expression A$ = "this" + "test", which is correctly translated as A$ "this" "test" + =. However, technically this is not being translated as an assignment statement. Unlike C, in BASIC the “=” operator could be either assignment or equality depending on where it is found. Assignment statements will be handled later.

I discovered that in input mode, the up and down arrows can be used to recall previously entered lines in a session. This must also be built into the gets() standard library function. I also discovered that the ibcp.exe program does not work from a DOS window (Windows XP), but does run from Windows Explorer. And of course runs fine from an MSYS command window or with the Run in Console option from VIDE – the two ways I have been running the programs. Therefore, with the binary, there is now an included shortcut that starts the program in Translator input test mode.

The ibcp_0.1.4-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. I realized that the codes.txt files has not been updated for the recent code additions. Instead of creating that file again (I did it manually and it was a pain), the codes.awk awk script was created to generate this file automatically from the ibcp.h include file. Assuming that MSYS is installed, this script can be run using the command “awk −f codes.awk <ibcp.h >codes.txt”.

Now that simple expressions are being translated into RPN format, it's time to add the handling of parentheses...

Translator – Simple Expressions (Problem Solved)

I suspected the problem with the expression “A-B” had something to do with the fact the “-” could be a unary operator (the expression “A+B” worked fine). The first problem discovered was that the Parser was seeing this as the Auto command; “-” was valid for the start of a line number range, but the “B” was not valid, so an “unexpected error” was returned. However, the Translator test code did not check for an Error token and passed it to the Translator, which did not know what to do with it. Therefore, the test code was modified to output a Parser error and terminate the processing of the input line.

To correct the problem with this expression being treated as a invalid immediate command, this error was temporarily removed. Temporary because no valid BASIC line would have just “A-B” on it, but for now only expressions are being tested. Most of the Translator was handling the “-” as a binary operator correctly until it got to the add operator routine. There is a check that only pops one operand from the done stack for a unary operator and it was this check that was wrong.

To simplify the check for a unary operator (which should check if the Table entry code is the same as the unary code, not if the unary code is not set to Null), an is_unary_operator() function was added to the Table class that does this check. This should eliminate possible future dumb coding errors. Now it's time to package this up and release it...

Translator – Simple Expressions (Debugging)

After correcting some minor problem, I finally discovered the problem with the improper termination. I don't fully understand the mechanism, but I'm guessing the crash/lock-up occurred in the automatic destructor code of the Translator and/or List classes. In any case, the problem was due to the done stack not being empty.

When the last operator in the expression was processed (when the EOL token emptied the hold stack), it pushed it's output list element pointer on the done stack as it was suppose to. But nothing popped this off of the stack. Eventually, when the command of the line is processed, the done stack will be examined to checked for the arguments of the command. For now however, this last item needs to popped from the done stack. As a result of this correction, two more diagnostic errors needed to be added:
Stack Empty 3 – Occurs if the done stack is empty when popping the final result of the expression.
Stack Not Empty 2 – Occurs at end of line processing if the done stack is not empty after popping the final result.
With these corrections in place, additional test inputs were added expressions and discovered that the Translator could not handle a blank line. Therefore a check was added to the Initial state code to return Done if the first token contains an EOL operator (the special Null token is not added to the hold stack since processing is done).  One more problem was found, while more complex expressions worked correctly, the simple expression “A-B” did not.  More debugging required...

Saturday, March 20, 2010

Translator – Adding Token Errors

Several errors could occur when calling the add_token() function. Each of these errors need to be tested. These errors are:
Expected Operand – Occurs if the token contains a binary operator when the state of the Translator is Operand indicating an operand or a unary operator was expected.
Expected Operator – Occurs if the token contains an operand when the state of the Translator is Operator indicating an operator was expected.
Expected Binary Operator – Occurs if the token contains a unary operator that cannot be a binary operator when the state of the Translator is Operator indicating a binary operator was expected.
During development of the Translator, several other diagnostic errors can occur. These errors are the result of checks put into the code to verify that the Translator is working correctly, in other words, to report on situations that should not occur. These checks will be removed once the code is debugged and tested. These errors are:

Translator – Simple Expressions (Implemented)

Handling of simple expressions has been implemented and appears to be working. Not all the functionality or error handling has be tested yet. There is a bug in the code where the program is not terminating correctly (must hit Ctrl+C to abort the program; or crashes when using the debugger). I suspect there is a memory allocation error causing some sort of memory corruption. I wanted to release what what there is since these kind of problems take a while to figure out.

The Parser routines have been updated to set the length for non-string tokens – these are tokens for operators, commands and internal functions. I thought this would be helpful in highlighting the entire token when outputting errors.

Other changes include the Table class where the unary_code and precedence fields have been added with the associated initializer data. More access functions have been added to the Table class for the two new members and also for the name members (so that these can be output).

The translator test code has also been updated. Instead of printing all the tokens like was done when testing the parser, I decided to just output the RPN list as a list of tokens separated by spaces. The outputting of the full tokens will be added as a debug option.

The ibcp_0.1.4-dev-2.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program (test stack files have been removed). The ibcp.exe program doesn't terminate properly when done testing the Translator, but appears to work. Not everything has been tested and it only includes three simple expressions.

Translator – Adding Operators

Adding operator tokens will be handled by the separate function add_operator(). Some extra processing must be performed in addition to adding the token to the output list and pushing the output list's element pointer onto the done stack like is performed for simple operands.

Eventually this function will verify that the data types of operand(s) of the operator are the correct data type. Also for the case of operators taking two operands of the double data type, if either of the operands is an integer, then the special hidden CvtDbl operator must be inserted into the output list after the operand. This is the purpose of having the done stack and why it contains pointers to the output list elements. The pointers will be used to insert tokens after an element when necessary.

For now, the operands will simply be popped off of the done stack. For unary operators, only one operand is popped off of the done stack. This function will return the status of adding the operator, which will be an error or Good upon success.

Translator – Adding Tokens

Adding tokens to the output RPN list for simple expressions consists of four major parts. But upon the first call, the master element for the output list needs to be allocated and the special Null token pushed onto the hold stack (to prevent popping tokens past the bottom of the stack). The state of the Translator is then changed from Initial to Operand.

When the state is Operand, for simple operand tokens (tokens that not operators and do not have a parenthesis), the token is appended to the end of the output list and the pointer to the output list's element is pushed onto the done stack. The state is changed to BinOp since a binary operator is expected next. A Good status is returned.

Also when the state is Operand, the token may contain a unary operator. The token is determined to contain a unary operator if the operator's table entry's unary_code is not the Null code. The token's index is replaced with the table index for the unary operator code.

When the state is Operator, the token must be an operator and must not be a unary operator. An operator could be only a unary operator, for example the NOT operator. The operator is determined to be only a unary operator if the operator's table entry has the same code and unary_code.

For all operators, all higher or the same precedence tokens on the hold stack are popped and added to the output list via another function. Once the precedence of the token on the top of the hold stack is less than the current operator, if the token is not the EOL (end of line) token, the operator token is pushed onto the hold stack. The state is set to Operand since an operand is expected next (for a unary operator, the state stays at Operand). A Good status is returned.

If the token is the EOL token, then end of line processing is performed. The hold stack has already been emptied because the EOL token is lower precedence than all the other operators except of course for the special Null token that was put on the bottom of the stack. The Null token is popped off the stack. For simple expressions, nothing else needs to be done so a Done status is returned.

Friday, March 19, 2010

Translator Class – Data Member Changes

I decided to rename the data member in the Translator to make more sense. The rpn_list will simply be named output. The op_stack will actually be holding more than just operators (as will be seen, functions and array will be placed on this stack also). Calling it an “operator” stack is not accurate. The operand_stack also will be holding more than strictly operands, though technically “operand” is correct (operands of an operator could be the result of another operator). The tokens placed in this stack will be “operands” to operators, but could be operands of operators.

I wanted to come up with better names (“op” is ambiguous and “operator” and “operand” are too similar). It would be a bonus if they had the same number of characters. Therefore, I came up this the name hold_stack for operators and other tokens on “hold” or pending to be placed in the output list. The other stack will named done_stack for tokens that are “done” being processed and have been placed into the output list.  By the way, this parallels the result stack at run-time, which will hold values of operands and results of operators (to be used as operands for the next operator or command).

Translator – Operator Precedence

The precedence member needs to be added to the Table, which the Translator will use to rearrange the input to the output RPN list. After considering the operator precedence in several BASICs and in C, the operator precedence that will be used for this project is (from highest to lowest):
^             (exponentiation)
-             (negation)
*, /          (multiplication, division)
\             (integer division)
MOD           (modulus)
+, -          (addition, subtraction)
<, >, <=, >=  (relational operators)
=, <>         (equivalence operators)
NOT           (logical not/1's compliment)
AND           (logical/bit wise and)
XOR           (logical/bit wise exclusive or)
OR            (logical/bit wise or)
IMP           (implication)
EQV           (equivalence)

Thursday, March 18, 2010

Translator – Developmental Release (Compile Test)

There's a lot of different syntax being used that I never used before and I wanted to make sure it compiles before continuing. The Translator also uses lists of pointers, something that the test_stack program didn't actually test. With these lists, the items (in this case Tokens) are allocated elsewhere (the Parser), used by the Translator, and will eventually be deallocated by the Encoder.

I want to verify that list of pointers worked, so the test_stack was modified to test these kinds of lists, where the pointer values of the structures are output to verify everything was working correctly. Fortunately these pointer lists worked as expected, so no changes to the List class were necessary.

The Translator class was implemented in the main include file and the basic structure of the Translator's functions were implemented in a new source file. The add_token() doesn't actually process the tokens yet. Test Translator routines were added to the test_ibcp.cpp file. The goal was to make sure the basic structure would compile.  Trying to run the Translator with input mode will crash the program.

The ibcp_0.1.4-dev-1.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the updated test_stack program (now moved to a test subdirectory). The ibcp.exe program doesn't do anything more than the last version except will print translator command line options, so it was not included in the binary .zip file.

Translator – Data Members

The Translator will have these data members:

    Table *table                                  – pointer to Table object
    List<Token *> *rpn_list                       – translated tokens list
    List<Token *> op_stack                        – pending operators stack
    List<List<Token *>::Element *> operand_stack  – processed operands stack

The members table, op_stack and operand_stack members will be initialized by argument or automatically by the constructor.  Upon calling the start() function, the rpn_list will be initialized (master item allocated).

Simple expressions have the general form of operand bin-op operand bin-op operand. Also each operand could be a lone operand or the form unary-op operand (where there may be more than one unary operator). The Translator will be either expecting an operand or unary operator(s) in front of the operand or a binary operator. Therefore, there will be a data member indicating what state the Translator is currently in along with it's enumeration definition:

    enum State:
        Initial   nothing processed yet
        BinOp    – expecting a binary operator
        Operand  – expecting a unary operator or operand
    State state  – current state of the translator

There may be states as more of the Translator is implemented. The state will be set to Initial by the start() function. Upon the first call to add_token(), the dummy Null token will be pushed to the op_stack and the state will be set to Operand to expect the first unary operator or operand.

Using The Translator

The Translator needs to be instanced, where there will be table argument pointer to the Table object, which will be needed by the Translator just like for the Parser.  The constructor will initialize the Translator's local Table point and will set it's RPN list pointer to a NULL. The input line is received from the user will be a C zero-terminated character array. This array needs to be initially fed to the Translator:

    Translator translator(table);
    char *input;
    . . .
    translator.start(input);

The start function will initialize internal variables and lists in preparation for translating the line into RPN format.  As Tokens are obtained from the parser, they will be fed to the Translator:

    Token *token;
    Translator::Status status;
    . . .
    do
    {
        token = parser.get_token();
        status = translator.add_token();
    }
    while (status == Translator::Good);

When the add_token() returns either a Done status or an error status, the loop will terminate.  If the status is Done, then the RPN list can be obtained from the Translator:

    List<Token *> *rpn_list = translator.get_result();

If the status is an error, the error will be reported. The token pointer will contain the part of the input line that contains the error (including it's starting column so the user can be informed where the error is).