Sunday, March 28, 2010

Translator – Arrays and Functions (Design Issues)

The first issue is that the number of operands counter cannot be a simple variable because operands of arrays and functions can be other arrays and functions, so there needs to be a counter variable for each nested array or function. Therefore a counter stack is needed for multiple nested levels of arrays or functions. Using the existing List class for this simple counter stack would be overkill, so a new simple stack class will be developed with a similar interface (so it would be easy to switch from one to the other for stacks).

The second issue is when an array or function token is pushed on the stack. The Paren and DefFuncP token types do not have a table index and therefore a precedence can't be obtained from a table entry. To resolve this, a new precedence() function is needed in the Token class, which will work the same as the is_operator() and has_paren() functions using a static precedence array. A new precedence() function will be added to the Table class with a Token pointer argument that will use the new Token precedence() function.

The values in the static precedence array within the Token class will be initialized to -1 if the token type has an associated table entry, otherwise the value is set to the desired precedence. For the Command, Operator and IntFuncP token types that have table entries, the token precedence will be set to -1 and for the DefFuncP and Paren token types, the precedence will be set to 2 (the same as the open parentheses precedence). The other token types won't be pushed to the hold stack so no value is needed at the moment.

Saturday, March 27, 2010

Translator – Commas

The commas that separate the subscripts or arguments will be processed as operators. The precedence of the comma operator will be the same as the closing parentheses, which will empty the hold stack up to but not including the array or function token. The comma will be processed as a token requiring special processing, where the comma will simply be counted and the comma token will not be pushed onto the hold stack.

This will require a number of operands counter. This counter is initialized to 0. When a parentheses token is pushed onto the hold stack, this variable is set to 1 indicating there is an array or function token on the stack. Each comma operator processed increments this counter. If no commas tokens are seen, the counter remains set to 1. If the counter is 0 when a comma token is seen, then an “unexpected comma” error will be reported.

So now, upon reaching the closing parentheses, if the counter is 0, then an open parentheses token is expected to be on top of the hold stack as previously implemented. If the counter is 1 or more, then the counter contains the number of operands added to the output list and an array or function argument is expected to be on top of the hold stack. The array or function token is then popped off of the hold stack and added to the output list.

Translator – Arrays and Functions

As mentioned, the Translator will not know the number or subscripts that arrays should have.  It will also not know (except for internal functions) the number of arguments that functions should have. Worse, the Translator will not even know if an identifier with a parentheses refers to an array or a function. (The Translator will be able to identify a define function since the Parser has already identified these by the “FN” with their own token type.)

As it turns out, translation wise, arrays and functions can be are handled identically – one has subscripts and the other has arguments. But these can also be thought as operands just like for operators except where operators have one (unary) or two (binary) operands, arrays and functions can have one or more operands. Functions with no arguments (including the IntFuncN and DefFuncN token types) are already being handled as simple operands – the Translator didn't need to distinguish between variables and functions with no arguments.

To translate arrays and functions, a token that has a parentheses will be pushed onto the hold stack. The precedence of these tokens will the same as the open parentheses (between Null and close parentheses for the same reason) to keep them on the hold stack. Upon reaching the matching closing parentheses token, the number of operands will be counted (and validated for an internal function). The array or functions token will then be popped off of the hold stack and added to the output list. But exactly how will the operands be counted...

Translator – Functions

Now that the translation arrays have been defined, it's time to define the translation of functions. Functions (whether Internal, one-line Defined or User) have arguments separated by commas. Take the example MID$(A$+B$,INT(A+0.5),I+5) containing two internal function calls where the tokens will be parsed as:
MID$( A$ + B$ , INT( A + 0.5 ) , I + 5 )
Again there are tokens for commas and closing parentheses. This expression will be translated to RPN as:
A$ B$ + A 0.5 INT( I 5 + MID$(
The functions are found after their associated arguments. Functions will pop their already evaluated arguments off of the run-time stack (which will be in reverse order), perform there operations, and then push their result back onto the run-time for the next part of the expression.

Though this example contains only internal functions, one-line defined functions and user functions work the same way. The big difference is that Translator (using Table) will know the number of arguments that internal functions have and can perform error checking for the number of arguments. However, this can't be done for one-line defined (FN) functions and user functions. This job will be left for the Encoder (which will access the Dictionary for this information).

Translator – Arrays

Arrays have subscripts as in A(I) and B(3,X). For now there will be no limitation on the number of subscripts that an array can have (there will be a limitation once the internal language is defined, but this limitation will be sufficiently large). Take the example A(X+5,Y*2,Z) for a three dimension array element where the tokens will be parsed as:
A( X + 5 , Y * 2, Z )
This includes tokens for the commas separating the subscripts and the closing parentheses. Also note that in this example, the subscripts can be expressions. This array element expression will be translated to RPN as:
X 5 + Y 2 * Z A(
Note that the subscript expressions are first with the array at the end. When this array element is processed at run-time, the expressions are processed as discussed before. The run-time stack will contain the values X+5, Y*2 and Z (Z on top of the stack) when execution reaches the array name.

The instruction code for the array will contain the number of subscripts, so when the array element offset is evaluated, it will know that there are 3 subscripts on the stack (in reverse order) that need to be popped. Once the offset is calculated, the value of the array element will be obtained from memory and pushed back onto the stack. This value will then be the operand for the rest of expression the array element is in.

There is a problem with arrays during translation, and that is that the Translator will not know the actual number of subscripts that a particular array will have. Therefore, the subscript checking will be delayed for the Encode, which will use the Dictionary to obtain this information. Remember that the purpose of the Translator is to rearrange the incoming program code into RPN format, doing as much error checking as possible.

Friday, March 26, 2010

Translator – Parentheses (Release)

The problem with some parentheses expressions was that the last precedence was set at the open parentheses token, but this caused a problem when another open parentheses is processed before the next operand because the last precedence would be reset to the highest precedence before the last last precedence value could be used.  The solution was to set the last precedence value at a close parentheses token just before the stack is emptied up to the open parentheses.

Also discovered that the code needed to check if there was pending parentheses token at an open parentheses and a closing parentheses in addition to when added an operand or operator to the output list. These additional locations were discovered when more test expressions were added.  Since the code for checking and adding a dummy token was needed in four locations, this code was replaced with a function call to the new function do_pending_paren() where the code was located. This function does last precedence is higher than current operator check and the same precedence only when the state is Operand (for operands and open parentheses). Since this check is also done for the EOL operator, the function call was added there also and because the top of the stack contain a Null operator (lowest precedence), the dummy token is always added.

One other problem wes found. At the EOL, the token popped off the stack (which should be the Null token at the bottom of the stack, but could be something else if there is an error), was not deleted if the token wasn't the Null token.

The code now appears to be handling parentheses correctly and ibcp_0.1.5-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Next array and function handling needs to be added to the Translator...

Thursday, March 25, 2010

Translator – Parentheses (Debugging)

A minor issue was discovered during debugging. During the initial detection for unary operators (in the operator section) when the last precedence variable is set to the highest precedence value, the opening parentheses token needed to be added directly to the hold stack, the state set to Operand and Good status returned. I initially last the code fall through to let the operator code process the opening parentheses unary operator, but this code malfunctioned for this token (the hold stack was emptied because of the low precedence open parentheses).

In the clean up function, which cleans up the Translator after an error occurs (empties the stacks), code needed to be added to also delete the pending parentheses token if it had been set. A bunch of test expressions were added with all various parentheses expression combinations previously discussed plus some more complex ones. Two expressions were also added to test each of the missing parentheses errors.

However, during testing it was found that when the expression (A+B)*(C+D) is translated, it incorrectly contains a dummy token: A B + ) C D + *. The problem is that at the point that it is processing the C operand, it sees that there is a parentheses pending. But, the parentheses before the C set the last precedence to the highest precedence, so it doesn't matter what operator is on the stack (in this case the *), it adds the dummy token.

Since most of the code is working, the ibcp_0.1.5-dev-1.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. I had to remove the windows shortcut from the binary because it only works on my system. The shortcut will need to be created on each target system. (Simple right-click on the ibcp.exe and select Create Shortcut, then Edit Properties of the shortcut and add “ -t i” to the Target field.)

Translator – Parentheses (Implementation)

While the previous several posts seem to suggest that the implementation of parentheses will be rather involved, this is not the case.  First the unary code field of the opening parentheses needs to be set (to the opening parentheses code) so that it will be processed correctly.

The precedence of the parentheses operators need to be set in the Table. The precedence of Null is zero and the precedence of EOL was set to 4. The precedence of the closing parentheses needs to the same as EOL – both perform the same hold stack emptying functionality. The precedence of the opening parentheses needs to be between Null and EOL/opening parentheses and therefore will be set to 2 so that neither EOL or an closing parentheses will remove it automatically during the emptying of the hold stack (of higher precedence).

The difference between processing a closing parentheses and an EOL is that a closing parentheses expects to find an opening parentheses on top of the hold stack after it is emptied and an EOL expects the null token at the bottom of the hold stack. If these conditions are not the case, the  appropriate missing parentheses error occurs.

Wednesday, March 24, 2010

Translator – Parentheses (End of Dummy Tokens)

There's a few cases of adding dummy parentheses tokens that need to be considered to make sure that the checks for before adding an operand and before adding an operator are handled properly. Consider the expression:
(A Op1 B Op2 C)
Again the decision point occurs when processing the EOL (as an operator). If this case however, there is no operator on top of the hold stack, just the Null token at the bottom of the stack. Fortunately the Null token has a precedence (lowest), so in this case, the precedence of the last operator will always be higher than the Null token, so the dummy token will always be added.

Similarly if the expression only contains (A). The Null token is lowest precedence but there is no last operator. Again the check covers this case with the if there is no last operator, the dummy token is always added. Finally, expressions with double (or more) parentheses need to be handled, take the expressions:
((A Op1 B)) Op2 C
A Op1 ((B)) Op2 C
When the second closing parentheses is being processed (an operator), if there is already a closing parentheses token pending and another closing parentheses is being processed (and no operator was on the hold stack that would have triggered the add dummy token check), then the pending closing parentheses token (the first one) needs to be added directly to the output list. The pending closing parentheses token is then set to the second closing parentheses.

The next token, Op2, will then be pushed onto the hold stack. At the C operand token, since there is still a closing parentheses pending (the second one), this will trigger the add dummy token check, where the last operator added (Op1) will be checked against the top of the hold stack (Op2) and the dummy token will only be added if the precedence of Op1 is higher than or same as Op2. (There will be one or two dummy tokens in the translation depending on the precedence of the operators.)

For the second double parentheses expression, the first dummy token will automatically be added to the output list. The second dummy token will also be added to the output list because there is no last operator, therefore the dummy token is always added.

This should handle all cases for adding dummy tokens for unnecessary parentheses. Now how all this is going to be implemented...

Tuesday, March 23, 2010

Translator – Parentheses (More on Dummy Tokens)

Not all cases of adding dummy parentheses tokens occur when an operand is about to be added to the output list.  Next consider the expression:
A Op1 (B Op2 C)
This expression will be translated to A B C Op2 Op1. When this is recreated, the parentheses will only be added if the precedence of Op2 is lower than or the same as Op1 (in other words, when the parentheses are required). The dummy token is only required after Op2 when the precedence of Op2 is higher than Op1.

This decision, whether to add the dummy token, needs to be made before the Op1 operator is added to the list. At this point, Op1 is on the top of the hold stack and the EOL is being processed (the stack is about to be emptied). Therefore, in the code at the point that an operator is about to be added to the output list from the top of the hold stack, there needs to be the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) during the processing of closing parentheses (handling the case that the parentheses is just around an operand) is higher than the operator on top of the hold stack (Op1) then add the closing parentheses token to the list.  Now consider the extension on the above expression:
A Op1 (B Op2 C) Op3 D
At the point of the decision, Op3 is being processed and Op1 is on top of the hold stack. Either Op1 will be popped off the stack and added to the list (Op1 is higher precedence than Op3), or Op3 will be pushed on the stack (Op3 is higher precedence than Op1). For the former case, the precedence check for the dummy token is made between Op2 (the last operator added during the parentheses processing) and Op1. For the later case, the precedence check is made between Op2 and Op3, which is on top of the stack at the EOL processing (Op1 would be popped and added last).

Remember that EOL is processed as a low precedence operator. Ditto for a closing parentheses, which is also processed as a low precedence operator. Therefore the dummy token check is made whenever an operator is going to be added to the output list.  Almost done, but a few more expressions need to be considered...

Translator – Parentheses (Dummy Tokens)

To determine when a dummy parentheses token should be added to the code so that the Recreator knows to include the unnecessary (but entered by the programmer) parentheses, consider the following generic expression:
(A Op1 B) Op2 C
This expression will be translated to A B Op1 C Op2. When this is recreated, the parentheses will only be added if the precedence of Op1 is lower than Op2 (in other words, when the parentheses are required). The dummy token is only required after Op1 when the precedence of Op1 is higher than or the same as Op2.

This decision, whether to add the dummy token, needs to be made before the C operand is added to the list. At this point, Op2 is on the top of the hold stack. Therefore, in the code at the point that an operand is about to be added to the output list, there needs to the check if a closing parentheses was just processed and the precedence of the last operator added (Op2) is higher than or the same as the operator on top of the hold stack (Op1) then add the closing parentheses token to the list.  Now consider a variation of the above expression:
A Op1 (B) Op2 C
This is similar to the above except except at the decision, the last operator added is outside the parentheses (Op1). The test condition needs to be modified to say the last operator added during the processing of the closing parentheses. When there is no last operator, then the parentheses always needs to be added.

This takes care of the processing necessary when adding an operator to the output list and a closing parentheses was just processed. However, there are other expressions with unnecessary parentheses to consider...

Monday, March 22, 2010

Translator – Parentheses (Recreation Issue)

When expressions with parentheses are translated to RPN, parentheses get removed as they are not necessary. Instead, RPN is dependent on the order of the expression. This is why RPN calculators have no parentheses keys like regular calculators. Take the expressions (A*B)+C and A*B+C – both would be translated to A B * C + (in other words, the parentheses are unnecessary in the original expression).

One of my pet peeves with programmers are the addition of unnecessary parentheses into expressions when they are not needed (my opinion: learn and use the language properly). It would be easy to just let the parentheses disappear as occurs automatically when an expression is translated to RPN, as would happen when the expression above is recreated since the expression is equivalent without parentheses. Unlike, for the expression (A+B)*C, that would be translated to A B + C *; the parentheses would be added back when the expression is recreated, because the parentheses are needed or the meaning of the expression would be changed.

However, it would unexpected to a programmer for the parentheses to just disappear. This is the reason that the original strings of numeric constants are being kept in tokens. It would not be nice for a number to be entered as 0.000001, but show up as 1e-6 in the program. To keep these unnecessary parentheses in the code so that they will be in the recreated expression, something needs to be added to the internal code. However, it would just waste memory and run-time to just throw in parentheses everywhere they are entered. So for these unnecessary cases, a dummy token will be added to the RPN list and since the closing parentheses token is present, this token will be the token added. During run-time, the closing parentheses code will be a no-operation code.

Note that no-operation does not mean no execution time, some processing will be required to skip over the dummy code; which is why dummy parentheses tokens won't be put into the code everywhere there are parentheses in the original expressions.  Later an option will be added to allow the user the option of having parentheses removed automatically. This will give the user the choice. Now the trick is to decide when to add one of these dummy tokens and when not to...

Sunday, March 21, 2010

Translator – Parentheses

Parentheses are used to override the precedence of operators in expressions. For the Translator, parentheses will be handled as operators themselves.

The opening parentheses will be considered a unary operator as it's placement in expressions is the same as for other unary operators – expected when an operand or optional unary operator is expected. The precedence of the opening parentheses will very low, above the Null operator, but below the EOL operator. Above the Null operator because the Null operator should never be removed from the hold stack until the very end. However, below the EOL operator because the opening parentheses token should not be removed from the hold stack at the end of the line – this would be a “missing closing parentheses” error.

The closing parentheses will be considered an ordinary binary operator as it's placement in expressions is the same as for other binary operators, in other words, after operands where a binary operator is expected. The precedence of the closing parentheses will be the same as the EOL operator. This will cause the hold stack to be emptied down to either the Null token at the bottom of the stack or an opening parentheses token.

A closing parentheses operator will require a little extra processing than other operators, similar to the EOL operator. Also, the closing parentheses token will not actually be pushed onto the hold stack. After the hold stack is emptied, if the Null token is on top of hold stack, then a “missing opening parentheses” error would occur. Otherwise, there must be an opening parentheses token on the top of the hold stack which would be popped off of the stack. Nothing needs to be put into the output RPN list...

Translator – Simple Expressions (Release)

The three possible errors were also validated (and the diagnostic errors did not occur as they were no suppose to). Finally, one of the test expressions is a string expression A$ = "this" + "test", which is correctly translated as A$ "this" "test" + =. However, technically this is not being translated as an assignment statement. Unlike C, in BASIC the “=” operator could be either assignment or equality depending on where it is found. Assignment statements will be handled later.

I discovered that in input mode, the up and down arrows can be used to recall previously entered lines in a session. This must also be built into the gets() standard library function. I also discovered that the ibcp.exe program does not work from a DOS window (Windows XP), but does run from Windows Explorer. And of course runs fine from an MSYS command window or with the Run in Console option from VIDE – the two ways I have been running the programs. Therefore, with the binary, there is now an included shortcut that starts the program in Translator input test mode.

The ibcp_0.1.4-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. I realized that the codes.txt files has not been updated for the recent code additions. Instead of creating that file again (I did it manually and it was a pain), the codes.awk awk script was created to generate this file automatically from the ibcp.h include file. Assuming that MSYS is installed, this script can be run using the command “awk −f codes.awk <ibcp.h >codes.txt”.

Now that simple expressions are being translated into RPN format, it's time to add the handling of parentheses...

Translator – Simple Expressions (Problem Solved)

I suspected the problem with the expression “A-B” had something to do with the fact the “-” could be a unary operator (the expression “A+B” worked fine). The first problem discovered was that the Parser was seeing this as the Auto command; “-” was valid for the start of a line number range, but the “B” was not valid, so an “unexpected error” was returned. However, the Translator test code did not check for an Error token and passed it to the Translator, which did not know what to do with it. Therefore, the test code was modified to output a Parser error and terminate the processing of the input line.

To correct the problem with this expression being treated as a invalid immediate command, this error was temporarily removed. Temporary because no valid BASIC line would have just “A-B” on it, but for now only expressions are being tested. Most of the Translator was handling the “-” as a binary operator correctly until it got to the add operator routine. There is a check that only pops one operand from the done stack for a unary operator and it was this check that was wrong.

To simplify the check for a unary operator (which should check if the Table entry code is the same as the unary code, not if the unary code is not set to Null), an is_unary_operator() function was added to the Table class that does this check. This should eliminate possible future dumb coding errors. Now it's time to package this up and release it...

Translator – Simple Expressions (Debugging)

After correcting some minor problem, I finally discovered the problem with the improper termination. I don't fully understand the mechanism, but I'm guessing the crash/lock-up occurred in the automatic destructor code of the Translator and/or List classes. In any case, the problem was due to the done stack not being empty.

When the last operator in the expression was processed (when the EOL token emptied the hold stack), it pushed it's output list element pointer on the done stack as it was suppose to. But nothing popped this off of the stack. Eventually, when the command of the line is processed, the done stack will be examined to checked for the arguments of the command. For now however, this last item needs to popped from the done stack. As a result of this correction, two more diagnostic errors needed to be added:
Stack Empty 3 – Occurs if the done stack is empty when popping the final result of the expression.
Stack Not Empty 2 – Occurs at end of line processing if the done stack is not empty after popping the final result.
With these corrections in place, additional test inputs were added expressions and discovered that the Translator could not handle a blank line. Therefore a check was added to the Initial state code to return Done if the first token contains an EOL operator (the special Null token is not added to the hold stack since processing is done).  One more problem was found, while more complex expressions worked correctly, the simple expression “A-B” did not.  More debugging required...

Saturday, March 20, 2010

Translator – Adding Token Errors

Several errors could occur when calling the add_token() function. Each of these errors need to be tested. These errors are:
Expected Operand – Occurs if the token contains a binary operator when the state of the Translator is Operand indicating an operand or a unary operator was expected.
Expected Operator – Occurs if the token contains an operand when the state of the Translator is Operator indicating an operator was expected.
Expected Binary Operator – Occurs if the token contains a unary operator that cannot be a binary operator when the state of the Translator is Operator indicating a binary operator was expected.
During development of the Translator, several other diagnostic errors can occur. These errors are the result of checks put into the code to verify that the Translator is working correctly, in other words, to report on situations that should not occur. These checks will be removed once the code is debugged and tested. These errors are:

Translator – Simple Expressions (Implemented)

Handling of simple expressions has been implemented and appears to be working. Not all the functionality or error handling has be tested yet. There is a bug in the code where the program is not terminating correctly (must hit Ctrl+C to abort the program; or crashes when using the debugger). I suspect there is a memory allocation error causing some sort of memory corruption. I wanted to release what what there is since these kind of problems take a while to figure out.

The Parser routines have been updated to set the length for non-string tokens – these are tokens for operators, commands and internal functions. I thought this would be helpful in highlighting the entire token when outputting errors.

Other changes include the Table class where the unary_code and precedence fields have been added with the associated initializer data. More access functions have been added to the Table class for the two new members and also for the name members (so that these can be output).

The translator test code has also been updated. Instead of printing all the tokens like was done when testing the parser, I decided to just output the RPN list as a list of tokens separated by spaces. The outputting of the full tokens will be added as a debug option.

The ibcp_0.1.4-dev-2.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program (test stack files have been removed). The ibcp.exe program doesn't terminate properly when done testing the Translator, but appears to work. Not everything has been tested and it only includes three simple expressions.

Translator – Adding Operators

Adding operator tokens will be handled by the separate function add_operator(). Some extra processing must be performed in addition to adding the token to the output list and pushing the output list's element pointer onto the done stack like is performed for simple operands.

Eventually this function will verify that the data types of operand(s) of the operator are the correct data type. Also for the case of operators taking two operands of the double data type, if either of the operands is an integer, then the special hidden CvtDbl operator must be inserted into the output list after the operand. This is the purpose of having the done stack and why it contains pointers to the output list elements. The pointers will be used to insert tokens after an element when necessary.

For now, the operands will simply be popped off of the done stack. For unary operators, only one operand is popped off of the done stack. This function will return the status of adding the operator, which will be an error or Good upon success.

Translator – Adding Tokens

Adding tokens to the output RPN list for simple expressions consists of four major parts. But upon the first call, the master element for the output list needs to be allocated and the special Null token pushed onto the hold stack (to prevent popping tokens past the bottom of the stack). The state of the Translator is then changed from Initial to Operand.

When the state is Operand, for simple operand tokens (tokens that not operators and do not have a parenthesis), the token is appended to the end of the output list and the pointer to the output list's element is pushed onto the done stack. The state is changed to BinOp since a binary operator is expected next. A Good status is returned.

Also when the state is Operand, the token may contain a unary operator. The token is determined to contain a unary operator if the operator's table entry's unary_code is not the Null code. The token's index is replaced with the table index for the unary operator code.

When the state is Operator, the token must be an operator and must not be a unary operator. An operator could be only a unary operator, for example the NOT operator. The operator is determined to be only a unary operator if the operator's table entry has the same code and unary_code.

For all operators, all higher or the same precedence tokens on the hold stack are popped and added to the output list via another function. Once the precedence of the token on the top of the hold stack is less than the current operator, if the token is not the EOL (end of line) token, the operator token is pushed onto the hold stack. The state is set to Operand since an operand is expected next (for a unary operator, the state stays at Operand). A Good status is returned.

If the token is the EOL token, then end of line processing is performed. The hold stack has already been emptied because the EOL token is lower precedence than all the other operators except of course for the special Null token that was put on the bottom of the stack. The Null token is popped off the stack. For simple expressions, nothing else needs to be done so a Done status is returned.

Friday, March 19, 2010

Translator Class – Data Member Changes

I decided to rename the data member in the Translator to make more sense. The rpn_list will simply be named output. The op_stack will actually be holding more than just operators (as will be seen, functions and array will be placed on this stack also). Calling it an “operator” stack is not accurate. The operand_stack also will be holding more than strictly operands, though technically “operand” is correct (operands of an operator could be the result of another operator). The tokens placed in this stack will be “operands” to operators, but could be operands of operators.

I wanted to come up with better names (“op” is ambiguous and “operator” and “operand” are too similar). It would be a bonus if they had the same number of characters. Therefore, I came up this the name hold_stack for operators and other tokens on “hold” or pending to be placed in the output list. The other stack will named done_stack for tokens that are “done” being processed and have been placed into the output list.  By the way, this parallels the result stack at run-time, which will hold values of operands and results of operators (to be used as operands for the next operator or command).

Translator – Operator Precedence

The precedence member needs to be added to the Table, which the Translator will use to rearrange the input to the output RPN list. After considering the operator precedence in several BASICs and in C, the operator precedence that will be used for this project is (from highest to lowest):
^             (exponentiation)
-             (negation)
*, /          (multiplication, division)
\             (integer division)
MOD           (modulus)
+, -          (addition, subtraction)
<, >, <=, >=  (relational operators)
=, <>         (equivalence operators)
NOT           (logical not/1's compliment)
AND           (logical/bit wise and)
XOR           (logical/bit wise exclusive or)
OR            (logical/bit wise or)
IMP           (implication)
EQV           (equivalence)

Thursday, March 18, 2010

Translator – Developmental Release (Compile Test)

There's a lot of different syntax being used that I never used before and I wanted to make sure it compiles before continuing. The Translator also uses lists of pointers, something that the test_stack program didn't actually test. With these lists, the items (in this case Tokens) are allocated elsewhere (the Parser), used by the Translator, and will eventually be deallocated by the Encoder.

I want to verify that list of pointers worked, so the test_stack was modified to test these kinds of lists, where the pointer values of the structures are output to verify everything was working correctly. Fortunately these pointer lists worked as expected, so no changes to the List class were necessary.

The Translator class was implemented in the main include file and the basic structure of the Translator's functions were implemented in a new source file. The add_token() doesn't actually process the tokens yet. Test Translator routines were added to the test_ibcp.cpp file. The goal was to make sure the basic structure would compile.  Trying to run the Translator with input mode will crash the program.

The ibcp_0.1.4-dev-1.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the updated test_stack program (now moved to a test subdirectory). The ibcp.exe program doesn't do anything more than the last version except will print translator command line options, so it was not included in the binary .zip file.

Translator – Data Members

The Translator will have these data members:

    Table *table                                  – pointer to Table object
    List<Token *> *rpn_list                       – translated tokens list
    List<Token *> op_stack                        – pending operators stack
    List<List<Token *>::Element *> operand_stack  – processed operands stack

The members table, op_stack and operand_stack members will be initialized by argument or automatically by the constructor.  Upon calling the start() function, the rpn_list will be initialized (master item allocated).

Simple expressions have the general form of operand bin-op operand bin-op operand. Also each operand could be a lone operand or the form unary-op operand (where there may be more than one unary operator). The Translator will be either expecting an operand or unary operator(s) in front of the operand or a binary operator. Therefore, there will be a data member indicating what state the Translator is currently in along with it's enumeration definition:

    enum State:
        Initial   nothing processed yet
        BinOp    – expecting a binary operator
        Operand  – expecting a unary operator or operand
    State state  – current state of the translator

There may be states as more of the Translator is implemented. The state will be set to Initial by the start() function. Upon the first call to add_token(), the dummy Null token will be pushed to the op_stack and the state will be set to Operand to expect the first unary operator or operand.

Using The Translator

The Translator needs to be instanced, where there will be table argument pointer to the Table object, which will be needed by the Translator just like for the Parser.  The constructor will initialize the Translator's local Table point and will set it's RPN list pointer to a NULL. The input line is received from the user will be a C zero-terminated character array. This array needs to be initially fed to the Translator:

    Translator translator(table);
    char *input;
    . . .
    translator.start(input);

The start function will initialize internal variables and lists in preparation for translating the line into RPN format.  As Tokens are obtained from the parser, they will be fed to the Translator:

    Token *token;
    Translator::Status status;
    . . .
    do
    {
        token = parser.get_token();
        status = translator.add_token();
    }
    while (status == Translator::Good);

When the add_token() returns either a Done status or an error status, the loop will terminate.  If the status is Done, then the RPN list can be obtained from the Translator:

    List<Token *> *rpn_list = translator.get_result();

If the status is an error, the error will be reported. The token pointer will contain the part of the input line that contains the error (including it's starting column so the user can be informed where the error is).

Wednesday, March 17, 2010

Translator – Developmental Release

Another change is needed in the Token class - the “code” field needs to be changed to “index” because it is really an index of a Table entry and not the code enumeration. It was originally named code because it was going to be the code within the program in memory, which will make the program easy to execute (the index is still going to be used in the program memory). The table entry will have a function pointer to the routine that will execute the command or operator. If the code from the Code enumeration was used, then the run-time module would have to first index through the code-to-index conversion array to get the index of the table and then index to the table entries array.

Changes have be made to the Parser (special EOL  token and changing the Token member code to index), Table (added unary_code field and entries for Null, Neg, and EOL codes) and the test code (renamed test_parser.cpp to test_ibcp.cpp because it's going to contain all of the test code; separated code that prints token contents into its own function so it can be called from other function like from the test_translator() function to be written; modified to work with the new EOL token; and was setup to allow for additional test functions where the command line arguments have been changed in that now “-p” activates the Parser test code).

The ibcp_0.1.3-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. The code essentially does what it did before, but contains the recent changes mentioned. The reason for the release was to follow the spirit of open source code (release early, release often) and I wanted to verify that the changes so far compiled and worked. Also, I discovered the source for Release 0.1.2 was missing the test_parser.cpp source file.

Translator Class – Table/Parser Changes

The new unary operator Neg_Code for Sub_Code needs to be added, which will represent negation, along with it's corresponding table entry. To other two codes also need to be added, Null_Code and EOL_Code.

The Null_Code will be used for two things. One being a very low (lowest) precedence operator that will be pushed onto the operator stack before processing tokens. This will simplify the Translator when it is checking and popping off high precedence operators as it won't need to check if the stack is empty first. The second will be for the new unary_code field that needs to be added to the Table entries to indicate an operator's associated unary operator with Null_Code indicating that there is no unary operator.

The Parser needs to be modified to return a token with an EOL_Code at the end of the line instead of returning a NULL pointer. The reason for this changes is so that this EOL token can be passed into the Translator. The EOL will be treated as an operator and it's precedence will also be very low, but higher than the Null operator. The EOL operator will force the Translator to empty the operator stack of any operators except for the Null operator. Also by using the EOL operator, the Translator can make sure an end of line is allowed. In other words, if the Translator is expecting an operand (as after it has just seen an operator), but sees EOL (an operator), an error will be reported (“operand expected”).

Tuesday, March 16, 2010

Translator Class – Design

There is more that the Translator will handle like parentheses, arrays and functions, commas and semicolons, commands, data type handling and references. Before getting too involved, it's time to layout the Translator class and get basic functionality working with simple expressions and then build the rest upon it. First up is how the Translator class will be used, which will have these member functions:

    start()       - initializes the RPN list, stacks and other variables for a new line
    add_token()   - adds a token obtained from the Parser
    get_result()  - gets the RPN list after the last token has been added

The get_token() function will return an enumeration status code on the result of the adding the token with values of Good (token accepted), Done (last token has been added) and various return error codes. If an error occurs, the caller will use the information in the current token to output the error (like was done with error tokens from the Parser).

The get_result() function will be nothing more than an access function that will return the RPN list.

Sunday, March 14, 2010

Translator – Unary Operators

Up to now, the subject of operators have referred to binary operators or operators with two operands. Unary operators only have only one operand. Only two unary operators are currently planned, - (negate) and NOT. Unary operators are found in expressions preceding an operand. Examples are -A + B, A + -B, and -(A+B). Two unary operators appearing together will be allowed, for example A + --B, but the translator will make no effort to try and optimize this (it's up to the programmer to make an effort to not do stuff like this).

When the current token contains a unary operator, it will be pushed directly onto the operator stack, since there are no operators currently planned that have the highest precedence. With the highest precedence, they will come off the stack whenever a binary operator is processed.

There will be two negate operators, one for integers (NegInt) and one for doubles (NegDbl), so the unary – operator will not need to insert any conversion operator. However, the NOT operator will only work with integers, so if the operand is a double, a hidden double to integer conversion operator (CvtInt) needs to be added to the RPN list before adding the NOT operator.

Translator – Operand Stack

When processing operators, it is necessary check the data types of it's operands already in the RPN list and it may be necessary to insert hidden conversion operators into the RPN list. The location of the first operand may not be the second to the last item added to the RPN list. Take the example A% * B% + C * D; the + operator will be the last operator to be processed, at which point the RPN list will contain A% B% MulInt C D MulDbl. For the + operator, which needs to be the add double operator also needs a CvtDbl inserted after the * operator, which has been changed to an multiple integer operator.

The Translator needs to save the location of where the previous operands are located, for both checking data types and for inserting conversion operators. This will be accomplished with another stack named the operand stack. When an operand is added to the RPN list, it's the pointer to element containing the operand in the RPN list will also be pushed onto the operand stack. When an operator is processed, it will pop it's operands off of the operand stack and use them to check data types and insert necessary conversion operator. After the operator is added to the RPN list, it's element pointer into the RPN list is pushed onto the operand stack for a future operator.

The operand stack parallels the run-time stack except instead of holding results of operands/operators, it will hold pointers to the items the will create the results during run-time.

Looking back at the example, when the + operator is processed, on the operand stack, there will be a pointer to the MulInt operator and a pointer to the MulDbl operator. It will insert a CvtDbl after the MulInt operator,  add an AddDbl to the RPN list and push it's element pointer onto the operand stack. The resulting RPN list will contain  A% B% AddInt CvtDbl C D MulDbl AddDbl. When this expression is executed during run-time, no time wasting decisions or checks need to be made.

Saturday, March 13, 2010

Translator – Operators

As mentioned, operator tokens are pushed onto an operator stack after pulling higher precedence operators off of the operator stack and adding them to the RPN list. However, there is some handling that needs to be performed before an operator can be added to the RPN list.  For each operator, the Translator needs to check the data type of the operator's operand to be sure they are compatible with the operator and/or to add the proper operator dependent on the data types of the operands.

Take the + operator, which can have integer, double or string operands. If both operands are strings, then the concatenation operator needs to be added. For numeric values, it's a bit more involved. If both the operands are integers, then an add integer operator is added. If both the operand are doubles, then an add double operator is added. For an integer and a double, the result will be a double, but the operand that is an integer (which could be the first or the second operand) needs to be converted to a double before performing an add double operator.

The goal is for optimal execution, so having the add double operator check to see if the one of the operands is an integer that needs to be converted first during run time is not ideal. This check should be done before hand. If a conversion is needed, an internal integer to double conversion operator will be inserted into the code.  So, for the expression A% + B, the internal code will be A% CvtDbl B +. During run time, the CvtDbl will operate like a function or unary operator, when A% is pushed onto the run-time stack, the CvtDbl will pop it off the stack, convert it to double and push the double value back on the stack.

The Translator will insert these hidden operators into the RPN list after the appropriate operand as needed by the operator.  (This is why the List class was modified to have append and insert functions – see post List Class – Updated.) When the internal code is output by the Recreator, these hidden operators will not produce any output.

Table and Token Updates

Since the Translator needs to know if functions have parentheses (arguments) or not, the IntFunc value in the TokenType needs to be replaced with the two types IntFuncN (function with no parentheses) and IntFuncP (function with parentheses). Currently there is one function with no arguments (RND). The table entries for the internal functions were updated.

The Translator needs to determine two things about tokens, whether the token has a parentheses and whether the token is an operator. This will be accomplished with two Boolean member functions in the Token structure, is_operator() and has_paren().  To implement these, Token will have two static arrays paren[] and op[] dimension of the size of the TokenType enumeration. To initialize these arrays, Token will have a static initialize() function, which will initialize the appropriate values in the array. This function needs to be called during initialization.

New source file ibcp.cpp has been added to the project, which will hold the new Token static variables and function. This file also contains the main function for the project. To test these minor changes, the test_parser code was used. The main() function in test_parser.cpp was renamed test_parser() with a call to it from main() that is in the new ibcp.cpp. The test_parser routines were modified to output “Op” if the token is an operator (is_operator() returns true); “()” if the token has parentheses (has_paren() returns true); and “??” if both are true, which should never be the case.

There is a new project file ibcp.vpj, which will be the name of the program for now. When a test_translator routine is written, it will be called from ibcp.cpp like with the test_parser routine. The ibcp_0.1.2-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program, which is now ibcp.exe, but works the same as the previous test_parser.exe except for the minor output changes for the new Token functions (the parserX.txt output files were also updated).

Friday, March 12, 2010

Translator – Operands

Operands are the objects that operators act upon. Basic operands consist of variables, constants and functions with no arguments. More complex operands consist of array elements, functions with arguments, and parenthetical expressions. Technically, the results of operators can be operands to other operators, but this will not be the definition of an operand for the Translator.

Only the basic operands will be added directly to the RPN list. Basic operands are any tokens that are not an operator and do not have a parentheses. I realized that this means that the Internal Function token type needs to be split into Internal Function with and with out parentheses (i.e. IntFuncP and IntFuncN) just like for the one-line define functions (DefFuncN and DefFuncP).

In other words, basic operands are items the don't require arguments. The Translator won't be able to tell the difference between functions with no arguments and variables – both of these will be of the NoParen token type. Therefore, the basic operands are NoParen, Constant, IntFuncN, DefFuncN, ImmCmd (will be the only token on a line), and Remark.

Thursday, March 11, 2010

Translator

The job of the Translator is to convert the order of the tokens parsed from an input line into reverse polish notation (RPN), which will make for optimal execution of the program. See the post Internal Language for an explanation.

Essentially, the Translator will obtain tokens from the Parser expecting either an operand or an operator token type. Operands will be added directly to the RPN list, and operators will be added on an operator stack. Before putting an operator token on the operator stack, any operators on the operator stack that have higher precedence than the current operator are pulled from the stack and added to the RPN list. After the higher precedence operators are pulled from the stack, the current operator token is pushed on the stack. At the end, all operators from the operator stack are pulled and added to the RPN list.

Parser Test Revisited

I was thinking that it would be nice if the test_parser program had a mode where it would ask for input lines from the user so that additional items could be tested without having to recompile the program. Therefore a “-i” option was added to test_parser. This will tell the test_parser program to ask for input lines instead of running the canned tests. Once entered, the line will be parsed into tokens. The program will continue asking for input lines and will end when a blank line is entered (control+C also works). The skip_whitespace() function in the Parser class was modified to accept tab characters in addition to space characters.

The standard library gets() function is used for line input, which supports backspace, and simple editing including moving the cursor with the arrow keys, insert mode can be toggled with the insert key, and characters can be deleted with the delete key. None of these characters end up in the input buffer. The ibcp_0.1.1-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the test program.  Now, really moving on to the Translator...

Wednesday, March 10, 2010

Parser Class – Operators (Testing/Release)

Additional tests lines were added into the fifth test input group for testing string constants. Everything was testing good until the one error condition was tested (invalid two character operator). This error was for the case where the character is not valid as a single character operator and the two characters at the position is not found in the table.

The first problem found was that the operators <, <= and <> had their multiple field set to OneChar instead of TwoChar. Then I realized that there were no two character operators where the first character isn't also a valid single character operator. I thought I could add one to test, for example &&, but then realized there would also have to be an & single character operator in the table with a token type set to Error for the code to work right. This would be silly, so the get_operator() code needed to be rewritten.

One rewritten, the “invalid two character” operator error no longer exists. First the current character is checked for a valid one character operator by searching the table. If valid then the token is setup for the operator. If the character can only be a one character operator, the token is returned. The single-quote remark operator is checked for before returning.

If the character can also be a two character operator, the table is searched again for the two characters. If not found then if the first character was a valid operator, the token previously setup is returned. Otherwise false is returned indicating there is not a valid operator at the current position. (In other words, let the next parser routine handle it, which for now there is none, so the get_token() returns an “unrecognized character” error.)  If the two characters are a valid operator, the token is setup for the operator and returned.

The temporary && operator was added to the table to test the two character operator where the first character is not a valid operator. Once tested, this test entry was removed. The ibcp_0.1.0-src.zip file has been uploaded at Sourceforge IBCP Project. The binary for the test program has also been uploaded. The Parser has been tested and is now complete. Next up is the Translator, now the fun begins...

Monday, March 8, 2010

Parser Class – Strings (Testing/Release)

Additional tests lines were added into he fourth test input group for testing string constants. There's not much to strings, but the one issue found was that the check in get_string() for a no closing quote needed to be removed because the scan_string() was modified during the testing of the immediate commands to accept a no closing quote condition. The ibcp_0.0.8-src.zip file has been uploaded at Sourceforge IBCP Project. The binary for the test program has also been uploaded.

Sunday, March 7, 2010

Parser Class – Numbers (Testing/Release)

Additional tests lines were added into the third test input group for testing numerical constants. Numbers were also added to test each of the errors. After seeing how the errors were identified, I thought it would be nice if the entire number was identified for errors like “constant is out of range” for example:

    A = 3.45e343
        ^^^^^^^^-- constant is out of range

This required a new length value be added to the Token structure. Two new set_error() functions were implemented that also take a length argument. The existing two set_error() functions were modified to set length to 1. The length value was added to the anonymous union holding the integer and double values since none of these will be used at the same time. Having this error length value will be very useful when the GUI is implemented to the whole constant can be highlighted in red instead of just the first character. All the existing callers of set_error() were checked to see if a length would be help, but only a few in the get_immediate() for immediate commands was changed.

One error message could not be verified. Towards the end of get_number() there is a check to see if the current pointer (which will be pointing to the first character not part of the constant) matches the end pointer value returned from the strtol() or strtod() standard library conversion functions (which will point to first character not part of the conversion). This check was added to make sure the result was expected. I couldn't think of case where this would actually happen, so if it does, it would be due to a bug in the get_number() function, therefore, “BUG:” was added to the error message.

The ibcp_0.0.7-src.zip file has been uploaded at Sourceforge IBCP Project. The parser1.txt was updated for the change in the error output. The binary for the test program has also been uploaded.

Saturday, March 6, 2010

Parser Class – Identifiers (Testing/Release)

The test_parser program was modified to break the tests into groups, otherwise one long stream of output would be generated – there's no reason to see all the Immediate Commands test output while testing Identifiers. Therefore, a single digit argument was added to the test_parser program where 1 is for the Immediate Command tests, 2 is for the Identifier tests, an so on. If anything in the core Parser code is changed because of bugs found, the older tests can be rerun and compared to the previous output as a regression test.

The first problem discovered was that it is was necessary to have distinct Token Types for Defined Functions with parentheses and without. Therefore DefFunc was replaced with DefFuncP and DefFuncN. The second issue discovered was that the internal string functions (CHR$, LEFT$, etc.) had the wrong data type of None in the table, so these were changed to the correct String data type.

Two things that could not be tested are search for words with a data type (there are none of these currently in the table) and the error if found the first word of a two word command and the second word of the command is not valid (currently all two commands are valid as a single word command, e.g. END, DO, LOOP, etc.).

The ibcp_0.0.6-src.zip file has been uploaded at Sourceforge IBCP Project. The binary for the test program has also been uploaded.

Monday, March 1, 2010

Parser Class – Immediate Commands (Testing/Release)

The first major problem discovered was that Immediate Commands can either have line number arguments (all argument types including Blank but not String) or a string argument. There needed to be a way to determine if the token string is pointing to the immediate commands argument structure or to an actual string. To solve this problem, the new data type CmdArgs was added. The immediate command's token's data type will either be set to CmdArgs or String.

The next problem discovered was that the scan_command() function could not return Null_Flag and have the token type set to an Error because it may have been set to Error from a previous call since it was not initialized or reset in scan_command() - this confused the get_command() function, which incorrectly returned errors when there wasn't an error. There needed to be a way to distinguish between invalid immediate commands that could be a valid BASIC command (like L500=10 or L500,L1000=8) and commands that are not a valid BASIC command either (like A100,-5 or L100-200=4). To solve this, the Error_Flag = -1 constant definition was added, which is returned if the immediate command has an error.

Many different test commands were added to the test input array, sufficient to test all of the different forms of command arguments, plus many errors like commands with arguments forms that is not acceptable for some commands. At the end of the array, there are many invalid commands to test every single error exit point in the scan_command() function.

The ibcp_0.0.5-src.zip file has been uploaded at Sourceforge IBCP Project. The binary for the test program has also been uploaded. Note that all the previous test programs and binaries are not included in this release in order to reduce the size of the file – these files can still be obtained from the 0.0.4 release.