Interactive BASIC Compiler Project

Tuesday, January 25, 2011

Translator – First/Last Operand – More on Parentheses

An old undiscovered bug was found where the pending parentheses token was not deleted when the previous token added to the output list's parentheses sub-code was set when it was determined that the parentheses was not necessary (the parentheses sub-code allows the parentheses to be reproduced). This again emphasizes the need for memory leak detection (more on this soon).

There was a conflict between a closing parentheses token being used as a pending parentheses and as a last operand. It's possible that a pending parentheses will become a token added to the output list (occurs when there are two sets of unnecessary parentheses), so when this token is no longer needed as a last operand, it can't be deleted. The solution was for two new sub-codes, one where the closing parentheses token is being used for a pending parentheses (Used) and one where it is being used for a last operand (Last).

When the pending parentheses is set, the token's Used sub-code is set. When a closing parentheses token is set as a last operand, the token's Last sub-code is set. When the pending parentheses token is no longer needed, if its Last sub-code is set, the Used sub-code will be cleared, otherwise the token will be deleted. If the pending parentheses is added to the output, its Used sub-code will be left set to prevent it from getting deleted. Similarly, when a closing parentheses token is no longer needed as a last operand, if its Used sub-code is set, its Last sub-code will be cleared, otherwise the token will be deleted.

In the closing parentheses handler, when a closing parentheses is processed for a parenthetical expression, the opening parentheses (previously deleted) will now be set as the first operand and the closing parentheses (also set to the pending parentheses) will not be set as the last operand of the token on top of the done stack. The closing parentheses token's Last and Used sub-codes will be set.

Saturday, January 22, 2011

Translator – First/Last Operand – Parentheses

Previously it was thought that parentheses could be ignored for the first and last operands because parentheses are always valid in a numeric expression or a string expression, though parentheses would be strange in string expression since there is currently only one string operator (concatenate), but still valid. However, consider this statement:

Z$ = A$ + -(B+C)

The entire -(B+C) sub-expression should be reported as an “expecting string expression” error. However, only the B+C is being reported before the unary operator fix and -(B+C after this fix. In both cases, the C was the last operand of the negate (where the expression type was invalid). The parentheses were removed. Had the statement been:

Z$ = A$ + INT(B+C)

The error would have been correctly reported since the closing parentheses is being attached to the last operand of the function. Therefore, both the open and closing parentheses also need to be carried through the operands.

Currently, when a closing parentheses is processed, the open parentheses token is deleted and the closing parentheses is temporarily saved in the pending parentheses variable, which is checked later to see if it was necessary or not (the parentheses sub-code of the previous operand is set if the parentheses were unnecessary). To be continued...

Translator – First/Last Operand – Unary Operators

Expression type errors involving unary operators were not pointing to the correct tokens. This was caused because no operands were being attached to unary operators pushed to the done stack. Consider this statement:

Z$ = A$ + -B

The -B tokens should be reported as an “expecting string expression” error. However, only the B token was being reported because the first and last operand of the negate operator was being set to the first and last operand of its operand from the done stack, which was the B token.

The solution was to only attach the last operand from the operand popped from the done stack and leaving the first operand set to nothing of the unary operator token, which will cause the code to use the negate token itself as the first operand.

Friday, January 21, 2011

Translator – First/Last Operand – Initial Debugging

Initially, all the Translator tests were failing. The source of the crash was the new delete close parentheses routine, which also needed to check if the token type has a table entry. The crash occurred because when token didn't have a table entry as it was trying to access the table entry to check for a close parentheses code.

Next, a very old bug that went undiscovered was occurring in the expression only test mode used for the first four Translator tests. The problem was that the final result was left on the done stack and the Null token was left on the hold stack. The crash occurred on the first error test when the cleanup routine was called and it tried checking for a close parentheses in the last operand – the token if was checking no longer existed. The End-of-line handler was modified to cleanup the stacks in expression mode and to check to make sure that there is only the result item on the done stack.

A series of problems were then discovered relating to unary operators and parenthetical expressions. Also, a realization was made that the first and last operands need to be saved for operands of arrays and used functions so that the Encoder can report errors correctly for array subscripts and function arguments. More on these problems to follow...

Sunday, January 16, 2011

Translator – First/Last Operand Implementation

As the hold stack was being changed from holding just a token pointer to a structure containing a token, a first operand token and last operand token pointers, it was noticed that the end of the Operator and Equal (for the equality operator) token handlers contained almost identical code, both needing modified for the hold stack item change. The code was moved and replaced with a call to a new process first operand routine.

A second token pointer argument was add to the process final operand routine, which is called to add a token to the output list by the add operator routine (to add an operator from the hold stack), add print code routine (to add a hidden print code for a specific data type), and the close parentheses token handler (to add a function or array from the hold stack). This second token pointer will contain the first operand token (from the add operator routine), nothing (from the add print code routine), or the close parentheses token (from the close parentheses token handler).

The find code routine was modified to return the first and last operand token pointers from the operand on the done stack that is being checked. These arguments are pointers to token pointers with a null pointer default value – a caller only passes these when it needs them. The find code routine contains local pointer variables if the caller doesn't pass pointers since first and last operand tokens will be needed if an error occurs.

The line in the close parentheses token handler that deletes the close parentheses token was removed (since this token is now being passed to the process final operand routine to be stored temporarily as the last token pointer for arrays and functions). Several locations need to delete this token when it is no longer needed. This procedure first needs to check if the last operand token pointer is set, if the token is a close parentheses token, and then needs to delete the token. A new delete close parentheses inline function was implemented that performs this procedure.

Finally, a through inline function was implemented in the Token class that takes a second token argument and calculates the total length for a token that includes up through to this second token as described here and returns a pointer to the first token so that it can be conveniently assigned to the token being returned. Now for debugging and testing...

Saturday, January 15, 2011

Translator – Sub-String Functions

Sub-string functions add a wrinkle to storing the first and last token because these are not actually stored on the done stack (their string operand is kept on the stack). Consider this statement:

Z$= LEFT$(A$, 1) + B$ > C$

When the +$ token is processed, the operand on the stack will be A$ left there by the LEFT$( function, which will be considered the first operand of the +$ token. This would then be passed on as the first operand of the >$ token. When the error is reported, the range will incorrectly be from A$ to C$ instead of LEFT$( to C$.

To resolve this, when a sub-string function is processed (popped from the hold stack and added to the output list, but not put on the done stack where its string operand remains), the sub-string token will be put into its operand's first token on the done stack. If this token already has a first token, it will be overwritten.

In the statement above, the first operand of A$ will be set to the LEFT$( token. The +$ token will then inherit the LEFT$( as its first token, followed by the >$ token. When the error is finally reported, the range of tokens will correctly be from LEFT$( to C$. Now on to the implementation of the first and last operand...

Friday, January 14, 2011

Translator – Functions and Arrays

Reporting range of tokens (sub-expressions) gets a little more involved when arrays and functions (internal or user) are in the expression. Consider this statements:

Z$ = A$(1) + B$(2) > C$(3)

The range of tokens would be A$( to C$( and reporting only up to C$( would not be correct since the 3 and ) tokens should be included. Therefore the closing parentheses of an array or function should be assigned the last token for an array or function token. This implies that closing parentheses tokens cannot not be immediately deleted.

When the array or function token is popped from the done stack, if the last token is a closing parentheses, then it will be deleted, unless it is being assigned as the last operand to an operator token as would happen in the statement above, the closing parentheses to C$( would be assigned to the >$ token as its last token. When the error gets reported, the range would be from the A$( token and the final ) token of the C$(.

There is no issue with regular parentheses expression as these are valid in any expression, therefore, if the statement above included parentheses around the expression, the error would be reported as (in other words, the parentheses are valid, but not the expression inside):

Z$ = (A$(1) + B$(2) > C$(3))
^^^^^^^^^^^^^^^^^^^^^-- expected string expression

This attachment of closing parentheses to tokens will take place in the closing parentheses token handler and will only apply to arrays and functions, not simple parentheses expressions. Also worth noting is that tokens (arrays and functions) will only have a last token assigned, the first token will be blank. In this case the token itself will be considered the first token. Finally, sub-string functions add another wrinkle to this scheme...

Thursday, January 13, 2011

Translator – First and Last Operand

When an error occurs, the inclusive range between the first and last tokens of the offending expression will be reported. Take the previous example statement Z$=A$+B$>C$. The >$ token will be on top of the done stack when the expression type error is detected (the >$ operator returns an integer and cannot be assigned to a string variable). The entire expression from A$ to C$ should be highlighted.

This will be accomplished by saving the first and last operand for each operator as the expression is processed. When an expression type error is detected, these operands will be available. When assigning the first operand to an operator, if the operand on done stack has a first operand assigned to it, then that operand will be used, otherwise the operand token on the done stack will be assigned. The same for the last operand.

In the statement above, the +$ operator will have A$ for its first operand (since A$ does not have a first operand assigned to it) and B$ for its last operand (B$ does not have a last operand). At the >$ operator, its first operand will be assigned to A$ (which is the +$ operand's first operand) and C$ for its last operand (C$ does not have a last operand).

The first operand is no longer saved on the done stack except for string operands (a specific type operator has already been selected), the first operand needs to be attached to the token on the hold stack where the operator is residing until put into the output list. Therefore, the hold stack needs to hold structures containing a token pointer and a first operand token pointer instead of just a token pointer – a new hold stack item will be defined to hold these.

The done stack currently holds output list element pointers (the output list consists of RPN items each containing a token pointer, number of operands and attached operand array, if any). The done stack will also need to hold the first and last operand token pointers. Therefore, the done stack needs to hold structures containing an output list element item pointer along with the first and last token pointers – a new done stack item will be defined to hold these.

Wednesday, January 12, 2011

Parser – Token Length

In researching for the last post, some issues were found in the Parser with respect to the token length. The token class contained a length member, but it was contained in the union along with the double value and integer values members used for constants. The thinking was that length would never be used with number constants; the length would be obtained from the string of the constant (which is being kept so that the constant could be reproduced exactly as entered).

The test routine contained code for determining the length of string constants. This required adding two for the surrounding double quotes (which are not stored) and adding one for each double quote contained in the constant (which is entered with two double quotes). It turned out that the length of the original string constant, including the surrounding double quotes was already being stored in the length member of the token. The code in the test routine was redundant and unnecessary (and was removed).

Generally the length member was being used for the length of the token, but it wasn't being set for identifiers (variables, arrays, and user functions) and number constants where the length could be obtained from the string member.

It will be easier if the length member always contained the length of the token so it will not required to obtain the length from the string member for certain token types. The length member was moved out of the union so it can also be used for number constants, and in the Parser, the length member was set for identifiers and number constants so now all token types are covered.

Tuesday, January 11, 2011

Translator – Errors with a Range of Tokens

Currently when the Translator detects an error, the error code is returned along with the token where the error was detected. The caller uses the column and length values in the token to point to the token. (Later the token will be highlighted in a different color.)

Now it is desired that a range of tokens be highlighted to show an entire sub-expression that contains an error. While it is possible to modify all the functions within the Translator for a second return token, there is an easier solution.

The column and length members of the token are used to output the error – where the token starts and how long it is. The Translator, upon detecting an error with a range of tokens, only needs to modify the length token where the beginning of the error is detected, to include all the tokens. The Translator can do this by using the column of the first token, the column of the last token and the length of the last token. The equation would be:

first->length = last->column – first->column + last->length

So when an error is detected and a range of tokens are involved, the length of the token being returned will be set using the equation above.

Monday, January 10, 2011

Translator – First Operand (Updated Design)

Upon considering the first operand design (see posts starting on July 8, 2010), There still be some confusion with the reported error message and the token at which it is pointing. Consider this simple statement (and the error produced currently):

Z$ = A$ + B$ > C$
^-- expected string expression

This does not make sense, because first, the integer expression is valid. The error occurs because this integer expression cannot be assigned to a string variable. And it could be confusing because a string expression cannot be placed where the greater than operator is. Alternatively it could indicate that a string operator is expected at the greater than (later why this was ruled out). The previously proposed first operand design would produce this error:

Z$ = A$ + B$ > C$
^^-- expected string expression

The error is now pointing to the beginning of the integer expression. But, one could say wait a minute, A$+B$ is a string expression, why is this an error? A better solution would produce this error:

Z$ = A$ + B$ > C$
^^^^^^^^^^^^-- expected string expression

This is much better as it highlights the entire expression, an integer expression, and says that a string expression is expected here. This can be accomplished by, in addition to keeping track of the first operand (the A$), also keeping track of the last operand (the C$). The error would point to all tokens between the first and the last of the invalid expression. Now, how can these two tokens be passed back where currently only one token is being returned when an error occurs? Next, the solution to this...

Sunday, January 9, 2011

Translator – Expression Type Pre-Release

There are two main problems with the Twelfth (more error statements) test, namely the handling of the first operand of operators first mentioned on July 8, 2010, and recognizing tokens with parentheses before assignment operators as arrays only (implying the operands are numeric subscripts).

All of the other tests are succeeding, so it's probably a good idea to kick another pre-release first. This is the first release since development was switched to NetBeans, so there are some changes in the released files. NetBeans is not required for building the project. There is now full Makefile for building the program. NetBeans was used to initially generate this file, but it was modified to simplify it. The VIDE2 project file ibcp.vpj will no longer be part of the source files.

Also no longer part of the source files is codes.txt and test_codes.h. Both of these files are now be generated by the new Makefile using the codes.awk and test_codes.awk awk scripts. The MinGW and MSYS packages both need to be installed in order to use the Makefile. The program build has only been tested with GCC 4.4.0 version for MinGW. Also, the awk program is needed for the awk scripts (I don't remember if this is part of MSYS or whether it needs to be installed separately).

The Makefile is currently configured for building the release program with no symbols or debugging information. Debugging can be enabled by changing BASICOPTS from "-O -s" to "-g" in the Makefile. The file ibcp_0.1.14-pre-3-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Now on to implementing the first operand design...

Saturday, January 8, 2011

Translator – More Expression Type Debugging

The next problem was occurring when there was an unexpected operand when an operator, comma or closing parentheses was expected (for example: Z$=MID$(A$,B C). This was only occurring for functions with multiple arguments (ASC, MID$ and INSTR) when at the last argument of the shorter form. The “expected operator or closing parentheses” error was produced, but a comma could follow for the optional argument.

A checked was added to the parentheses status routine when at the last argument. If the function has the Multiple flag set, indicating there is another form with more arguments, then the “expected operator, comma or closing parentheses” error is produced instead.

One problem remained, the statement Z$=MID$(A$,B, produced an “expected string expression” at the end of the line where it should have been an “expected numeric expression” error. The problem also occurred on the last optional argument of the INSTR and ASC functions. The problem was caused because the top of the count stack contained the index to the first form of these functions instead of to the second form with the optional argument.

When the code went to get the argument's data type, it ran of the end off the operand data type array incorrectly picking up a string data type. The coding error was on the line that incremented the index of the token on the hold stack to the next form of the function and assigned it to the index on top of the count stack. A post-increment C operator was used instead of the pre-increment operator and the index on the count stack was not actually changed.

Now all the test statements in the Eleventh Translator test are working (and on expected error needed to be corrected on the Third test because of the last fixed – it was missed previously). On to the Twelfth (more error statements) test...

Translator – More Expression Type Debugging

Now that the data type was moved back from the expression information structure back to the main table entry structure, so that it is available for commands as well as operators and functions, and the code was working again the way it was, it was back to debugging the Eleventh (error tests) Translator test (after committing the working code to CVS) to continue fine tuning the error messages produced.

The next problem to tackle was the generic “expected expression” error being produced when a binary operator occurred in an expression when only a unary operator was expected. This error needed to be changed to the more specific “expected <numeric/string> expression” errors. This was accomplish by calling the new get expression data type routine to get the current data type and producing the appropriate error.

Next there were several error test statements that contained an unexpected comma in an expression, which were producing “expected operator or end-of-statement” errors. This was the correct for some statements (for example: Z=A+B, and Z=A,B=1 and Z=A,), but, not for others (for example: Z=, and Z,Y=, and Z$=,) where the correct error should have been “expected <numeric/string> expression” errors.

These errors are being handled in the comma token handler. These conditions can be determined by looking at the done stack. If the done stack is not empty, the expression up to the comma is valid, so expecting another operator or end-of-statement is appropriate. If the done stack is empty, then there is no valid expression yet for the assignment, so expecting an expression is appropriate. Again, to determine the type of expression, the new get expression data type routine is used.

Thursday, January 6, 2011

Translator – Command Expression Type

There was a problem getting the data type for the token on top of the command stack. There is no issue with assignment tokens as they contain a data type (they are just assignment operators put on the command stack).

A while back, a new field was added to the Table entries for the expression type with possible values of None, Numeric, String and Any to be used with command entries. The idea at the time was that this would be used for the expression type checking (for the generation of the correct error messages). However, this field was not being used (the original expression type code added at the time was abandoned because it had too many problems – it is currently commented).

This first attempt at the current problem was to use this expression type and convert it into a data type (None/Any to None, Numeric to Double, and String to String). The only place this was used was in the new get expression data type routine for commands. But this wasn't working for assignment commands (which were operators and their entries didn't have the expression type set – the data type in the expression information structure had the return value).

So it was concluded that the expression type field should be removed and the data type moved back from the expression information structure into the main entry structure (where it was originally). When the Parser finds and creates a command token, it sets the data type of the token to data type from the table entry. Currently for all tokens, the Parser was getting the data type from the expression information structure or none if there was no structure (as is the case for commands).

Wednesday, January 5, 2011

Translator – Expression Type Implementation

A new routine was implemented to handle getting the current data type of the expression processed so far. This routine is called when the unexpected end of expression occurs to get the data type so that the correct “expected ...” error can be reported, and is also called before pushing an opening parentheses token to the hold stack to set its data type (so that it can be carried through the rest of the expression).

Once the routine was working, it was simplified. First, the token on top of the hold stack is checked for an operator before looking for the other conditions (parentheses, internal functions, and tokens with parentheses). This reduced the number of tests overall. An empty hold stack causes a bug error since at least the Null token should be on the stack.

If the top hold stack token is the Null token, then the data type is obtained from the token on top of the command stack. This caused another problem with commands (more on this in the next post). If the top token is an opening parentheses, then its data type is used (which could be none). Otherwise the data type of the operator's operand is used (first operand for a unary operator, second operand for binary operator).

If the top token on the hold stack is not an operator, then the top of the count stack is used to determine where to get the data type. If the count stack is empty, a bug error is returned since this situation should not happen. If the top is an internal function, then the data type of the current operand of the function is used. If the top is a parentheses, a bug error is returned since this situation should also not happen (parentheses token should have been on top of the hold stack). Otherwise the top contains a token with parentheses (an array or non-internal function) where the data type cannot be determined, so is set to None.

Interactive BASIC Compiler Project

Tuesday, January 25, 2011

Translator – First/Last Operand – More on Parentheses

Saturday, January 22, 2011

Translator – First/Last Operand – Parentheses

Translator – First/Last Operand – Unary Operators

Friday, January 21, 2011

Translator – First/Last Operand – Initial Debugging

Sunday, January 16, 2011

Translator – First/Last Operand Implementation

Saturday, January 15, 2011

Translator – Sub-String Functions

Friday, January 14, 2011

Translator – Functions and Arrays

Thursday, January 13, 2011

Translator – First and Last Operand

Wednesday, January 12, 2011

Parser – Token Length

Tuesday, January 11, 2011

Translator – Errors with a Range of Tokens

Monday, January 10, 2011

Translator – First Operand (Updated Design)

Sunday, January 9, 2011

Translator – Expression Type Pre-Release

Saturday, January 8, 2011

Translator – More Expression Type Debugging

Translator – More Expression Type Debugging

Thursday, January 6, 2011

Translator – Command Expression Type

Wednesday, January 5, 2011

Translator – Expression Type Implementation

Email

Source and Downloads

Labels

Blog Archive