Interactive BASIC Compiler Project: April 2010

Friday, April 30, 2010

Translator – Data Types On Assignments

The assignment operators need a little bit different handling that regular operators because they require reference tokens for the operands being assigned. The reference tokens will have a specific data type and can not be converted between types. Only the value being assigned can be converted to the type of the reference operands. An added complication with the assignment list operators is that they have no set number of reference operands so the number of operands and the operand data types in the table entries will not work, which the match code routine relies on.

To support the assignment operator, the match code routine needs to be modified so that when it sees a reference operand, to only check if the data type matches exactly to the data type in the table entry. The main assignment operator (Assign, which will handle the double data type) will have two associated codes, AssignInt and AssignStr. The second operand for Assign and AssignInt can be converted.

For the assignment list operators, it has already been decided that all of the variables being assigned must be the same data type, in other words, no mixing of doubles and integers – mainly for efficiency during run-time. The match code routine was not designed to handle a variable number of operands for a code. However, the match code routine can be used, at least to check the two operands on top of the done stack, which would be the value being assigned and the last reference operand in the list being assigned.

In the add operator routine, upon returning from the match code routine successfully, it can then check if the operator is an assignment list operator. This can be accomplish with a new assignment list operator flag in the table entry. If this flag is set, then the add routine would proceed to check the rest of the operands on the done stack for the reference flag (which it does now) and check to see if the data type is the same as the assignment list operator returned from the match code routine.

Wednesday, April 28, 2010

Translator – Data Types (Release)

The remaining minor bugs were found by simply tracing through using the debugger. There are minor differences in the previous test input sets due to the addition of the internal CvtDbl and CvtInt codes. Many test inputs were added for the sixth set of the test inputs for the data type handling including several error tests.

Now that the source tree is under CVS, the full source tree is being released with all the test sub-directory files. All the test output files have also been located in the test sub-directory. In the future, the full source tree will only be release for major releases (which means for the interim releases, only the main project files, current test output files and any new test program source files will be included).

The code now handles data types in expression (but not the assignment operators) and ibcp_0.1.9‑src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program (which now will run from the Windows command line and without MinGW being installed). Next the data type handling for the assignment operators...

Monday, April 26, 2010

Translator – Data Types (Implementation)

The final implementation of the data type handling was completed except for assignment operators – these will take a little extra processing because of the references (which are not convertible), plus the assignment list operator does not have a fixed number of operands. The assignment operators will be handled later.

The new find_code() function returns a Status enumeration value, either Good upon success or one of the three new error codes for the “excepted <data type>” errors. A reference to the token is passed so that it can be changed to point to the error. I realized that the comma and close parentheses tokens were not being deleted in all cases. There needs to be some sort of memory leak checking added to the program. The new and delete operators can probably be overloaded, so some checking can be added. This is a side-project for another day, but it is needed.

Testing began with the existing Translator test inputs (1 through 5) – some differences were expected (new conversion codes). What follows are some of the changes that were needed as the debugging progressed:

When the operands are being popped off of the done stack (the operand array needs to be filled in reverse order), the reference flags needed to be cleared (no need to check if it is set first), just like was previously being done for operators and internal functions.

When checking the associated codes and a convertible match is found, it is only recorded if no convertible match has been found so far (in other words, find only the first convertible match).

When changing the token to a new associated code, in addition to changing the index in the token, the data type also needed to be set to new code's data type, which might be different than the main code (for example Abs returns Double, but AbsInt returns Integer).

Quite a few of the expressions in the existing test inputs are not working correctly, or worse, causing the program to crash. One test input VAL(STR$("1.23")) was written incorrectly in the first place, which should have been VAL(STR$(1.23)) or STR$(VAL("1.23")), but at least the code properly detected this error. Debugging continues...

Sunday, April 25, 2010

Project Executable Issue Resolved

The cause of the problem why the executable would not run in the Windows (XP) command window was finally identified. During the transition to CVS, it was discovered that the older executables had no problem running in the command window. The problem started with release 0.1.2 (release 0.1.1 worked). Some time was finally spent investigating what changed between 0.1.1 and 0.1.2 to cause this problem.

During the investigation, it was also discovered that the executables require libgcc_s_dw2‑1.dll to run. This library comes with MinGW. The GCC GNU GPL (General Public License) prevents this library from being distributed with executables without also distributing the source code for it – something not desirable. However, there is the linker option “‑static‑libgcc” that will statically link this library into the executable, which is permitted under the GCC GPL, so future executables will be linked this way. VIDE complains about this option upon loading the project indicating libraries should be entered in the project library tab – but it does correctly use this option on linking. This will be left as is for the moment. Unfortunately, this library issue was not the cause of the command window problem.

After further investigation, the problem was determined to be caused by the transition from the test_parser program to the ibcp program (the test_parser.cpp source file to the ibcp.cpp and test_ibcp.cpp source files). One of the changes made was in the GPL header print function to print the actual name of the executable, not a fixed string. This simply involved printing the first command line argument. Under MSYS, Insight (GDB) and apparently when run from Windows Explorer, this first argument (argv[0]) contains the full path of the program, which was not desired. A string function was used to find the last back-slash in this full path and only print the string that comes after this character.

The problem was that from the command window, only the program name entered is passed as the first argument (the path or the “.exe” is not included unless entered on the command line). In any case, the strrchr() library function used to get a pointer to the last back-slash returned a NULL because a back-slash was not found, and using this NULL caused the crash. The code was corrected to allow for a lack of a path – problem solved.

Translator – Data Type Matching

When determining which code (main or associated) to use from the operand's data types, there can be an exact match, no match or a convertible match. A convertible match can be made to be an exact match when conversion code(s) are added after the operand that doesn't match exactly. An exact match is preferred.

There will be a match routine that will determine the type of match there is between the current operand's data type(s) and a code's required data type(s). In additional to returning the type of match found (exact, none, or convertible), the conversion codes needed for each operand or Null if a particular operand does not need conversion will also be returned.

This match routine will use an table holding the conversion codes for each possible operand (have) data type and required (need) data type pairs. For the pairs where the two data types are the same, the conversion code will be set to the Null code. For the pairs with an Integer or Double vs. a String, the conversion code will be set to an Invalid code. The only pairs with actual conversion codes are the have Integer need Double pair (CvtDbl) and the have Double need Integer pair (CvtInt).

The routine for finding the code for the data types of the operand(s) will be called from both the add operator routine and the internal function routine (after the number of arguments is checked). The operands will be pulled off of the done stack to get their data types. Using the match routine, the operands will be checked against the main code for a data type match. If an exact match is found, then no further action is needed. If a convertible match was found, the main code is saved in case no further exact match is found.

When the main code is not an exact match, each of the associated codes (if any) of the main code table entry are checked for a match. For each, if an exact match is found, then no further action is needed. If a convertible match was found, the associated code is saved in case no further exact match is found.

If a convertible match was found, then the token's code is changed and the conversion codes are inserted into the output list after the operands. If no match was found, then one of three “expected <data type>” errors (one for each data type) will be reported against the first operand with an Invalid code in the conversion codes returned from the match routine.

Saturday, April 24, 2010

Translator – Data Type (Table Updates)

Before working on the design of the routines that will handle the matching of the data types and finding the correct code, I decided to make the changes to the table entries for the new operand data type (operand_datatype) array and associated code (assoc_code) array and make sure these compile. The number of arguments member (nargs) was also renamed to the number of operands (noperands). New access functions for these members were added to the Table class.

All the table entries for the operators and internal functions were updated with the new values. The number of operands (previously nargs) for all of the operators were previously set to zero, so these were changed to 1 for the unary or 2 for the binary operators. The data type for many operators were also incorrectly set to None, so these was changed to the appropriate data type.

New entries for all the associated codes were added to the table and their codes were added to the Code enumeration. An entry for the CDBL functions was missing and was added. An entry for a new function FRAC was also added (this function will return the fractional part of a floating point values).

The integer division operator (“\” or IntDiv) is intended to be used with double operands, which will be rounded and converted to integers internally before the division. The operand data types could have been set to the integer data type, but then the CvtInt would always be inserted for both operands. This operator only has one code and should not be used with integer operands as the regular division should be used, but this will be allowed (CvtDbl will be inserted, but only to be converted back to integers internally).

The Power operator will have three forms, the standard double/double (Power) and integer/integer (PowerInt), but there will also be a double/integer code (PowerMul). The PowerInt will use integer multiplication internally. The idea for the PowerMul code is for it to also use multiplication internally instead of calling the standard C pow() function for speed. However, the pow() function may already have these optimizations, so this may be unnecessary. Some experiments will be needed to determine which is more efficient. The goal is to allow the programmer to use an expression like A^2 instead of A*A and not be penalized by slow execution.

Translator – Associated Codes

Operators and some internal functions allow different operand data types, each will have it's own code (e.g. Add, AddInt, and CatStr), but only one name in the source (e.g. “+”). The Parser will only find the first one in the table since the Parser is only responsible for breaking the source into tokens, not looking at data types and determining appropriate codes. It is the Translator's responsibility to examine and validate the operands of operators and internal functions and to set the token to the appropriate code.

To accomplish this, the first entry in the table (that the Parser finds, the default entry), which for many will be the code that handles the double data type, will contain the other codes associated to the main code. These codes will be put into an associated code array in the table entry.

After obtaining the data types of the operands from the done stack, the Translator will check to see if there is a match to the code by checking the required operand data types in the code's table entry. If there is no match, then each of the associated code table entries will be checked for a match.

In addition to an exact match, there will also be a convertible match. With this type of match, hidden conversion codes can be inserted to obtain the desired data types. The exact match will be preferred so that the correct code is used. For example, if the first exact match or convertible match is used, then for Integer + Integer, the Translator would use the default Add code (for doubles) and add two CvtInt instead of using the preferred AddInt code. Next, how a convertible match will be detected...

Friday, April 23, 2010

Translator – Operand Data Types

Now it has been established that operators and internal functions can be handled the same way, except that the number of arguments (operands) is checked first for internal functions. For internal functions, the number of arguments check also determines which code to use (MID2 vs. MID3, INSTR2 vs. INSTR3, or ASC vs. ASC2).

Each code (operator or internal function) has a return data type, a fixed number of operands and a expected data type for each operand. The word operand will be used instead of argument from now on, which is more appropriate when applied to operators. Taking the plus operator as an example, which will have three codes (one main and two associated codes):

Add Double (2) Double Double
AddInt Integer (2) Integer Integer
CatStr String (2) String String

The code is listed first along with the return data type. The number of operands is in parentheses followed by the data type of each operand. If there is an add with a double and an integer operand, the Add code will be used with a CvtInt inserted after the integer operand.

The table entries already contain the code and data type values and the number of arguments will be renamed to the number of operands. A new operand data type array needs to be added to the table entries. The size of this array will be set to three since there are currently no planned functions containing more than three arguments. Next, how associated codes will be handled...

Thursday, April 22, 2010

Translator – Operators and Internal Functions

Unary operators have one operand. When the translation to RPN of unary operators is compared to the translation of functions with one argument, it can be seen that the translations are the same (one operand followed by the code):

-A → A Neg
ABS(A) → A Abs

Binary operators have two operands. When the translation to RPN of binary operators is compared to the translation of functions with two arguments, it can be seen that the translations are the same (two operands followed by the code):

A$+B$ → A$ B$ CatStr
LEFT$(A$,2) → A$ 2 Left

There are no tertiary or more operators, but functions with three of more arguments have a similar format except there would be more operands before the code. So, once the source code for operators and internal functions are translated to RPN, there is no difference between them - operand(s) followed by a code. Therefore, for data type handling, both operators and functions can be handled the same way. Next, how data types will be handled...

Wednesday, April 21, 2010

Translator – Data Types and Internal Functions

The internal functions have a fixed number of arguments, though some functions have multiple forms with different number of arguments (MID$, INSTR and ASC). Each argument has a specific data type expected, however, integer and double data types are interchangeable as the necessary conversion will be perform like for operators.

Most of the math functions (e.g. INT, SQR, LOG, COS, etc.) have one argument, which is expected to be a double. These functions return a double. Only one code is required for these functions. If the argument is an integer, then a hidden CvtDbl code will be inserted after the integer operand. Here are some RPN examples of math functions:

A Sqr
B% CvtDbl Log

A few of the math functions (ABS and SGN), will return the same type as there operand. Two codes are required for these functions: Abs, AbsInt, Sgn, and SgnInt. The reason for having two codes is so that these functions can be used in integer expressions without any wasteful conversions to and from double precision. This is not necessary for the other math function because they will be calculated in double precision internally.

The conversion functions (CDBL and CINT) return the opposite date type as their operand. These functions work the same is the hidden CvtDbl and CvtInt codes. It would appear that these functions are unnecessary since the hidden conversion codes will be inserted as needed. However, they may be reasons where they are necessary like in function and subroutine calls.

The string functions deal with the string data type (implementation will be delayed along with the string data). Some string functions take an integer operand (CHR$ and SPACE$) or double operand (STR$) and produce a string, some take a string operand and produce an integer (ASC and LEN) or double (VAL), and some take both string and integer operands and produce a string (LEFT$, MID$, RIGHT$, and REPEAT$) or an integer (ASC and INSTR).

Next, the similarity between operators and internal functions...

Tuesday, April 20, 2010

Translator – Data Types and Operators

Three data types are currently planned: double precision (the default), integers and strings. Several operators accept all three as operands (plus, equality, relational and assignment). Consider the different possible operand combinations for the plus operator (the default data type is double, % is for integers and $ is for strings):

A + B
A + J%
I% + B
I% + J%
S$ + T$

Any combination between a number and a string is invalid and an error needs to be reported. However, doubles and integers may be mixed in an expression. The expression will remain integer for efficiency as long as there are integer operands. In a double expression, integers will be promoted (converted) to double.

If the plus operator was implemented as a single routine, at run-time it would have to look at the operands and then decide what action to take, whether to add two doubles, whether to convert one operand or the other from integer to double and then add two doubles, whether to add two integers, or whether to concatenate two strings. All this decision making will slow execution.

These decisions will of course be made before run-time by the Translator. The Translator will know which operation will be needed at run-time, so it will add the appropriate code to the RPN list. Since there are five different possible combinations, that would mean five different add operator codes. If this same thing was implemented for every operator, then the number of codes and the number of execution routines would be excessive.

There are really only three operations, adding doubles, adding integers and concatenating strings. For the other two combinations, one or the other operand needs to be converted from integer to double. To accomplish this, there will be special hidden conversion codes (which have already been mentioned in several earlier posts). With these hidden codes, the above expressions would be converted to RPN as:

A B Add
A J% CvtDbl Add
I% CvtDbl B Add
I% J% AddInt
S$ T$ CatStr

The Add handles doubles, AddInt handles integers, and CatStr (concatenate) handles strings. The CvtDbl code converts it's integer operand to double. For now, the focus will be on doubles and integers; strings require more involved handling and will be added later. Next, data types and internal functions...

Sunday, April 18, 2010

Project – Processes and Insights

The transition to CVS is almost complete (two releases remaining to be put into the repository). While it would have been far simpler just to throw to whole project into CVS as is, I wanted the complete history for each release in the CVS repository as if CVS had been used from the beginning. So this would be a good time to give some insight into the development process of this project.

Most days there is only an hour or two to work of this project either on the code or write blog entries, more on the weekends, sometimes less depending on the demands of my full time paying job and my family. The is an attempt to post at least once a day on what is being worked on or on the design of upcoming components, but some days the code is being worked on and there is no time left to write and proofread a post. Sometimes several entries are written at a time (because there is too much for one post), but wait to post each entry to give some time to work on the code.

The design is ongoing, and is actually being worked on ahead of what is being worked on in the actual code. A lot of notes were made on the design almost six months prior to the first post last December, when the decision was made to actually write code and document the effort in a blog. Many of these notes (like on the Recreator design) have yet to be put to code. It is very helpful to explain the design and the reasons behind decisions in the posts. Many problems are identified during this process. The design notes for future components continue to be made (now in a bound notebook instead of loose pieces of paper).

The parallel development (code and future design) process is the reason why there are few major changes to what has been developed so far. This will be become even more evident when commands are implemented in the Translator. Thought needs to be given to how the command will be executed at run-time, the format of the command when it is encoded into the internal code, and what is needed to recreate the source code from the internal code for editing. Now back to the CVS transition and on to data type handling...

Saturday, April 17, 2010

Translator – Assignments (Release)

The test code was modified to output “<ref>” after tokens that have the reference flag set. For the third set of test inputs used for testing arrays and function, there are some differences because the reference flag is now set for array subscripts and function arguments. The unexpected comma error was also changed (replaced with separate unexpected comma errors). For the assignment statement test inputs (fifth set), a lot of different statements were needed to test the many situations that can occur. Many have already been discussed. As bugs were discovered, new test statements were added. There are 9 new possible assignment related errors, so there are statements for each of these.

Some more comma related bugs were discovered, including assigning an multiple dimension array element as in the statement “A(B,C)=D” added an assign list operator instead of an assign operator; and the statement “A(B+C,D=E)=F” generated an error. The bottom line was that the counter stack not empty check also needed to be added to the comma operator code. This resulted in new error “unexpected comma in parentheses” being added.

I'm changing the way the changes made to the code are dated. Previously, all the dates were changed to the date of the release – like all these changes were made the same day, which may not have been the case. The changes are becoming rather involved over many days and this method is not efficient. From this release on, the date will be when the changes are actually made. The time stamp of the file may be more recent because of going back and editing the change history at the top of the file.

I have also decided to put this project under software version control, specifically CVS – with the next release. I have working knowledge of CVS and I discovered that either MSYS came with it, or I installed an MSYS/CVS package. The current version numbers in the source files will be removed and replaced with the CVS revision ID tag. These versions were my attempt at version control anyway. Maintaining all the difference versions as zip files and directories is also not efficient.

The code now handles assignment statements and ibcp_0.1.8-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. The release notes now contain a Planned Roadmap to show upcoming development. Next the handling of data types...

Translator – Assignments and Parentheses

Simply changing the mode to Expression when an open parentheses occurs is insufficient. Statements like “A(B,C)=D” no longer worked correctly because the mode was not set to Command when the equal was processed. Also, statements like “A(B+C)=D” and especially “A(B=C)=D” need to have the mode temporarily set to Expression to process the operators in the subscripts correctly. Simply checking if the counter stack is not empty is sufficient for detecting this situation. The logic for open parentheses needs to be:

Counter Stack Not Empty: No need to check mode; push open parentheses token on hold stack and push a 0 on the counter stack (which prevents commas).

Command: Return an “unexpected parentheses in command” error. Statements like “(A=B)” and “(A+B)” are not valid.

Equal: Set mode to Expression. This is the start of the expression after an equal (assignment). This will handle statements like “A=(B=3)” and “A=(B)=3” where the expression starts after the open parentheses.

Comma: Return an “unexpected parentheses in assignment list” error. Statements like “A,(B),C=4” are not valid.

Expression: Same as if counter stack is not empty (push token and 0).

When processing a closing parentheses for an open parentheses (not an array or function), the reference flag of the last token added to the output list needs to be cleared. A pointer to this token is on top of the done stack. If this token is an operator like in the expression “(A+B)” then clearing the reference flag has no effect, but in statements like “A=(B)=3” or “A=Function((B),C)” the reference flag of B is cleared.

The counter stack is not empty check also needs to be made when processing operators before checking the mode. In other words, if the counter is not empty, then it is assumed that the Translator is within an expression. This will handle statements like “A(B+C)=D” and “A(B=C)=D” correctly. So this check is needed at both the equal operator section and the no special operator section. Looks like everything is working correctly, so almost ready to release...

Friday, April 16, 2010

Translator – Assignments (Testing)

While testing comma separated assignment statements, an “unexpected character” error for statements like “A,B,C=4” was occurring from the Parser. This occurred because the Parser was seeing the line as an immediate command and expected a number for B. The Parser was modified to not return errors for these since these can be valid immediate statements.

Before testing the previous Translator tests that consisted of only expressions (which are not normally valid by themselves and would now cause “unexpected operator” errors), the test code needed to be modified for a special expression test mode that would be set for the first four sets of test expressions only. The Translator start() function was modified to optionally initialize the mode to Expression instead of Command.

The reference flags also need to be cleared for the arguments of internal functions as values are needed during run-time. This does not apply to define or user functions as the arguments are planned to be passed by reference by default. Surrounding a variable with parentheses will override this and pass the variable by value. For array subscripts, values are needed during run-time. Remember that the Translator does not know the difference between arrays and user functions, therefore, the reference flag will be left set for array subscripts along with function arguments. The Encoder can clear the reference flags for array subscripts once an array is identified.

While testing, I realized that the mode needed to be changed to Expression when an open parentheses occurs. Statements like “A=(B=3)” and “A=(B)=3” are single assignments and the second equals are the equality operator. Testing continues...

Thursday, April 15, 2010

Translator – Assignment Operator Implementation

The reference flag for the first operand of an assignment operator or the list of operands of an assignment list operator will be checked in the add_operator() function before these operators are added to the output list. These operator remain on the hold stack until the end of statement processing since the precedence of these operators is low.

When the expected reference flag is not set, a “cannot be assigned” error will occur. The error needs to point to the token that is not a reference, not to the assignment operator token. Therefore, the add_operator() function's argument was changed to be a reference to the token pointer so that the a different token can be returned upon an error.

For the assignment list operator, the error should point to the first token in the line that is not a reference. There could be more than one token in the list that is not a reference, but only the first one will be reported. Because the tokens are pulled from the done stack in reverse order, the last one pulled that is not a reference is the token that needs to be reported. There will be a bad token pointer that will be set for each non-reference token. Once the stack is empty, if there was a non-reference, the bad token pointer will be pointing to the first non-reference token in the statement.

Upon modifying the code to return the bad token, I realized there was a problem doing this. Previously the code that calls the add_token() function deleted the token with the error since it had not been added to the output list. The tokens in the output list are deleted by the clean_up() function called after an error occurs. Before the Translator returned a different token on an error, it deleted the token passed to the add_token() function. This is where the problem occurred; the caller would delete the error token coming back unconditionally. If it was a token in the output list, it would be deleted twice (once by the caller, once by the clean_up function). This was corrected in the caller by checking if the original token was passed back and only then deleting the token.

Monday, April 12, 2010

Translator – Reference Flag Implementation

For non-parentheses tokens, only the NoParen and DefFuncN token types should have the reference flag set. The NoParen token type includes a possible variable or user function with no arguments. The reference flag should not be set for internal functions with no arguments (IntFuncN) or constants. At the moment it's unclear if the reference flag should be set for defined functions with no arguments (DefFuncN). Intuition says that the reference flag should be set since the DEF command line is similar to an assignment statement.

For parentheses tokens, only the Paren (an array or user function with arguments) and DefFuncP token types should have the reference flag set. For now the reference flag will be set for define functions with arguments (same reason as above). The reference flag should not be set for internal functions with arguments (IntFuncP). However, with that said (and this is jumping ahead a bit), some internal functions, namely string functions, could be references. Specifically the MID$ function, which could be used to assign a part of a string. This will be implemented later when data type handling and string handling is added.

Lastly, in the test code that outputs the tokens, a “<ref>” will be output after the token if the reference flag of a token is set. This check will be made for all token types just to make sure it is not getting set when it's not supposed to.

Sunday, April 11, 2010

Translator – Single Vs. Multiple Assignments

Some issues came up during the implementation of the assignment handling. A new Assign code is needed for assignment operator. The original plan was that this operator would handle both single and multiple assignments. At run-time, it would simply keeping popping references off of the evaluation stack and assign the value until the stack is empty. However, this requires a check if the stack is empty. The goal is to reduce the number of checks that is needed at run-time because the more checks made, the more time wasted doing these checks during program execution.

Since a single assignment will be the common case and multiple assignment the exception, the single assignment execution time would be penalized by the extra stack empty check To prevent this, two assignment codes are needed, Assign for single assignment and AssignList for multiple assignment.

To implement two assignment codes in the Translator, at the first equal, the token will be changed to Assign as previously planned. If the mode is Equal (indicating another equal in a multiple equal assignment), the Assign token on top of the hold stack will be changed to AssignList. If the mode is Comma, then the token will be changed to AssignList before pushing the token onto the hold stack. At the time the assignment token is added to the output list, the Assign operator will expect two operands (the first a reference) and the AssignList operator will expect multiple reference operands and the value to assign operand.

This brings up another issue. The Recreator will need to distinguish if the AssignList is from a multiple equal or comma separated multiple assignment so that the original code is recreated as it was entered. There could be two different AssignList codes for each (for a total of three assignment operators). Though not implemented yet, there will also need to be unique codes each assignment if the optional LET command word is entered (that's now 6 unique assignment codes). All these unique codes would not be required at run-time (only two are needed, single or multiple). There's actually a better way to prevent this multiplying of codes, but this will wait until the internal code is developed (and it will also eliminate the need for the dummy parentheses codes in the program, but not all the code needed to determine when it is needed).

Translator – More Mode Processing

There is some mode mode processing needed. When a non-assignment operator is added to the output, the mode will determine how the operator token is processed:

Command: An operator is not expected so an “unexpected operator” error occurs.

Equal: Within an assignment (possibly multiple). The mode will be changed to Expression indicating the start of the expression to assign.

Comma: Within a comma separated list of a multiple assignment, an operator is not expected so an “expected comma or equal” error occurs.

Expression: Within an expression, operator is processed as previously implemented.

At the end of the line, some checks needs to be made based on mode to see if any error has occurred:

Comma: Within a comma separated list of a multiple assignment, but there was no assignment operator, so an “incomplete assignment” error occurs.

No errors are detected for the other modes at the current time. Now on to implementation of assignment statements...

Saturday, April 10, 2010

Translator – Comma Token Processing

Some comma token processing has already been implemented – commas separating subscripts of arrays or arguments of functions. Commas outside of arrays or functions caused an “unexpected comma” error. This processing needs to be extended to include comma separated multiple assignment statements. Again the current mode will determine how the Translator processes comma operator tokens:

Command: The beginning of a comma separated multiple assignment. The mode will be changed to Comma indicating a comma separated multiple assignment. The comma token is not needed, so it can be deleted. Eventually an assignment operator will be pushed on the hold stack.

Equal: Continuation of a multiple equal assignment statement, so a comma token at this point cause an “unexpected comma in assignment” error.

Comma: Continuation of a comma separated list of a multiple assignment. No further action is needed so the comma token can be deleted.

Expression: Within an expression, so proceeds with the previously implemented comma processing. When the counter stack is empty or the top counter has a zero value, an “unexpected comma in expression” error occurs (changed from “unexpected comma” to differentiate it from the “unexpected comma in in assignment” error).

The comma token has the same low precedence as closing parentheses and assignment operators so most tokens will be removed from the hold stack. Comma tokens are not pushed to the hold stack so they won't be removed. Closing parentheses are also not pushed to the hold stack. And there will never be an assignment operator pushed to the stack before a comma token is processed.

Translator – Equal Token Processing

The Parser returns an equality operator token for an equal. The current mode will determine how the Translator processes equality operator tokens:

Command: The token will be changed to an assignment operator and the mode will be changed to Equal indicating a possible multiple equal assignment and to prevent a comma separated assignment. The assignment operator will be pushed onto the hold stack.

Equal: Continuation of a multiple equal assignment statement. There is already an assignment operator on the hold stack, so this token can be deleted. No further action is needed. During run-time, the single assignment operator will handle multiple variable references.

Comma: The end of a comma separated list of a multiple assignment. The token will be changed to an assignment operator and the mode will be changed to Expression indicating the start of the expression being assigned. The assignment operator will be pushed onto the hold stack.

Expression: Within an expression, equals are equality operators, so the token (already an equality operator) will be processed as a regular binary operator.

The assignment operator will have a very low precedence, which will keep on the hold stack until the end of the statement. Its precedence can be the same as the closing parentheses. The assignment token will be first on the hold stack, so there won't be a closing parentheses token to remove. The closing parentheses only empties the hold stack to an opening parentheses or internal function token (with a lower precedence), so there won't see an assignment token.

Friday, April 9, 2010

Translator – Multiple Assignment Vs. Equality

There are two types of multiple assignment statements being supported – multiple equals and comma separated. Having a two value “in command” flag is sufficient for determining whether an equal is an assignment or equality operator. However, it's not sufficient for knowing which type of multiple assignment is present. For this, two more values are needed, whether in a multiple equal assignment or in a comma separated assignment.

Instead of calling this the “in command” flag, it will be called mode. There will be four modes: in command, in multiple equal assignment, in comma separated assignments, or in expression.
In addition to knowing if the Translator is in command mode or expression mode, the Translator also needs to know if it is in a multiple equal assignment or a comma separated assignment. At the beginning of the line, the mode will be set to command.

For equal tokens, comma tokens and other operator tokens; this mode variable will be used to determine what action to take or what error needs to occur. One of the error conditions to detect is if the two types of multiple assignment statements are mixed, something that is not allowed.

Thursday, April 8, 2010

Translator – Assignment Vs. Equality

The Translator needs to decide when to add an equality operator to the output list and when to add an assignment operator. For now, commands won't be considered since support for commands has not been added to the Translator yet. Consider these assignment statements:

A = X * 5
A = B = C + D
A = B + (C = 5) * D
A = B = C + (D = 5)

There needs to a flag for when an equal should be an assignment operator and when it should be the equality operator. At the beginning of a line, a command is expected, so when there is an equal, it should be interpreted as an assignment operator (which is technically an assignment command). Once the expression starts, any equals should be interpreted as equality operators.

In the second example statement, the first and second equals are assignment operators since the expression to be assigned hasn't started yet. In the third example, the plus begins the expression, so the second equal is an equality operator. In the last example, the plus again begins the expression, so the third equal is an equality operator.

The flag will indicate whether whether the Translator is within a command and will be initialized to on at the beginning of the line. If a non-assignment operator is added to the output list this “in command” flag will be turned off. At each equal, if the flag is on then an assignment operator will be put on the hold stack; and if the flag is off, then an equality operator will be put on the hold stack.

Wednesday, April 7, 2010

Translator – References

When processing operands, the Translator needs to distinguish between the values for and references to variables and arrays. This will be accomplished with a flag in Token to indicate whether it contains a reference or not. When a variable or array token is added to the output list, it's reference flag will be set. When the Encoder processes tokens from the output list to generate the internal code, it will generate a push reference instruction if the reference flag is set and a push value instruction if the reference flag is not set.

In the Translator, when a non-assignment operator is processed and it pops it's operands off of the done stack, it will clear the reference flag in the token of the operand since it needs values and not references. For an assignment operator, the reference flag of only the second operand will be cleared; however, the reference flag of the first operand needs to be checked to make sure that it is set, so that expressions like “5 = A” or “A+B = 4” will cause a “non-reference cannot be assigned” error.

As previously mentioned, the Translator does not know the difference between a variable and function with no arguments or an array and a function with arguments. Therefore, it could be setting the reference flag for a token that refers to a function and not an variable or array. This will not cause a problem however. Only tokens on the left side of an assignment operator will have the reference flag set and there should be no functions on the left side (except in this case of setting the return value of a function with it's own function name – therefore setting the reference flag makes sense, but functions are much later).

Tuesday, April 6, 2010

Assignments and References

Assignment statements will be processed like any expressions with binary operators, however its first operand is handled differently than the second operand (which is the same as operands of other binary operators). Consider the expression with its RPN translation:

A + B A B +

During run-time, the value for the variable A is pushed onto the evaluation stack followed by the value of variable B. When the add operator is executed, it pops the two values off of the stack, adds them, and then pushes the resulting value back onto the stack. Now consider an assignment expression with its RPN translation:

A = B A B =

During run-time, the assignment operator needs to know where to store the value being assigned. So, for the A operand, a reference to the variable value needs to be pushed onto the evaluation stack instead of its value. The assignment operator will not be pushing anything back onto evaluation stack.

The assignment operator will always be the last operator to be processed, so to handle multiple assignments, the multiple references will be pushed onto the evaluation stack. The assignment operator will simply keep popping references and assigning the value until the stack is empty.

Monday, April 5, 2010

Assignment Statements

In BASIC, the “=” character represents both the equality operator and the assignment operator – it all depends on where the “=” is found. The is different from C, which has unique operators for equality (“==”) and assignment (“=”). Consider these examples:

IF A = 5 THEN
A = 5
A = B = 5

The first example contains equality operator and the second example has the assignment operator. In the third example, does the statement contain two assignment operators (like C) or does it contain an assignment operator and an equality operator? In GW-Basic, SmallBASIC, FreeBASIC and QuickBASIC, it is the latter assigning A to a true or false value depending on where B is equal to 5 or not.

Multiple assignments can be convenient so they will be supported like in the third example above. But it can also be convenient to be able to use the equality operator (or any relational operator) in expressions beyond conditional expressions (in IF, WHILE, UNTIL, etc. statements). Two forms of multiple assignments will be supported:

Variable1 = Variable2 = Variable3 = Expression
Variable1 , Variable2 , Variable3 = Expression

For the first form, the “=” characters will interpreted as assignment operators until the expression starts, in other words, once there is a another operator including an expression in parentheses:

Variable1 = Variable2 = (Variable3 = Expression)
Variable1 = Variable2 = Variable3 + Expression1 = Expression2

Here Variable1 and Variable2 will be assigned to a true or false value depending on whether Variable3 is equal to the Expression in the first example of Variable3 + Expression1 is equal to Expression2 in the second example (the third “=” in both examples is the equality operator).

Sunday, April 4, 2010

Translator – Internal Functions (Release)

During testing of the “wrong number of arguments” error, it was noticed that the error was reported on the closing parentheses. While this is acceptable, it would be better if the error pointed to the actual function. However, the caller always pointed to the token that caused the error. Therefore, in order to report an error at any token, the token pointer argument of the add_token() function was changed to a reference, so that it could be changed to any token so that caller will point to this token instead when the error is reported.

The name2 member is being used for the name to output during testing, so for example, for Mid2_Code and Mid3_Code, name2 is set to “MID2$(” and “MID3$(” so that is can easily be seen which code is in the token. This was needed when outputting internal function tokens in addition to operator tokens. So instead of copying these lines of code, a new access function debug_name() was added to get the appropriate name for debug test output.

I also realized there were some memory leaks in the code - the EOL token was not deleted; when an error occurs, the token of the error was not deleted; and when the token is replaced (see above), the original token was not deleted. Therefore, the code was cleaned up to prevent these memory leaks. Nothing was done when the “BUGs” occur since these should not happen once the code is debugged.

To make the code clearer, “Error_” was added to all the error and “BUG_” was added to all the diagnostic error Translator status enumeration names. By the way, each of theses codes is only used in one place so that it is easy to figure where an error occurs (if one “stack empty” was used instead of five unique codes, then if one occurs, there would be confusion which place in the code it came from).

The code now checks the number of arguments for internal functions and ibcp_0.1.7-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Next handling assignment statements and the concept of references...

Translator – Internal Functions (Implementation)

The number of arguments member needs to be added to the TableEntry structure and values to the table entries along with an access function to this member in the Table class. New entries are needed for the MID$, ASC and INSTR functions with the appropriate number of arguments. A new multiple entry flag is needed to the first entry for each of these functions to indicate that more entries exist with different number of arguments.

The check for the number of arguments for internal functions will be added to the closing parentheses token processing in the array or function section. If the token popped off of the hold stack is an internal function and the number of operands doesn't match the number of arguments in the table, then if the multiple entry flag is set it needs to search for another entry of the function that matches the number of operands. If found, the the token's index will be changed to the new entry's index. Otherwise a “wrong number of arguments” error will occur.

A new table search function is needed that will search for a function with a specific number of arguments. The index of the first entry for the function will be specified assuming that it has already been checked for the number of arguments (this is the index the Parser originally returned). The search will proceed with the entry after the index and continue until the end of the internal functions (when an entry with a NULL name if found). The additional entries must be in the same internal functions section (words with parentheses).

Saturday, April 3, 2010

Translator – Internal Functions

The Translator is unable to check the number of operands for arrays or functions without more information – information that will be contained in the Dictionary, so this checking will be left for the Encoder. However, the number of arguments for internal functions are fixed and can be checked within the Translator.

The number of arguments cannot be checked with knowing the number, therefore a new number of arguments member needs to be added to the Table entries. However, some internal functions may have more than one form. Several of these types or functions are currently planned, which include MID$, ASC and INSTR.

This means that there will need to be multiple Table entries, like for the existing Sub_Code and Neq_Code for the minus operator. For example with MID$, there will be Mid2_Code and Mid3_Code. Initially the Parser will set the code to Mid2_Code since that will be the first entry found in the table.

When the MID$ token is about to be added to the output list, the Translator will have the number of operands that needs to be checked. If the number matches what's in the table, nothing further needs to be done. If the number doesn't match, the Translator either needs to return an error or go looking for another Table entry with the correct number of operands found. Therefore, there needs to be a flag in the Table that indicates that there are more entries. If this flag is set then the Translator needs to look for it.

Friday, April 2, 2010

Translator – Arrays and Functions (Release)

The SimpleStack class is in the new source file stack.h. A simple stack test program was written to test the SimpleStack class, especially the automatic expanding feature of SimpleStack. There is no VIDE2 project file for with program, it was compiled with the command line “g++ test/test_stack2.cpp -o teststack2.exe” via MSYS.

Several test expressions were added for testing arrays and functions including multiple levels of identifiers with parentheses, defined functions and internal functions. At this point there is no difference between arrays and user functions – both are identifiers with parentheses. The last several expressions are ones that were used to test error conditions (these actually caused problems during testing).

A change was made to how the test output is generated specifically related to internal operator codes. Right now there is only one of these, the Negate code. Originally the name string was set to “Neg” to distinguish between “-” (binary operator subtraction) and “-” (unary operator negation) in the test output. However, the name will eventually need to be “-” for the Recreator. There will be many more of these operators. Therefore, name2 will now be used for the test output and name will be for the actual output. For test output, if name2 is not set (NULL) like it is for all operators, name will be used for the test output, otherwise name2 will be used.

The code now handles arrays and functions and ibcp_0.1.6-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program (the test stack program is not included). Next the checking for the correct number of arguments for internal functions will be added to the Translator...

Translator – Arrays and Functions (Implementation)

For the new counter stack in the Translator class, the SimpleStack class was implemented. This class is a very simple stripped down version of the List class implementation that uses simple allocated array (expanded as needed) instead of using a double linked list. The template contains arguments for the initial size of the array and the size the array is increased when it gets filled (both arguments default to 10). A key feature of this class is that the top() function can be used to manipulate the item on top of the stack (in this case a comma counter) directly.

When an array or function token is encountered (any token that has a parentheses), it is pushed onto the hold stack. The Translator state is left set to Operand since the next token must be an operand (or optional unary operator). The pending parentheses check needs to be made before the token that has a parentheses is handled.

For comma handling, a 0 is pushed onto the counter stack for an open parentheses token. If a comma token is encountered and the counter stack is either empty or the top of the counter stack is 0, then an “unexpected comma” error occurs. For a token that has a parentheses, a 1 is pushed onto the counter stack. For each comma token encountered, the counter on top of the stack is incremented.

Now when a closing parentheses token is encountered, if the counter stack is empty, then a “missing opening parentheses” occurs. The top counter is popped off the stack. If this counter value is zero, then there was an opening parentheses, so an open parentheses token is expected on top of the hold stack. The pending parentheses token pointer only needs to be set for closing parentheses tokens.

When the value popped from the top of the counter stack is not zero, it contains the number of operands for an array or function, which is expected on top of the hold stack (specifically a token that has a parentheses). For now, the operands are just popped from the done stack. The array or function token is popped from the hold stack, added to the output list and pushed on the done stack.

Other changes needed was the precedence check when emptying the hold stack to work with tokens that don't have table entries (array and function tokens); at the EOL processing, adding a token has parentheses to the missing closing parentheses check; and emptying the counter stack in the clean up routine.