Interactive BASIC Compiler Project: July 2013

Wednesday, July 31, 2013

Memory Testing / Minor Memory Leak

Since all of the tests are now working with the new translator (excluding the commands not yet implemented), it seemed appropriate to change the memtestn script to run all of the tests. After changing this script, two memory leaks were discovered in translator tests #7 (Errors) and #9 (Semicolon Errors). The memory leak determined to be occurring with sub-string assignment statements that contained an error.

The memory leak occurred because an RPN item was allocated for the sub-string assignment token, which is not appended immediately to the RPN output list. The RPN item is left on the done stack, which the LET translate routine pops and pushes it's token to the LET stack and then deletes the RPN item. However, if the next token that should be a comma or equal token is not or a parser error occurred, then this does not occur. The error clean up code assumes that all RPN items on the done stack have been added to the RPN output list, so only the items in the output list are deleted.

This problem was corrected by slightly rearranging the code in the LET translate routine where if there is an error with the comma or equal token, and the top of the done stack contains a sub-string assignment token (that has not been added to the RPN output list), then the done item on top of the done stack is popped and deleted, which deletes the RPN item and its token(s). With this change, all of the tests with the new translator have no memory errors.

[commit 229af22a78]

Parser And Unary Operator Errors

While looking at the results of all the translator tests to make sure at least the assignment tests were working (the PRINT, INPUT, REM statements currently report "not yet implemented" errors), it was noticed that many of the parser errors in translator test #14 (Parser Errors) were not reporting some errors correctly.

While working on this issue, another problem was found with unary operators when they occur when a binary operator was expected. The error should include the word "binary" in front of word "operator" to indicate that the unary operator was not expected, but a binary operator was expected. This is to avoid the confusion that a unary operator is an operator.

See the commit log for details of the changes, but basically the get expression routine needs to return a parser error and the caller needs to determine the appropriate error. The caller also needs to check if the token causing the error is a unary operator and use the appropriate error with the additional "binary" word. The get token routine was modified to which errors are reporting by also checking the expected data type, specifically if the current data type is string, then a number parser error should not be returned.

Previously, the only error with the word "binary" was the "expected binary operator or end-of-statement" error, which is not appropriate for a number of cases like when a comma or closing parentheses is expected, not an end-of-statement (for instance, inside parentheses of a parenthetical expression, an internal function or a parentheses token). Therefore, several new errors were added.

Because of the unary operator issues, a number of new test statements with unary operator errors was added to translator test #14. Many of these don't pass with the old translator routines. All of these pass with the new translator routines (excluding those with commands not yet implemented).

[commit 87e4072ba7]

Sunday, July 28, 2013

New Translator – Data Type Checking

Up until now, the data type passed to the new get expression and get operand routines was only used to the return the appropriate error when a problem was detected. The intention was that this data type be used to check that the resulting expression before returning to the caller of the get expression routine. This simplifies the design in that the callers don't have to check if the expression is the correct type - the checking is done in one place. Also part of this, any hidden conversion codes are added (to convert from integer to double or double to integer).

The get operand routine cannot check the data type of the operand obtained against the data type passed. Consider the valid expression A%+(B$=C$). After the open parentheses, the expected data type for the second operand of the add operator would be a number. An error can't be reported against the B$ operand. This checking will be handled when the operator is processed. However, when the caller of the get operand routine requests a reference, the data type of the reference operand can be checked.

These routines were modified to check the data type. The get expression required a new level argument, which is incremented for each level of parentheses recursively calls the get expression routine. A simple flag could have been used instead of a level value, but having an actual level value could be useful for debugging. Only at the end of the expression at the first level can the data type be checked. For the Any data type, no checking is done, and for the Number data type, the data type can either be Double or Integer (no conversion code is added). Otherwise, hidden conversion codes are added to the RPN output list as needed or an error is reported.

Callers of these routines no longer need to check the resulting expression or reference operand. Previously for internal functions and tokens with parentheses, the find code routine was called for each argument (internal function at each comma) and the process final operand routine was called for the final argument or sub-script (at the closing parentheses). These routines no long need to be called (but are still used for processing operators).

All of the translator tests that only contain assignment statements (though three have a single PRINT statement) still pass successfully. One statement in test #8 contained an array assignment with a string sub-script. The old translator did not catch this error (capability was not implemented). The new translator without these latest data type checking also did not even though a numeric expression was requested, the data type was not actually checked. The expected results for test #8 were updated for this (and the old results saved). For more details of all the changes needed, see the commit log.

[commit b4a908b60d]

Saturday, July 27, 2013

New Translator – Assignment Error Reporting

The remaining issues with translator tests #7 (Errors), #8 (More Errors), and #9 (Semicolon Errors) were related to incorrect errors being reported for a number of the test statements. The new get operand routine was not taking into account which type of reference was being asked for to determine the appropriate error to return. This was only a problem for the string type where different errors were needed depending if the reference type being requested was a Variable ("expected string variable" error) or All ("expected string item for assignment" error signifying that sub-string assignments are allowed).

There were translator functions for returning the expected expression error and the expected variable error from a given data type. Instead of adding a third function for the All reference type, the three were combined into the expectedErrStatus() function, which was given the reference argument in addition to the existing data type argument defaulting to the None reference type.

In the get operand function, the new expectedErrStatus() function with both arguments was used for parser errors, command and operand token types, functions with no parentheses tokens, and functions with parentheses tokens that are not sub-string functions. The new function was also used in the LET translate routine when a reference with a wrong data type is returned using the All reference type.

Finally for define functions with parentheses, the get operand routine should have reported an "expected equal or comma" error pointing to just the open parentheses when a reference was requested since define functions with no parentheses tokens are valid in assignments.

All the tests containing only assignment statements (tests #1 to #5, #7 to #11, and #13) now pass with the new translator routines except for tests #5, #11 and #13 that each contain a lone PRINT statement that is reporting a not yet implemented error.

[commit bbe3b01e37]

New Translator – Operator Processing

One of the remaining major issues are how operators were being processed in the new process operator routine. This routine pops tokens from the hold stack and processes them (processes their final operands and adds them to the RPN output list) if the token on top of the hold stack is of higher or the same precedence as the incoming token. However, incoming unary operators do not force tokens off of the hold stack regardless of precedence (since they only have one operand, they get pushed right to the hold stack).

The issue was that the incoming token can be any token type including operands, commands, functions, non-unary or binary operators (like comma or colon). These token types would have a low precedence, indicate the end of the expression, and will force all unary or binary operators off of the hold stack. These tokens types are considered terminating tokens and it is up to the caller to determine their validity.

The problem was that the token on the hold stack is not necessarily a unary or binary operator in the case of an open parentheses, internal functions, define functions or identifiers with parentheses, which are also pushed onto the hold stack. If the incoming token was also one of these tokens (or other tokens with the same precedence like identifiers with no parentheses), it would incorrectly force the token off of the hold stack (causing a malfunction).

This problem was corrected by only forcing unary and binary operators from the hold stack to be processed. A new isUnaryOrBinaryOperator() table access function was added that supports all token types and only returns true if the token type is an operator and the operator has operands (operators with zero operands like a comma do not count). The new process operator routine was optimized a bit by setting the incoming token precedence once before entering the precedence check loop, and a pointer to the top token is obtained once at the beginning of the loop. The end of the routine was also modified, also using the new table access function to determine if the incoming token's first operand should be processed (only unary and binary operators), otherwise the incoming token is a terminator and the done status is returned.

The expected results for translator tests #7 and #11 were updated to fix an incorrect error message (#7 only) and for sub-string assignment translation changes (the old result files were saved). With this change, there are still some issues with translator tests #7 (Errors), #8 (More Errors), and #9 (Semicolon Errors). Also any tests with PRINT statements report not yet implemented errors.

An unrelated change was made to the data type enumeration where the numberof value was removed as a separate value and is now set to None data type. The numberof value is used for the number of real data types that don't include the None, Number and Any data types. It is not necessary for numberof to be separate a value and was requiring dummy values to be put into the various arrays sub-scripted by the data type.

[commit 016c120804] [commit c6012c5600]

Friday, July 26, 2013

New Translator – Remaining Issues

There are at least two major issues remaining in the new translator routines that are impacting the failures in the other LET tests, though test #10 (Expression Errors) passes. The first issue is how operators are detected, specifically that some tokens that are considered operators, but are not expression operators (for example, open and closing parentheses, commas, semicolons, colons, the remark operator and end-of-line tokens).
The second issue is part of the design of the new translator has not yet been realized, namely that when getting an expression, if given a particular data type, it should check that the expression is of that type, or can be converted to that type via a hidden conversion code, else return an error. Currently the data type is only being used for reporting errors with operands. Correcting and implementing these issues has been a major undertaking, so a bunch of preliminary changes were made leading up to these changes.

Previously, all tokens with a command or operator type were considered operators. This was necessary for the token-centric old translator, because some commands like THEN and ELSE needed to be considered operators since they can come at the end of expressions. The first change made was to remove the Token::isOperator() access function along with the static Token::s_op[] array used by the access function. Uses of the access function were replaced with the Token::isType() access function. Not all uses required checking for both operator and command token types. [commit ffd1ba6462]

Since the number of old translator expected results files are growing (because of corrections and changes to translations), the regtest script was modified to look for an old expected results file (ending with an 'o') and comparing to that file if found instead. An old expected results file for translator test #3 was also added (print functions used in expressions do not report through the closing parentheses. [commit 7aef54c63e]

The fact that the open parentheses operator token was configured as a unary operator in the table was going to cause issues with the new translator routines. Therefore, it is now not configured as a unary operand and the old translator routine was modified to look for an open parentheses token before checking if the operator was unary (the check was simply moved from after to before). [commit e75adeddcb]

The segmentation fault on expression test #3 was becoming a nuisance, and was only caused by the additional of more error tests for the new translator. Therefore, the problems with old translator routines were corrected, including considering the initial state as an operand state, allowing for an empty command stack in the case of expression mode, and assuming the Any type at the beginning of an expression before any operands or unary operators are received. The memtest script was also updated to allow for comparing old translator expected results files. [commit f0f6cb57f7]

Wednesday, July 24, 2013

New Translator – Array Assignments

The next problem occurred in an incomplete array assignment statement (a single identifier with parentheses at the beginning of the line). This type of token at the beginning of the line implies that the identifier is an array as opposed to a function, since these (function names with parentheses) can't be at the beginning of the line (can't be assigned). An array at the beginning of the line also implies that the operands are subscripts and must be numeric.

The first change to correct this issue was made in the get operand routine. The reference flag of the token was set after calling the get parentheses token routine. This was moved to before the call so that the get parentheses token routine, that gets the operands (subscripts in this case) knows when the token is an array being assigned (as opposed to not knowing whether it is an array of a function).

The second change was made in the get parentheses token routine. Previously the Any data type was passed to the get expression routine, since the translator is not aware of whether the identifier with parentheses is an array or a function (and if a function doesn't know the types of the arguments, at least not right now). However, for an array assignment, it knows that the subscripts must be integers. So if the reference flag of the token is set, it will pass the Number data type (a double can be converted to an integer).

In making this change, I realized there was another issue. The get expression routine doesn't do all it should with the passed data type. It gets passed to to sub-expression of parentheses, but generally it is only being used when an error is detected to report the correct "expected XX expression" error. This issue will be addressed shortly

[commit 688816adb9]

New Translator – Additional Issues

As it so happens, translator test #5 was not the last of the LET tests. There is also test #7 (Error Tests), #8 (More Errors), #9 (Semicolon Errors), #10 (Expression Errors), #11 (Temporary Strings), #13 (Negative Constants), and #14 (Parser Errors, but also contains PRINT and INPUT statements). Unfortunately, all of these tests fail, three with glibc double free or corruption errors and one with a segmentation fault.

The first problem in test #7 was caused when handling the expression inside parentheses. When an error occurred, the get expression routine just continued on to the next token. The rest of the parentheses processing was skipped (popping the open parentheses token off of the hold stack, replacing the first and last operands of the done item on top of the done stack, setting the last precedence, and saving the pending parentheses token). This problem was corrected by returning when get expression returns an error for the sub-expression.

[commit 9d87ab2c03]

Tuesday, July 23, 2013

New Full Done-Stack Class

The final improvement seen while implementing the LET translation was in giving the done stack inside the translator more functionality, including automatic handling of the first and last operand token pointers stored in each done stack item with the PRN output list item pointer. Control of the open and close parentheses tokens was given the done item structure since it owns the first and last tokens that may contain parentheses tokens. The done item structure and done stack class were also put into their own header and source file. Click Continue... for details of the changes.

[commit 98f6ed9ae9]

Continued... »

Old Translator – INPUT PROMPT Problem

While updating the uses for a more functional done stack class (see next post for details), a bug was discovered in the old translator INPUT command handler when an error is detected with the INPUT PROMPT string expression because the type of the expression is not a string. A segmentation fault would occur for certain types of expressions, particularly if the first or last operands contained parentheses or were not set, because the first and last tokens were not being handled correctly. It has been the policy not to fix bugs in the old translator routines (not worth the time and effort), but these lines needed to be changed anyway and the changes were simple.

This code previously set the error token to the first operand token pointer through the last operand token of the top done stack item. It then proceeded to delete any closing parentheses token in the RPN output list item token, which was obviously wrong (only the last operand token would have a closing parentheses). This code also did not allow the first and last token pointers to contain null values, which caused the segmentation fault.

The code was changed to set the error token to the RPN item's token if the first point is null, and set through the last pointer if the last pointer is also not null, and deleting any closing parentheses token in the last operand token. Statements were added to translator test #12 (INPUT tests) that cause the issues (these will be useful for testing the new INPUT command translation once implemented).

[commit 1f360f08fb]

Sunday, July 21, 2013

Memory Testing Issue

I spent the day yesterday installing a new SSD (Solid State Drive). After installing the OS (the same Linux Mint 13 KDE 64-bit) and transferring my previous configuration (home directory), the version of Mint 13's KDE was upgraded from version 4.8.5 to 4.10.5 using the Kubuntu backports repository. Along with KDE 4.10.5 came a slightly new version of the Qt libraries (from 4.8.1 to 4.8.2) and CMake (2.8.7 to 2.8.9). While the CMake change had no effect, the new Qt libraries caused the memory tests to fail.

The reason for the failures was due to the error suppression file containing specific references to Qt 4.8.1, which obviously didn't match the errors produced when using Qt 4.8.2. Changing all the "4.8.1" string to "4.8.2" allowed the memory test to pass. Instead of having separate suppression files for each version of Qt, the suppression file was changed to a CMake configure input file where CMake fills in the current Qt version.

Testing this change with an installed version of Qt 4.8.4 (Qt 4.8.5. is now the latest version of the Qt 4.8 series) did not work because of two issues. The first was that this version of Qt was located in a different path. The suppression input file was modified to also get the directory of Qt filled in by CMake. The second was that Qt 4.8.4 produced a few additional memory errors. These errors were added to the suppression input file and do not cause any issues when testing with Qt 4.8.2.

In addition to these changes, a temporary regtestn script was added, similar to memtestn, that tests the expression tests and the translator tests that are working with the new translator routines. A temporary batch file regtestn.bat was also added, however, because of the limitations of DOS batch files, it couldn't easily be restricted to only test the first five translator tests.

[commit 3bb45e3ced]

New Translator – Minor Code Improvements

During the implementation of the LET command translation using the new translator routines (which is now complete), some areas of improvement were seen in the code that could be made. This changes were kept separate from the latest sub-string assignment implementation commit.

The first was really a correction in the LET translate routine that would be needed once the multiple statements per line ability, separated by colons, was implemented. The issue was when an error is detected at the very beginning of a LET statement. To detect if the error was at the beginning, the column of the token was checked to see if it was zero. This was used to determine which error to return. However, this only works for a LET statement at the very beginning of a line. To correct this, before entering the get references loop, the column of the first token (the command token if there was one, or the first token) is saved. This saved column is then used to detect if the token with an error is at the beginning of the statement.

An improvement was made in how a flag is accessed from a table entry. There were flags() functions (taking either a code or a token pointer) that returned the flags for the table entry (if the code has an entry, otherwise the null flag was returned). The returned value was then anded to the desired flag to see if the result was non-zero (flag set) or zero (flag not set) These were changed to the hasFlag() functions that take a second argument for the desired flag, and return non-zero (flag is set) or zero (flag is not set).

While testing the LET translation, a lot of time was spent chasing down token memory leaks. The solution was to set the UnUsed sub-code in the token if it was not used. Care was needed to not set this sub-code if the token was used (for instance, the token was in the RPN output list or on the hold stack). There were quite a few of these set statements. As an alternative solution, this sub-code is now set once when a token is obtained from the parser by the get token routine. When the token is added to the output list, this sub-code is cleared. A simple output append routine was implemented to do this for all locations appending to the output list.

[commit a67b759432] [commit 6193ef5550] [commit 887cfc06e1]

Friday, July 19, 2013

New Translator – Sub-String Assignments

Sub-string assignments were implemented as described on the two posts from Sunday. In the LET translate routine, a flag was added, which is set if any of the assignments are a sub-string assignment. Before converting the comma or equal token to an assign code, the token on top of the done stack is checked to see if its code's table entry has the sub-string flag. If it does, the comma or equal token is deleted since it will not be used and the sub-string function is popped from the done stack, converted to the appropriate assign sub-string code and pushed to the local token stack.

After the equal token is received, and the tokens on the local token stack are processed, if the stack is not empty after popping the last token (indicating a multiple assignment) and there is a sub-string assignment (the new flag is set), then starting with the first token, each assign token is converted to an AssignKeep token (included a regular AssignStr code) and appended to the RPN output list. This continues until the last token is popped, which is appended to the output as a regular assign code by the process final operand routine.

Translator tests #4 (sub-string assignments) and #5 (LET commands) now work with the new translator routines except for a single PRINT command at the end of test #5 (which for now reports a "not yet implemented" bug error). Because of the change in sub-string assignment translations, the expected results were updated. As a result, the old translator will fail with these tests. The old results files for these tests were temporarily saved for reference. The temporary memory test script was updated to include these tests. See the commit log for more details of the changes made to implement sub-string assignments.

[commit e56db8f188]

Tuesday, July 16, 2013

Print Functions Correction

A segmentation fault was occurring with the new translator routines when a print function (TAB or SPC) was used incorrectly in an expression. The problem was in the existing process final operand routine, which contained a check for print-only internal functions. For these functions, it set flags for the current command on the command stack. However, the command stack is not used for the new translator routines and was empty causing the crash. This section of code will be removed once the old translator routines are removed, but to temporarily correct the problem with the new translator, a check was added to make sure the command stack is not empty.

The second problem occurred in the existing find code routine. The change above allowed the print functions with a none data type to now be placed on the done stack. When the the data type of the done stack top item was checked with the expected data type, the conversion code table did not have any entries for the none data type. The missing entries were added set to the invalid code.

Two additional statements with print functions used incorrectly in expressions were added to translator test #3. The old translator routines report the error incorrectly, the entire function through the closing parentheses should be reported, not just the function token. The new translator routines do report the error correctly.

[commit 250b6ffd15]

Sunday, July 14, 2013

Multiple String Assignments – With Sub-Strings

Multiple string assignments will be handled by the AssignListStr code. At run-time, this code (as the other assign list codes) will pop the value to be assigned from the stack and then begin popping variable references from the stack, assign the value, and continue until the stack is empty. This will not work if any of the references to assign is a sub-string. With the old design, mix-string assignments were handled with the AssignListMixStr code. This was detailed in the post on May 22, 2010, but no details were given how this would be handled at run-time.

For the new design, if a multiple string assignment contains at least one sub-string, then there will be a specific assign code for each assignment instead of a single assign list code. The specific assign codes will keep the value being assigned on the stack for the next assign code. Only the last code will be a regular assign code. Consider this mixed string assignment and its translation (note color coding showing the tokens that the codes process):

A$, LEFT$(B$,5), RIGHT$(C$,2) = D$
A$ B$ 5 C$ 2 D$ AssignKeepRight AssignKeepLeft AssignStr

The assign keep codes will pop the value to be assigned from the stack, pop the reference to assign, assign the value to the reference and push the value back to the stack for the next assign code. The final regular assign code will not push the value to be assigned back to the stack leaving the stack empty.

There will be five assign keep codes: AssignKeepStr, AssignKeepLeft, AssignKeepMid2, AssignKeepMid3 and AssignKeepRight. In the table, these codes will be the second associated code for the AssignStr, Left, Mid2, Mid3 and Right code entries.

Sub-Strings – New Design

Previously with the original String class, as an optimization, sub-string functions were handled differently than the other string functions. Instead of returning a temporary string, they simply adjusted what part of the string they referred to, and would therefore work for either temporary strings (results of other string functions or operators) or reference strings (from variables). Sub-strings assignments would be handled with an assign sub-string code that would work with the result of a sub-string of a variable reference. This was detailed by the series of posts in May, 2010.

With the change to the QString class, this optimization will not work, and is not necessary. The sub-string functions (LEFT$, MID$, and RIGHT$) will work like the other string functions and operators where they will return a temporary string. However, sub-string assignments will need to be handled differently. There will need to be specific new codes for handling sub-string assignments, to be named AssignLeft, AssignMid2, AssignMid3, and AssignRight.

Consider the following sub-string assign along with the old translation and the proposed new translation:

LEFT$(A$,5)=B$
Old: A$<ref> 5 LEFT$(<ref> B$ AssignSub$
New: A$<ref> 5 B$ AssignLeft

With the old translation, the sub-string reference would be on the stack (along with the value to assign) for the generic AssignSub$ to process. The new translation is simpler where the new AssignLeft code will expect a regular variable reference, the length argument of the LEFT$ function, and the value to assign. Internally all the sub-string assignment codes will use the QString::replace() function.

New Translator – Testing and a Correction

Now that all four expression tests along with the first three translator tests are working with the new translator routines, it is becoming somewhat time consuming to test each individually including checking for memory errors.

Therefore, temporarily a new memory test script (memtestn) was added that is basically identical to the current memory test script (memtest) except that the new translator is used and only the first three translator tests are run. As more of the new translator is implemented and more tests are working, the script will be updated. This change was put into its own commit, so that it can be reverted once the new translator implementation is complete and the old translator routines are removed.

While using the new memory test script, a problem was discovered with translator test #3 on one of the error tests, which was causing a segmentation fault, but only when compiled for Release. When compiled for Debug (as used for development), the segmentation fault did not occur. Finding the problem was difficult because the segmentation fault did not occur when compiled for Debug.

The debugging method used was the insertion of qDebug() calls until the location of the crash was found. The problem occurred in the new outputLastToken() access function added to the Translator class so that the command translate routines can access the last token added to the RPN output list. The problem was that this function did not actually have a return keyword. When compiled for debug, the correct pointer gets returned, but when compiled for release, this is optimized out and a null gets returned.

[commit 06bd286162] [commit 8c872f68c6]

Saturday, July 13, 2013

LET Command – Single/Multiple Assignments

The design of each command will be reconsidered with the new translator design. Several designs for the LET command were considered, but in the end, the current design seems to the most efficient at run time with the minor exception of multiple sub-string assignments. Excluding sub-string assignments, LET statements are translated as follows:

A = 5.0 A<ref> 5.0 Assign
A,B,C = 5.0 A<ref> B<ref> C<ref> 5.0 AssignList

For multiple assignments, all variables being assigned must be the same data type. The data type of the value being assigned must match the variables being assigned, however, for numeric types, an appropriate hidden conversion code will be added as needed. If the optional LET keyword was specified, the hidden LET sub-code is set in the final assignment token.

Click Continue... For details of the implementation of the LET translation. See the commit log for other minor changes made. Translator tests #1 through #3 (various assignment tests) now pass with the new translator routines.

[commit f965e0f649]

Continued... »

Saturday, July 6, 2013

New Translator – LET Translation (Begin)

Before beginning the implementation of the LET command translation routine, so thought was given on how the project should be organized. The old token centric translator design had the translation of commands embedded throughout the translator, specifically in the token handling functions. With the new translator design being command centric, the various routines (translate, recreate and execute) for each command can be organized into their own files.

A decision was also made not to clutter up the main project directory with all the various command source files, so these files will be put into a sub-directory. The name "basic" was chosen for this sub-directory since all the sources files will be related to the BASIC language. This sub-directory will also contain the execute routines for all of the operators and internal functions of the BASIC language. The various command function prototypes (and any other command related definitions needed) will be put into the commands.h header file in this sub-directory.

The let.cpp source file was created in the basic sub-directory to hold LET command routines. An initial translate routine was implemented for the LET command. For now, this routine just checks if the LET keyword was specified and returns two different BUG Debug statuses to distinguish between the two forms. Finally, a function prototype for this routine was added to the commands.h header file.

Some changes were also needed to the CMake build configuration file starting with added the let.cpp source file to the list of source files. So that the various files in the basic sub-directory can access the various header files, the main project source directory was added to the list of include directories. It turns out that it is not necessary to make the executable dependent on the list of project header files as CMake automatically figures out all the dependent header files for each source file, so this list was removed.

The initial LET function was tested with translator test #5 (LET command tests) to make sure the temporary BUG Debug errors were reported correctly.

[commit 3330b28c9e]

Friday, July 5, 2013

New Translator – Command Translation

To start the translation of a command, a new get command routine was implemented and starts by getting a token taking into account an assignment statement that does not have the LET keyword. If the first token has an error, the "expected command" error is returned since it does not matter what type of parser error was detected.

If the token obtained is a command token, the pointer to the translate function for the command is obtained from the table. Otherwise an assignment statement is assumed and the pointer to the translate function for the LET command is obtained. The token will be passed to the LET translate function.

The interface of the translate functions contains a reference to the translator (so the command can access the various translator routine like the get token and get expression routines), a pointer to the command token (so the command can add it to the output list), and a reference to a token pointer to be used to return the token the terminated the command or where an error was detected (and will be used to pass the first token to the LET translator function for an implied assignment statement). The token status is returned.

If the translate function pointer is not set, the token is marked unused and a "not yet implemented" error is returned. For now, no translate function pointers have been set in the table (none have been implemented). The translate functions will replace the command handlers. The token handlers are not needed with the new translator.

The new translator routine was modified to call either the get expression or the new get command routine depending on the expression mode argument. The expression mode argument does not need to be saved with the new translator routines. A temporary '-nt' test option was added to access the new translator routines for statements, and similarly the '-n' test option was expanded to support translator test files. Obviously none of the translator tests succeed with the new translator routines.

[commit de01f48ebb]

New Translator – Parentheses Expressions (Tagged)

The implementation of expressions for all tokens with parentheses including open and closing parentheses is now complete. The new translator routines now fully support expressions and version v0.4.1 has been tagged. Note that expression test #1 currently still fails with the regression and memory test scripts because of problems with the expression mode in the old translator routines. The new translator routines run all expression tests successfully. Implementation of command translation can now commence in the new translator.

[commit ac6658f16f]

New Translator – Arrays/User Functions

There are four token types used for arrays and functions, which include identifiers with and without parentheses (could be either a variable, an array or a user function to be determined by the encoder), and a defined function with and without parentheses. A define function without parentheses simply gets added to the output list and push to the done stack like constants, identifiers with no parentheses, and internals functions with no parentheses. To support the other two types (identifiers and define functions with parentheses) the get operand routine was updated to call the newly implemented get parentheses token routine.

The new get parentheses token routine starts by pushing the parentheses token to the hold stack, which being of low precedence, will create a border as the expressions of the arguments are processed. A number of operands counter is initialized and a loop is entered for each operand by first calling the get expression routine. If the parentheses token is an identifier with parentheses, it could be a user function. Arguments of user functions are passed by reference, so any operand that could be an variable or array element has its reference flag set.

The terminating token is then checked. For a comma, the token is deleted and the operand is counted. For a closing parentheses, the existing process final operand function is called, which upon success attaches all the operands, appends the token with parentheses to the RPN output list, and pushes the token to the done stack. The token with parentheses is then dropped from the hold stack. For other terminating tokens, the appropriate error is returned.

One expression in test #3 had a different result from the old translator routines due to two issues. First, an array in the expression incorrectly had its reference flag set by the old routines (the new routines correctly did not). Second, the argument of a define function along with the define function incorrectly had their reference flags set. For a defined function, the old routines assumed the define functions would be passed by reference (and set the reference flags of arguments). This is no longer the case (see last post). These minor issues were corrected in the old routines and results for expression test #3 were updated.

[commit 847c5d078e]

New Translator – User Functions

User functions can take two forms, a full user function (using the FUNCTION syntax) or a simple define function (using the DEF FN syntax). A full user function will pass arguments by reference. This only applies to variables and array elements. Results of expressions will be passed by value. To keep things consistent internally, a temporary value will be allocated and a pointer to it will be passed, so all arguments will be a reference. To force a single variable or array element to be passed by value, it can be surrounded by a set of parentheses.

Define functions will have two forms, a single line form that looks like an assignment and a multiple line form (that will end with an END DEF statement have one or more assignments for the return value). Unlike full user functions, arguments to define functions will only be passed by value. This make sense for the single line form since the arguments can't be assigned inside the function, and to keep the internal code consistent between the two forms, the multiple line form will also have arguments passed by value. Arguments will be local variables, so any assignments will not affect the original variables.

This is the same method of argument passing used by QBASIC, and seems to be a reasonable design choice, so will also be used for this project. Subroutines (the SUB syntax) will use the same pass by reference scheme as full user functions. The passing of entire arrays will be dealt with later. QBASIC allows this by listing the array name followed by an opening and closing parentheses with no subscripts. This same syntax may be used for this project.

To handle function calls in the translator, the reference flag of an operand is set if it is an identifier with or without parentheses. These tokens could end up being function calls, but this will be handled by the encoder. Identifiers with parentheses could also end up being an array, where its arguments are integer subscripts. Again this will be handled by the encoder, which will add any needed internal convert to integer codes (for double subscripts) or report errors for string subscripts. The reference flag in any subscripts will be ignored. Regardless of whether the identifier is an array or function, the translator will attach a pointer to each argument for the encoder.

Thursday, July 4, 2013

New Translator – Internal Functions

To support internal functions, both with arguments (parentheses) and without (no parentheses), the get operand function was updated. Internal functions without arguments are treated the same as constants and identifiers with no parentheses where the tokens are simply added to the RPN output list and pushed to the done stack.

For internal functions with parentheses, a new get internal function routine was implemented and called. This function starts by pushing the internal function token to the hold stack, which being of low precedence, will create a border as the expressions of the arguments are processed. After getting the number arguments, it loops for each argument by first calling the get expression routine and then checking the terminating token for a comma or closing parentheses.

For a comma, if at the last argument, then an error is returned if the internal function code does not have multiple entries (for examine, the MID$ function with two or three arguments). Otherwise the code is changed to the code with an additional argument. The comma token is deleted and the existing find code routine is called to process and check the argument.

For a closing parentheses, if not at the last argument, then an error is returned. Otherwise the existing process final operand function is called to process the final argument, which appends the internal function token to the RPN output list upon success. The internal function token is then dropped from the hold stack.

If the terminating token is neither a comma nor a closing parentheses, then the appropriate error is returned depending which argument it is at taking into account whether the internal function has multiple entries.

Expression test #3 now passes with the new translator routines except for the three lines that contain identifiers with parentheses (an array or a user function), which has not yet been implemented in the new translator. Expression test #4 also now passes successfully. Since none of the expression tests contained a function with no arguments, a new expression was added to test #4 containing the RND function.

[commit 81d3137531]

New Translator – Checking Token Codes

A minor coding issue was discovered in code that needs to check the code in a token. This is more an issue with the new translator routines because they will be checking tokens that could be of any token type. The old translator routines already knew the token type before checking the code.

Since it is desirable to make the code as easy to implement as possible, the Token class isCode() function was modified to make sure the token is the correct type before checking the code. The correct type is any type that has a table entry. If the calling code already knows the token is a type that has a valid code, then it can use the Token class code() access function and compare to the code directly.

[commit 9bbd1e319f]

Wednesday, July 3, 2013

New Translator – Parentheses

To support simple parenthetical expressions, the get expression routine was modified to look for an open parentheses token just before checking for a unary operator. The open parentheses token is pushed to the hold stack so that when the expression is processed by recursively calling get expression, only operators from inside the expression will be popped from the hold stack (open parentheses has a very low precedence). After processing the closing parentheses, get expression continues on by getting and processing a binary operator of end-of-expression token.

The recursive get expression will terminate upon a closing parentheses token. In the table, the end expression flag had to be added to the closing parentheses code to cause this termination. Upon return, the terminating token is checked for a closing parentheses, otherwise an "expected operator or closing parentheses" error is returned. The open parentheses token is popped from the hold stack and the open and closing parentheses tokens are assigned to the first and last tokens of the item on top of the done stack (used for error reporting; details of this design start with posts on January 16, 2011).

The closing parentheses is saved as a pending parentheses token for later checking. This token is marked as being used as a last operand and as the pending parentheses (to prevent it from being prematurely deleted). The precedence of the last token added to the output list (which is on top of the done stack) is saved for when the pending parentheses token is later checked. If this token is an operator, the precedence is obtained from the table, otherwise it is set to the highest precedence (parentheses around not operators are never necessary).

A new check pending parentheses routine was added starting with the existing do pending parentheses routine (renamed more appropriately instead of adding a '2', but will still eventually replace the old routine). These routines check if parentheses entered are not necessary, in which case, a flag is set in the translated so that these will be reproduced. Necessary parentheses are implied by the translated RPN format. Parentheses necessity is determined by the precedence of the operators (see posts starting on March 21, 2010 for details).

The new check pending parentheses routine is called from two locations in the new process operator routine, one when an operator is popped from the hold stack before its final operand is processed and one for a new operator before it is checked for the end-of-expression or its first operand is processed. The check pending parentheses routine determines if parentheses are not necessary by checking if the precedence of last token added is greater than the current operator or of equal precedence if th operator was just popped from the hold stack. There is a new popped argument to determine from which (the old routine used the current translator state).

Expression test #2 (parentheses tests) now passes with the new translator routines, but one change was needed for the unexpected closing parentheses test that reported the "expected operator or end-of-expression" error (internally named NoOpenParen). When the wording of this error was changed, it should been been removed and the "expected operator or end-of-statement" error used instead. This error needs to be replaced by the caller as appropriate for the command in which it is detected (see post from Tuesday).

[commit d3d57f2235]

Tuesday, July 2, 2013

New Translator – Unary Operator Issue

An issue was discovered where if the token after an operand was a unary operator (this token should be a binary operator or a token terminating the expression), the process operator routine incorrectly processed the unary operator assuming it was a binary operator causing the code to malfunction.

A check was added after getting this token where if it is a unary operator, the "expected binary operator or end-of-statement" error is returned. This error, like the "expected operator or end-of-statement" error, will also need to be changed to an appropriate error by the caller. A test for this was added to expression test #1.

[commit 261c9647df]

New Translator – Parser Errors

There are currently two types of parser errors, an unrecognizable character and an incorrect number constant (of which there are five different ones). The number constant errors may have an alternate column (for example, when there is an error in exponent of a floating point number, the alternate column points to the error and the column points to the beginning of the number) or a length more than one (for example, two consecutive decimal points at the beginning).

When the translator is expecting an operand, the number constant errors should be reported as is (for example, pointing to the bad exponent or both decimal points). However, when the translator is expecting some other token (like an operator), a different error appropriate for the situation (for example, an "expecting an operator or end-of-statement" error) should be reported and it should be pointing to the only the first character of the bad token.

The get token routine was modified to take a desired data type argument instead of an operand flag. A none data type indicates that the token desired is not an operand. If the token obtained has an error, then it is marked as unused (it needs to be deleted). If not getting an operand token and the data type of the token is double (indicating a number constant error), then the token length is set to one (the error will point to only the first character of the token). For the error to return, if getting an operand token and the data type of the token is not double (indicating a parser error), the error is set to the appropriate "expecting XX expression" error for the desired data type argument, otherwise a generic parser error is returned (so the caller will used the number constant error message in the token).

The loop in the get expression routine was modified to eliminate duplicate code before the loop and at the end of the loop. At the end of the loop, the token pointer is set to null to force getting a new token at the beginning of the loop (I wish there were a way to prevent having this extra check, but there isn't without resorting to a goto statement and a label). Each of the return statements were changed to break statements with the final return at the end outside of the loop.

Also in this routine, the statement for getting the operand and next token was broken into two so that the error can be changed to an "expecting operator or end-of-statement" error when get token returns an error (any parser error). This error will need to be changed to the appropriate error by the caller. For example, in an IF statement when it calls to get its expression, if this error occurs, the error needs to be changed to an "expected operator or THEN" error.

In the new translate routine, when setting the error message in the RPN output list, the error string is obtained from the token for a parser error instead of from the error status. This also needed to be done before an unused token is deleted (tokens with parser errors are marked as unused).

Finally, the set operand state function of the parser was removed. This was always and only called before calling the token routine, so an operand state argument was added to this function. Several parser error tests were added to expression test #1. It was also discovered that the autoenums.h include file was not always being regenerated when the Token class source file was modified, which turned out to be a wrong dependency listed in the CMake build file.

[commit a194cdbfb6] [commit 348ec2230b]