Interactive BASIC Compiler Project

Friday, July 26, 2013

New Translator – Remaining Issues

There are at least two major issues remaining in the new translator routines that are impacting the failures in the other LET tests, though test #10 (Expression Errors) passes. The first issue is how operators are detected, specifically that some tokens that are considered operators, but are not expression operators (for example, open and closing parentheses, commas, semicolons, colons, the remark operator and end-of-line tokens).
The second issue is part of the design of the new translator has not yet been realized, namely that when getting an expression, if given a particular data type, it should check that the expression is of that type, or can be converted to that type via a hidden conversion code, else return an error. Currently the data type is only being used for reporting errors with operands. Correcting and implementing these issues has been a major undertaking, so a bunch of preliminary changes were made leading up to these changes.

Previously, all tokens with a command or operator type were considered operators. This was necessary for the token-centric old translator, because some commands like THEN and ELSE needed to be considered operators since they can come at the end of expressions. The first change made was to remove the Token::isOperator() access function along with the static Token::s_op[] array used by the access function. Uses of the access function were replaced with the Token::isType() access function. Not all uses required checking for both operator and command token types. [commit ffd1ba6462]

Since the number of old translator expected results files are growing (because of corrections and changes to translations), the regtest script was modified to look for an old expected results file (ending with an 'o') and comparing to that file if found instead. An old expected results file for translator test #3 was also added (print functions used in expressions do not report through the closing parentheses. [commit 7aef54c63e]

The fact that the open parentheses operator token was configured as a unary operator in the table was going to cause issues with the new translator routines. Therefore, it is now not configured as a unary operand and the old translator routine was modified to look for an open parentheses token before checking if the operator was unary (the check was simply moved from after to before). [commit e75adeddcb]

The segmentation fault on expression test #3 was becoming a nuisance, and was only caused by the additional of more error tests for the new translator. Therefore, the problems with old translator routines were corrected, including considering the initial state as an operand state, allowing for an empty command stack in the case of expression mode, and assuming the Any type at the beginning of an expression before any operands or unary operators are received. The memtest script was also updated to allow for comparing old translator expected results files. [commit f0f6cb57f7]

Wednesday, July 24, 2013

New Translator – Array Assignments

The next problem occurred in an incomplete array assignment statement (a single identifier with parentheses at the beginning of the line). This type of token at the beginning of the line implies that the identifier is an array as opposed to a function, since these (function names with parentheses) can't be at the beginning of the line (can't be assigned). An array at the beginning of the line also implies that the operands are subscripts and must be numeric.

The first change to correct this issue was made in the get operand routine. The reference flag of the token was set after calling the get parentheses token routine. This was moved to before the call so that the get parentheses token routine, that gets the operands (subscripts in this case) knows when the token is an array being assigned (as opposed to not knowing whether it is an array of a function).

The second change was made in the get parentheses token routine. Previously the Any data type was passed to the get expression routine, since the translator is not aware of whether the identifier with parentheses is an array or a function (and if a function doesn't know the types of the arguments, at least not right now). However, for an array assignment, it knows that the subscripts must be integers. So if the reference flag of the token is set, it will pass the Number data type (a double can be converted to an integer).

In making this change, I realized there was another issue. The get expression routine doesn't do all it should with the passed data type. It gets passed to to sub-expression of parentheses, but generally it is only being used when an error is detected to report the correct "expected XX expression" error. This issue will be addressed shortly

[commit 688816adb9]

New Translator – Additional Issues

As it so happens, translator test #5 was not the last of the LET tests. There is also test #7 (Error Tests), #8 (More Errors), #9 (Semicolon Errors), #10 (Expression Errors), #11 (Temporary Strings), #13 (Negative Constants), and #14 (Parser Errors, but also contains PRINT and INPUT statements). Unfortunately, all of these tests fail, three with glibc double free or corruption errors and one with a segmentation fault.

The first problem in test #7 was caused when handling the expression inside parentheses. When an error occurred, the get expression routine just continued on to the next token. The rest of the parentheses processing was skipped (popping the open parentheses token off of the hold stack, replacing the first and last operands of the done item on top of the done stack, setting the last precedence, and saving the pending parentheses token). This problem was corrected by returning when get expression returns an error for the sub-expression.

[commit 9d87ab2c03]

Tuesday, July 23, 2013

New Full Done-Stack Class

The final improvement seen while implementing the LET translation was in giving the done stack inside the translator more functionality, including automatic handling of the first and last operand token pointers stored in each done stack item with the PRN output list item pointer. Control of the open and close parentheses tokens was given the done item structure since it owns the first and last tokens that may contain parentheses tokens. The done item structure and done stack class were also put into their own header and source file. Click Continue... for details of the changes.

[commit 98f6ed9ae9]

Continued... »

Old Translator – INPUT PROMPT Problem

While updating the uses for a more functional done stack class (see next post for details), a bug was discovered in the old translator INPUT command handler when an error is detected with the INPUT PROMPT string expression because the type of the expression is not a string. A segmentation fault would occur for certain types of expressions, particularly if the first or last operands contained parentheses or were not set, because the first and last tokens were not being handled correctly. It has been the policy not to fix bugs in the old translator routines (not worth the time and effort), but these lines needed to be changed anyway and the changes were simple.

This code previously set the error token to the first operand token pointer through the last operand token of the top done stack item. It then proceeded to delete any closing parentheses token in the RPN output list item token, which was obviously wrong (only the last operand token would have a closing parentheses). This code also did not allow the first and last token pointers to contain null values, which caused the segmentation fault.

The code was changed to set the error token to the RPN item's token if the first point is null, and set through the last pointer if the last pointer is also not null, and deleting any closing parentheses token in the last operand token. Statements were added to translator test #12 (INPUT tests) that cause the issues (these will be useful for testing the new INPUT command translation once implemented).

[commit 1f360f08fb]

Sunday, July 21, 2013

Memory Testing Issue

I spent the day yesterday installing a new SSD (Solid State Drive). After installing the OS (the same Linux Mint 13 KDE 64-bit) and transferring my previous configuration (home directory), the version of Mint 13's KDE was upgraded from version 4.8.5 to 4.10.5 using the Kubuntu backports repository. Along with KDE 4.10.5 came a slightly new version of the Qt libraries (from 4.8.1 to 4.8.2) and CMake (2.8.7 to 2.8.9). While the CMake change had no effect, the new Qt libraries caused the memory tests to fail.

The reason for the failures was due to the error suppression file containing specific references to Qt 4.8.1, which obviously didn't match the errors produced when using Qt 4.8.2. Changing all the "4.8.1" string to "4.8.2" allowed the memory test to pass. Instead of having separate suppression files for each version of Qt, the suppression file was changed to a CMake configure input file where CMake fills in the current Qt version.

Testing this change with an installed version of Qt 4.8.4 (Qt 4.8.5. is now the latest version of the Qt 4.8 series) did not work because of two issues. The first was that this version of Qt was located in a different path. The suppression input file was modified to also get the directory of Qt filled in by CMake. The second was that Qt 4.8.4 produced a few additional memory errors. These errors were added to the suppression input file and do not cause any issues when testing with Qt 4.8.2.

In addition to these changes, a temporary regtestn script was added, similar to memtestn, that tests the expression tests and the translator tests that are working with the new translator routines. A temporary batch file regtestn.bat was also added, however, because of the limitations of DOS batch files, it couldn't easily be restricted to only test the first five translator tests.

[commit 3bb45e3ced]

New Translator – Minor Code Improvements

During the implementation of the LET command translation using the new translator routines (which is now complete), some areas of improvement were seen in the code that could be made. This changes were kept separate from the latest sub-string assignment implementation commit.

The first was really a correction in the LET translate routine that would be needed once the multiple statements per line ability, separated by colons, was implemented. The issue was when an error is detected at the very beginning of a LET statement. To detect if the error was at the beginning, the column of the token was checked to see if it was zero. This was used to determine which error to return. However, this only works for a LET statement at the very beginning of a line. To correct this, before entering the get references loop, the column of the first token (the command token if there was one, or the first token) is saved. This saved column is then used to detect if the token with an error is at the beginning of the statement.

An improvement was made in how a flag is accessed from a table entry. There were flags() functions (taking either a code or a token pointer) that returned the flags for the table entry (if the code has an entry, otherwise the null flag was returned). The returned value was then anded to the desired flag to see if the result was non-zero (flag set) or zero (flag not set) These were changed to the hasFlag() functions that take a second argument for the desired flag, and return non-zero (flag is set) or zero (flag is not set).

While testing the LET translation, a lot of time was spent chasing down token memory leaks. The solution was to set the UnUsed sub-code in the token if it was not used. Care was needed to not set this sub-code if the token was used (for instance, the token was in the RPN output list or on the hold stack). There were quite a few of these set statements. As an alternative solution, this sub-code is now set once when a token is obtained from the parser by the get token routine. When the token is added to the output list, this sub-code is cleared. A simple output append routine was implemented to do this for all locations appending to the output list.

[commit a67b759432] [commit 6193ef5550] [commit 887cfc06e1]

Friday, July 19, 2013

New Translator – Sub-String Assignments

Sub-string assignments were implemented as described on the two posts from Sunday. In the LET translate routine, a flag was added, which is set if any of the assignments are a sub-string assignment. Before converting the comma or equal token to an assign code, the token on top of the done stack is checked to see if its code's table entry has the sub-string flag. If it does, the comma or equal token is deleted since it will not be used and the sub-string function is popped from the done stack, converted to the appropriate assign sub-string code and pushed to the local token stack.

After the equal token is received, and the tokens on the local token stack are processed, if the stack is not empty after popping the last token (indicating a multiple assignment) and there is a sub-string assignment (the new flag is set), then starting with the first token, each assign token is converted to an AssignKeep token (included a regular AssignStr code) and appended to the RPN output list. This continues until the last token is popped, which is appended to the output as a regular assign code by the process final operand routine.

Translator tests #4 (sub-string assignments) and #5 (LET commands) now work with the new translator routines except for a single PRINT command at the end of test #5 (which for now reports a "not yet implemented" bug error). Because of the change in sub-string assignment translations, the expected results were updated. As a result, the old translator will fail with these tests. The old results files for these tests were temporarily saved for reference. The temporary memory test script was updated to include these tests. See the commit log for more details of the changes made to implement sub-string assignments.

[commit e56db8f188]

Tuesday, July 16, 2013

Print Functions Correction

A segmentation fault was occurring with the new translator routines when a print function (TAB or SPC) was used incorrectly in an expression. The problem was in the existing process final operand routine, which contained a check for print-only internal functions. For these functions, it set flags for the current command on the command stack. However, the command stack is not used for the new translator routines and was empty causing the crash. This section of code will be removed once the old translator routines are removed, but to temporarily correct the problem with the new translator, a check was added to make sure the command stack is not empty.

The second problem occurred in the existing find code routine. The change above allowed the print functions with a none data type to now be placed on the done stack. When the the data type of the done stack top item was checked with the expected data type, the conversion code table did not have any entries for the none data type. The missing entries were added set to the invalid code.

Two additional statements with print functions used incorrectly in expressions were added to translator test #3. The old translator routines report the error incorrectly, the entire function through the closing parentheses should be reported, not just the function token. The new translator routines do report the error correctly.

[commit 250b6ffd15]

Sunday, July 14, 2013

Multiple String Assignments – With Sub-Strings

Multiple string assignments will be handled by the AssignListStr code. At run-time, this code (as the other assign list codes) will pop the value to be assigned from the stack and then begin popping variable references from the stack, assign the value, and continue until the stack is empty. This will not work if any of the references to assign is a sub-string. With the old design, mix-string assignments were handled with the AssignListMixStr code. This was detailed in the post on May 22, 2010, but no details were given how this would be handled at run-time.

For the new design, if a multiple string assignment contains at least one sub-string, then there will be a specific assign code for each assignment instead of a single assign list code. The specific assign codes will keep the value being assigned on the stack for the next assign code. Only the last code will be a regular assign code. Consider this mixed string assignment and its translation (note color coding showing the tokens that the codes process):

A$, LEFT$(B$,5), RIGHT$(C$,2) = D$
A$ B$ 5 C$ 2 D$ AssignKeepRight AssignKeepLeft AssignStr

The assign keep codes will pop the value to be assigned from the stack, pop the reference to assign, assign the value to the reference and push the value back to the stack for the next assign code. The final regular assign code will not push the value to be assigned back to the stack leaving the stack empty.

There will be five assign keep codes: AssignKeepStr, AssignKeepLeft, AssignKeepMid2, AssignKeepMid3 and AssignKeepRight. In the table, these codes will be the second associated code for the AssignStr, Left, Mid2, Mid3 and Right code entries.

Sub-Strings – New Design

Previously with the original String class, as an optimization, sub-string functions were handled differently than the other string functions. Instead of returning a temporary string, they simply adjusted what part of the string they referred to, and would therefore work for either temporary strings (results of other string functions or operators) or reference strings (from variables). Sub-strings assignments would be handled with an assign sub-string code that would work with the result of a sub-string of a variable reference. This was detailed by the series of posts in May, 2010.

With the change to the QString class, this optimization will not work, and is not necessary. The sub-string functions (LEFT$, MID$, and RIGHT$) will work like the other string functions and operators where they will return a temporary string. However, sub-string assignments will need to be handled differently. There will need to be specific new codes for handling sub-string assignments, to be named AssignLeft, AssignMid2, AssignMid3, and AssignRight.

Consider the following sub-string assign along with the old translation and the proposed new translation:

LEFT$(A$,5)=B$
Old: A$<ref> 5 LEFT$(<ref> B$ AssignSub$
New: A$<ref> 5 B$ AssignLeft

With the old translation, the sub-string reference would be on the stack (along with the value to assign) for the generic AssignSub$ to process. The new translation is simpler where the new AssignLeft code will expect a regular variable reference, the length argument of the LEFT$ function, and the value to assign. Internally all the sub-string assignment codes will use the QString::replace() function.

New Translator – Testing and a Correction

Now that all four expression tests along with the first three translator tests are working with the new translator routines, it is becoming somewhat time consuming to test each individually including checking for memory errors.

Therefore, temporarily a new memory test script (memtestn) was added that is basically identical to the current memory test script (memtest) except that the new translator is used and only the first three translator tests are run. As more of the new translator is implemented and more tests are working, the script will be updated. This change was put into its own commit, so that it can be reverted once the new translator implementation is complete and the old translator routines are removed.

While using the new memory test script, a problem was discovered with translator test #3 on one of the error tests, which was causing a segmentation fault, but only when compiled for Release. When compiled for Debug (as used for development), the segmentation fault did not occur. Finding the problem was difficult because the segmentation fault did not occur when compiled for Debug.

The debugging method used was the insertion of qDebug() calls until the location of the crash was found. The problem occurred in the new outputLastToken() access function added to the Translator class so that the command translate routines can access the last token added to the RPN output list. The problem was that this function did not actually have a return keyword. When compiled for debug, the correct pointer gets returned, but when compiled for release, this is optimized out and a null gets returned.

[commit 06bd286162] [commit 8c872f68c6]

Saturday, July 13, 2013

LET Command – Single/Multiple Assignments

The design of each command will be reconsidered with the new translator design. Several designs for the LET command were considered, but in the end, the current design seems to the most efficient at run time with the minor exception of multiple sub-string assignments. Excluding sub-string assignments, LET statements are translated as follows:

A = 5.0 A<ref> 5.0 Assign
A,B,C = 5.0 A<ref> B<ref> C<ref> 5.0 AssignList

For multiple assignments, all variables being assigned must be the same data type. The data type of the value being assigned must match the variables being assigned, however, for numeric types, an appropriate hidden conversion code will be added as needed. If the optional LET keyword was specified, the hidden LET sub-code is set in the final assignment token.

Click Continue... For details of the implementation of the LET translation. See the commit log for other minor changes made. Translator tests #1 through #3 (various assignment tests) now pass with the new translator routines.

[commit f965e0f649]

Continued... »

Saturday, July 6, 2013

New Translator – LET Translation (Begin)

Before beginning the implementation of the LET command translation routine, so thought was given on how the project should be organized. The old token centric translator design had the translation of commands embedded throughout the translator, specifically in the token handling functions. With the new translator design being command centric, the various routines (translate, recreate and execute) for each command can be organized into their own files.

A decision was also made not to clutter up the main project directory with all the various command source files, so these files will be put into a sub-directory. The name "basic" was chosen for this sub-directory since all the sources files will be related to the BASIC language. This sub-directory will also contain the execute routines for all of the operators and internal functions of the BASIC language. The various command function prototypes (and any other command related definitions needed) will be put into the commands.h header file in this sub-directory.

The let.cpp source file was created in the basic sub-directory to hold LET command routines. An initial translate routine was implemented for the LET command. For now, this routine just checks if the LET keyword was specified and returns two different BUG Debug statuses to distinguish between the two forms. Finally, a function prototype for this routine was added to the commands.h header file.

Some changes were also needed to the CMake build configuration file starting with added the let.cpp source file to the list of source files. So that the various files in the basic sub-directory can access the various header files, the main project source directory was added to the list of include directories. It turns out that it is not necessary to make the executable dependent on the list of project header files as CMake automatically figures out all the dependent header files for each source file, so this list was removed.

The initial LET function was tested with translator test #5 (LET command tests) to make sure the temporary BUG Debug errors were reported correctly.

[commit 3330b28c9e]

Friday, July 5, 2013

New Translator – Command Translation

To start the translation of a command, a new get command routine was implemented and starts by getting a token taking into account an assignment statement that does not have the LET keyword. If the first token has an error, the "expected command" error is returned since it does not matter what type of parser error was detected.

If the token obtained is a command token, the pointer to the translate function for the command is obtained from the table. Otherwise an assignment statement is assumed and the pointer to the translate function for the LET command is obtained. The token will be passed to the LET translate function.

The interface of the translate functions contains a reference to the translator (so the command can access the various translator routine like the get token and get expression routines), a pointer to the command token (so the command can add it to the output list), and a reference to a token pointer to be used to return the token the terminated the command or where an error was detected (and will be used to pass the first token to the LET translator function for an implied assignment statement). The token status is returned.

If the translate function pointer is not set, the token is marked unused and a "not yet implemented" error is returned. For now, no translate function pointers have been set in the table (none have been implemented). The translate functions will replace the command handlers. The token handlers are not needed with the new translator.

The new translator routine was modified to call either the get expression or the new get command routine depending on the expression mode argument. The expression mode argument does not need to be saved with the new translator routines. A temporary '-nt' test option was added to access the new translator routines for statements, and similarly the '-n' test option was expanded to support translator test files. Obviously none of the translator tests succeed with the new translator routines.

[commit de01f48ebb]

New Translator – Parentheses Expressions (Tagged)

The implementation of expressions for all tokens with parentheses including open and closing parentheses is now complete. The new translator routines now fully support expressions and version v0.4.1 has been tagged. Note that expression test #1 currently still fails with the regression and memory test scripts because of problems with the expression mode in the old translator routines. The new translator routines run all expression tests successfully. Implementation of command translation can now commence in the new translator.

[commit ac6658f16f]

Interactive BASIC Compiler Project

Friday, July 26, 2013

New Translator – Remaining Issues

Wednesday, July 24, 2013

New Translator – Array Assignments

New Translator – Additional Issues

Tuesday, July 23, 2013

New Full Done-Stack Class

Old Translator – INPUT PROMPT Problem

Sunday, July 21, 2013

Memory Testing Issue

New Translator – Minor Code Improvements

Friday, July 19, 2013

New Translator – Sub-String Assignments

Tuesday, July 16, 2013

Print Functions Correction

Sunday, July 14, 2013

Multiple String Assignments – With Sub-Strings

Sub-Strings – New Design

New Translator – Testing and a Correction

Saturday, July 13, 2013

LET Command – Single/Multiple Assignments

Saturday, July 6, 2013

New Translator – LET Translation (Begin)

Friday, July 5, 2013

New Translator – Command Translation

New Translator – Parentheses Expressions (Tagged)

Email

Source and Downloads

Labels

Blog Archive