Interactive BASIC Compiler Project

Tuesday, October 29, 2013

Recreator – Design Considerations

Originally, there was going to be another module, the decoder, which would convert internal program code into an RPN (Reverse Polish Notation) token list (like the translator produces). The recreator would then convert the RPN list back into the program text (close to the originally entered code). Very early on (see December 19, 2009) this step was considered simple and unnecessary and so was combined into the recreator.

The program model needs to detect when a changed line has actually been changed. The edit box sometimes reports changes lines when the line has not actually change. The user could also have simply added spaces to the line (which are not stored) or changed the case of a keyword, which would not result in a change to the internal program code. As previously mentioned, comparing the internal code of the current line with to the new line is problematic.

The new line would first have to be encoded, which will affect the dictionaries. Either this would need to be undone, or the old line removed first to dereference dictionary entries only to have them referenced again when putting in the new line. This is acceptable for simple dictionaries (variables, constants, remarks, etc.), but is much more involved with the blocking commands (IF-END IF, FOR-NEXT, etc.), where the block will probably be kept in a block dictionary.

A better alternative is to convert the program line into an intermediate RPN token list (decoded). The translated RPN list of the new line and be compared to the decoded RPN list of the current line. Since the decode operation can be contained in a single routine like the encode operation, the program model will own this decode routine also.

Since the decode routine will be present, then it makes sense for the recreator to take a decoded RPN list as input to convert to the program text. This method has another advantage for testing. Currently, the test code translates expressions and statements into RPN lists. These RPN lists can then be passed to the recreator for testing, therefore there will be a new test mode for taking these existing tests, translate them to RPN lists, and then recreate them back to text.

Monday, October 28, 2013

Class Definition Consistency

In preparing to create the recreator class, I noticed that all the class definitions were not consistent - some had the private members at the beginning and some had them at the end. Having the member functions at the beginning allow access functions for instance to use them, at least this was the case with early C++ compilers (or at least was my understanding when learning C++ over two decades ago). But this does not appear to be a requirement with modern compilers.

It appears the Qt developers like to put the private member variables at the end of the class. The public function definitions start at the beginning followed by the private section, which start with the private function definitions. So, before embarking on creating of the recreate class definition, the non-conforming classes were changed to this style.

[commit 8fc5c92519]

Sunday, October 27, 2013

Encoder/Program – Release

This concludes the integration of the encoder with the program model and dictionaries for the initial BASIC features being implemented. The program model still keeps all of the translation RPN lists, but these are only used to detect when a line changes. Once the recreator is implemented, these lists will no longer need to be kept.

Version 0.5.3 has been released (branch0.5 was merged to the master branch and tagged v0.5.3). Archive files containing the source and binaries (with test files) for both Windows and Linux have been uploaded to SourceForge. For Windows, there is also the ibcp-libs.zip file that contains all the dynamic linked libraries required to run the program if the MinGW and QtSDK packages have not been installed (extract into the ibcp directory). Linux should already have the required libraries installed. This concludes the 0.5 development series.

Implementation of the recreator will now begin with the 0.6 development series. The recreator will convert the internal program code back to a reasonable facsimile of the originally entered program text. This text will then replace the original text in the edit box.

[commit d984a70c6b]

GUI Program View – Code Output

The program view contents was changed from the text of the translated RPN list to the debug text output of the program code using the debug text routine of the program model. The program model data function is used by the program view widget for getting its contents. Unfortunately, this was not as simple as change as it sounds.

The data function is a constant function (const). Because this function is constant, the debug text function also needed to be a constant function. Making this function constant required several variables in the function to be constant, and the operand text function also needed to be changed to constant. Changing the operand text function to constant required the table operand text function pointers to be constant. Changing these functions required their program model pointer argument to also be change to constant.

[commit f089c78b59]

Dictionaries – Key Case Sensitivity

The BASIC language is traditionally not case sensitive. The keyword lookup of the BASIC commands and operators are already case insensitive. However, the dictionary lookup was not, so variables entered as VAR, Var, and var would be three different variables. This is incorrect. However, only one form can be stored in the dictionary. The first instance of the variable name seen, will be the one stored. Later there will be a facility for renaming variables.

The key map in the dictionary was changed from the QMap class to the QHash class. These two classes are very similar for use in the dictionary class with a few differences. The QMap class stores the key values in order and the QHash class stores them in an arbitrarily order, and QHash provides faster lookups. Since the keys do not need to be sorted (they can always be sorted from the key list in the dictionary), the faster lookup feature is desirable.

To make the lookup of the key value case insensitive, the key string is first converted to all upper case and this all upper case string is use to lookup an index for a key and is the string stored as the hash key. Upon the initial add of the key string to the key list, the original string entered is stored and is the string returned for all instances.

However, the remark and string constants dictionaries must be case sensitive. Therefore a case sensitivity option argument was added to the dictionary class add routine with a default value for a case insensitive lookup. The remark and string constant encode routines use the case sensitive option.

New encoder test #3 was added to test various forms of variables, remarks, string constants and exponential numeric constants. Exponential numeric constants can have either an upper or lower case 'E' for the exponent. Again, the first instance of the constant seen will be stored, so the user can decide which form is desirable.

[commit da56095d1a]

Saturday, October 26, 2013

Program – Enhanced Line Debug Output

The program view will get the same output as produced at the end of encoder test output, which includes the offset range, and the debug text or the error information (column, length and message) of the line. Currently a program line with an error does not have any code associated with it (an input error). However, for a code error, the program line will have been successfully encoded and stored into the program. An example of a code error is a missing ENDIF to an IF.

The generation of the line offset range and error output was moved from the tester class to the program model debug text routine so that it can also be used for the program view. This code was also modified to output both the debug text and the error information instead one or the other. Since input errors have no code, this works as before.

Since the debug text routine is also used by the tester class encode input routine to just obtain the debug text for a line, the debug text routine was given an flag argument for whether to return the full information (offset range, debug text and error information) or just the debug text.

The error information is handled differently by the encode input routine, where the error column and length are used to point to the error. This was modified to get a pointer to the error item for the line instead of the RPN list using the new error item access function added to the program model class, which returns a null pointer if the line does not have an error.

[commit 3fde3da2e0]

Program – Delay Line Encoding

The edit box class has an issue where sometimes unmodified lines are reported as being changed. There is currently a check when replacing a program line where if the line has not changed, no action is taken. Currently the translated RPN list is compared to the stored RPN list for the line. Eventually however, the RPN lists will not be stored and this line change detection will have to be changed. More on this later, but this change will be made once the recreator is implemented.

A newly translated line can't be encoded until it has been determined that the line has changed because the process of encoding adds or updates references in the dictionaries. If the line then hasn't changed, this would need to be undone, which would be unnecessarily complicated. Therefore, the encoding of the line was delayed until after it is determined that the line changed. Since this does not affect new line insertions, the line also needs to be encoded for the insert operation. The line is only encoded in both places if there was no translation error.

[commit b5c9020cc8]

Program – Removing RPN List Dependency

The pointer to the RPN list of a program line from the translator is currently being held in the line information list for the program (along with offset and size of the line and an index to the error list if the line has an error). This pointer will eventually be removed since the RPN list is not needed after a line is encoded and stored in the program. There were two dependencies on the RPN list that needed to be removed.

The program model update error routine used the RPN list from an line information list item to determine if the line has an error. If it did, an error item is created from the RPN list (retrieving the error column, error and message) and stored in the error list. The index of the error item in this list is then stored in the line information item for the line.

Since the RPN list pointer is going to be removed from the line information list, the update error routine was modified to obtain the error information differently. Instead of creating the error item in this routine, the error item is now created in the calling update line routine just after the line is translated and checked for an error. The error item with the error information is passed to the update error routine, which was modified to use it instead of the RPN list.

So that an empty error item can be indicated, a new none error type was added to the error item class. An is empty access function was added to return if the error item does not contain an error. A default constructor was added to create an empty error item. Also for clarity, the translator and encoder error types were renamed to the input and code error types.

Even though currently no encoder errors can occur, the error item constructor was modified from having an RPN list pointer argument to having arguments for the error column, length and message. This will allow setting errors from encoding without an RPN list (the error item class is no longer dependent on the RPN list class).

[commit 0a10ddae84]

Friday, October 25, 2013

Program – Dictionary Debug Output

The program debug output shows the indexes of dictionary entries, but this is insufficient for showing if the dictionary entries were removed correctly and are placed on the free stack of the dictionary for reused. Code was added to output the contents of each dictionary.

The debug text routine was added to the dictionary class that takes a header string as an argument. After appending the header string to the output string, it loops through the dictionary entries and appends the index, use count and string of every entry with a non-zero use count. After the entries, the indexes in the free stack are appended. If any free stack item contains a non-zero use count, the use count is appended after the index. Also, if the item has a non-empty string, the string is also append. The strings of deleted entries should be cleared.

The debug text dictionaries routine was added to the program model class, which calls the debug text routine of each dictionary and appends each to the output string. A call to this routine was added to the tester class run routine after the program model debug text function is called to output the program code. The expected results for encoder test #1 and #2 were updated for the additional dictionary debug output.

[commit 11337cb673]

Program – Dereference Removed Lines

When a line of code is replaced or removed, the use counts of any dictionary entries referenced on the line need to be decremented. If the use counts becomes zero, the entry is no longer being used and needs to be deleted from the dictionary (the entry becomes available for use by another item upon the next add).

The dereference routine was added to the program model to scan a line that is about to be replaced or removed. This routine loops through each program word of the line and removes the reference for any code that has an operand, which is determined by whether the code has a remove function. In the update line routine, this routine is called before the line is replaced or removed.

Table entry remove functions were added for the various REM, constant and variable codes. Each remove function calls the remove routine of appropriate dictionary (just like the encode function calls the add routine of appropriate dictionary to add a reference).

Encoder test #2 contains replace and remove operations, but previously the use counts of dictionary entries were not being decremented. Now that they are, several dictionary entries are now removed since their uses counts become zero and are removed from the dictionary. This allowed new items to be added in the unused entries, which affected the index of several dictionary entries on some of the program lines, therefore the expected results were updated.

[commit a310dc458e]

Monday, October 21, 2013

Program – Error Handling Issues

When checking for memory errors, a use of uninitialized memory error was detected with new encoder test #2. The problem occurred in the remove error routine, which was always adjusting the rest of the errors after the current line. It should have only been doing this if the line was not deleted and did not have an error. Because this routine did not return for this condition, the error index loop variable was not initialized causing the memory error.

When the update error routine was modified back to not returning the status of whether the line has an error, the routine was not changed back to its original code correctly. This caused errors not to be added to the errors list and therefore did not show up when running the GUI. The routine was put back to its original code from an earlier commit.

When a line was replaced with an empty line (for example, a line with an error), the replace line routine of the new LineInfoList class was supposed to call the remove line routine and then return. Instead if was calling the base QList class remove routine and not returning. This caused extra code to be removed from the program.

A line with an error was added as a replacement line to encoder test #2 to verify the corrections described above. The offset for a line with an error was added to the test output to verify that errors lines are added to the program correctly. The offset is needed for when the line with the error is replaced with good code.

[commit d8ef155e5e]

Sunday, October 20, 2013

Program – Operation Testing

With the program line operations (insert, replace and remove) implemented, some automated mechanism was needed to test them using the command line test mode. The encoder test mode was modified to accept a special syntax at the beginning of each test line to indicate a program operation.

The syntax starts with an optional '+' for insert line and '-' for delete line. This is followed by a line index number indicating the line that should be inserted or deleted. If there is just a line index number, the line is replaced. The number is followed by optional spaces (though the number ends when there are no more digits). The number must be within the valid range for the lines currently in the program. The number is optional after a '+' in which case, the line is appended to the end of the program, the same as if the line does not contain this syntax. After a '-' and its number, there must be no statement.

When using this syntax, the normal "Input:" and "Output:" lines are suppressed. This is the difference between using a lone '+' (output suppressed) and no syntax (output not suppressed) for appending a line. After processing this syntax, the characters are removed from the line, and the appropriate call to the update slot routine is made for the specified operation to translate, encode and perform the program operation on the line. However, if a line does have an error, the outputs are not suppressed to report the error.

Encoder test #1 still operates the way it did before since none of the lines contain this additional operation syntax. This test was copied into new encoder test #2 where a lone '+' was added to every line. Several additional lines were added to test #2 to test various program operations. (Note: this test currently produces a memory error that needs to be resolved.)

[commit e6ac67e6b7]

Saturday, October 19, 2013

Additional Memory Issues

Some memory errors were reported when performing memory testing on the current source. Checking previous commits back to the last tag reported the same memory errors, which was strange because the previous commits successfully passed the memory tests. The memory errors were reported in libglib2.0.

I remembered that there was just an update for this library within the past week, which explained why these memory errors were previously not reported. There must be some interaction between the Qt library and the new version of this library. These errors were added to the error suppression file so that they will no longer be reported. These extra errors will not affect the memory tests if the update for this library is not applied.

[commit 7ccd9ac05f]

Program – Replace and Remove Lines

So far, only the insertion of program lines into the program had been implemented. The removal and replacement of program lines was implemented to complete the operations needed. Some additional support functions were also needed. The remove line routine just calls the remove routine of the QVector base class with the offset and size of the line if the size of the line if greater than zero (otherwise, no code needs to be deleted).

The replace line routine was much more involved. If the replacement line size is zero, then the current line is deleted by calling the new remove routine. If the replacement line is larger, then the program code vector is first resized for the net increase in the size of the line. The code after the current line is then moved up by the net increase. If the new line is smaller, then the code after the current line is moved down by the net decrease. The program code vector is then resized by the net decrease in the line. Finally, the contents of the replacement line is copied into the program.

The standard library memmove() function is used to move that program code, which is also used by the base QVector class. The address of the code to move is obtained using the data() function of QVector (the data in the vector is guaranteed to be in continuous memory). However, a word of caution learned while debugging: the data() function must be called after a call to the resize() function since this function may relocate the actual data.

When a line is inserted, removed or replaced with a different size line, the offset of every line after the line needs to be adjusted for the net change in program size. To accomplish this, a new LineInfoList class was implemented based on the QList class. The replace, insert and remove functions were reimplemented to adjust the offset of all lines after the affected line for the net change in program size.

[commit 49257e119d]

Program – Lines With Errors

For a line with a translator error, there is nothing to insert into the program for the line. Even though there is no code to insert, the offset into where the line belongs in the program still needs to be recorded along with a size of zero. However, if the line had an error, the offset and size for the line was not being set. This was corrected, and as a result, the update error routine no longer needs to return whether the line has an error, which was being used as the condition whether to set the offset and size.

[commit a871ca953e]

Monday, October 14, 2013

Encoder Testing – Program Output

Now that encoder testing is inserting the input test lines into an actual program model, it would be helpful to know if the lines are being inserted correctly. The encoded line is output after each input line, and while this is extracted from the program model, it is not known if the lines are actually in the correct location within the program.

The offset and size could have been added to the output line, but this is insufficient for testing if lines are replaced and removed correctly. These operations have not been implemented yet, but will be shortly. Therefore, at the end of testing, all of the lines currently in the program model are output. If a line contains an error, the column, length and message of the error are output.

Included in the output is the index number of the line, the offset range of the line to verify that lines are inserted in the correct place with no gaps between lines, and the debug text output for the line. Blank lines take no space in the program code and so no offset range, so only the offset is output. The offset is needed in case the blank line is replaced. The expected results for encoder test #1 were updated for the new output.

[commit 96060db9b7]