Interactive BASIC Compiler Project: April 2013

Monday, April 29, 2013

Error List Changes Finalized

The method of keeping a list of change operations does not work when multiple lines of the program are changed with more than one error affected because each line of the multiple line change is processed individually. This caused errors will the same index to be repeatedly set leading to the many extra change operations being appended to the change list. Even though this was a failed attempt, it did lead to a clue on how to get the start and end indexes of the changes method working correctly.

The index to the start of the changes only needs to be set the first time since a lower index will not be changed. The index to the end of the changes will still be set if a higher index is changed, but with a slight modification.

The clue to the solution came from the debug output in the edit box receiving errors changed signal. As the list of changes were processed, the debug code simply incremented the index for each change operation except for a remove operation. The index was not incremented for a remove operation since an error was removed from the list and the next operation applied to the same index.

Therefore, before checking if end change index needs to be set to the current higher index, the index is decremented first for a remove operation. The edit box will determines the number of errors that have changed by subtracting the start from the end index and adding one. For example, for a single remove operation, the end index will be one less than the start index, so the number of errors changes will calculate to zero. The number of errors removed (or inserted) is determined by the change in the size of the error list.

The debug output code in the edit box was updated to allow for an end index to be one less than the start index. The code still outputs asterisks for errors that changed (because the index is within the start and end indexes of the changes), but now outputs a minus if no errors were changed (only removed).

[commit 9ea17018cc]

Sunday, April 28, 2013

Error List Changes Revisited

Using a start and an end index to track changes to the error list is not sufficient to be able to reproduce the changes in the edit box. A simple example of this is if one error inserted and one error removed. The next effect is that one error is changed. However, the start and end indexes will indicate that two lines were changed.

A new scheme was implemented where there is just a start index (renamed to simply change index) and a list of change operations preformed on the error list (insert, change and remove). This will allow the edit box to repeat the operations that were performed starting at the change index.

This scheme works because as the change list is built as a program update is processed, the index of each change will be sequential. However, while this works for single line changes (or a group of lines with a single error change), it doesn't work when multiple lines are changed (like with a multiple line insert or delete). Too many changes operations are appended to the change list because each line of the program update change is processed individually.

[commit 28af130fef]

Saturday, April 27, 2013

RPN List – Error Storage

The RPN list class was storing the token where an error was detected, but it only really needed the column and length. Therefore, the code was changed to only store the column and length of the error.

While these changes were being made and tested, translator test #14 (parser errors) failed because there were extra blank lines after error messages. This extra blank line was left in so that the result files would not need to be changed, but this did not affect parser errors. The output of the extra blank line was removed and expected results files were updated.

[commit fdb798281c]

Friday, April 26, 2013

Program Model – Error List Changes

The program model is now keeping a list of errors in the program. This list will be used to create the extra selections in the edit box to highlight the errors in the program. A signal was added to the program model to emit the error list when it changes and a slot was added to the edit box to receive the error list. For now, this slot just outputs the error list with indicators of the start and end errors that were changed.

The error list class was modified to capture the start and end indexes of the changes to a list when the program model receives a lines changed signal. New start and end change indexes were added to the error list class along with access functions - one for resetting these indexes called at the start of the program model update slot, one for checking if there was a change to the error list called at the end of the update slot (to determine if the error list change signal should be emitted), and two to access each index called by the edit box to update the error highlighting.

To catch when errors in the list are modified, the insert, remove and replace functions in the error list class were overloaded. Each calls the QList base function and then calls a new set change index private function, which checks if the index being modified is lower or higher then the current start and end change indexes. New increment and decrement line number functions were also implemented, which call the error item functions and then the set change index function.

The new update errors slot function for the edit box class receives the error list change signal. For now, the function just outputs the error list with indicators on the errors that changed, which was used for debugging.

[commit 666b15b531]

Thursday, April 25, 2013

Another Undo/Redo Problem

Now that the program model is keeping a list of errors in the program, the next incremental change is to report changes to this list to the edit box so that the errors can be highlighted. While testing these changes, another problem was found with the detection of lines changes when using redo.

The problem occurs when the current line is modified (specifically from a redo) and another redo causes a change to occur in another part of the program. The line that was modified does not get reported. This can also occur with an undo, but only when going up and down through the undo/redo stack. The line is not being reported because for undo and redo, the document change signal occurs before the cursor moved signal (where a modified line gets reported).

A check was added towards the beginning of the document change slot after gathering information about the change but before acting upon it. If there is a modified line and the modified line is not at the line of the change, then the modified line is captured before processing the current change.

A complication occurs when the modified line is after the line of the change. When reporting the modified line, the capture routine needs to report the line number contained in the modified line variable, however, when retrieving the text of this line, since the document has already been modified by the undo or redo, the number of the line is not necessarily the same if the undo or redo inserted or deleted lines.

Fortunately, the offset of where the actual program line is (after the undo or redo change) is the same as the net line count change caused by the undo or redo. The capture modified line routine was modified to take an optional offset. The net line count change is passed if the modified line is after the line of the change, otherwise, no offset is needed if the modified line is before the change.

[commit 0049ab8883]

Saturday, April 20, 2013

Program Model – Error List

The next not so small increment to implement the error highlighting was the addition of an error list to the program model that will hold information about each error in the program. A new ErrorList class was implemented based on the QList class of the new ErrorItem class, which will contain the type of error (translator or encoder), line number, column, length and error message.

The errors in the list will be kept in order by line number, which allows finding an error by line number quickly by using a binary search. This class contains a find function with the binary search routine, which returns the index of the line number if found or the nearest error for a line number less than the line number being searched for. The nearest error location is used insert a new error into the list.

The program model update function, which receives program updates, was expanded into several new functions. The update function now calls the new update line function for changed, deleted and inserted lines. The remaining purpose of the update function is to call the appropriate triggers to update the program view.

The new update line function first compiles the line (change and insert), which for now just translates the line. For a change, if line has not changed, returns an indication that no change occurred, otherwise the line is replaced and the new set error function is called. For an insert, new set error function is called and the line is inserted into the program. For a delete, the new remove error function is called, the program line is deleted.

The new set error function (called for change and insert) first determines if the line has an error. If the line does not have an error, any current error (change only) is removed by the new remove error function. If a changed line had an error and still has an error, the old error is replaced with the new error. If the line has a new error, the error is inserted into the error list. The rest of the errors in the list are also adjusted - if a new error was inserted, the error index of the program line for each error is incremented; and if a new line is being inserted, the line number of each error is incremented.

The new remove error function (called for change and delete) determines in the line has an error, which is removed from the error list if it does. The rest of the errors in the list are also adjusted - if an error was deleted, the error index of the program line for each error is decremented; and if a line is being deleted, the line number of each error is decremented.

[commit 7c94dd4630]

Wednesday, April 10, 2013

Program Model – Line Info List

As good practice, the changes to implement the error highlighting will be made in small increments. The first was to replace the program model's list of translated lines (pointers to RPN lists) with he list of line information items, each containing a pointer to the RPN list for the line (for now until the encoder is implemented) and an index to the error list if the line as an error (for now not used).

This change was trivial and just amounted to adding the definition of the LineInfo structure, private to the ProgramModel class since will be the only user of this structure; and changing the uses of the translated lines list member with the line information list member.

[commit aee559d041]

Tuesday, April 9, 2013

Highlighting Errors – List Of Errors

The program code will be stored in a continuous array of words. This will make the program easy to run by simply starting at the beginning of the array and executing instructions until an END or STOP is encountered. There will always be an END at the end of the array. There will be no line markers in this array, so that the run-time module can just execute the code without dealing with line separators. Subroutines and functions will be stored in there own code arrays.

In order to maintain the code as individual program lines, there will be another array containing information about each line. Each element of this line information array will contain an index into the code array to where the line begins. The run-time module will not need to access this array. If the line has an error, the index to the code array value will be invalid. For a line with a warning, the index to the code array will be valid.

There will also be a list of errors (and warnings). This list will correspond to and mirror the list of extra selections that will be used to highlight the errors in the edit box. Each item in this list will have the line number (index) back to the line information array, and the line information array elements with errors (and warnings) will contain an index to this error list. The program can only be run when this list is empty.

Monday, April 8, 2013

Highlighting Errors – Extra Selections

The QPlainTextEdit class, the base class for the EditBox class, contains a feature called Extra Selections, which allows for temporarily marking certain regions of the document with a given format (color and/or style). Each extra selection is specified by the QTextEdit::ExtraSelection structure that consists of a cursor (QTextCursor) and a format (QTextCharFormat). A list (QList) of all the extra selections is given to the plain text edit widget using the setExtraSelection() function.

The selection, the characters to temporarily format, is specified by the cursor, which is initialized from the text cursor of edit widget. The cursor is them set to the beginning position of the selection to temporarily mark and moved to the end of the selection using the keep anchor position option.

The temporary format is specified by the format member. The format has many options including colors (foreground and background), and font style like italicized, bold, underlined, and curvy underline. The underlines can even be made a different color.

For translator errors, a simple red background will be used. Later, another color, possible yellow or light blue, will be used for warnings. There will be no warnings from the translator, but there will be warnings from the encoder. For example, when the beginning of an IF statement or FOR loop is entered before the END IF or NEXT is entered, or vice-versa. These are not errors as the line is valid, but these warnings will also prevent the program from being run until corrected.

Saturday, April 6, 2013

REM Operator (For BASIC Comments)

Implementing the REM operator (single quote) turned out to be simpler than anticipated. The REM operator type of REM always occurs at the end of a line, therefore, it can be treated as the end of the line except that the REM operator token (with comment string) will be added to the end of the RPN output list.

A new REM operator token handler was implemented, which first checks if the command stack is not empty or if the current token mode is not command (occurs for assignment statements without the LET keyword). Otherwise the REM operator being processed occurred at the beginning of the line and no current command needs to be processed.

To process the current command, the end of line token handler is called since it contains all the needed functionality for processing the command when the end of a line is reached. A new end of line token is created and passed to the end of line token handler. A new token was needed in case the command changes the end of line token into another token to add to the RPN output list (as the INPUT command does). If the command does not use this token, the end of line token handler will delete it. If an error is returned, then the end of line token is deleted and the error is returned. Otherwise, the REM operator token is appended to the end of the RPN output list.

Since the REM operator token acts as the end of the line, the table entry for the REM operator code entry was changed to include the end expression and end statement flags. The pointer to the new REM operator token handler was also added. Several new REM tests were added to translator test #15 for the REM operator on various types of commands. Some error tests were also added.

[commit da69b3ba07]

REM Command (For BASIC Comments)

Before continuing with highlighting errors in the edit box, I noticed (when implementing the routine that converts the token contents to text for display in the program view) that the Remark token type was not actually being used. The parser is handling two types of remarks, the REM command and the single quote comment method, which is treated as an operator since it can appear at the end of any line and does not need to be preceded by the colon statement separator.

However, these two tokens (REM command and REM operator) were not being handled in the translator. The REM command token was returning a "Not Yet Implemented" error message. For the REM command token to be handled and not return this error message, the REM code in the table needed to be assigned a token mode. It turned out that the specific token mode assigned was not important as long as it wasn't the default NULL token type, which triggers the error message, because the REM command will always be the last command on a line.

To process the REM command token, a new REM command handler was implemented that simply adds the REM command token (which contains the actual comment text in the string of the token) to the RPN output list. A new translator test (#15) was added for various REM command tests. The REM operator token is a little trickier to handle, which will be implemented next.

[commit 258f3a9df0]

Friday, April 5, 2013

Program Model Translator Integration Complete

The translator (with the parser) has now been integrated with the program model, though for now, the program model is only holding the translated RPN lists of the program lines (or error information). Eventually, the program model will hold the compiled program lines, but the encoder that will do this compiling has not yet been implemented.

The next step is to highlight any translator errors in the edit box. But first, this is a good point to make a development release. The release related files were updated, along with some minor file clean up, for a new release and the repository was given the tag v0.3.4.

[commit 2a19163e3c]

Thursday, April 4, 2013

Comparing RPN Lists With Errors

RPN lists may also contains errors and this needs to be taken into account when comparing RPN lists. The RpnList compare operator function was modified for this by first checking if one list has an error and the other does not. If both has errors, the column and the length of the error token is compared along with the error message. If both lists do not have errors, it proceeds as before comparing the lists.

[commit d810bba51f]

Comparing RPN Lists

Eventually, the program model will hold the program lines in an internal incrementally compiled format the will be ready to run. To detect when a line has changed, either the new line needs to be fully encoded into this internal format, the internal line needs to be converted (recreated) back into text, or both need to be converted to an intermediate comparable format.

I'm not sure at this moment which is best, but recreating the internal line back into text will not work because the recreated lines will not necessarily match the original lines. For example, the line "C=3" and "C = 3" don't match as text, but the internal lines are identical. Therefore, using text to compare is not an option unless the new line is compiled and then recreated (which is wasteful).

For now, the RPN lists will be compared. This required comparison operator functions (for the == and != operators) to be implemented for the Token, RpnItem and RpnList classes. Only the == operator function was fully implemented, with the != operator implemented as the opposite of the == operator. The program line string list member of the program model was moved since it is no longer needed (second commit).

[commit f33bb20df2] [commit 3a2317910c]

Monday, April 1, 2013

Colon Precedence Correction

The segmentation fault mentioned in the last post that was occurring with some files (like accidentally loaded expected translator test file) was caused by the presence of a colon. The colon will be a statement separator or will indicate a label at the beginning of a line. In the translator, the colon is considered an operator. Eventually there will be a special colon token handler, but this has not yet been implemented.

The problem occurred when the colon operator token was processed as a regular operator. The precedence of the colon was set to zero, which is also the same as the NULL operator that is put on top of the of hold stack as a blocker to prevent it from being popped (because all operators are suppose to have higher precedences). However, since the colon also had a zero precedence, the translator popped the NULL operator from the hold stack. The segmentation fault occurred because the NULL operator had no expression information (a NULL pointer).

To correct this problem, the precedence of the colon operator was changed to a four, which is the same precedence as the End-of-Line operator since a colon is also indicate the end of a statement.

[commit d971d6d436]

Interactive BASIC Compiler Project