Interactive BASIC Compiler Project: October 2013

Thursday, October 31, 2013

Table – Has Operand Determination

During the development of the encoder before all of the routines were implemented, it was necessary to have a has operand flag in the table entries for codes that have an operand. This is no longer necessary as the equivalent can now be determined by whether the code has an operand text or an encode function pointer.

The has operand flag was removed from the table entries of these codes. A has operand table access function was added that returns whether the table entry of the code has an operand text function. The tests for the has operand flag were replaced with this new function or use the presence of the operand text or encode function pointer (by using the table access functions for these).

The program model operand text function is also no longer needed, which had special allowance for whether there was no operand text function for a code with the has operand flag. The encode routine also had special allowance for this same condition, but this is also not necessary since all codes with operands have an encode function.

[commit 1e8b92e47c]

Wednesday, October 30, 2013

Program – Line Change Detection

The translated RPN lists for program lines is currently being saved in the line information list. This was only temporary until the RPN lists could be encoded into program code and stored. This mechanism was left in place since it is still being used for line change detection. As described in the last post, RPN lists will still be compared to detect line changes except that the program code will be decoded into an RPN list.

However, the form of the decoded RPN list will be slightly different then a translated RPN list. The method of RPN list comparison was to compare each RPN item, where the token, the attached token count, and if non-zero, each attached token was compared. For the token comparison, the token type, the data type for constants, the code for commands and operators except the REM command and operator, the code for internal functions, the string for other token types, the reference flag, and the sub-code.

The tokens in a decoded RPN list will only have the code, sub-code (only program sub-codes), and a string (for all types). Therefore, only these members of the token are compared. For the RPN items, it is unnecessary to compare the attached tokens since these only refer to other tokens in the list, and the token comparisons are sufficient to catch differences, so only the attached token count needs to be compared. In fact, a decoded RPN list will not have attached tokens, only the count, since these will not be necessary for recreation.

Token comparison boils down to comparing the code, sub-code masked by the program only sub-codes, and the string. There is one other issue when comparing the strings that would cause the previous comparison to incorrect detect a change. A non-case sensitive comparison must be used except for the REM, REM operator and string constant codes.

Where testing these changes in the GUI, a token (memory) leak was discovered, which occurred when the line did not change. The program was due to the RPN list of the line to replace was not used in this case, but was not deleted. This was corrected in additional to updating the RPN item and tokens comparison routines.

[commit 16c85dfe79]

Tuesday, October 29, 2013

Recreator – Design Considerations

Originally, there was going to be another module, the decoder, which would convert internal program code into an RPN (Reverse Polish Notation) token list (like the translator produces). The recreator would then convert the RPN list back into the program text (close to the originally entered code). Very early on (see December 19, 2009) this step was considered simple and unnecessary and so was combined into the recreator.

The program model needs to detect when a changed line has actually been changed. The edit box sometimes reports changes lines when the line has not actually change. The user could also have simply added spaces to the line (which are not stored) or changed the case of a keyword, which would not result in a change to the internal program code. As previously mentioned, comparing the internal code of the current line with to the new line is problematic.

The new line would first have to be encoded, which will affect the dictionaries. Either this would need to be undone, or the old line removed first to dereference dictionary entries only to have them referenced again when putting in the new line. This is acceptable for simple dictionaries (variables, constants, remarks, etc.), but is much more involved with the blocking commands (IF-END IF, FOR-NEXT, etc.), where the block will probably be kept in a block dictionary.

A better alternative is to convert the program line into an intermediate RPN token list (decoded). The translated RPN list of the new line and be compared to the decoded RPN list of the current line. Since the decode operation can be contained in a single routine like the encode operation, the program model will own this decode routine also.

Since the decode routine will be present, then it makes sense for the recreator to take a decoded RPN list as input to convert to the program text. This method has another advantage for testing. Currently, the test code translates expressions and statements into RPN lists. These RPN lists can then be passed to the recreator for testing, therefore there will be a new test mode for taking these existing tests, translate them to RPN lists, and then recreate them back to text.

Monday, October 28, 2013

Class Definition Consistency

In preparing to create the recreator class, I noticed that all the class definitions were not consistent - some had the private members at the beginning and some had them at the end. Having the member functions at the beginning allow access functions for instance to use them, at least this was the case with early C++ compilers (or at least was my understanding when learning C++ over two decades ago). But this does not appear to be a requirement with modern compilers.

It appears the Qt developers like to put the private member variables at the end of the class. The public function definitions start at the beginning followed by the private section, which start with the private function definitions. So, before embarking on creating of the recreate class definition, the non-conforming classes were changed to this style.

[commit 8fc5c92519]

Sunday, October 27, 2013

Encoder/Program – Release

This concludes the integration of the encoder with the program model and dictionaries for the initial BASIC features being implemented. The program model still keeps all of the translation RPN lists, but these are only used to detect when a line changes. Once the recreator is implemented, these lists will no longer need to be kept.

Version 0.5.3 has been released (branch0.5 was merged to the master branch and tagged v0.5.3). Archive files containing the source and binaries (with test files) for both Windows and Linux have been uploaded to SourceForge. For Windows, there is also the ibcp-libs.zip file that contains all the dynamic linked libraries required to run the program if the MinGW and QtSDK packages have not been installed (extract into the ibcp directory). Linux should already have the required libraries installed. This concludes the 0.5 development series.

Implementation of the recreator will now begin with the 0.6 development series. The recreator will convert the internal program code back to a reasonable facsimile of the originally entered program text. This text will then replace the original text in the edit box.

[commit d984a70c6b]

GUI Program View – Code Output

The program view contents was changed from the text of the translated RPN list to the debug text output of the program code using the debug text routine of the program model. The program model data function is used by the program view widget for getting its contents. Unfortunately, this was not as simple as change as it sounds.

The data function is a constant function (const). Because this function is constant, the debug text function also needed to be a constant function. Making this function constant required several variables in the function to be constant, and the operand text function also needed to be changed to constant. Changing the operand text function to constant required the table operand text function pointers to be constant. Changing these functions required their program model pointer argument to also be change to constant.

[commit f089c78b59]

Dictionaries – Key Case Sensitivity

The BASIC language is traditionally not case sensitive. The keyword lookup of the BASIC commands and operators are already case insensitive. However, the dictionary lookup was not, so variables entered as VAR, Var, and var would be three different variables. This is incorrect. However, only one form can be stored in the dictionary. The first instance of the variable name seen, will be the one stored. Later there will be a facility for renaming variables.

The key map in the dictionary was changed from the QMap class to the QHash class. These two classes are very similar for use in the dictionary class with a few differences. The QMap class stores the key values in order and the QHash class stores them in an arbitrarily order, and QHash provides faster lookups. Since the keys do not need to be sorted (they can always be sorted from the key list in the dictionary), the faster lookup feature is desirable.

To make the lookup of the key value case insensitive, the key string is first converted to all upper case and this all upper case string is use to lookup an index for a key and is the string stored as the hash key. Upon the initial add of the key string to the key list, the original string entered is stored and is the string returned for all instances.

However, the remark and string constants dictionaries must be case sensitive. Therefore a case sensitivity option argument was added to the dictionary class add routine with a default value for a case insensitive lookup. The remark and string constant encode routines use the case sensitive option.

New encoder test #3 was added to test various forms of variables, remarks, string constants and exponential numeric constants. Exponential numeric constants can have either an upper or lower case 'E' for the exponent. Again, the first instance of the constant seen will be stored, so the user can decide which form is desirable.

[commit da56095d1a]

Saturday, October 26, 2013

Program – Enhanced Line Debug Output

The program view will get the same output as produced at the end of encoder test output, which includes the offset range, and the debug text or the error information (column, length and message) of the line. Currently a program line with an error does not have any code associated with it (an input error). However, for a code error, the program line will have been successfully encoded and stored into the program. An example of a code error is a missing ENDIF to an IF.

The generation of the line offset range and error output was moved from the tester class to the program model debug text routine so that it can also be used for the program view. This code was also modified to output both the debug text and the error information instead one or the other. Since input errors have no code, this works as before.

Since the debug text routine is also used by the tester class encode input routine to just obtain the debug text for a line, the debug text routine was given an flag argument for whether to return the full information (offset range, debug text and error information) or just the debug text.

The error information is handled differently by the encode input routine, where the error column and length are used to point to the error. This was modified to get a pointer to the error item for the line instead of the RPN list using the new error item access function added to the program model class, which returns a null pointer if the line does not have an error.

[commit 3fde3da2e0]

Program – Delay Line Encoding

The edit box class has an issue where sometimes unmodified lines are reported as being changed. There is currently a check when replacing a program line where if the line has not changed, no action is taken. Currently the translated RPN list is compared to the stored RPN list for the line. Eventually however, the RPN lists will not be stored and this line change detection will have to be changed. More on this later, but this change will be made once the recreator is implemented.

A newly translated line can't be encoded until it has been determined that the line has changed because the process of encoding adds or updates references in the dictionaries. If the line then hasn't changed, this would need to be undone, which would be unnecessarily complicated. Therefore, the encoding of the line was delayed until after it is determined that the line changed. Since this does not affect new line insertions, the line also needs to be encoded for the insert operation. The line is only encoded in both places if there was no translation error.

[commit b5c9020cc8]

Program – Removing RPN List Dependency

The pointer to the RPN list of a program line from the translator is currently being held in the line information list for the program (along with offset and size of the line and an index to the error list if the line has an error). This pointer will eventually be removed since the RPN list is not needed after a line is encoded and stored in the program. There were two dependencies on the RPN list that needed to be removed.

The program model update error routine used the RPN list from an line information list item to determine if the line has an error. If it did, an error item is created from the RPN list (retrieving the error column, error and message) and stored in the error list. The index of the error item in this list is then stored in the line information item for the line.

Since the RPN list pointer is going to be removed from the line information list, the update error routine was modified to obtain the error information differently. Instead of creating the error item in this routine, the error item is now created in the calling update line routine just after the line is translated and checked for an error. The error item with the error information is passed to the update error routine, which was modified to use it instead of the RPN list.

So that an empty error item can be indicated, a new none error type was added to the error item class. An is empty access function was added to return if the error item does not contain an error. A default constructor was added to create an empty error item. Also for clarity, the translator and encoder error types were renamed to the input and code error types.

Even though currently no encoder errors can occur, the error item constructor was modified from having an RPN list pointer argument to having arguments for the error column, length and message. This will allow setting errors from encoding without an RPN list (the error item class is no longer dependent on the RPN list class).

[commit 0a10ddae84]

Friday, October 25, 2013

Program – Dictionary Debug Output

The program debug output shows the indexes of dictionary entries, but this is insufficient for showing if the dictionary entries were removed correctly and are placed on the free stack of the dictionary for reused. Code was added to output the contents of each dictionary.

The debug text routine was added to the dictionary class that takes a header string as an argument. After appending the header string to the output string, it loops through the dictionary entries and appends the index, use count and string of every entry with a non-zero use count. After the entries, the indexes in the free stack are appended. If any free stack item contains a non-zero use count, the use count is appended after the index. Also, if the item has a non-empty string, the string is also append. The strings of deleted entries should be cleared.

The debug text dictionaries routine was added to the program model class, which calls the debug text routine of each dictionary and appends each to the output string. A call to this routine was added to the tester class run routine after the program model debug text function is called to output the program code. The expected results for encoder test #1 and #2 were updated for the additional dictionary debug output.

[commit 11337cb673]

Program – Dereference Removed Lines

When a line of code is replaced or removed, the use counts of any dictionary entries referenced on the line need to be decremented. If the use counts becomes zero, the entry is no longer being used and needs to be deleted from the dictionary (the entry becomes available for use by another item upon the next add).

The dereference routine was added to the program model to scan a line that is about to be replaced or removed. This routine loops through each program word of the line and removes the reference for any code that has an operand, which is determined by whether the code has a remove function. In the update line routine, this routine is called before the line is replaced or removed.

Table entry remove functions were added for the various REM, constant and variable codes. Each remove function calls the remove routine of appropriate dictionary (just like the encode function calls the add routine of appropriate dictionary to add a reference).

Encoder test #2 contains replace and remove operations, but previously the use counts of dictionary entries were not being decremented. Now that they are, several dictionary entries are now removed since their uses counts become zero and are removed from the dictionary. This allowed new items to be added in the unused entries, which affected the index of several dictionary entries on some of the program lines, therefore the expected results were updated.

[commit a310dc458e]

Monday, October 21, 2013

Program – Error Handling Issues

When checking for memory errors, a use of uninitialized memory error was detected with new encoder test #2. The problem occurred in the remove error routine, which was always adjusting the rest of the errors after the current line. It should have only been doing this if the line was not deleted and did not have an error. Because this routine did not return for this condition, the error index loop variable was not initialized causing the memory error.

When the update error routine was modified back to not returning the status of whether the line has an error, the routine was not changed back to its original code correctly. This caused errors not to be added to the errors list and therefore did not show up when running the GUI. The routine was put back to its original code from an earlier commit.

When a line was replaced with an empty line (for example, a line with an error), the replace line routine of the new LineInfoList class was supposed to call the remove line routine and then return. Instead if was calling the base QList class remove routine and not returning. This caused extra code to be removed from the program.

A line with an error was added as a replacement line to encoder test #2 to verify the corrections described above. The offset for a line with an error was added to the test output to verify that errors lines are added to the program correctly. The offset is needed for when the line with the error is replaced with good code.

[commit d8ef155e5e]

Sunday, October 20, 2013

Program – Operation Testing

With the program line operations (insert, replace and remove) implemented, some automated mechanism was needed to test them using the command line test mode. The encoder test mode was modified to accept a special syntax at the beginning of each test line to indicate a program operation.

The syntax starts with an optional '+' for insert line and '-' for delete line. This is followed by a line index number indicating the line that should be inserted or deleted. If there is just a line index number, the line is replaced. The number is followed by optional spaces (though the number ends when there are no more digits). The number must be within the valid range for the lines currently in the program. The number is optional after a '+' in which case, the line is appended to the end of the program, the same as if the line does not contain this syntax. After a '-' and its number, there must be no statement.

When using this syntax, the normal "Input:" and "Output:" lines are suppressed. This is the difference between using a lone '+' (output suppressed) and no syntax (output not suppressed) for appending a line. After processing this syntax, the characters are removed from the line, and the appropriate call to the update slot routine is made for the specified operation to translate, encode and perform the program operation on the line. However, if a line does have an error, the outputs are not suppressed to report the error.

Encoder test #1 still operates the way it did before since none of the lines contain this additional operation syntax. This test was copied into new encoder test #2 where a lone '+' was added to every line. Several additional lines were added to test #2 to test various program operations. (Note: this test currently produces a memory error that needs to be resolved.)

[commit e6ac67e6b7]

Saturday, October 19, 2013

Additional Memory Issues

Some memory errors were reported when performing memory testing on the current source. Checking previous commits back to the last tag reported the same memory errors, which was strange because the previous commits successfully passed the memory tests. The memory errors were reported in libglib2.0.

I remembered that there was just an update for this library within the past week, which explained why these memory errors were previously not reported. There must be some interaction between the Qt library and the new version of this library. These errors were added to the error suppression file so that they will no longer be reported. These extra errors will not affect the memory tests if the update for this library is not applied.

[commit 7ccd9ac05f]

Program – Replace and Remove Lines

So far, only the insertion of program lines into the program had been implemented. The removal and replacement of program lines was implemented to complete the operations needed. Some additional support functions were also needed. The remove line routine just calls the remove routine of the QVector base class with the offset and size of the line if the size of the line if greater than zero (otherwise, no code needs to be deleted).

The replace line routine was much more involved. If the replacement line size is zero, then the current line is deleted by calling the new remove routine. If the replacement line is larger, then the program code vector is first resized for the net increase in the size of the line. The code after the current line is then moved up by the net increase. If the new line is smaller, then the code after the current line is moved down by the net decrease. The program code vector is then resized by the net decrease in the line. Finally, the contents of the replacement line is copied into the program.

The standard library memmove() function is used to move that program code, which is also used by the base QVector class. The address of the code to move is obtained using the data() function of QVector (the data in the vector is guaranteed to be in continuous memory). However, a word of caution learned while debugging: the data() function must be called after a call to the resize() function since this function may relocate the actual data.

When a line is inserted, removed or replaced with a different size line, the offset of every line after the line needs to be adjusted for the net change in program size. To accomplish this, a new LineInfoList class was implemented based on the QList class. The replace, insert and remove functions were reimplemented to adjust the offset of all lines after the affected line for the net change in program size.

[commit 49257e119d]

Program – Lines With Errors

For a line with a translator error, there is nothing to insert into the program for the line. Even though there is no code to insert, the offset into where the line belongs in the program still needs to be recorded along with a size of zero. However, if the line had an error, the offset and size for the line was not being set. This was corrected, and as a result, the update error routine no longer needs to return whether the line has an error, which was being used as the condition whether to set the offset and size.

[commit a871ca953e]

Monday, October 14, 2013

Encoder Testing – Program Output

Now that encoder testing is inserting the input test lines into an actual program model, it would be helpful to know if the lines are being inserted correctly. The encoded line is output after each input line, and while this is extracted from the program model, it is not known if the lines are actually in the correct location within the program.

The offset and size could have been added to the output line, but this is insufficient for testing if lines are replaced and removed correctly. These operations have not been implemented yet, but will be shortly. Therefore, at the end of testing, all of the lines currently in the program model are output. If a line contains an error, the column, length and message of the error are output.

Included in the output is the index number of the line, the offset range of the line to verify that lines are inserted in the correct place with no gaps between lines, and the debug text output for the line. Blank lines take no space in the program code and so no offset range, so only the offset is output. The offset is needed in case the blank line is replaced. The expected results for encoder test #1 were updated for the new output.

[commit 96060db9b7]

Sunday, October 13, 2013

Program – Single Code Vector

The program model was modified to contain a single code vector of program words so that during execution, no handling of individual lines is needed. Program execution will simply flow from one line to the next as if it was a single block of code. This will allow for fast execution. The temporary line code vector that was added to the line info list was replaced with an offset into the single code vector and the code size of the line.

The code vector was previously just a QVector of the program words. However, the QVector class does not have an existing function for inserting another vector into a vector. Therefore, a new ProgramCode class was created based on the QVector class. This was similar to the ProgramLine that was previously implemented and then removed. This time, a new insert line routine was implemented to insert a vector (code for a line) into another vector (the single code for the program).

The new insert line routine does nothing if the line to insert is empty. Otherwise the program is first resized to allow for the new line. A hole is made for the new line if the insert point is not at the end of the program. Finally, the code of the line is copied into the program code.

The update line routine was modified (insert operation only) to insert the code for a line that does not have an error. The offset for the new line is determined. If the new line will be at the end of the program, the offset is set to the current size of the program, otherwise the offset is set to the offset of the line being inserted before. The size is set to the size of the line code, and the line is inserted into the program.

The update error routine was modified to return if the line being inserted has an error. This is used to determine if the code for the line is to be inserted into the program. If the line has an translator error, the line was not encoded, so there is no code to insert into the program, though information for the line is still inserted into the line info list.

The debug text routine previously just retrieved the code vector inserted into the line info list for a specified line. Since this code vector was removed, this routine was modified to calculate the address of the line by adding the offset for the line to the base address of the program code vector. The size is obtained from the new size added to the line info list.

[commit dd94ed0ef9]

Program Model Update – Encoding

The program model needs to encode lines when an update signal is received once each line has been successfully translated. The program model update slot routine receives this signal from the edit box, which includes the line number, number of lines deleted, number of lines inserted, and a list of strings of the lines. The type of operations needed on the program is then determined (change, insert or remove lines) and calls the update line routine to perform the operation for each line affected.

The update line routine was modified to encode the line after a successful translate. A temporary program word vector was added to the line info list to the code for the line. The line info list currently holds a pointer to the translated RPN list (temporary), and an index to the error list if the line has an error. The program model will contain a single program word vector for the entire program unit with the line info list containing offsets and sizes of the lines within this vector. This will allow execution through the program without using time dealing with lines. Only the insert operation was modified to store the line code vector at this time.

The tester class encode input routine was modified to let the program model translate and encode the line, by calling the update slot routine directly to insert a single line to the end of the program. An access function to the RPN list pointer was added to the program model to get the RPN list for a line to determine if there was a translation error and to get the information about the error (column, length and message). The debug text routine was modified to take a line index argument instead of a program word pointer and a word count, which uses the line index to get the program line code and work count. Because the program model still possesses the RPN list, the encode input routine can no longer delete it.

As a result, a token memory leak occurred (which could previously be seen when running the GUI and then exiting) because the program model was not deleting the RPN lists contained in the line info list when the application exited. This was corrected by deleting all of the RPN lists in the program model destructor. The tester class run routine was modified not to report token errors after each line, which will be reported when the application terminates.

[commit 49b3231acb]

Saturday, October 12, 2013

Encoder Class – Removal

The next step is to implement the code to insert the encoded lines into the program (currently the translated RPN lists are stored). Part of this process that is already implemented is to increment the use counts for dictionary items when a line is inserted into the program. The encoder handled this part. Related to this is when a line is removed from the program - the use counts need to be decremented for dictionary items that were on the line (deleting a dictionary item when it is no longer used).

This remove line process appeared to be mirror to the encode (add) line process and therefore the question arose whether the encoder class should also contain a remove routine. Looking at the current encoder class, it currently contained the single encode routine and no member variables (except for a reference to the table instance). It also required a pointer to the current program unit, which is only used to pass to the encode functions for each code with an operand. A reference to the encoder was not passed to the encode function since the encoder class did not contain anything that was needed.

So the conclusion was made that the encoder class was not really necessary and that the encode routine should be in the program model (which contains all the dictionaries needed for encoding) along with the remove routine that will be implemented. Therefore, the encoder class was removed and the encode routine was moved to the program model class.

[commit 85827d49e8]

Friday, October 11, 2013

Program – Unit Vs. Model

The original plan was for program unit class to contain the code and dictionaries for a single unit (main routine, subroutine, or function) and the program model to contain all the program units. The program model class contained the functions for manipulating the program (inserting, replacing and removing lines), which needed to be moved to the program unit class. However, these functions are triggered by signals from the edit box, and also generate signals for updating the program view dock widget. The program model class still needs to be connected to receive and generate these signals.

Eventually when subroutines and functions are implemented, additional edit box windows can be created for each program unit opened. Each edit box opened will be connected to a program unit. This implies that the program model should only contain a single program unit that will be connected to an edit box when opened for editing, and there will be a program model for each program unit. Something else will need to contain a list of all the program units.

Currently, the main window class contains the edit box and program model instances. It is logical that when subroutines and functions are implemented, the main window will contain the list of program units, which are really program unit models. As each program unit is opened for editing, the main window will create an additional edit box window.

Therefore, the program unit class members were merged into the program model class. While doing this, all the member variables were placed at the end of the class definition to follow the same style that Qt uses. This was also done with the program word class. The other classes that don't currently follow this convention will be modified at some future point.

The constructor of the program model class was modified to get the table instance directly instead of being passed the table instance as an argument. The table instance is then used to create the translator instance before proceeding to create the dictionaries. Eventually the constructor will need to be aware if a subroutine or function program unit is being created and get the translator instance along with pointers to the global dictionaries (like the constant and remark dictionaries) from the main routine program unit.

[commit f3b76d057c]

Sunday, October 6, 2013

Encoding – Second Phase (Tagged)

This concludes the implementation of the encoder for the planned components needed to getting all modules up to the run-time working. The next step is fully integrating the encoder into the program model, where lines are inserted into, placed and removed from the program. Before continuing, the source was tagged version v0.5.2.

[commit cf579cd389]

Constant String Dictionary

The constant string dictionary was a little tricky to get working because of a difficult to resolve memory leak. The constant string dictionary will also be an information dictionary, but was not defined as an information template dictionary directly. The information required for this dictionary contains a pointer to a QString instance. At run-time, an array of QString pointers will be used for the string constants (though the pointers will be wrapped within a class syntactically).

As for number constants, a constant string information class was added with a lone QString pointer to the string constant value. Even though the information will contain a new instance of a QString, due to the implicit sharing feature of Qt, the underlying string will reference the original string in the token. The original string will also similarly be referenced by the key list and key map members of the dictionary. After the token is deleted, the string is still referenced three times (information element, key list and key map). The string is not deallocated until all three instances are removed.

When the constant string information element is created for a new element, a new QString instance is created from the string in the token. To prevent a memory leak, these QString instances need to be deleted. A destructor was needed for the constant string dictionary, so a new constant string dictionary class was created. This new class overloads the destructor, which deletes all the QString instances in the information elements. It will be necessary to nullify this QString pointer when the string constant is removed from the dictionary. The destructor then just deletes all the instance pointers and if one is null then no action is taken.

Initially, an information class instance was created locally in the encode routines and a reference to it was passed to the information dictionary add routine. For constant strings, a QString instance was created in this local instance. This was not an issue when the copy of the instance was put into the information vector for a new or reused instance as the instance pointer was copied into the vector. However, a memory leak occurred when the constant was already in the dictionary and the instance was left hanging (the QString instance was not deleted). This was previously resolved in the information dictionary template class by having its add routine create the instance only when adding a new item.

To complete the implementation, a constant string dictionary was added to the program unit class with an access function. The constant string encode and operand text routines were added that simply call the dictionary add and string routines. Finally, pointers to these functions were added to the associated table entry. The expected results file for encoder test #1 was updated for the numeric constants that are now in the output. A new statement was added with a string containing an embedded double quote character.

[commit 8a85668458]

Constant Number Dictionary

The information template dictionary will be used for the constant number dictionary. The additional information will hold the double and integer value of the constant, and is obtained from the same members in the token. Even through for double constants that can't be converted to an integer, the integer value (which is not set) is still copied to the information element, but it won't actually be used at run-time.

A constant number information class was defined containing the double and integer values. This class was given a blank default constructor (required for the QVector class), and a constructor taking a token pointer to initialize the values (required for the information template dictionary).

To complete the implementation, a constant number dictionary was added to the program unit with allocation and an access function. The constant number encode and operand text routines were added that simply call the dictionary add and string routines. Finally, pointers to these functions were added to the associated table entries. The expected results file for encoder test #1 was updated for the numeric constants that are now in the output.

[commit babfb758ec]

Information Template Dictionary

The numeric and string constant dictionaries have slightly different requirements and so will be handled separately. Both of these dictionaries will have additional information beyond what is in the base dictionary class, namely the constant values themselves. For both of these dictionaries, this additional information will be stored in a QVector. At run-time, the constant data for the QVector will be obtained as an array. This will allow for fast access of the constant values by the indexes stored in the program code.

The Qt QList and QVector classes are almost identical and both can access elements by index. The big difference is that the elements in the QVector are guaranteed to be stored in consecutive memory addresses. This allows the data to be access like a C/C++ array. If this consecutive access is not required, then the QList class should be used, though the only advantage appears to be in the time it takes to prepend items to the beginning of the list (QList is faster).

An information dictionary template class was created based on the dictionary class, which adds a vector of generic information class elements. The add routine was overloaded, which first calls the base dictionary add routine. The base dictionary add routine was modified from just returning whether a new item was added or not, to returning the status of whether a new entry was appended to the end of the list, an freed entry was reused, or if the entry already exists. This status is then used to either append a new information element, replace an information element, or do nothing if the item already exists.

The QVector class was chosen over the QList class for the information so that an array can be used at run-time for constants. There will be other dictionaries with information that won't have this run-time array requirement, but as noted above, the only disadvantage to using QVector instead of QList is prepending to the beginning of the list, and that operation will not be needed.

One requirement for the generic information class is that it must have a constructor that takes a pointer to a token. For constants, the information is obtained from the token. Eventually for other dictionaries, other information beyond what is contained in a token may be needed, at which time, additional argument(s) will be added to the constructors and to the information dictionary add routine, which now just takes a pointer to a token.

[commit 6ffb6901a1]

Saturday, October 5, 2013

Variable Dictionaries

When the program is run, the values for the variables need to be initialized (numbers to zero, and strings to empty). If all three data types of variables are stored in the same dictionary, the initialization routine would need to check the data type for each variable before initializing its value. This is not necessary if a variable dictionary contains a single data type, in which case it just has to loop through and initialize a single value type. The variable dictionaries will not hold the actual variable values. The run-time module will allocate and own the arrays with the values.

For now, no other information is needed in the variable dictionaries. The variable name is stored as the dictionary key, and the data type is implied by which dictionary contains the variable. Therefore at this moment, the base dictionary class is sufficient for the variable dictionaries. However, eventually addition information will need to be stored with the variable.

To complete the implementation of the variable dictionaries, three dictionaries, one for each data type, were added to the program unit class with the associated allocation and access functions. Encode and operand text routines were implemented for each of the three data types. These routines call the add (for encode) or string (for operand text) routines of the variable dictionary for the associated data type. These functions were placed in a the new basic.cpp source file, which will also contain the run-time routines for the variable and variable reference codes along with other miscellaneous functions (the two rem functions were also moved into this file).

Finally, pointers to these functions were placed in the six table entries for these codes (variable and variable reference times the three data types). Several new statements were added to encoder test #1 so that there are more than one variable of each data type. The expected results file for this test was updated for the variable names that are now in the output along with the output of the new statements.

[commit 2843b44761]

Operand Text / Remark Operands

Next up was to retrieve the text for the operand (comment string) of the remark command or operator from the dictionary using the index that is stored in the program code. The recreator also needs to do this, therefore, a mechanism was implemented to allow for the getting the text for the debug output, which will later be used for the recreator.

A new operand text function pointer was added to the table entry. This function is only needed for codes that have an operand, and an access function was added for this pointer. These functions have arguments for the pointer to the current program unit (to access the dictionaries) and the operand. The rem operand text function was implemented to get the text of the remark for an operand (an index) by passing it to the new dictionary string access function (gets the text from the key list member).

The program line text routine was renamed to debug text since it is only used for generating text for test output. Also, since this routine needs access the program unit for the dictionaries to look index values, it was moved to the program unit class. Since the program line class was only created to add the text routine to a QVector, this class was removed and QVector<ProgramWord> is now used.

The operand debug text (renamed) routine outputs the index of the operand along with the value of the operand, and needed to output the text of the operand. To get the pointer to the operand text function, it would need to be passed the code for the operand and would need the table instance. Instead, this functionality is handled by the debug text routine and the resulting text is passed to this routine to add the word index and operand index values.

The debug text routine also needs the table instance, so a table reference was added to the program unit class. This routine will also be called when encoded debug text is integrated into the program view of the GUI. The test routine called this routine with a program line (now a program word vector), however, when used for the program view, a program word vector would need to be extracted from the program unit code vector. This would require a vector to be copied from part of the total program code vector. To prevent this copy, the debug text was modified to take a program word pointer and a count of the number of words in the line.

Some minor cleanup of the table entries were performed on a separate commit (placed function pointers on their own lines, corrected the camel casing of the rem encode routine, and fixed a comment). The expected results file of encoder test #1 was updated for the remark comments that are now in the output.

[commit f5fe1efa4f] [commit bf9264bafb]

Thursday, October 3, 2013

Remark Dictionary

The remark dictionary is the first dictionary implemented. This is the simplest dictionary as it only needs to store the string of a comment, so the base dictionary is sufficient for this dictionary. The ProgramUnit class was added to contain the remark dictionary, which is defined as a pointer and allocated in the ProgramUnit constructor. A pointer was used since all the other program units will use the same remark dictionary.

Previously, the encoder encode routine just set the operand to zero and was modified to call the encode function for the code with an operand. For now, if the code does not have an encode function, the operand is still set to zero. The encode functions contain arguments for the pointer to the current program unit and the pointer to the token being encoded. Additional arguments will probably be needed, but these are sufficient for now.

The rem encode function was implemented, which simply passes the token to the add function of the remark dictionary and returns the index of the entry to be set as the operand word. The pointer to an encode function was added to the table entry structure and a table access function was added to return this pointer. A pointer to the rem encode function was added to the REM command and REM operator code table entries.

For testing, a temporary program unit was added to the main tester run routine and a pointer to it is passed to the encode input routine, which passed to the encode routine. Additional remarks were added to encoder test #1 including one set of duplicate comments to test that a single entry is stored in the dictionary. Currently only the index value is output as the string (of the remark) is not yet retrieved, which will be the subject of the next change. For now, the current output of this test (though incomplete) was copied for the expected results. Though this makes the test pass, the purpose for changing this file was detect changes as more of the encode routines are implemented.

[commit d2222df1d8]

Base Dictionary Class

A dictionary will hold a list of strings (keys) for the program for a particular type of item (remark, constants, variables, arrays, etc.). The index of the name will be stored in the program. When the program is running, the index will be used to access an item (constant, variable value, etc.). When the program is converted back to text (recreated) for display in the GUI, the index will be used to recover the originally entered string.

There are several member variables and functions that every dictionary will have. The variables include a stack of index values of entries that have been removed sp they can be reused, a list of entry keys, a map of entry keys to index values, and a entry use count. The functions include adding and removing entries to and from the dictionary.

The key list variable (QStringList) holds the list of keys. The index of the key within this list is the index stored in the program. The index is converted back to a string by using the key list. The purpose of the key map variable is for an efficient way of looking up a string in the dictionary for its index. The string could be looked up in the key list but would require a slow sequential search since it is unsorted. The key map variable (QMap<QString,&nbspint>) where the map stores the string as the key and the index as the value and allows for fast look up of keys. Since the QString class supports the implicit sharing feature of Qt, there is only one actual copy of each string.

The use count variable (QList<quint16>) hold the number of times the string is used in the program. On first use, the use count entry for the key is set to one. For each additional use, the use count for the key is incremented, and for each time it is removed from the program, the use count is decremented. When the use count for a key drops to zero, the key is removed from the dictionary. The use count list mirrors the key list.

The add function starts by looking up the string of a token in the key map to get its index. If not in the key map, the string is added to the dictionary. If the free stack is empty, then the index for the string is set to the size of the key list and the string is appended to it. A value of one is appended the use count list. Otherwise, an index of popped from the free stack and the token string is set at that location in the key list and a one is set in the use count list. The token string is added to the key map. If already in the dictionary, the use count entry for the string is incremented. The status of whether a new entry was added to the dictionary is returned, which will be used by derived dictionaries to know when to add additional information to the dictionary.

The remove function decrements the use count for index of an entry. If the use count is zero, the key is obtained from the key list and removed from the key map. The string in the key list is cleared (freeing the string). The now unused index is pushed to the free stack for reuse when a new string is added.

Tuesday, October 1, 2013

Program Components And Organization

The dictionaries are needed for the next step of the encoder to be implemented, which means that the encoder needs access to the dictionaries. Temporary dictionaries could be implemented in the tester class and maid available to its encoder instance. Eventually though, the program model will be the owner of the dictionaries. Some design was needed for what components the program model will contain for a program. The current plan follows...

The program will be organized into program units. The program model will contain a list of program units. Each unit will represent the main routine (only one), a subroutine (zero or more) or a function (zero or more) and will contain these components (only the main routine program unit with the components in bold are being implemented at this time):

Name (only subroutine and function program units)
ID - index within the program unit list (main routine will be the first)
Code - array of program words
Line Information - information about each line in the program
Block Information - information about multi-line statements (IF, FOR, etc.)
Remark Dictionary
Number Constant Dictionary
String Constant Dictionary
Subroutine Dictionary
Function Dictionary
Common Variable Dictionary
Common Array Dictionary
Common Define Function Dictionary
Variable Dictionary
Array Dictionary
Define Function Dictionary
Argument Dictionary (only subroutine and function program units)
Return Value (only function program units)

There will be only one instance of the italicized components, which will belong to the main routine program unit, but each of the subroutine and function program units will contain a pointer to these dictionaries. The common dictionaries are for COMMON statements, which will indicate variables, arrays, or defined functions that are accessible outside a given program unit (similar to C/C++ global variables). Further details of each component will be described as each is implemented.