Sunday, October 6, 2013

Encoding – Second Phase (Tagged)


This concludes the implementation of the encoder for the planned components needed to getting all modules up to the run-time working.  The next step is fully integrating the encoder into the program model, where lines are inserted into, placed and removed from the program.  Before continuing, the source was tagged version v0.5.2.

[commit cf579cd389]

Constant String Dictionary

The constant string dictionary was a little tricky to get working because of a difficult to resolve memory leak.  The constant string dictionary will also be an information dictionary, but was not defined as an information template dictionary directly.  The information required for this dictionary contains a pointer to a QString instance.  At run-time, an array of QString pointers will be used for the string constants (though the pointers will be wrapped within a class syntactically).

As for number constants, a constant string information class was added with a lone QString pointer to the string constant value.  Even though the information will contain a new instance of a QString, due to the implicit sharing feature of Qt, the underlying string will reference the original string in the token.  The original string will also similarly be referenced by the key list and key map members of the dictionary.  After the token is deleted, the string is still referenced three times (information element, key list and key map).  The string is not deallocated until all three instances are removed.

When the constant string information element is created for a new element, a new QString instance is created from the string in the token.  To prevent a memory leak, these QString instances need to be deleted.  A destructor was needed for the constant string dictionary, so a new constant string dictionary class was created.  This new class overloads the destructor, which deletes all the QString instances in the information elements.  It will be necessary to nullify this QString pointer when the string constant is removed from the dictionary.  The destructor then just deletes all the instance pointers and if one is null then no action is taken.

Initially, an information class instance was created locally in the encode routines and a reference to it was passed to the information dictionary add routine.  For constant strings, a QString instance was created in this local instance.  This was not an issue when the copy of the instance was put into the information vector for a new or reused instance as the instance pointer was copied into the vector.  However, a memory leak occurred when the constant was already in the dictionary and the instance was left hanging (the QString instance was not deleted).  This was previously resolved in the information dictionary template class by having its add routine create the instance only when adding a new item.

To complete the implementation, a constant string dictionary was added to the program unit class with an access function.  The constant string encode and operand text routines were added that simply call the dictionary add and string routines.  Finally, pointers to these functions were added to the associated table entry.  The expected results file for encoder test #1 was updated for the numeric constants that are now in the output.  A new statement was added with a string containing an embedded double quote character.

[commit 8a85668458]

Constant Number Dictionary

The information template dictionary will be used for the constant number dictionary.  The additional information will hold the double and integer value of the constant, and is obtained from the same members in the token.  Even through for double constants that can't be converted to an integer, the integer value (which is not set) is still copied to the information element, but it won't actually be used at run-time.

A constant number information class was defined containing the double and integer values.  This class was given a blank default constructor (required for the QVector class), and a constructor taking a token pointer to initialize the values (required for the information template dictionary).

To complete the implementation, a constant number dictionary was added to the program unit with allocation and an access function.  The constant number encode and operand text routines were added that simply call the dictionary add and string routines.  Finally, pointers to these functions were added to the associated table entries.  The expected results file for encoder test #1 was updated for the numeric constants that are now in the output.

[commit babfb758ec]

Information Template Dictionary

The numeric and string constant dictionaries have slightly different requirements and so will be handled separately.  Both of these dictionaries will have additional information beyond what is in the base dictionary class, namely the constant values themselves.  For both of these dictionaries, this additional information will be stored in a QVector.  At run-time, the constant data for the QVector will be obtained as an array.  This will allow for fast access of the constant values by the indexes stored in the program code.

The Qt QList and QVector classes are almost identical and both can access elements by index.  The big difference is that the elements in the QVector are guaranteed to be stored in consecutive memory addresses.  This allows the data to be access like a C/C++ array.  If this consecutive access is not required, then the QList class should be used, though the only advantage appears to be in the time it takes to prepend items to the beginning of the list (QList is faster).


An information dictionary template class was created based on the dictionary class, which adds a vector of generic information class elements.  The add routine was overloaded, which first calls the base dictionary add routine.  The base dictionary add routine was modified from just returning whether a new item was added or not, to returning the status of whether a new entry was appended to the end of the list, an freed entry was reused, or if the entry already exists.  This status is then used to either append a new information element, replace an information element, or do nothing if the item already exists.

The QVector class was chosen over the QList class for the information so that an array can be used at run-time for constants.  There will be other dictionaries with information that won't have this run-time array requirement, but as noted above, the only disadvantage to using QVector instead of QList is prepending to the beginning of the list, and that operation will not be needed.

One requirement for the generic information class is that it must have a constructor that takes a pointer to a token.  For constants, the information is obtained from the token.  Eventually for other dictionaries, other information beyond what is contained in a token may be needed, at which time, additional argument(s) will be added to the constructors and to the information dictionary add routine, which now just takes a pointer to a token.

[commit 6ffb6901a1]