Friday, November 29, 2013

Information Dictionary Issues

The information dictionary class extended the base dictionary class by adding a vector for additional information and was implemented as a class template (see post from October 6).  The additional information in the constant string dictionary contained a pointer to a string instance (see post from October 6).

The memory leak in the constant string dictionary was caused by how the information dictionary template and constant string information classes were implemented.  The problem occurred when a string in an entry of the information vector was replaced with the same string.  A new information instance was created with a new string pointer, which was put into the information vector, and the old string instance was lost (a memory leak).

When an old program line was dereferenced after the new replacement line was encoded, a string being replaced by the same string had its reference incremented in the dictionary from one to two by the encode, then the dereference decremented the count back to one.  However, when the dereference was moved to after the encode, the reference count of the string went from one to zero and the dictionary entry was freed, but not the entry in the information vector.  When the new line was encoded, a new string instance was created overwriting the old string instance pointer.

While this problem was not difficult to correct, another issue was discovered, this time with the constant number dictionary where its additional information consisted of a double value and an integer value contained in a structure (see post from October 6).  Each double value was aligned on a double boundary (eight bytes) and because an integer is half of a double (four bytes), four bytes of padding is inserted by the compiler between each element in the vector (wasted memory).

The only way to correct this is to separate the two sets of values by having a double value vector and an integer value vector.  Unfortunately, the information dictionary template class only allows for a single information vector.  A new design is needed for information dictionaries.

Program – Dereferencing Replaced Lines

When a line is replaced, references to dictionary entries in the old line must be removed.  This was taking place after the replacement line was encoded.  When a dictionary entry is dereferenced, the reference may no longer be used causing the dictionary entry to be made available for another entry.  The new line may add new dictionary entries, but with the encode before the dereferencing, the new entry will be added to the end of the dictionary if there are no free slots.

It is desirable for new dictionary entries to use slots that may be freed with the old line being replaced.  This will help the dictionary from growing larger then it needs to be.  Therefore, the dereference call was moved to before the encode call.  With this change, the results for encoder test #2 changed slightly, but only with respect to indexes of a couple of dictionary entries.

A previously undiscovered memory error was reported on encoder test #2 when running the memory test script.  The problem occurred in the constant string dictionary with the allocation of the string pointers for the QString instances.  While investigating this problem, another issue was discovered in the constant number dictionary, though this issue is much less serious and only results in wasted memory.  The conclusion was that the information dictionary class (currently defined as a template) needs to be redesigned.

[commit f284a33ac8]