Wednesday, October 8, 2014

Dictionary – Improved Case Handling

Three of the dictionary functions converted the key to look up to upper case if the dictionary is not case sensitive (four of the six dictionaries).  Also, essentially a duplicate of the key was stored in the key list vector and the key map, except that the key was converted to uppercase in the key map (excluding two of the six dictionaries) and so the key map could not be used to reproduce the string of the dictionary entry (hence the duplicate key list).

The std::unordered_map class by default uses the standard hash function to put keys with their value into buckets and the standard equal function for comparing keys already in a bucket (to determine if the key is already in the map).  By providing case insensitive hash and equal functions, unconverted strings can be stored for the keys.  The Key Map alias was modified to add key hash and equal structures:
using KeyMap = std::unordered_map<std::string, EntryValue, KeyHash, KeyEqual>;
Both of these private structures contain a case sensitive member.  The function operator function of the key hash checks this member, and if case insensitivity is not selected, the standard hash function is called.  Otherwise the key string is converted to upper case into a temporary string, which is then passed to the standard hash function:
struct KeyHash {
    CaseSensitive caseSensitive;
    size_t operator()(const std::string &s) const
    {
        if (caseSensitive != CaseSensitive::No) {
            return std::hash<std::string>{}(s);
        }
        std::string s2;
        std::transform(s.begin(), s.end(), std::back_inserter(s2), toupper);
        return std::hash<std::string>{}(s2);
    }
};
The key equal structure is setup similarly except the function operator function has two constant string references arguments (to compare) and returns a boolean value.  If case insensitivity is not selected, the strings are compared with the equality operator.  Otherwise the strings are not equal if their sizes are not equal.  For equal length strings, the characters of the strings are looped through converting each to upper case before comparing.  A mismatch indicates the strings are not equal.  If the end of the loop is reached, the strings are equal.

By default, std::unordered_map constructs defaults for the hash and equal structures.  Both of these structures do not initialize the case sensitive member.  Instances of these structures can be given to the map during initialization, which is done in the constructor (the 10 is the number of hash buckets, which is the default if not specified):
Dictionary(CaseSensitive caseSensitive = CaseSensitive::No) :
    m_keyMap {10, KeyHash {caseSensitive}, KeyEqual {caseSensitive}} {}
The key list vector was removed.  However, a way to look up dictionary entry keys by index was still required, so a vector of iterators was added in its place.  Each iterator points to a key/value pair in the key map.  When a new entry is added, its iterator is put into this vector.  When an entry is removed, the end iterator of the key map (points to one past the end) is put into this vector.

In the add function, the map emplace functions return the iterator of the key/value inserted.  This iterator is put into the iterator vector.  The remove function no longer needs to find the iterator for the index, which is now obtained from the iterator vector.  After removing the key/value, the iterator for the index is set to the key map end iterator.  The debug text function also no longer needs to find the iterator for the index, and it checks if the iterator for the index is the end iterator to determine if there is no entry for the index (instead of a blank key).

For some reason, there is a sporadic memory error occurring when memory testing with these changes.  It is another error from Qt, but it doesn't occur with the same tests each time.  It is also only occurring with Qt 4.8.2, and not with Qt 4.8.4 or Qt 4.8.6.  For now, this problem will be monitored.

[branch stl commit 1f2483e82c]