Interactive BASIC Compiler Project

Wednesday, October 8, 2014

Dictionary – Improved Case Handling

Three of the dictionary functions converted the key to look up to upper case if the dictionary is not case sensitive (four of the six dictionaries). Also, essentially a duplicate of the key was stored in the key list vector and the key map, except that the key was converted to uppercase in the key map (excluding two of the six dictionaries) and so the key map could not be used to reproduce the string of the dictionary entry (hence the duplicate key list).

The std::unordered_map class by default uses the standard hash function to put keys with their value into buckets and the standard equal function for comparing keys already in a bucket (to determine if the key is already in the map). By providing case insensitive hash and equal functions, unconverted strings can be stored for the keys. The Key Map alias was modified to add key hash and equal structures:

using KeyMap = std::unordered_map<std::string, EntryValue, KeyHash, KeyEqual>;

Both of these private structures contain a case sensitive member. The function operator function of the key hash checks this member, and if case insensitivity is not selected, the standard hash function is called. Otherwise the key string is converted to upper case into a temporary string, which is then passed to the standard hash function:

struct KeyHash {
    CaseSensitive caseSensitive;
    size_t operator()(const std::string &s) const
    {
        if (caseSensitive != CaseSensitive::No) {
            return std::hash<std::string>{}(s);
        }
        std::string s2;
        std::transform(s.begin(), s.end(), std::back_inserter(s2), toupper);
        return std::hash<std::string>{}(s2);
    }
};

The key equal structure is setup similarly except the function operator function has two constant string references arguments (to compare) and returns a boolean value. If case insensitivity is not selected, the strings are compared with the equality operator. Otherwise the strings are not equal if their sizes are not equal. For equal length strings, the characters of the strings are looped through converting each to upper case before comparing. A mismatch indicates the strings are not equal. If the end of the loop is reached, the strings are equal.

By default, std::unordered_map constructs defaults for the hash and equal structures. Both of these structures do not initialize the case sensitive member. Instances of these structures can be given to the map during initialization, which is done in the constructor (the 10 is the number of hash buckets, which is the default if not specified):

Dictionary(CaseSensitive caseSensitive = CaseSensitive::No) :
m_keyMap {10, KeyHash {caseSensitive}, KeyEqual {caseSensitive}} {}

The key list vector was removed. However, a way to look up dictionary entry keys by index was still required, so a vector of iterators was added in its place. Each iterator points to a key/value pair in the key map. When a new entry is added, its iterator is put into this vector. When an entry is removed, the end iterator of the key map (points to one past the end) is put into this vector.

In the add function, the map emplace functions return the iterator of the key/value inserted. This iterator is put into the iterator vector. The remove function no longer needs to find the iterator for the index, which is now obtained from the iterator vector. After removing the key/value, the iterator for the index is set to the key map end iterator. The debug text function also no longer needs to find the iterator for the index, and it checks if the iterator for the index is the end iterator to determine if there is no entry for the index (instead of a blank key).

For some reason, there is a sporadic memory error occurring when memory testing with these changes. It is another error from Qt, but it doesn't occur with the same tests each time. It is also only occurring with Qt 4.8.2, and not with Qt 4.8.4 or Qt 4.8.6. For now, this problem will be monitored.

[branch stl commit 1f2483e82c]

Saturday, October 4, 2014

Dictionary – Entry Use Counts

Mentioned in the last post, the use count vector member was removed and a use count was put into the value of the map. A new private Entry Value structure was created to hold the index and the use count of the entry. Both were defined as unsigned 16-bit integers. A constructor was added taking a single index argument and the use count is initialized to 1.

For the add function, the entry in the use count vector no longer needs to be added or initialized (in the case of a reused entry). Instead of assigning a key to an index using the bracket operator, the map emplace function is used with a single index. The use count get set to 1 whether it is a new or a reused entry by the constructor of the Entry Value structure. For an existing entry, the iterator is used to increment the use count.

For the remove function, since the use count is no longer available by index, the index is used to get the key from the key list member. The key is converted to upper case for case insensitive dictionaries and used to find an iterator for the key. While the index is available in the iterator, it is used to clear the entry in the key list and pushed to the free stack. The iterator is used to erase the entry from the key map (entries can be erased either by key value or iterator).

For the debug text function, since the use count is no longer available by index, a non-empty entry in the key list is used to identify if an entry is used. The key is used to find an iterator to the entry. An error check was added if the key was not found (shouldn't happen). The iterator is used to get the use count. When outputting the indexes of the free stack, the check for a non-zero use count was removed since the use count is no longer available (the entry in the map with the use count was erased, and an entry is not pushed to the free stack unless the use count was zero).

[branch stl commit e79c179ad5]

Dictionary – Case Sensitive Member

The next changes for the Dictionary class are some improvements in its implementation. One of these changes is to integrate the use count into the value of the key map instead of using a separate vector for the use counts of each entry. Putting the use count into the key map value will require the use of iterators to get to the use count since they will no longer be in an indexed vector.

The debug text function (which will be changed to a put stream operator function) will need to use an iterator to look up a key to get the use count. When using iterators, the debug text function will also need to convert the key to upper case for case sensitive dictionaries, which implies that it would also need a case sensitive flag argument like the add and remove function.

This made me realize that passing a case sensitive flag to all of these functions was not the best design. A better design is to have a case sensitive flag as a member variable initialized once when the dictionary is constructed. Using arguments allows different case sensitivities on a dictionary and this will cause problems. A case sensitive member was added, the constructors were added or modified, and the case sensitive arguments were removed.

[branch stl commit b78217c5a4]

Dictionary – Key Map

The remaining member of the Dictionary class to change to a standard class is the key hash, which was defined as a QHash. This class is equivalent to the std::unordered_map, which also stores keys using a hash. The member was renamed to key map and a Key Map alias was added since the the map type is rather lengthy:

using KeyMap = std::unordered_map<std::string, int>;

The add routine, was changed to use the key as a standard string. The standard string class does not have a member function for converting the string to upper case. For case insensitive dictionaries, the string is converted to upper case using the standard transform function:

std::transform(key.begin(), key.end(), key.begin(), toupper);

The first two arguments specify the range of the string to transform (the entire string). The third argument specifies the destination (back into the string). The last argument is function that takes and returns a character. To find a key value in the map, the map find function is used, which returns an iterator. A key not found is detected if the iterator is the end iterator of the map. The value in the map (the index of the entry) is obtained using the iterator:

auto iterator = m_keyMap.find(key);
if (iterator == m_keyMap.end()) // key not found?
...
index = iterator->index;

Similar changes were made in the remove function for converting the key to upper case for case insensitive dictionaries. The map erase function is used to remove a key from the map.

[branch stl commit 66561d4532]

Thursday, October 2, 2014

Dictionary – Start STL Transition

The Dictionary class will be next to have its debug text function changed to a put stream operator function. Unlike the RPN List and Token classes, access to some of private variables will be necessary, so the put stream operator function will need to be a friend function. First, these private variables needs to be changed from Qt classes to STL classes. Other dependencies on Qt will also be removed. This was started with these changes:

Changed the quint16 type to the standard uint16_t type.
Changed the use count list member to a standard vector.
Changed the free stack member to a standard stack.
Replaced the use of the Q_UNUSED macro to just (void), which is just as simple.
Replaced the use of Qt case sensitivity enumerators with a simple yes/no case sensitive enumeration class. The enumerators were never passed to Qt functions, so there was really no reason to use the Qt enumerators.

The next change was to change the QStringList for the list of keys with a standard vector or standard strings. The string access function was changed to return a standard string, which required the change of the return value of the various operator text functions accessing dictionaries to a standard string. Several conversions from and to QString values were temporarily added as needed until more variables are changed to standard strings (though some will be needed when interfacing to the GUI).

[branch stl commit 84875530db]

Wednesday, October 1, 2014

Token – Put Stream Operator

The Token class was next to have its text function changed to a put stream operator function. Again, it was not necessary to make the new function a friend of the Token class since there are sufficient access functions. The put stream operator function for the RPN List class was changed to use the new token put stream operator.

[branch stl commit 2b5c581d31]

Tuesday, September 30, 2014

RPN List – Put Stream Operator

There are a number of member functions in various classes that create text from an instance for outputting while running tests. A better implementation of this is to overload the put stream operator (<<). The RPN List class was the first class changed. An RPN list instance is now output like this:

std::cout << rpnList;

The text member function was moved from the RPN List class source file to the Tester class source file (the only current caller) and renamed to the put operator (operator<<). The return value and first argument was changed to an output stream reference. A second argument was added for a constant reference to the RPN list instance. Normally these put stream operator function are made friend functions of the class so that private members can be accessed, but the RPN List class already provides the necessary access functions.

A few changes made to the new. The output stream argument is used to output to instead of the local string stream variable, which was removed. The local index variable (used to create an RPN item pointer to index map) is used to detect the first item in the list instead of the number of characters written to the output stream. And the output stream argument is returned instead of the contents of the local string stream.

If there is a future need for this operator beyond the Tester class, the function can be moved and a function prototype provided in a header file. If this future need is for a string, the string can be obtained by using an output string stream and getting its string:

std::ostringstream oss;
oss << rpnList;
std::string string = oss.str();

[branch stl commit 365d20b2b]

Monday, September 29, 2014

Tester – Standard Output Stream

The non-GUI classes use many strings that will be changed from QString to std::string. In the Tester class, these strings are output using Qt text output streams. The next step in the STL transition was to change the Tester class to use std::ostream instead of QTextStream. This also required changes to the Command Line class, which provides the tester instance with the output stream to use for output.

The text stream member of the Command Line class (where either stdout or stderr was opened) was changed to a pointer to an std::ostream (which is now set to a pointer to either std::cout or std::cerr). The cout() member function was changed to return a reference to the output stream and allowing the output stream member to be set from is argument (defaulting to std::cout). Since nothing is opened, nothing needs to be closed so the coutClose() member function and destructor was removed.

Where ever a QString is output to the output stream, the toStdString() function was added since QString is not supported by std::ostream. This is temporary until most of the QString instances are changed to std::string. The usage string member was changed to a local std::string since no other code used it and it's access function was removed.

The text stream member of the Tester class was changed to an output stream reference. The toStdString() was added to QString instances (temporary). All uses of endl were changed to the new line character ('\n'). The endl manipulator (both Qt and STL) not only outputs the new line character, it also flushes the stream and this flush was not needed.

[branch stl commit 5cc2438dd0]

Saturday, September 27, 2014

Error List – Use Standard Vector

Like other classes, the Error List class was publicly derived from the QList class. It was changed to contain the list as a private std::vector member. Several access functions were added for access to the list (since the list is no longer public) including the bracket operator (constant and non-constant), constant at, clear and count. The at and count names were used so that callers didn't need to be changed.

A binary search was previously used to find an error by a line number or the closest error not greater than the line number. The std::lower_bound() function does a similar operator except returns an iterator to an element instead of an index. There are two forms of this function, one that assumes the elements being searched have the less than operator defined, and the other where a function object is passed defining the comparison.

The Error Item class does not have a less than operator defined so the second form of std::lower_bound() was used. A C++11 lambda function was defined for comparing the line numbers of error items. This lambda function definition and the call to std::lower_bound() was put into a new private find iterator function, which returns the resulting iterator. Since only line numbers are used for searching, a new constructor was added to the Error Item class for initializing just the line number member to pass to std::lower_bound().

The find function returns the index of the error item found for a line number or the closest error to the line number. This index is used an an insert point for a new error and is also passed to the edit box instance for maintaining the extra selections list (used for highlighting the errors). This function was modified to call the new find iterator function, which converts the iterator to an index by subtracting the begin iterator or the vector.

In several places in the code, the error index is compared to the size of the error list. Unlike with the Qt classes, the size of STL classes is returned a size_t type, which is an unsigned integer. Throughout the code, the type of the error index variable was changed from an integer to either size_t or where possible auto.

The find index function is used to only return an index of an error item for a line number, or a value indicating the line does not have an error (a -1 value was used). This function was also modified to call the new function, but a -1 value could not be used since the index returned is unsigned. Instead, an index one past the end (in other words, the size of the vector) is returned to indicate a line number without an error. The callers of this function was modified accordingly.

The std::vector class uses iterators instead of indexes to insert and remove items from the vector. Similar to converting an iterator to an index, an index is converted to an iterator by adding it to the begin iterator of the vector. The insert and remove at access functions were modified accordingly.

[branch stl commit 22e8013fb4]

Wednesday, September 24, 2014

New Status Message Class

The final lower-level class with translate functions was the token class, which contained a static function for converting a status code to an error message. This function was moved to a new status message class as there was no other logical class to put it in. A plain function could not be used since the easiest way to use the translate functions is to wrap them inside of a class using the Q_DECLARE_TR_FUNCTIONS() macro.

This new status message class only contains this lone static text function. To prevent an instance of this class from being created, the default constructor was deleted using the C++11 delete feature (previous to C++11, this was accomplished by making the default constructor private). To prevent this class from being used as a base class, the C++11 final keyword was added to the class definition:

class StatusMessage final
{
    Q_DECLARE_TR_FUNCTIONS(StatusMessage)

    StatusMessage() = delete;
public:
    static const QString text(Status status);
};

The three callers of the text function are the main window class (by the status bar update slot function), the program model class (by the debug text function used by the temporary program view widget), and the tester class (by the print error function). Each of these were changed to use the new status message text function.

Currently no translation is loaded, so no translation occurs. When translation gets added, the tester class should not do any translation, otherwise the expected results will not match (because they are in the default English). Therefore, when translation does get added, if a test option is selected from the command line, no translations will be loaded.

After making these changes, two minor issues was found the table class header file. The first was that this header was relying on the token header file to include the Qt core application header (which contains the translate functions macro), so this include was added. The second was the argument to this macro incorrectly contained the context name Test instead of Table. This context is used by the translate utility to identify what the translatable strings belong to. This did not cause a compile error, but was corrected. The table class will be redesigned to not require the translation functions (used for a few error messages).

[branch err-msgs commit 1047df7065]

This concludes the changes to use status codes throughout until an actual error message is needed. The err-msgs branch was merged into the develop branch and deleted. A new branch will be created for the next set of C++11 related changes, which will be the replacement of more Qt with the STL in the non-GUI classes.

[branch develop merge commit 6d5ea9367f]

Tuesday, September 23, 2014

Error Item – Error Messages

The error item contains information about an error, which includes the type (none, input or code), line number, column, length and previously the error message. Continuing with the change from error messages to error status codes, the error message member was changed to an error status code. This required a few additional changes.

The cursor changed signal is emitted from the edit box instance and is connected to the status bar update slot of the main window instance. Previously the argument of this signal and slot was the error message string. These arguments were changed to the status code. The slot was changed to obtain the error message for the error status code. If the status code was the default (same as the good status), a blank message is displayed on the status line.

The cursor changed signal previously obtained the error message from the error message access function of the program model. This function obtained the error message for a line number by accessing the item in the error list in the program model. If there was no error on the line a blank string was returned. Since the items in the error list now contain error status codes, the status code is returned. If the line does not have an error, the default status code (good) is returned.

Finally in the tester class, the error string argument of the print error function was changed to a error status code since that's what all the callers now have. The error message is now obtained from the status code argument in the print error function instead of by each of the callers obtaining the error message.

[branch err-msgs commit 0050184ac7]

Sunday, September 21, 2014

RPN List – Error Messages

If a line being translated contains an error, the error column and length members of the RPN list were set to the error. Previously the translator also set the error message member to the error message. Continuing with the change from error messages to error status codes, this error message was changed to an error status code. The translator was modified to set the error status instead of the error message. The receivers of an RPN list, the program model and tester classes, were modified to take the error status code from the RPN list and get the error message for the status code.

[branch err-msgs commit c9bad9f36b]

Parser – Error Messages

Previously the parser returned detected parse errors by setting the token type to an error and setting the string member (normally used to hold the string of the token) to the error message. Since these messages need to be translated, the parser required the Qt translate functions. The parser was modified to use status error codes instead of messages.

The token does not have a status member, and only the parser would need to use it for errors. Instead, an error status member was added to the parser class with an access function. If token has an error, the caller obtains the status error code by calling the access function. The set error access functions of the token class were also moved to the parser class, since only the parser class uses them. These were modified to set the parser error status code instead of setting the token string to an error message.

The parser error messages were changed to status codes and appropriate enumerators were added to the status enumeration, and the messages were added to the switch statement in the static token error message function. The result is that the parser class is no longer dependent on the Qt translate functions.

The two users of the parser class, the translator and the tester classes, now retrieve the error status from the parser when the token has an error and this error status is used to get the error message. Previously they obtained the error message from the token where the parser put it.

[branch err-msgs commit 84c52f0e9b]

Saturday, September 20, 2014

Error Status And Message Refactoring

The next effort is to replace the Qt classes and Qt dependencies with STL classes in the lower level non-GUI related classes (Token, Parser, Translator, Recreator, RPN List, Error List, Table, and Dictionary). Besides using Qt classes for some of their members, the Token and Parser classes also use the locale translate function. These are used for error messages. (The Table class also uses the translate function for error messages, but this class will be handled separately, and may not be necessary when this class is redesigned.)

In order to remove the dependency on the translate function, the Token, Parser, RPN List, and Error List classes will be modified to only use status codes instead of holding error messages. It will be the responsibility of the user of these classes to convert the status code to an error message when needed. The Translator class is the user or the Token and Parser classes and it will pass the status code along in the RPN list that it returns.

Currently, the Token class contains a static member function for converting a status code to an error message. There is no reason that this function along with the status enumeration to be part of the Token class (which doesn't have a status member variable).

The Parser class does not currently use status codes to return errors. Instead, it sets the token type to Error and the string member to the error message. This mechanism needs to be changed so that the Parser uses status codes. This will remove the dependency on the translate function.

The first step was to move the Status enumeration from the Token class to the main application header file, since the Status enumeration shouldn't be part of the Token class. This main header file does not have any dependencies on Qt. The error status and error message refactoring will take place on the new err-msgs branch. The goal of this branch is to remove direct handling of error messages by these lower classes.

[branch err-msgs commit 837f085074]

Thursday, September 18, 2014

End of the Initial C++11 Changes

A comment about the single remaining naked new and delete operations in the constant string dictionary mentioned in the last post. An attempt was made to use a vector of standard unique pointers, but it appears that QString and QVector classes don't play nice with std::unique (or perhaps the problem is unrelated to the Qt classes). This issue will be revisited when the dictionaries are transitioned to STL classes.

To end the initial C++11 transition, a few additional minor changes were made though there were many of them. These included:

Replacing all uses of the untyped NULL macro with the C++11 nullptr typed null pointer.
Replacing tests against the untyped NULL macro with testing the pointer directly as described recently (though for QString instances, the isNull() function was required to check if the instance contained a null string, which is not the same as an empty string).
Replacing unnamed enumerators to define integer constants with C++11 constexpr statements.
Removing empty constructors and destructors (the compiler generates these by default).
Moving empty constructors that have only member initializations to the header file.
Removing unnecessary include statements (to no longer used Qt classes).
Changing to the C++11 universal initializer list syntax throughout (except for when a specific constructor needs to be called, like giving a size to a container, or with a reference variable, which is apparently not allowed to be initialized this way).

With the conclusion of the initial C++11 changes, the cpp11 branch was merged into the develop branch and deleted. A new branch will be created for the next set of C++11 related changes.

[branch cpp11 commit 299f71ab5c]
[branch develop merge commit 53175d69b5]

Tuesday, September 16, 2014

Unique Smart Pointers

Another C++11 STL smart pointer class are unique pointers (std::unique_ptr). This smart pointer is for a single scope and deletes its resource when it goes out of scope (like when a function returns). If used for a class member variable, it deletes its resource when the class instance is deleted or goes out of scope.

Most of the rest of the allocated resources were changed to these unique pointers. Two of these were in local blocks of functions. The rest were members of various classes. For all the classes involved, after the delete operators were removed from the destructors, there was no code left, so the destructors were removed. A default constructor is now generated, which calls the destructors for the unique pointers causing their resources to be deleted.

One drawback to using unique pointers is that forward references can no longer be used in the header files for the classes involved. Forward references could previously be used since only a pointer to the class was used. However, with unique pointers, the size of the class is needed. Therefore, the forward references were replaced with include statements for the headers of the classes.

These changes covered all the remaining naked new and delete operations except for one. This remaining naked resource is the string (QString) pointers kept in a vector (QVector) for the constant string information used in the constant string dictionary (that holds the constant strings in the BASIC program). There is probably a better way to implement this and so will be handled separately.

[branch cpp11 commit 95c680e107]