Saturday, November 8, 2014

Recreator – Separator Member

Since there were a number standard string to c-style string conversions to convert to a QString in the recreator functions, the Recreator class is next to be transitioned to the STL.  This was started by changing the separator member from QChar to char along with its access functions.  An initializer for the separator member was added since the plain char type does not have a default constructor like the QChar class.

Two uses of the separator access function that required special handling.  These were in the input assign recreate and assign string recreate functions.  Previously the separator was added to other literal character constants.  Now that the separator is the plain char type, the compiler will treat the plus operator as addition instead of concatenation.  The separator is temporarily converted to a QChar so that the plus gets compiled as string concatenation instead of character addition.

[branch recreator commit 3b5b789198]

Token – Standard String Member

It turned out that changing the token string member to a standard string did not require a lot of other changes.  The string access functions were modified and the c_str function call was removed from the constructor initializers.  The equality operator function was modified to use straight equality operator for case sensitivity comparison (REM, REM operator and string constants) and the new no case string equal function for case insensitive comparison.

In the rest of the code, the c_str function call was removed where a standard string was present, and the toStdString function call was removed from the token string access function where the result is put into a standard string or output in a standard output stream.  In several places in the recreator functions, a c_str function call was added where a QString is needed (the c-style string returned is implicitly converted).  Finally, one line in the translator using the QString::startsWith function with the case insensitive option was changed to convert the first character to upper case before comparing it (only the first character was being compared).

[branch token commit ee7b926daa]

Since there is no more work for the token class, the short-lived token branch was merged to the develop branch.

[branch develop merge commit ee12854e36]

Utility – Case Insensitive Comparisons

The token equal operator function uses case insensitive string comparisons that need to be replaced with standard string equivalents.  There is no direct equivalent, so the solution used so far was to use the std::equal function passing a case insensitive character comparison lambda function.  Since the std::equal function assumes equal length containers, the size of the strings need to be compared first to make sure the strings are the same length (or at least the primary string is not less than the secondary string when doing a string begins with comparison).

This pattern of comparing the string lengths before comparing strings is repeating, therefore two in line functions were added to do this.  The first, named no case string equal, checks if the string are equal and the second, named no case string begins with, checks if the length of the primary string is greater than or equal to the string being compared.  The lambda function was renamed to no case character equal for consistency.  The lambda function was moved from the main header file to the utility header file along with the two new in line functions.

[branch token commit 1dde64fb7b]

Parser – Standard Input Stream

The parser functions have been modified to use a standard input string stream, so the input member could now be changed to the std::istringstream class, and the input position member removed.  The current input position can be obtained directly from the input stream using the tellg member function.  The skip white space function was removed since this can be done directly on the input stream (in other words, extract white space):
m_input >> std::ws;
In several places, a temporary position or length integer variable is used to get the current position (tellg) or length (length) because these functions return a pos_type and size_type values, which are 64-bit integers.  The token constructors and error structure only accept integers (32-bit).  There is no reason to change the member variable types to 64-bit integers as there will never be input lines or strings that are long enough to require 64-bit integers.

When using an input stream, care must be taken when using the tellg function to obtain the current input position.  This function returns an EOF value (-1) once the input stream has been read past the end.  So this function can't be used is a previous operation could have possibly read past the end.

There times when the input position must be reset (like when the second word of a possibly two-word command is not valid).  The seekg function is used to the input position.  However, this function does not work once the input stream has been read past the end.  This is because the EOF flag is set.  To clear this condition, the clear function needs to be called, so this call precedes all seekg function calls except one where an EOF cannot have occurred.

In the get string function, the characters read are counted so that the length of the string in the input is known when the token is constructed and returned (pairs of double quotes count as one character in the string, but take two characters in the input string, so must be counted as two).  The ending input position cannot be used to determine the length in the input string because the position is not valid if the string constant is at the end of the line (see issue with tellg function above).

The constructor of the parser was changed to take a standard string input.  Both callers were modified accordingly - the tester class already had a standard string, but the translator needs to convert from its QString to a standard string (until the translator is modified).  Dependency on Qt has almost been removed from the parser except for one call to obtain the name for the REM command (which will be handled when the table is modified).

[branch parser commit 8e71a71fd5]

One outstanding item remains - the token string member is still a QString though its constructors have been modified to take standard string arguments.  This will not a trivial change since many users of the token string still expect a QString.  Therefore, this work will take place in a new development branch.  This concludes work on the parser, so the parser branch was merged to the develop branch and deleted.

[branch develop merge commit 2cafb22a8e]