Interactive BASIC Compiler Project: October 2014

Thursday, October 30, 2014

Parser – Unique Pointers

The parser routines create a token held in a shared pointer upon return. The main function operator routine returns this shared pointer. The token is not actually being shared, just moved until it reaches the caller. There is no reason to use a shared pointer in the parser as a standard unique pointer is sufficient. The parser routines were changed to return a unique pointer. Another alias was added for a unique token pointer:

using TokenUniquePtr = std::unique_ptr<Token>;

Unfortunately, there is no equivalent function for std::unique_ptr like the std::make_shared() function for std::shared_ptr (though one has been added for C++14) so unique pointers must be initialized using the new operator with the unique token pointer alias constructor:

return TokenUniquePtr{new Token {pos, len, type, dataType, m_input}};

The callers of the parser operator function did not needed to be modified since there is a shared pointer constructor that takes unique pointer as an argument (the shared pointer takes ownership of unique pointer). The table new token function was also modified to return a unique token pointer.

One other small unrelated change was made to the get identifier routine with the creation of the REM command token. This code was simplified as it was not necessary to copy the comment string from the input into a temporary string before creating the token. The string can be passed directly when the token is created. The new position can simply be set to the length of the input string. This was already done for the remark operator in the get operator routine.

[branch parser commit 336ad07bf8]

Wednesday, October 29, 2014

Parser – Operator Tokens

The get operator routine was modified to create a new token upon returning when a valid token is found. If the first character is not the start of an operator, a default token pointer is returned. The existing table new token function is used to create the new token upon return. The flow of the function was cleaned up by checking for an invalid operator first, a remark operator next and finally for a two-character operator.

In the main function operator routine, the call to get string was changed like the other get function calls with the member token initialization was finally removed along with the token member.

[branch parser commit 632ce89f80]

Parser – Constant String Tokens

The get string routine was modified to create a new token upon returning when a valid token is found. If the first character is not the start of a string constant (a double quote), a default token pointer is returned. A token constructor was added to support creating a string constant token, which in addition to the column and length takes the string constant without the surrounding double quotes.

This routine was changed from setting characters into the token string (by a length index counter that was not otherwise used) to simply appending the characters to a local string (since the token is not created until the return statement). The former will not work with standard strings. This local string is moved to the constructor, though this has no effect with a QString (copies if class doesn't support move), but will with standard strings.

The set string character token access function was removed since this routine was the only caller. In the main function operator routine, the call to get string was changed like the other get function calls with the member token initialization moved below this.

[branch parser commit 87e4ed4fb8]

Tuesday, October 28, 2014

Parser – Constant Number Tokens

The get number routine was next to be modified to create a new token upon returning when a valid token is found. The character parsing part of the routine was left intact except the two instances where no number is found were changed to return a default token pointer. The token creation lines at the end were replaced with return statements creating a token in a shared pointer. Two more constructors were added to the token class to these return statements.

The first, in addition to the column and length takes the string of the number and the integer value of the number, and automatically sets the type to constant and the data type to integer. The integer value member is initialized to the integer value, however, the double value member is set to the integer value in the body to do the conversion from integer to double (which can't be done with an initializer because the types are different).

The other constructor also takes the string of the number, the double value and a flag for whether a decimal point was present, sets the type to constant. The body checks if the double value is within the range of an integer, and if it is, the sets the data type to integer, and sets the double sub-code only if there was a decimal point. The translator uses this sub-code to determine if a constant can be used as a double even though the data type is integer (a hidden conversion from integer to double code is not needed). For values outside the integer range, the data type is set to double (indicating conversion to an integer is not possible).

The body of the second constructor was taken from the get number routine because this code primary sets token members (via access functions), and it seemed appropriate to do this within the token class. Since the body was not trivial, the constructor was put into the token source and not the header file. Another reason was that the C-style integer minimum and maximum constants were replaced with C++ standard numerical constants from the limits STL header file (no reason to burden source files including the token header file with another header file).

In the main function operator routine, the call to get number was changed like the call to get identifier with the member token initialization moved below this. It appears redundant to declare a if-scoped token pointer at each if statement, but if there was a single token pointer for the entire routine, it would first be initialized to a default value, then reinitialized at each if statement. The if-scoped variable is initialized directly with the return value of the get routine.

[branch parser commit d328c0a720]

Monday, October 27, 2014

Parser – Create Token As Needed

The parser will be modified to create a token only when a valid token is found in the input string and is returned directly. This means that the token will be created on a return statement, which is automatically moved to the caller since the created token (in a shared pointer) is temporary and going out of scope.

Once all the get routines in the parser are changed, it will no longer be necessary to have a member variable to hold the token and there will be no worry of a token being left allocated for an error. Right now the returned token will be an shared pointer, though is not necessary. The return pointer will be changed to a unique pointer, which can be assigned to a shared pointer.

The get identifier routine was the first to modified. Most of the token creation lines were replaced with returns statements creating a token in a shared pointer:

return std::make_shared<Token>(pos, len, type, dataType, m_input);

To support this, two new constructors were added to the token class. One that in addition to the column, length, type and data types values takes the input string (from which a string is created using the column and length values) as shown above. The other constructor taking a code and optional string, which is used by a new token function added to the table class that uses the table to set the type and data type values of the new token.

Once all the locations where in the get identifier routine were replaced, it could be seen that the code was repetitive, so the whole function was reorganized and reduced. If no valid identifier token is found, a default token pointer is returned, which the caller can check as a boolean.

The main function operator routine was modified to support this partial transition. When the end-of-line is reached, a new token is created and returned (using the new token table function). The get identifier routine is called in an if statement by itself receiving the return value in a if-scoped variable, which is returned if set:

if (TokenPtr token = getIdentifier()) {
return token;
}

For now, the current creation of a new token was moved to after the statements above. It will continue being moved as each get routine is changed until all have been changed at which time it will be removed along with token member pointer.

[branch parser commit e34fbccacc]

Sunday, October 26, 2014

Parser – STL Preparations

The changes required to make the parser routines use the STL are going to be extensive, but an attempt will be made to break the changes into smaller incremental changes. Since the parser routines make use of various table functions, it will be necessary to modify the table entries and its functions to use STL. Some preparatory changes were made.

The table entries are divided into several groups for searching, which includes plain word, parentheses words, data type words and symbols. The parser utilizes these groups when searching if strings have a code. The data type words section was empty and upon consideration it was concluded that this group is not needed. This group may have been originally conceived for internal functions that don't have arguments (for example, a DATE$ function). However, these internal functions can go into the plain word group (the RND no argument function is already in this group). This group type along with its bracketing entries were removed.

[branch parser commit 0043105154]

The issue of the token being left allocated when the parser throws an exception could be resolved by not creating a token until a valid token is found. Used of the token member before an exception is thrown were examined and the only use was in the creation of the error exception. The only token members used were its column and length. The column was always the same as the current input position (except one instance) and the length was always 1.

All of the throw statements were modified to not use the token instead using the current input position with a length of 1. One case used a length of 2 (2 was previously used). For the "floating point constant is out of range" error, the statement to set the new input position was moved to after an error is thrown.

For the "expected sign or digits for exponent in floating point constant" error, two columns were reported, the column at the beginning of the number (for operator state) and the alternate column at the beginning of the error (for operand state). A number token is no longer accepted when invalid, so only the alternate column was being used. This mechanism was not needed, and for this error, the position of the error only is reported. This mechanism was also removed from the tester print error function.

[branch parser commit e92da57ef1]

Saturday, October 25, 2014

Parser – Exceptions

When an error was detected, the parser set its internal error status to an enumerator for the error (either unknown token or a number error) and returned a an error token with its column and length set to the error. So to return an exception, the status, column and length values need to be included in the exception thrown. A simple Error structure was added to hold this information. The parser routines were modified to throw an error exception when detecting an error. The necessary values are included for this structure:

throw Error {Status::Error, m_token->column(), m_token->length()};

The set error functions were removed along with the error status member and its access function. Since the Error token type enumerator doesn't indicate an error token anymore, this enumerator was removed. The table entries that used with enumerator were changed to the default token enumerator (required the first enumerator to be set to 1). A leftover check in the get identifier routine was removed that set the token string to "invalid two work command" for an error token type, but this won't occur.

The translator get token routine was modified to catch parser exceptions (using C++ try and catch blocks). For no exception, the Good status enumerator is returned. For an exception, an error token is created from the column and length in the error structure. (The token constructor was modified with an additional length argument and to the C++ initializer syntax.) The rest of catch section remains the same except the status in the error structure is returned. The creation of an error token may be removed later if the translator is modified to the exception model for handling errors.

The tester parse input routine was also modified to catch parser exceptions. The while loop was replaced with a forever loop and the more flag removed. For no exception, the print token routine is called. The routine continues with the next token unless an End-of-Line token was returned. For an exception, the print error routine is called directly, and the routine returns immediately. The use of exceptions in the parse input routine allowed some simplifications in these print routines.

The print token previously had an error status argument. If the token type was an error, then the error status was passed to the print error routine with the token column and length. This check and call was removed, and so was the error status argument. The tab argument was always true, so it was also removed. The column, length and status arguments of the print error function were replaced with an error structure reference argument. This required creating a temporary error structure in the translate input and encode input routines (perhaps later the translator will be modified to throw an error and the program module modified to return an error structure).

It should be noted that the token created at the beginning of the parser function operator routine if left allocated when an exception is thrown. Previously the token is moved to the caller when the token contained an error the same as with a good token. This is not a problem because the parser will go out of scope once the translator handles the error and returns. The parser routines may be able to be changed to only create a token when a good token is found. This will be considered as the parser is changed to use the STL.

[branch parser commit 14265956f7]

Parser Errors – Removed Date Type

When the parser returned an error, it set the data type of the token to Double to indicate a number error or None for an unknown token. This was necessary since a number error could be returned when the parser was in operator state. This no longer occurs after the last change as the unknown token error is returned if the parser finds a character of a number when numbers are not allowed, so the error status alone can be used to determine the error type. Setting of the token data type for errors was removed. This reduces the amount of data to send back with an exception.

When the parser returned an error, the get token function of the translator returned the special Parser status enumerator. The translator routines used this enumerator along with the token data type to determine if the error was an unknown token or a number error. The parser error type can now determined directly with the error status from the parser, so the get token function was changed to return this status instead of the Parser enumerator.

The checks in the rest of the translator routines for the Parser enumerator and None data type were changed to just check for unknown token. The check in the get operand routine to set the token length to 1 for non-references when there was a parser error was removed since this is the only possible error. When getting the token after an argument in the process internal function routine, the check for a unary operator (an error) was moved to before the check for an error, which was changed to return all errors except unknown token. When this token was not a comma or closing parentheses the appropriate error is determined for the error or bad token.

The special Parser status enumerator was no longer used, so it was removed. The concludes all the prep work for changing parser errors into exceptions.

[branch parser commit ae9e97696e]

Thursday, October 23, 2014

Parser – Numbers (Operand State)

Upon making the next set of changes, I realized that State with Operator and Operand enumerators were not accurate terms for what the parser did with this option. All Operator state did was prevent numeric constants, but still allowed other operand type tokens (like string constants, identifiers, and functions). This was renamed Number with Yes and No enumerators, which more clearly expresses what the code does.

This change made it obvious where some simplifications could be made to the get token function of the translator. The value of the number (previously state) argument was selecting number (Yes) if the desired data type was not the default data type value (indicating the caller wants an operator token). When the desired data type is string, the number token would be invalid, so the condition of the desired data type is not string was added to the setting of the number argument.

Number tokens no longer are returned when looking for a string (an unknown token error will be returned). For error tokens, there was a check for operator state and the data type was double (indicating a number error), or the desired data type is string, which set the token length to 1 (so the error only points to the first character) and the token data type set to None (to indicate to callers that there is no number error). These situations now return an unknown token error for numbers with the length set to one and data type set to None, so this check was not needed.

The other check was much more involved (and confusing) but basically said if looking for an operand and there was not a number error, return an expression expected error. Otherwise return a parser error. This check was changed to if the desired data type was not empty (expecting operator) and not None (indicating a PRINT function is allowed) and there was an unknown token error, then return an expression expected error.

[branch parser commit 5690b1b4e9]

Wednesday, October 22, 2014

Parser – Operand State

When the translator is expecting an operator, both type of parser errors (unrecognizable character or numeric constant) are treated the same, and the error reported is appropriate for the situation (expecting an operator, comma, closing parentheses, etc.). However, when the translator is expecting a non-reference operand, the unrecognizable character error is reported as some type of expecting expression error is and numeric constant errors are reported as is (one of the six). For reference operands, the translator reports the appropriate expecting variable error.

A while ago, an operand state was added to the parser so that negative constants would be correctly interpreted instead of the unary negate operator and a positive constant. The get number function checked the operand state when a '-' appears at the beginning of the number. For the operator state, a '-' this function terminated indicating a numeric constant was not found, and it would then be interpreted as an operator by the get operator function.

To simplify checking for parser errors in the translator and ease the transition to exceptions, the parser was modified to return a single error ("unknown token" replacing "unrecognizable character") when in operator state since this error now included unexpected numeric constants. This error will not be seen by the user and is used by the translator to report appropriate errors. In operator state, there is no reason to parse a numeric constant.

The get number function is now only called for the operand state, so this function no longer needs to check for the operand state itself. Since the operand state variable was now only used in the main operator function, it no longer needed to be a member. The boolean argument value was also changed to a enumeration class (named State with values of Operator and Operand), so values are more explicit in calls to the parser operator function.

The default value for this state argument was also removed, requiring an argument value to be added from the Tester class call, which previously used the default (incorrectly operator state). The operand state is used so that numerical constants are parsed (since numerical constants are no longer parsed in operator state).

These changes caused a problem in the translator where the wrong error was reported when assigning a numeric constant (for example, 2=A) as the parser was now returning an unknown token error for numeric constants since the translator requested a command token in operator state. Previously a valid non-command token (numeric constant) was returned and passed to LET translate routine, which reported the appropriate error ("expected item for assignment"). Now with an error, the "expected command" error was reported.

The get commands function was modified by adding the Any data type argument to the get token call, which puts the parser in operand state. This works because any token not a command token is passed to the LET translate routine, which reports an error for any non-reference token. The check for an error from this get token call was no longer necessary since error tokens are passed to the LET translate routine (which reports the appropriate error).

These changes affected a result for parser test #3 (number tests) with the '-2147483647' number, which was being parsed as a negate operator followed by a positive integer. This test was intended to check for the maximum negative integer constant, which it was now since the operand state is now being used for testing. Two addition numbers were added to check one beyond both the maximum positive and negative constants (parsed as double constants). The results for parser test #5 were also updated for the change in the "unknown error" status.

[branch parser commit 04c4fea820]

Sunday, October 19, 2014

Parser – Errors (Exceptions)

One C++ feature not currently being used are exceptions (though exceptions were used a while back for table initialization, but this code was removed when this initialization was redesigned). The Qt library functions do not use (throw) exceptions, but the Standard Template Library (STL) functions can.

It is possible that exceptions could be used by the parser to throw exceptions for parser errors, which fall under two types, errors with constants (six of them related to incorrectly formed numbers or numbers out of range) and unrecognizable characters. Exceptions may also be able to be used for translator errors, but this will be considered later when the Translator class undergoes improvements.

The handling of parser errors was recently redesigned (see post), where the goal was to remove the dependency on the Qt translations functions for the error messages. This design still requires the caller to ask the parser for the error status code when it sees that the last token returned has an error. Before adding exceptions, some additional improvements can be made to the Parser class that will simplify the use of exceptions.

Saturday, October 18, 2014

Parser – Function Operator

The Parser class has a single purpose, to take an input string and return tokens of this string. This is basically like how the special function operator class works. So the first improvement made to the Parser class was to change it to a function operator class.

The set input function with a single string argument used to set the input string, plus initialize the position and operator state members, was removed. A input string argument was added to the constructor and initialization of these other members were added. The token function used to return pointers to tokens from the input string was changed to the operator function:

TokenPtr token(bool operandState) → TokenPtr operator()(bool operandState)

Callers now set the input through the constructor and retrieve tokens like this:

QString input = {...};
...
Parser parse {input};
...
TokenPtr token = parse();

Note that the instance was renamed from "parser" to "parse" as this makes the code a read a little better. No argument is shown because operand state argument has a default value (which is not shown in the function definition above).

The Tester class for the most part looks like above except that the input string is a standard string and is converted to a C-style string with the c_str() string member function (which is then implicitly converted to a QString). This won't be needed once the parser uses standard strings.

The Translator class changes are a slightly different form because it contains a parser pointer member defined as a std::unique_ptr. Previously, a single instance was created for the life of the translator instance. There is no reason to do this since there is nothing in the parser instance that needs to be retained between translations. A new parser instance is created for each translation and must be deferenced to obtain tokens:

m_parse.reset(new Parser {input});
...
token = (*m_parse)(operand);

And finally before returning, the translate function resets the parser member pointer to the default pointer (calls the reset function with no argument), which deletes the parser instance. The Translator class will also be changed to a function operator class so this final reset won't be necessary.

[branch parser commit 9e782539f3]

Utility – Base File Name

Most of the simpler transition to using the STL classes has been completed, though there is still quite a few Qt classes in the non-GUI classes. For example, the string member of Token class is still a QString, but the Parser class, needs significant changes to use this member as a standard string. Since these non-GUI classes need major changes, each will be handled in separate topic branches. Before concluding , the stl branch, some minor refactoring was done.

The base file name function was created in the Command Line class when Qt dependency was removed from the Tester class. This function takes a standard string file path argument and returns a standard string base file name, but uses a QFileInfo function to do its work (which is the easiest platform independent way to handle file name paths, because for instance, Windows and Linux use different directory separator characters - back slash vs. forward slash).

All Qt dependency has been removed from the Command Line class except for this static function. There was no other logical class to put this function so that it could be used by both the Command Line and Tester classes. A new Utility class was created to hold this function. Its header file includes the standard string header file and its source file contains the QFileInfo header, which shields the users from having to know about Qt. This class, like the Status Message class (see post), was made so that it can't be instanced or used as a base class. (Other similar functions can be added in the future.)

The Tester class had one remaining dependency on the Command Line class. The instance pointer of the Command Line is passed to run function as an argument. This instance was only used to call the copyright statement function. This argument was changed to a standard string for the copyright statement, which is now generated in the Command Line constructor and passed to the run function.

[branch stl commit 3020cd6827]

This concludes the initial (simpler) changes transitioning non-GUI classes to STL use. The stl branch was merged into the develop branch and deleted. A new branch will be created for the next set of C++11/STL related changes, which will be the replacement of Qt with the STL in the Parser class.

[branch develop merge commit a8bd956bb0]

Friday, October 17, 2014

Command Line – File Path

There are a few more items in the Command Line class that are dependent on Qt. One of these is the file name member, which contains the path name if a file was specified on the command line. The member along with it access function was changed to a standard string.

The Main Window class also contains a file name member that holds either the path of the file specified on the command line (obtained from the command line instance) or the last file that was loaded. This member was also changed to a standard string. The program path is also passed to many of the the functions within this class. These were modified to take a standard string.

The version function in the Command Line class was modified to return a standard string. This function first converted the C-style release string to a QString, then the first digit if found using a QRegExp with the index of function. A std::regex class is new to the C++11 STL, though unfortunately, this class is not implemented in GCC 4.8. There are a number of possible solutions to accomplish the same thing, but a simple C-like loop to look for the first digit was selected because the release string is a C-style string. Once the first digit is found, a standard string is created from the point of this digit character and returned. This function was made static since it doesn't use any members.

The copyright statement function in the Command Line class contained a translate call for the "Copyright" word. This function is called from the Tester class (no translation needed) and from the About box in the Main Window (translation needed). The function was modified to take the copyright string as an argument with a default of the untranslated copyright word. The About box passes the translated word. With this change, the translate macros could be removed from this class.

A problem was corrected in the Command Line constructor where the file name on the command line wasn't stored in the file name member, so the file name on the command line was ignored. This problem occurred when the argument list was changed to standard list of standard strings.

The constructor of the Main Window was modified to better handle the error when the command line file doesn't exist or the last used program no longer exists. When the command line file doesn't exist, an error is output to the standard error stream. When the last used program no longer exists, a warning box is displayed.

[branch stl commit c998b1ffaa]

Thursday, October 16, 2014

Memory Testing Issues – Resolved

After discovering a default Mint 13 system (kubuntu backports not used) containing Qt 4.8.1 did not exhibit the sporadic memory errors, some further investigation was done. The errors also did not occur when Qt 4.8.2 was built from source. Before blaming the Qt 4.8.2 from the kubuntu backports, the build directory was wiped and the application was rebuilt from scratch. The sporadic memory errors were no longer occurring, so there must have been a corrupted file in the build directory causing the errors.

Some memory testing investigation was also done on Mint 17 (based on Ubuntu 14.04). The conclusion previously was that valgrind 3.10.0.SVN reported errors differently than 3.7.0 (Mint 13) or 3.9.0 (built from the latest source available). The source for 3.10.0 is now available and 3.10.0 built from source on Mint 13 did not report any additional memory errors. The issue was found to be with the ld-2.19.so library on Mint 17 (Mint 13 has ld-2.15.so). This library appears to contain low-level memory allocation functions.

A different error suppression file was needed for the newer version of this library. The CMake build file was modified to detect the presence of ld-2.19.so (either 32-bit or 64-bit). If present, then a different error suppression file is copied to the build directory. This error suppression file generated on Mint 17 is independent of the version of Qt (the Qt libraries are not referenced), so no configuration of the file is needed.

The error suppression files generated for Qt 4.8.2 and 4.8.6 are also independent of the version of Qt, however, the one generated for Qt 4.8.4 is not. To create a suppression file that works with all versions (at least the ones tested), the suppression file needs to be configured for the specific version of Qt. The file generated from Qt 4.8.4 was used, and this file works with Qt 4.8.1, 4.8.2, and 4.8.6 once all references of "4.8.4" along with the installation directory of Qt are changed.

There are now two error suppression files, one for Mint 13 (ld-2.19.so not present) and one for Mint 17 (ld-2.19.so present). The one for Mint 13 is configured for the version of Qt detected, but the one for Mint 17 is not. Mint 17 has Qt 4.8.6, so no other version of Qt should be present. This commit was put in the develop branch since it is not related to the STL changes.

[branch develop commit 2fb73b6892]

Wednesday, October 15, 2014

Tester – Standard Input Streams

The run function of the Tester class either read input from the console (standard input) or from the specified test file. A QTextStream was used to read the input and was either opened to the standard input device or the test file. The standard input stream does not work the same way in that an input stream is not opened with a device. Fortunately the sections of code for reading from the console or a file was separate, so input can be read directly from standard input (std::cin) or from an input file stream (std::fstream).

Since reading from standard input or input file stream returns a standard string, the rest of the code in the tester class (the various functions for processing input) was modified to use standard strings. However, where these functions interfaced to the various classes not yet converted to use standard strings, the string is converted to a C-style string that is acceptable for a QString argument (via implicit conversion). This is temporary until those classes are modified.

[branch stl commit 2e57346eab] (The memory bug has been resolved, see next post.)

Sunday, October 12, 2014

Tester – Options List

The Tester class contains a static options function that returned a string list of supported testing options. The constructor of the Command Line class obtained the testing options and added them to the options it supports (version and help) to generate the usage string with the program name. The testing options in the list were joined with a vertical bar ('OR') separator character using the join function of QStringList.

There was no reason to join the options in this way to generate the usage string. The options function was changed to simply return its part of the usage string (with the vertical bar characters between the testing options) as a std::string. The Command Line constructor was modified accordingly.

[branch stl commit fc61d53041] (The Qt 4.8.2 memory bug persists.)

Command Line – Standard Argument List

The Tester class still contains functions that have an argument or return value that is a QString or QStringList that need to be changed. The first of these functions modified was the constructor that had a string list argument for the command line arguments. These arguments come from the Command Line constructor, which in turn is passed from the Main Window constructor that obtains them from the Qt application instance.

The Main Window constructor was modified to convert the QStringList arguments to a standard list of standard strings. There are Qt functions for converting lists and strings to the standard equivalents, but there is no function for converting a string list. A simple for each loop was added to iterate through the argument list and add (emplace back) to a standard list converting each element to a standard string. This list is then passed (moved) to the command line constructor.

The Command Line constructor was modified to convert the first argument (the program path name) to a base file name and store it into the program name member (which was changed to a std::string). This first element is removed from the list. The rest of the constructor was modified to treat this list and a std::list with one less element. The is version output and is help option functions were modified similarly. The program name access function was removed since there were no callers.

The Tester constructor was modified to take the program name and argument list (less program path) as separate arguments. The program name is used as the initializer of the program name member and the rest of the constructor was modified to treat this list and a std::list with one less element.

A minor bug was also discovered and corrected in the translate input function of the Tester class where the header argument was not being output when it was present. The bug was recently added when the RPN list text function was changed to a standard put stream operator function.

[branch stl commit 3d3f0d3e5c] (The Qt 4.8.2 memory bug persists.)

Tester – Removed Translation Calls

The rest of the translation tr() calls were removed from the Tester class, which allowed the Qt translation functions declaration macro to be removed. A number of changes were made to the Command Line class to support these changes.

The copyright statement was put into a constant C-string with a Qt translate macro, which allowed for a delayed translate call. This string contained QString style place holders for the program name (or version string for the GUI about box) and the copyright year. There was a static access function to obtain this string, which the callers filled in as desired.

These were replaced with a static copyright statement function, which internally writes to the std::ostringstream and then returns the result as a std::string. The program name or version string is no included as the caller is now responsible for this. The copyright year function used to access the year was removed and the year is used directly since this was the only user. The main window about box function was updated for this new function.

The warranty statements were put into an array of constant C-strings with Qt translate macros, which allowed for delayed translate calls. There was a static access function to obtain this array. This function was only called from the Tester class (the GUI has a different statement). These were removed and the Tester class now outputs these strings directly without translation.

While making these changes, it was noticed that the copyright year set in the CMake build file had not been updated for 2014. The application version numbers (major, minor and patch), copyright year and release string are transferred to the source code via a template input file from which CMake creates a header file. This file contained C-style preprocessor defines, which are not type-safe, and were changed to type-safe C++11 constexpr statements.

[branch stl commit d399ff535f] (The Qt 4.8.2 memory bug persists.)

Saturday, October 11, 2014

Tester – String Members

The next items in the Tester class changed were the string members, the program name, test file name, and error message members, which were changed to std::stringg. The access function for the error message member was also changed to return a std::string. There were two considerations with these changes.

The first consideration was the issue of [language] translations, which were used with various strings for generating error messages. I decided for the testing part of the code, that translations are not necessary. The testing code is for testing the internals of the application and any output does not need to be internationalized. Therefore, translations (calls to the tr() function returning QStrings) will be completely removed from the Tester class, which started with these error messages.

The second consideration was the QFileInfo class, which is used to extract the file name from a file name path. There is no equivalent functionality in the standard library. It is not appropriate to create a similar function when it already exists. The Boost C++ library does have equivalent functionality, but there is no reason to add another dependent library when Qt is already present and provides the functionality.

Since the goal is to remove all Qt dependencies from the Tester class, a base file name static function was added to the Command Line class (which functions of the Tester class already call; and is the owner of the Tester class instance). This function takes a std::string file path input and returns a std::string base file name using QFileInfo. It is expected that the Command Line class will continue to use Qt since it interfaces with the main application.

[branch stl commit 986268caf7] (The Qt 4.8.2 memory bug persists.)

Friday, October 10, 2014

Tester – Option Enumeration Class

The Tester class is next to transition from Qt to the STL. When enumerations were changed to C++11 enumeration classes, the Option enumeration in the Tester class was excluded because it was used for a loop iterator and contained a number of enumerators used for various purposes. These needed to be removed before changing this enumeration to an enumeration class.

The none enumerator used to indicate no option was removed. The default Option enumerator, Option{}, will be used to indicate no option. The first enumerator (parser) needed to be set to 1 for this to work (like done with other enumeration classes).

The size of enumerator was used to dimension an array of strings for the names of each of the options, which is used to compare the command line file name to select the appropriate test option. The option enumerators were used as indexes to set the elements of this array. This method is error-prone (elements could be missed), and enumeration class enumerators cannot be used as indexes. This array was changed to a std::unordered_map with an initializer list for each of the options except for the recreator. The size of enumerator was removed since it is not needed to initialized the map.

The first and number of enumerators were used to bound the loop through the names of the options to compare to the file name from the command line to set the appropriate option. The number of enumerator did not include the recreator name (there are no recreator test files). This loop was changed to range-for iterating over the items in the new name map (why the recreator name was not put into the name map). The starts with Qt comparison call was changed to the standard equal function using a C++11 lambda to do a case insensitive comparison.

The option member was set to the error enumerator when an error was detected to indicate an error. For each error, the error message string was also set. The error enumerator was not needed since a non-empty error message string can be used to indicate an error, and was removed. The has error access function was changed to check for a non-empty error message string.

The option member was defined as an integer so that it could be set to the loop iterator variable (an integer). Since the option enumerator value is directly accessible in the range loop (the key in the iterator), the option member could be properly re-typed as an Option enumeration class variable. The test name member was changed to a standard string (to match value in the name map) and the arguments to the is option function was changed to standard strings.

[branch stl commit 5c3a7d6141] (The Qt 4.8.2 memory bug persists.)

Thursday, October 9, 2014

Dictionary – Put Stream Operator

The Dictionary class has now been converted from using Qt to the STL, except for its debug text function. This function was changed to a put stream operator in the tester source file. Unlike the other two put functions, this put function was made a friend class so that it can access private members. The function was modified to write to an output stream instead of into a string.

The writing of a header string was removed from the new put stream operator function since another argument can't be added. This is not an issue since the caller can write these header strings. The output of all the dictionaries was contained in a dictionaries debug text function in the Program Model class. This function was only called once and was removed with the same functionality put into the run function of the Tester class where it was called from.

This concludes work on the Dictionary class. The sporadic memory mentioned in the last post is still occurring (with Qt 4.8.2 only).

[branch stl commit 1cbe8e41ea]

Wednesday, October 8, 2014

Dictionary – Improved Case Handling

Three of the dictionary functions converted the key to look up to upper case if the dictionary is not case sensitive (four of the six dictionaries). Also, essentially a duplicate of the key was stored in the key list vector and the key map, except that the key was converted to uppercase in the key map (excluding two of the six dictionaries) and so the key map could not be used to reproduce the string of the dictionary entry (hence the duplicate key list).

The std::unordered_map class by default uses the standard hash function to put keys with their value into buckets and the standard equal function for comparing keys already in a bucket (to determine if the key is already in the map). By providing case insensitive hash and equal functions, unconverted strings can be stored for the keys. The Key Map alias was modified to add key hash and equal structures:

using KeyMap = std::unordered_map<std::string, EntryValue, KeyHash, KeyEqual>;

Both of these private structures contain a case sensitive member. The function operator function of the key hash checks this member, and if case insensitivity is not selected, the standard hash function is called. Otherwise the key string is converted to upper case into a temporary string, which is then passed to the standard hash function:

struct KeyHash {
    CaseSensitive caseSensitive;
    size_t operator()(const std::string &s) const
    {
        if (caseSensitive != CaseSensitive::No) {
            return std::hash<std::string>{}(s);
        }
        std::string s2;
        std::transform(s.begin(), s.end(), std::back_inserter(s2), toupper);
        return std::hash<std::string>{}(s2);
    }
};

The key equal structure is setup similarly except the function operator function has two constant string references arguments (to compare) and returns a boolean value. If case insensitivity is not selected, the strings are compared with the equality operator. Otherwise the strings are not equal if their sizes are not equal. For equal length strings, the characters of the strings are looped through converting each to upper case before comparing. A mismatch indicates the strings are not equal. If the end of the loop is reached, the strings are equal.

By default, std::unordered_map constructs defaults for the hash and equal structures. Both of these structures do not initialize the case sensitive member. Instances of these structures can be given to the map during initialization, which is done in the constructor (the 10 is the number of hash buckets, which is the default if not specified):

Dictionary(CaseSensitive caseSensitive = CaseSensitive::No) :
m_keyMap {10, KeyHash {caseSensitive}, KeyEqual {caseSensitive}} {}

The key list vector was removed. However, a way to look up dictionary entry keys by index was still required, so a vector of iterators was added in its place. Each iterator points to a key/value pair in the key map. When a new entry is added, its iterator is put into this vector. When an entry is removed, the end iterator of the key map (points to one past the end) is put into this vector.

In the add function, the map emplace functions return the iterator of the key/value inserted. This iterator is put into the iterator vector. The remove function no longer needs to find the iterator for the index, which is now obtained from the iterator vector. After removing the key/value, the iterator for the index is set to the key map end iterator. The debug text function also no longer needs to find the iterator for the index, and it checks if the iterator for the index is the end iterator to determine if there is no entry for the index (instead of a blank key).

For some reason, there is a sporadic memory error occurring when memory testing with these changes. It is another error from Qt, but it doesn't occur with the same tests each time. It is also only occurring with Qt 4.8.2, and not with Qt 4.8.4 or Qt 4.8.6. For now, this problem will be monitored.

[branch stl commit 1f2483e82c]

Saturday, October 4, 2014

Dictionary – Entry Use Counts

Mentioned in the last post, the use count vector member was removed and a use count was put into the value of the map. A new private Entry Value structure was created to hold the index and the use count of the entry. Both were defined as unsigned 16-bit integers. A constructor was added taking a single index argument and the use count is initialized to 1.

For the add function, the entry in the use count vector no longer needs to be added or initialized (in the case of a reused entry). Instead of assigning a key to an index using the bracket operator, the map emplace function is used with a single index. The use count get set to 1 whether it is a new or a reused entry by the constructor of the Entry Value structure. For an existing entry, the iterator is used to increment the use count.

For the remove function, since the use count is no longer available by index, the index is used to get the key from the key list member. The key is converted to upper case for case insensitive dictionaries and used to find an iterator for the key. While the index is available in the iterator, it is used to clear the entry in the key list and pushed to the free stack. The iterator is used to erase the entry from the key map (entries can be erased either by key value or iterator).

For the debug text function, since the use count is no longer available by index, a non-empty entry in the key list is used to identify if an entry is used. The key is used to find an iterator to the entry. An error check was added if the key was not found (shouldn't happen). The iterator is used to get the use count. When outputting the indexes of the free stack, the check for a non-zero use count was removed since the use count is no longer available (the entry in the map with the use count was erased, and an entry is not pushed to the free stack unless the use count was zero).

[branch stl commit e79c179ad5]

Dictionary – Case Sensitive Member

The next changes for the Dictionary class are some improvements in its implementation. One of these changes is to integrate the use count into the value of the key map instead of using a separate vector for the use counts of each entry. Putting the use count into the key map value will require the use of iterators to get to the use count since they will no longer be in an indexed vector.

The debug text function (which will be changed to a put stream operator function) will need to use an iterator to look up a key to get the use count. When using iterators, the debug text function will also need to convert the key to upper case for case sensitive dictionaries, which implies that it would also need a case sensitive flag argument like the add and remove function.

This made me realize that passing a case sensitive flag to all of these functions was not the best design. A better design is to have a case sensitive flag as a member variable initialized once when the dictionary is constructed. Using arguments allows different case sensitivities on a dictionary and this will cause problems. A case sensitive member was added, the constructors were added or modified, and the case sensitive arguments were removed.

[branch stl commit b78217c5a4]

Dictionary – Key Map

The remaining member of the Dictionary class to change to a standard class is the key hash, which was defined as a QHash. This class is equivalent to the std::unordered_map, which also stores keys using a hash. The member was renamed to key map and a Key Map alias was added since the the map type is rather lengthy:

using KeyMap = std::unordered_map<std::string, int>;

The add routine, was changed to use the key as a standard string. The standard string class does not have a member function for converting the string to upper case. For case insensitive dictionaries, the string is converted to upper case using the standard transform function:

std::transform(key.begin(), key.end(), key.begin(), toupper);

The first two arguments specify the range of the string to transform (the entire string). The third argument specifies the destination (back into the string). The last argument is function that takes and returns a character. To find a key value in the map, the map find function is used, which returns an iterator. A key not found is detected if the iterator is the end iterator of the map. The value in the map (the index of the entry) is obtained using the iterator:

auto iterator = m_keyMap.find(key);
if (iterator == m_keyMap.end()) // key not found?
...
index = iterator->index;

Similar changes were made in the remove function for converting the key to upper case for case insensitive dictionaries. The map erase function is used to remove a key from the map.

[branch stl commit 66561d4532]

Thursday, October 2, 2014

Dictionary – Start STL Transition

The Dictionary class will be next to have its debug text function changed to a put stream operator function. Unlike the RPN List and Token classes, access to some of private variables will be necessary, so the put stream operator function will need to be a friend function. First, these private variables needs to be changed from Qt classes to STL classes. Other dependencies on Qt will also be removed. This was started with these changes:

Changed the quint16 type to the standard uint16_t type.
Changed the use count list member to a standard vector.
Changed the free stack member to a standard stack.
Replaced the use of the Q_UNUSED macro to just (void), which is just as simple.
Replaced the use of Qt case sensitivity enumerators with a simple yes/no case sensitive enumeration class. The enumerators were never passed to Qt functions, so there was really no reason to use the Qt enumerators.

The next change was to change the QStringList for the list of keys with a standard vector or standard strings. The string access function was changed to return a standard string, which required the change of the return value of the various operator text functions accessing dictionaries to a standard string. Several conversions from and to QString values were temporarily added as needed until more variables are changed to standard strings (though some will be needed when interfacing to the GUI).

[branch stl commit 84875530db]

Wednesday, October 1, 2014

Token – Put Stream Operator

The Token class was next to have its text function changed to a put stream operator function. Again, it was not necessary to make the new function a friend of the Token class since there are sufficient access functions. The put stream operator function for the RPN List class was changed to use the new token put stream operator.

[branch stl commit 2b5c581d31]