Sunday, November 16, 2014

Translator – Function Operator

The Translator class has a single purpose, to task a BASIC input string and create an RPN list representation of the BASIC code.  This is similar to the Parser, Tester, and Recreator classes, which were already changed to be function operator classes.  This started with renaming the translate function to operator().

Like the recreator where a temporary instance is used, the translator can be used in the same way except that the input string is passed to the constructor so that it can be used to instance the parser (which lives throughout the translation).  The parser no longer needs to be instanced at the beginning and reset before returning in the function operator function.  Since only a temporary instance is needed to translate, the translator member pointers in the tester and program model classes were removed.

The clean up function was called when the translator returned an error, which deleted the hold and done stack items, cleared the RPN output list and reset the pending parentheses token pointer.  Now at the end of a translation, including when an error is detected, the temporary translator instance goes out of scope, all all these actions occur automatically except for the clearing of the output list (since it is moved to the caller upon return).  This clean up function was removed.  For an error, the output list is cleared.

[branch misc-cpp-stl commit b597b5b0a7]

Saturday, November 15, 2014

Program Model – Standard Strings

To complete the transition of the program model to the STL, all of the strings were changed to standard strings.  This included the return value of the line text function, the text member of the line info structure, and an argument of the update line function (where an rvalue reference was used since the caller no longer needs the string passed so it is moved to the function).

The QStringList argument of the update function was changed to a std::vector<std::string> (the STL has no string list class) rvalue reference type (the caller no longer needs the vector passed so it can be moved to the update function).  The update function was defined as a public slot, but was no longer being used as a slot, so it was changed to a regular public function.

All callers to the modified functions were modified accordingly.  This included the translate function of the translator where its input argument was already being converted to a standard string to pass to the parser constructor.  There were four functions that put function pointers into a temporary variable.  These were changed to if-statement scoped variables using the auto type for convenience.  This concludes work on the program model.

[branch misc-cpp-stl commit 2f7a0bb119]

Program Model – Line Info List

The program model contains a list of information for each line in the program code vector including the offset into the code vector, the size of the line, and if the line has an error, the error index for the line and its original text.  This member list is an instance of the Line Info List class also defined in the Program Model class.

The Line Info List class is another class that derived from a container class (QList).  This class was changed to instead contain the standard vector as a member (QList is implemented as an array internally similar to a vector).  The existing adjust (private), replace, insert and remove at functions were modified accordingly.  The remove at function was renamed to erase for consistency with STL naming conventions.

Necessary vector access functions were added to the Line Info List class including bracket operator for element access, constant bracket operator for constant element access, size of vector, and clear vector.  Since the bracket operator is used frequently to access the offset and size line info structure member variables, specific offset and size element access functions were added taking a line number argument.

The STL convention is to use size to represent the number of items compared to the Qt convention is use count.  Many of the variables and functions were naming using the Qt convention.  The variables in the program model functions were changed to the STL size name.  The program model also contained two temporary access functions (line offset and line size) that were not being used and were removed.

[branch misc-cpp-stl commit fd24923d76]

Program Code – Standard Vector

The Program Code class holds the code of the program in a vector.  The vector consists of program words.  This class was derived from the QVector, which exposes all of base functions.  This not the best object-oriented design practice, and was redesigned to instead contain a standard vector member.  The program words are defined by the Program Word class, which consists of various access functions to access the code, sub-code and operand components of program word.

Several functions are needed to access the member vector including the begin iterator, end iterator, empty status, size, clear vector, element access bracket operator ([]) and the emplace back functions.  These functions simply pass through to the code vector member.  The main program code insert line, remove line, and replace line access functions were modified to use standard vector functions (insert and erase or using the direct standard copy) to manipulate the code vector instead of the raw memory move function previously used.

Using the vector functions simplified the code, but since there is no function to replace part of the vector with a different size part, the replace line still contains various checks and operations depending on size of the old and new lines.  For an empty new line, the remove line function is called as before.  For a same size or larger new line, the part of the new line that will fit in the space of old line is copied.  For a larger new line, the rest of the new line is then inserted.  For a smaller new line, the new line is copied, and the remainder of the old line is erased.

The debug text, dereference and decode functions of the program model previously obtained a pointer to the first word of the program line by adding its offset to the the raw data access function.  These were changed to obtain a vector iterator to the line by adding the line offset to the begin iterator of the program code vector.  The C++11 auto type was used to define these iterators, which is simpler then using std::vector<ProgramWord>::const_iterator or the ProgramCode::const_iterator alias added to the program code class.

The encode function of the program model previously created a sized vector for the new line (using a constructor taking a size argument), and each word was set to the instruction or operand information from the RPN list.  This was replaced by starting with an empty vector and using the emplace back template function for each instruction and operand word.  The set instruction and set operand functions of the program word class were replaced with corresponding constructors needed for the emplace back function.  The size constructor was removed.

The Program Word and Program Code classes were moved to a new header file since the program model header file.  The  insert line, remove line, and replace line functions were kept as in-line functions in the class definition.  Each of these functions are only called once, so in-lining is appropriate.  This header file did not show up in the project file list in QtCreator (the project built fine) because there was no associated source file.  A new headers variable was added to the CMake build file with this new header file and the main program target was made dependent on this variable so that QtCreator knows about this new header file.

[branch misc-cpp-stl commit c7080421cb]

Thursday, November 13, 2014

Program Model – Debug Text

There is a debug text function in the Program Model class used for the temporary program view in the GUI and by the tester class.  This function used two functions from the Program Word class, the instruction debug text (for instruction words) and operand debug text (for operand words).

The debug text function was modified to return a standard string.  To implement the building of the string, an standard output string stream is used, which make it easy to build up the string especially for numeric data types (the QString::arg function was being used for this purpose).  Once done, the string of the output stream is returned.

Since an output stream is being used, it made sense to make the instruction debug text function an overloaded output stream operator.  This function only used access functions of the program word, so it didn't need to be a member of the program class (or a friend function).

The operand debug text function only contained a single line and also used a program word access function, so it also didn't need to be a member of the program word class.  It was passed the text to output with the operand integer value, and was only used by the debug text function, so it made sense to remove this function and just do the functionality directly in the  debug text function.

[branch misc-cpp-stl commit 645eff8d1f]

Tester – Function Operator/Exceptions

The Tester class is another one-use class that fits the pattern of the function operator class.  The main run function was changed to function operator function.  The caller in the command line constructor was modified accordingly with the instance renamed from tester to test as was done with the parser instances.

The Tester class also had an error mechanism where its error message member was set if an error occurred.  Both the constructor and the run function can generate an error.  The has error access function returned if an error occurred, and the error message access function returned the error message.  These functions were modified to throw an exception containing the error message (a standard string).  The function operator function (formerly run) no longer needs to return success status as a boolean.  This simplified the command line constructor since errors from both functions are caught with the same section of code.  The error message member and its access functions were removed.

The redundant void was also removed from tester function definitions that don't have arguments.  This was a practice I used when working with C code where the void in the arguments of a function definition indicates no arguments, as opposed to an empty parentheses, which could also indicate the old Kernighan and Ritchie (K&R) style function definition, which preceded the typed function definitions introduced with the first ANSI C standard.  This void usage is used throughout and will slowly be removed as there is no reason to use it anymore.

[branch misc-cpp-stl commit 738eba02e1]

Wednesday, November 12, 2014

Dictionary Information – Standard Classes

When the dictionary classes were changed to use the standard classes, the derived constant number and string information classes were missed because they were in different source files.  For adding elements, these classes also used the method of increasing the vector size by one and then setting the new element.  This causes the default constructor to be called when the size is increased, and then a copy when the element is set.  It is more efficient to use the C++11 emplace functions to construct directly to the new element.

The info dictionary add function previously did the add element (increase vector size) and set element as two separate calls.  The add function was changed to pass the token pointer to the add element function and not call the set element when adding an element.  The add element functions get a token pointer argument to be used with the emplace back function to add the new element to the end of the value vectors (which were changed to standard vectors).  The value vector of the constant string info class was changed to a vector of standard string pointers.  The constant string info destructor was changed to use a C++11 range-for loop to delete the strings in this vector.

When dependency on Qt was removed from the dictionary classes, the quint16 type was changed to the standard uint16_t for holding indexes into the dictionaries (in the program code).  This change was also missed in the constant info classes.  The compiler didn't complain since these two types are the same under the hood.  The constant info classes were changed to use uint16_t.

[branch misc-cpp-stl commit 14d3b329b7]

More Miscellaneous C++ and STL Changes

The Table class is going to be completely redesigned to better utilize C++.  Before starting this major development, there are several more miscellaneous C++ and STL changes to be made to some of the other classes.  The changes are expected to be minor, so development will be done on a single topic branch (to avoid short lived branches).  These changes include:
  • Modifying the dictionary information classes to use STL.
  • Modifying the Tester class to be a function operator class, and use exceptions for error reporting.
  • Modifying the Program Code class to use STL, be in its own header file, and not use a container class as its base class.
  • Modifying the Program Model class to use STL.
  • Modifying the Translator class to fully use STL, be a function operator class, and use exceptions for error reporting.
  • Making other minor miscellaneous changes that come up along the way.

Tuesday, November 11, 2014

Recreator – Function Operator

The Recreator class has a single purpose, to take an RPN list and recreate the original input string of the BASIC code.  This is similar to the Parser class, which was already changed to be a function operator class (see October 18).  The recreator was also changed to be a function operator class.  All that was required was to rename the recreate function to operator().

Unlike the parser that is instanced with the input string and tokens are repeatedly obtained using  the function operator function until either an error (exception) occurs or an end-of-line token is returned, the recreator is single use and its function operator function will only be called once for a given RPN list.  Therefore, only a temporary instance is needed - no member or local instance is required.

The users of the recreator, the tester and program model classes, previously contained a pointer to a recreator instance, which was created by their constructors.  The recreator instance was deleted automatically when these class instances were deleted because a standard unique pointer was used.  Since the recreator is single use, only a local instance is required and these members were removed.  Instead of creating a local variable instance, a temporary instance is used (the instance is created and deleted during the statement):

Recreator recreator;           →      string = Recreator{}(rpnList);
string = recreator(rpnList);

Since this is the last change to be made to the recreator class, the redundant 'void' keywords were removed from several of the recreator function definitions, and the use of Q_UNUSED macro was replaced with the '(void)' syntax on unused variables in the various recreate functions.  This concludes work on the recreator class, so the recreator branch was merged to the develop branch and deleted.

[branch recreator commit a0e99f6dcf]

[branch develop merge commit dc4dd26876]

Monday, November 10, 2014

Recreator – Standard Strings

The strings in the recreator stack item and several local stacks were changed to standard strings along with the recreator output string.  The QString::append function was previously used to append to these strings.  The std::string class also has an append function but it does not support a single character argument (though does support a count with the single character).  Use of append function was replaced with the addition assignment operator (+=).

The append and top append functions of the recreator are still used by the various external recreate functions.  Unlike with QString where a plain character is implicitly converted to a QChar, which then implicitly converted to a QString, the std::string class does not have similar functionality.  For the same functionality, additional append and top append functions were added that take a plain character argument.  All of the append functions use the addition assignment to append to the strings.

The append and top append functions, along with the pop with operands function, were changed to take a string rvalue reference argument (std::string &&).  In most cases, the argument given to these functions is a temporary value, which using this type moves the temporary value to the function instead of copying the value.  In other cases, the variable passed was no longer needed and was going out of scope, so the variable is passed via the std::move function.  The advantage with using an rvalue reference argument is that it requires a temporary, so an error is given when the argument not a temporary value or the variable is not moved.

There are several cases in the recreator functions where values are obtained from the table.  Since the table functions are still returning QString values, a call to the toStdString function was added.

The constant string recreate function used the QString::replace function that substitutes all instances of a particular character with a string.  This function was used to convert all of the double quote characters in the constant to two double quote characters.  There is no equivalent easy to use standard function to do the same thing.  A range-for statement was added to iterate though the string making a copy into a local string.  For each double quote character added, a second double quote is added.

The recreator contained an output is empty access function, for determining if the output string was empty, and a output last character function, for returning the last character added to the output string.  These functions were only used by the remark recreate function to check if the last character added to a non-empty output was not a space (to see is a space should be added before the REM operator).  These functions were replaced with the single back is not space function.

[branch recreator commit 56b40e3411]

Sunday, November 9, 2014

Recreator – Standard Stacks

The stack member was changed to a standard stack.  Two local stacks were also changed to standard stacks (in the assign recreate and push with operands functions).  The stack access functions were modified accordingly:
  • The push function was replaced with the emplace function defined as a template member function to forward the arguments to the stack emplace function (see details below).
  • The pop function was replaced with the pop string function with basically does the same thing except is not longer checks if the stack is empty (which was only needed during initial development of the recreator).
  • The top function was replaced with individual functions for returning the precedence and unary operator flag members of the top stack item.  No similar function was added for the string member as there are other functions for accessing the string member: pop string, top append, and new top add parentheses.
  • A new top add parentheses function was added to add parentheses around the string of the top stack item.  This function is only called by the parentheses recreate function.
  • The precedence and unary operator pointer arguments were removed from the pop with parentheses function since they were only used by the parentheses recreate function (which now has its own access function).
  • The stack is empty function was renamed to just empty to be consistent with the other stack access functions (which don't have the stack name included) and with the STL naming convention.
The previous push function resized the stack up by one element and set the members of the new item on top of the stack.  This caused the default constructor to be called for the new item and the values are then copied to the item.  The STL emplace functions allow constructions of the new item to be called directly eliminating the extra copy (one to the function arguments, and another to the item).  To allow the same functionality, the new emplace stack access function was defined as a variadic template member function:
template <typename... Args>
void emplace(Args&&... args) {
    m_stack.emplace(std::forward<Args>(args)...);
}
This utilizes the new C++11 perfect forwarding feature, allowing a function template to pass its arguments through to another function and avoids unnecessary copying.  What the above template does is pass the stack item constructor arguments to the stack emplace function (using the std::forward template function), which are then passed to the stack item constructor when the new item is created on the stack, without copying the arguments.

To use the emplace stack function, a constructor was added to the stack item structure.  The constructor has arguments for each of the three member variables.  The precedence and unary operator arguments were made optional with defaults (from the replaced push function).  The member variables of the stack item were also renamed with the "m_" prefix.

[branch recreator commit 5451552470]

Saturday, November 8, 2014

Recreator – Separator Member

Since there were a number standard string to c-style string conversions to convert to a QString in the recreator functions, the Recreator class is next to be transitioned to the STL.  This was started by changing the separator member from QChar to char along with its access functions.  An initializer for the separator member was added since the plain char type does not have a default constructor like the QChar class.

Two uses of the separator access function that required special handling.  These were in the input assign recreate and assign string recreate functions.  Previously the separator was added to other literal character constants.  Now that the separator is the plain char type, the compiler will treat the plus operator as addition instead of concatenation.  The separator is temporarily converted to a QChar so that the plus gets compiled as string concatenation instead of character addition.

[branch recreator commit 3b5b789198]

Token – Standard String Member

It turned out that changing the token string member to a standard string did not require a lot of other changes.  The string access functions were modified and the c_str function call was removed from the constructor initializers.  The equality operator function was modified to use straight equality operator for case sensitivity comparison (REM, REM operator and string constants) and the new no case string equal function for case insensitive comparison.

In the rest of the code, the c_str function call was removed where a standard string was present, and the toStdString function call was removed from the token string access function where the result is put into a standard string or output in a standard output stream.  In several places in the recreator functions, a c_str function call was added where a QString is needed (the c-style string returned is implicitly converted).  Finally, one line in the translator using the QString::startsWith function with the case insensitive option was changed to convert the first character to upper case before comparing it (only the first character was being compared).

[branch token commit ee7b926daa]

Since there is no more work for the token class, the short-lived token branch was merged to the develop branch.

[branch develop merge commit ee12854e36]

Utility – Case Insensitive Comparisons

The token equal operator function uses case insensitive string comparisons that need to be replaced with standard string equivalents.  There is no direct equivalent, so the solution used so far was to use the std::equal function passing a case insensitive character comparison lambda function.  Since the std::equal function assumes equal length containers, the size of the strings need to be compared first to make sure the strings are the same length (or at least the primary string is not less than the secondary string when doing a string begins with comparison).

This pattern of comparing the string lengths before comparing strings is repeating, therefore two in line functions were added to do this.  The first, named no case string equal, checks if the string are equal and the second, named no case string begins with, checks if the length of the primary string is greater than or equal to the string being compared.  The lambda function was renamed to no case character equal for consistency.  The lambda function was moved from the main header file to the utility header file along with the two new in line functions.

[branch token commit 1dde64fb7b]

Parser – Standard Input Stream

The parser functions have been modified to use a standard input string stream, so the input member could now be changed to the std::istringstream class, and the input position member removed.  The current input position can be obtained directly from the input stream using the tellg member function.  The skip white space function was removed since this can be done directly on the input stream (in other words, extract white space):
m_input >> std::ws;
In several places, a temporary position or length integer variable is used to get the current position (tellg) or length (length) because these functions return a pos_type and size_type values, which are 64-bit integers.  The token constructors and error structure only accept integers (32-bit).  There is no reason to change the member variable types to 64-bit integers as there will never be input lines or strings that are long enough to require 64-bit integers.

When using an input stream, care must be taken when using the tellg function to obtain the current input position.  This function returns an EOF value (-1) once the input stream has been read past the end.  So this function can't be used is a previous operation could have possibly read past the end.

There times when the input position must be reset (like when the second word of a possibly two-word command is not valid).  The seekg function is used to the input position.  However, this function does not work once the input stream has been read past the end.  This is because the EOF flag is set.  To clear this condition, the clear function needs to be called, so this call precedes all seekg function calls except one where an EOF cannot have occurred.

In the get string function, the characters read are counted so that the length of the string in the input is known when the token is constructed and returned (pairs of double quotes count as one character in the string, but take two characters in the input string, so must be counted as two).  The ending input position cannot be used to determine the length in the input string because the position is not valid if the string constant is at the end of the line (see issue with tellg function above).

The constructor of the parser was changed to take a standard string input.  Both callers were modified accordingly - the tester class already had a standard string, but the translator needs to convert from its QString to a standard string (until the translator is modified).  Dependency on Qt has almost been removed from the parser except for one call to obtain the name for the REM command (which will be handled when the table is modified).

[branch parser commit 8e71a71fd5]

One outstanding item remains - the token string member is still a QString though its constructors have been modified to take standard string arguments.  This will not a trivial change since many users of the token string still expect a QString.  Therefore, this work will take place in a new development branch.  This concludes work on the parser, so the parser branch was merged to the develop branch and deleted.

[branch develop merge commit 2cafb22a8e]

Thursday, November 6, 2014

Parsing Identifiers – Standard Library

The get identifier function was changed to use a standard input stream (again using a temporary input string stream like the previous functions).  This function used scan word support function to look for a word and was renamed more appropriately to get word.  Instead of checking for a REM command first (because unlike other commands, a space is not required after the command), the get word function is called first to get a word.  If no valid word is found, an empty token pointer is returned.

The first check on a valid word is if the word starts with letters in the REM command name using the std::equal function with the no case compare lambda function.  Since the REM command name is still in the table as a QString, it is temporarily converted to standard string.  For a remark, the input position is set to beginning of the string of the remark, and the word string is then replaced with the rest of the characters on the line, from which the token is created and returned.

An issue was discovered with the parsing of define function tokens (identifiers that start with "FN").  A valid defined function name should start with a letter, but there was no check for a letter or even a check if there were any characters after the "FN" so identifiers like FN and FN1 were incorrectly accepted as defined function tokens.  Instead of rejecting these names as invalid defined function names, the decision was made to allow these names, and treat them as regular names (variables and arrays).

The get word support function was modified in the same way (by using a temporary input stream).  It also returned three values, the position after the word found, the data type of the word and whether the word has a parentheses.  Two of these were returned by passing references.  The position is not needed since it will be obtained from the input stream, however, a string for the word is needed because it is read from the stream.  A new Word structure was added to hold the word string, data type and parentheses flag, which is now returned.  An empty word string indicates no valid word found.

To simplify the handling of two word commands in the get identifier function, the check of the second to make sure that it does not have  a data type or parentheses was moved to the get word function.  If the second word does, an empty word string is returned and the input stream is repositioned back to the beginning of the word.  A word type argument was added with values first and second to enable this second word checking.

The get identifier function uses the two-word table search function, which was modified to take two standard strings.  The token constructor for identifiers was modified to take a standard string argument, which is temporarily converted to a c-style string to initialize the QString token member.  Several invalid defined function names were added to parser test #2 (identifiers) to verify these names are treated as plain identifiers (with and without parentheses).

[branch parser commit dbbd9fe054]