Interactive BASIC Compiler Project: November 2014

Sunday, November 30, 2014

Translator – Better Token Handling

A common pattern in the translator functions was a reference to a token pointer argument that a token was pass into and out of the function. The token returned was the next token that the function could not process (a terminating token). Several of the functions allowed an unset token pointer, in which case, they would get a token, otherwise they would use the token passed in. This was a strange pattern and somewhat difficult to work with.

The translator functions were modified to a better pattern where a member token pointer was added to the translator class to hold the current token. The translator functions were modified to use this current token member. The token is moved out of this member variable when successfully processed (consumed). The token pointer reference argument was removed from the translator functions.

The get token function was modified to put the token obtained from the parser into this new member, and the token argument was removed. If the current token member already has a token, then no action is taken.

The process command function when modified was reduced to only a few lines and since it was only called once, its code was moved to the get commands function. The command token and token arguments were removed from the LET, PRINT and INPUT translate functions, which were modified to get the current token from the translator. Several access functions for the current token were added to the translator including getting a constant reference to the token pointer member, reseting the token pointer member, and moving the token pointer out of the member. The latter two force a new token to be obtained upon the next get token function call.

The get operand function was modified to leave the current token pointer member empty if a valid operand was processed (added to the output list and pushed to the done stack). This forced the next call to the get token function to obtain a new token from the parser.

For sub-string assignments, the get operand function set the reference flag of the token if the token was a sub-string function (LEFT$, MID$, or RIGHT$) and a reference was requested. The process internal function function to identify a sub-string assignment and to request a string variable for the first argument of the function. Upon return, this reference flag was cleared. This reference flag toggling was replaced by passing the reference enumerator as an argument.

There was some code in the get operand function that set the code of the token to the Define Function with Parentheses code enumerator. These statements were moved to the process parentheses token where the similar statement reside for setting the Array and Function codes.

Some other minor changes were made including changing the RPN list token access function to return a reference to the token pointer instead of the token pointer itself (to prevent the copying of the token pointer and updating the shared pointer use counts), reorganizing some code in the various routines, renaming some local token variables, and updating comments for changes made.

[branch misc-cpp-stl commit 828f210f18]

Friday, November 28, 2014

Translator – Some Minor Improvements

The done stack pop error token function was used create an error token from the item on top of the done stack. An item on the done stack could consist of an entire expression so in addition to a token, it contained the first and last tokens of the expression, which would be null if the item only contained a single token.

There is no need to create an error token since exceptions are now thrown for errors, which do not contain tokens. All callers to this function only used the error token to create a token error (status with the column and length from the error token). This function was modified to return a token error and was more appropriately renamed to the done stack top token error function. There is also no need to pop the done item from the stack since when an error is thrown, the translator instance goes out of scope and all members are deleted including the done stack.

Several other minor changes were also made including removing the token set through function since it was no longer used, replacing the uses of the Qt forever macros with a plain empty for (;;), renaming the expected error status function (had "Error" abbreviated as "Err"), adding the C++ noexcept keyword to translator functions that do not through an exception, and updating a few comments missed for changes made earlier.

[branch misc-cpp-stl commit 625195f18d]

Thursday, November 27, 2014

Translator Improvements – Getting Tokens

While modifying the translator routines to throw exceptions, it was noticed that some minor improvements could be made. The first of these is with calls to the get token function. There was a similar pattern to most of the calls to this function, where the call was in a try block and for caught errors, the error was set to an appropriate error.

An error status argument was added to the get token function. The caller puts its desired error status to be returned when the parser returns an Unknown Token. With this change, the caller no longer needs to catch errors from the get token function call. Several of the callers need the parser error to be thrown as is, so if the error status argument is a null status, the parser error is thrown as is. The first status enumerator was assigned to a value of one so that a null status is not one of the existing enumerators.

[branch misc-cpp-stl commit 950d3cf9ed]

Translator Exceptions – Expressions

The get expression function was modified to throw errors. The return type was removed since a no exception return now indicates success. The temporary error tokens for errors no longer need to be created. The Done status that was previously returned for success was no longer used, so this status enumerator was removed.

All callers of the get expression function were modified for an error being thrown, which for the most part meant catching errors instead of looking for an error return status. However, there was an issue with how errors were processed when an unexpected unary operator token appeared.

When an unexpected unary operator appeared, get expression returned an "expected binary operator or end-of-statement" error along with the unary operator token. When another error was appropriate, the caller essentially ignored this error when it a unary operator, and threw the appropriate error. Now that get expression throws an error, there is no token returned, so callers cannot look for a unary operator token. The callers were were modified to look for this error and then throw the appropriate error.

[branch misc-cpp-stl commit 697a0d25d8]

Wednesday, November 26, 2014

Translator Exceptions – Tokens

The get token function used to get the next token from the parser was modified to throw errors. The return type was removed since a no exception return now indicates success. The temporary error token for an error no longer needs to be created. The Good status that was previously returned for success was no longer used, so this status enumerator was removed.

All the callers of the get token function were modified for errors being thrown, which mostly included catching the error and throwing the appropriate error for the Unknown Token error. The get operand function just passes the error on to its caller (In this case, in addition to an Unknown Token error, a number constant error could be returned for non-reference token requests).

The get expression function was modified to catch the error from the get token function, obtain the error status and create an error token. Th error token won't be necessary once this function is modified to throw errors (which is now the only function remaining now modified yet to throw errors).

In the LET translate function where the token after a reference token is not a comma or equal character, the token on top of the done stack was checked to see if it has the sub-string flag set, and if is does this item (a sub-string assignment function) is popped. This is not necessary since once an error is thrown out of the translator, the translator instance is deleted along with the done stack.

[branch misc-cpp-stl commit 217de6d02c]

Tuesday, November 25, 2014

Translator Exceptions – Operators

The process operator function called by the get expression function was modified to throw errors. This function returned Done status for a token that is not an operator, Good status for a successfully processed unary or binary operator, or an error status. Since errors on now thrown, the return type was changed to a boolean, true for success and false for not an operator.

The process done stack top function is used to pop an item from the done stack and add a conversion code to the output if needed or report an error if the item can't be converted. This function used by the process final operand, INPUT translate, and LET translate functions was modified to throw the error and its return type was removed. These callers were modified to just call the function and any error thrown will the passed to their caller.

Similarly, the process operator function was changed to just call the process final operand function, and its error is not caught passing any error thrown up to its caller, which is the get expression function.

The process first operand function was called to process the first operand of a binary operator, and push the unary or binary operator to the hold stack. At first it was also modified to throw errors (again from the process done stack top function), but this function ended up being only a few lines, and since is only called by the process operator function, its code was just moved into the process operator function eliminating the separate function.

The get expression was temporarily modified to catch errors from the process operator function where the status is obtained from the thrown error, and an error token is created from the error column and length. The error status and the token is returned as before. This won't be necessary once get expression is modified to throw errors.

[branch misc-cpp-stl commit cf45a8e393]

Sunday, November 23, 2014

Translator Exceptions – Parentheses Tokens

The process parentheses token function used to translate tokens with parentheses token (arrays and user functions) for the get operand function was modified to throw errors. This function only returned a Good status or an error status, so a return value was no longer needed. Like the process internal function function, the get expression call was changed to throw an error if an error status is return, which is immediately caught. This simulates the get expression function throwing an error, which has not been modified yet to throw errors.

Unrelated to this function, the comments for the other functions modified were updated to reflect the changes made so far, which was missed when these functions were changed. The new expression error status function added for the process internal function function was moved to the support functions section of the source file.

[branch misc-cpp-stl commit 4f03c8bd77]

Saturday, November 22, 2014

Translator Exceptions – Internal Functions

The process internal function function used to translate internal function tokens for the get operand function was modified to throw errors. Some of the changes made in the failed attempt to add exceptions throughout the translator routines around the get operand call used for sub-string assignments ended up in the code during the last commit. This did not affect the functionality (since the regression tests had passed, it wasn't noticed).

The expression error status private helper function was added to determine the error status depending on whether at the last operand of the internal function, if the internal function has multiple arguments and if the bad token is a unary operator. This function is called in three locations.

Since all errors are thrown, there was no longer a need for a return value. The unary operator local variable was no longer needed and was removed. The status and expected data type local variable declarations were moved closer to where they are used. The status variable can be removed once the get expression function is modified to throw errors.

[branch misc-cpp-stl commit 640255514d]

Translator Exceptions – Operands

The get operand function used to get an operand was modified to throw errors. The command routines either returned an error or a Done status, so once they were modified to throw errors, there was no need to return anything. In addition to errors, the get operand function returned a Good status representing successfully getting an operand or a Done status representing no operand token (an operator or command).

The return value of the get operand function was changed to a boolean where true represents successfully getting an operand and false representing no operand. When a reference operand is requested this function will not return a false status, it will just throw an error. So for the process internal function, LET translate, and INPUT translate routines, it is not necessary to test the return value since these only request reference operands.

There were three locations in the get operand function where errors were returned that popped the token on top of the hold stack before returning. This was intended to prevent memory leaks. This wasn't necessary since the translator cleans up the stack when the translator goes out of scope, so these pop calls were removed.

One of the errors for define function identifiers with parentheses tokens added the length of the token to the column to point the error to the parentheses of the token. The token class add length to column access function was used for this. Instead of using this access function (the only caller), the token length is added to the column when the token error instance is created and thrown. This access function was removed.

[branch misc-cpp-stl commit 640255514d]

Translator Exceptions – Commands

The routines that process commands were the next modified to throw exceptions. This included the get commands routine used to get command statements separated by colons. For now, if the two calls to the get token function return an error (not good status), the appropriate error is thrown. The get token function will also be modified to throw errors soon. Since get commands now throws errors, it no longer needs to return a status so the return type was removed (along with the local status variable).

The process command routine handles the processing of one command and was modified to throw exceptions. It calls a specific translate function for a command or the LET translate function if the first token is not a command. The LET, PRINT and INPUT translate functions were also modified to throw exceptions. The status return type was also removed and if statement surrounding the call to process command in get commands was removed (which will cause thrown exceptions to be thrown up its caller - the translator function operator function).

In the translator function operator function, a status is no longer returned from the get commands function, so assignment of the local status variable was removed. And thrown exceptions are passed up to the caller of the function operator function (which will catch the errors). For a successful return, the local status variable is set to Done. This is temporary until the get expression function is also modified to throw exceptions.

The LET, PRINT and INPUT translate functions were restructured a bit. Local variable declarations were moved to where the variables are first used. Some of the error checking if statements were rearranged to ease the handling of errors (when the lower functions are modified to throw exceptions). These changes came from the failed total translator exception changes. The checks for the end-of-statement were also moved to a more logical place in each of these routines.

[branch misc-cpp-stl commit 7217ce58c6]

Friday, November 21, 2014

Translator Exceptions – Top-Level

Several convenience functions were added to the Token Error structure to make it syntactically easier to use. These include a function operator function with no arguments for returning the status of the error, a function operator function with a status argument for checking if the error status is the passed status, and an assignment operator function taking a status value to assign to the error status.

The translator function operator function was modified to throw errors instead of setting a local status variable, which is then used to throw the error at the end of the function. Eventually the get expressions and get commands functions will throw exceptions for errors and the local status variable won't be necessary.

The try block in the tester parse input function was reformatted where the try-catch blocks were moved outside of the forever loop to the function block. The return statement in the catch block could then be removed.

[branch misc-cpp-stl commit c639347b07]

Translator Exceptions – Development Strategy

The next incremental change was initially difficult to see for adding exception throws throughout the translator routines. Starting at the bottom by changing the get token function to throw exceptions was problematic because all callers would have to catch the errors and create an error token to hold the column and length of the error. Starting at the top by changing both the get expressions (used for testing) and get commands functions to throw exceptions was problematic because all functions in between would also need to be modified. An attempt to change the entire translator was made.

After the code was modified and corrected for compile errors, the tests were run, but there were many problems, which was not unexpected considering the large number of changes made. Instead of trying to debug, the changes were committed to a temporary work branch (though not pushed to the official GitHub repository). This is a scheme to use git to temporary save work:

git checkout -b work           (create a temporary work branch)
git commit -a            (stages all changed files and commits them)
git checkout -b misc-cpp-stl   (restore all files before changes)

This saved the changes of the failed attempt and restored the original files. Changes from the work branch could now be transferred to the working directory piecemeal. For example, several changes were made to the Token Error structure in the token header file (details in the next post). For more details on using this scheme, click Continue...

Continued... »

Sunday, November 16, 2014

Token Errors (Minor Refactoring)

The plan is to modify the translator routines to throw error exceptions when an error is detected. Errors consist of a status, column and length. For translator errors, the column and length will always be obtained from a token, so it made sense to add a constructor that takes status and token pointer arguments.

The Error structure was a plain structure with no constructors, so a default constructor is generated by the compiler taking arguments for the three member variables. Once a constructor is added, this default constructor is no longer generated, so one was added. The constructor taking status and token pointer arguments was also added. Since the structure now has constructors, the member variables were renamed with the member "m_" prefix.

The Error structure was defined in the main header file. This header file does not have access to the token header so that the new constructor can retrieve the column and length from the token, and an include couldn't be added for the token header because the token header already includes this main header file. The Error structure was therefore moved to the token header file. The name Error was a little generic so this structure was renamed to the more appropriate Token Error.

[branch misc-cpp-stl commit 97ce3592c3]

Translator – Exceptions (Top-Level)

The goal of the next step is to modify the translator to throw errors instead of returning an RPN list containing an error. This will be done in steps starting at the top-level of the translator in the function operator function with a couple of other minor improvements to the code.

The first minor improvement was made to the Error Item class that holds information about an error in the program (includes type, line number, column, length and error status). The is empty function that checks if the error type is none (no error) was replaced with the explicit operator bool function, which allows an error item to be checked for an error without a named function (this function is called when a boolean is expected and an error item instance is provided):

if (!errorItem.isEmpty()) → if (errorItem)

At the end of the translator function operator function, if the status is not Done, the RPN list was cleared and its error member variables were set (column, length and status). This was changed to throw an Error structure with the status, token column and token length. The RPN list no longer needs to be cleared since it will be cleared when the temporary translator instance goes out of scope.

The tester translate input routine was modified to catch an error in a try block. This routine no longer needs to check if RPN list has an error, and if it did, retrieve the error information to create a local error instance and print the error. Now upon a caught error, it simply prints the error. It also returns an empty RPN list that the caller uses to detect an error.

The program model update line routine was modified to catch an error in a try block. For an error, the local error item is set to an error with the info in the thrown error. If there is no error then the error item will not contain an error (the default constructor sets the error type to none).

To detect a change, the update line routine compared the new RPN list to the current RPN list decoded from the program. The lists were not equal if either contained an error. The current RPN list is empty for a line with an error, but the decode routine does not set the error variables. Now that the RPN list does not have an error status, so the error item also needs to be checked when checking if the line changed. The RPN error variables are no longer used and were removed along with their access functions. The check for errors in the RPN list equality operator function was also removed.

The final improvement made was to changed the RPN list argument of the encode routine from a constant reference to an rvalue reference. All calls to the encode routine no longer needs the RPN list after the call, so it can be moved. Since the arguments of these calls were not a temporary instance, the local RPN list variable needs to moved using std::move.

[branch misc-cpp-stl commit 52359711e7]

Translator – Function Operator

The Translator class has a single purpose, to task a BASIC input string and create an RPN list representation of the BASIC code. This is similar to the Parser, Tester, and Recreator classes, which were already changed to be function operator classes. This started with renaming the translate function to operator().

Like the recreator where a temporary instance is used, the translator can be used in the same way except that the input string is passed to the constructor so that it can be used to instance the parser (which lives throughout the translation). The parser no longer needs to be instanced at the beginning and reset before returning in the function operator function. Since only a temporary instance is needed to translate, the translator member pointers in the tester and program model classes were removed.

The clean up function was called when the translator returned an error, which deleted the hold and done stack items, cleared the RPN output list and reset the pending parentheses token pointer. Now at the end of a translation, including when an error is detected, the temporary translator instance goes out of scope, all all these actions occur automatically except for the clearing of the output list (since it is moved to the caller upon return). This clean up function was removed. For an error, the output list is cleared.

[branch misc-cpp-stl commit b597b5b0a7]

Saturday, November 15, 2014

Program Model – Standard Strings

To complete the transition of the program model to the STL, all of the strings were changed to standard strings. This included the return value of the line text function, the text member of the line info structure, and an argument of the update line function (where an rvalue reference was used since the caller no longer needs the string passed so it is moved to the function).

The QStringList argument of the update function was changed to a std::vector<std::string> (the STL has no string list class) rvalue reference type (the caller no longer needs the vector passed so it can be moved to the update function). The update function was defined as a public slot, but was no longer being used as a slot, so it was changed to a regular public function.

All callers to the modified functions were modified accordingly. This included the translate function of the translator where its input argument was already being converted to a standard string to pass to the parser constructor. There were four functions that put function pointers into a temporary variable. These were changed to if-statement scoped variables using the auto type for convenience. This concludes work on the program model.

[branch misc-cpp-stl commit 2f7a0bb119]

Program Model – Line Info List

The program model contains a list of information for each line in the program code vector including the offset into the code vector, the size of the line, and if the line has an error, the error index for the line and its original text. This member list is an instance of the Line Info List class also defined in the Program Model class.

The Line Info List class is another class that derived from a container class (QList). This class was changed to instead contain the standard vector as a member (QList is implemented as an array internally similar to a vector). The existing adjust (private), replace, insert and remove at functions were modified accordingly. The remove at function was renamed to erase for consistency with STL naming conventions.

Necessary vector access functions were added to the Line Info List class including bracket operator for element access, constant bracket operator for constant element access, size of vector, and clear vector. Since the bracket operator is used frequently to access the offset and size line info structure member variables, specific offset and size element access functions were added taking a line number argument.

The STL convention is to use size to represent the number of items compared to the Qt convention is use count. Many of the variables and functions were naming using the Qt convention. The variables in the program model functions were changed to the STL size name. The program model also contained two temporary access functions (line offset and line size) that were not being used and were removed.

[branch misc-cpp-stl commit fd24923d76]

Program Code – Standard Vector

The Program Code class holds the code of the program in a vector. The vector consists of program words. This class was derived from the QVector, which exposes all of base functions. This not the best object-oriented design practice, and was redesigned to instead contain a standard vector member. The program words are defined by the Program Word class, which consists of various access functions to access the code, sub-code and operand components of program word.

Several functions are needed to access the member vector including the begin iterator, end iterator, empty status, size, clear vector, element access bracket operator ([]) and the emplace back functions. These functions simply pass through to the code vector member. The main program code insert line, remove line, and replace line access functions were modified to use standard vector functions (insert and erase or using the direct standard copy) to manipulate the code vector instead of the raw memory move function previously used.

Using the vector functions simplified the code, but since there is no function to replace part of the vector with a different size part, the replace line still contains various checks and operations depending on size of the old and new lines. For an empty new line, the remove line function is called as before. For a same size or larger new line, the part of the new line that will fit in the space of old line is copied. For a larger new line, the rest of the new line is then inserted. For a smaller new line, the new line is copied, and the remainder of the old line is erased.

The debug text, dereference and decode functions of the program model previously obtained a pointer to the first word of the program line by adding its offset to the the raw data access function. These were changed to obtain a vector iterator to the line by adding the line offset to the begin iterator of the program code vector. The C++11 auto type was used to define these iterators, which is simpler then using std::vector<ProgramWord>::const_iterator or the ProgramCode::const_iterator alias added to the program code class.

The encode function of the program model previously created a sized vector for the new line (using a constructor taking a size argument), and each word was set to the instruction or operand information from the RPN list. This was replaced by starting with an empty vector and using the emplace back template function for each instruction and operand word. The set instruction and set operand functions of the program word class were replaced with corresponding constructors needed for the emplace back function. The size constructor was removed.

The Program Word and Program Code classes were moved to a new header file since the program model header file. The insert line, remove line, and replace line functions were kept as in-line functions in the class definition. Each of these functions are only called once, so in-lining is appropriate. This header file did not show up in the project file list in QtCreator (the project built fine) because there was no associated source file. A new headers variable was added to the CMake build file with this new header file and the main program target was made dependent on this variable so that QtCreator knows about this new header file.

[branch misc-cpp-stl commit c7080421cb]

Thursday, November 13, 2014

Program Model – Debug Text

There is a debug text function in the Program Model class used for the temporary program view in the GUI and by the tester class. This function used two functions from the Program Word class, the instruction debug text (for instruction words) and operand debug text (for operand words).

The debug text function was modified to return a standard string. To implement the building of the string, an standard output string stream is used, which make it easy to build up the string especially for numeric data types (the QString::arg function was being used for this purpose). Once done, the string of the output stream is returned.

Since an output stream is being used, it made sense to make the instruction debug text function an overloaded output stream operator. This function only used access functions of the program word, so it didn't need to be a member of the program class (or a friend function).

The operand debug text function only contained a single line and also used a program word access function, so it also didn't need to be a member of the program word class. It was passed the text to output with the operand integer value, and was only used by the debug text function, so it made sense to remove this function and just do the functionality directly in the debug text function.

[branch misc-cpp-stl commit 645eff8d1f]

Tester – Function Operator/Exceptions

The Tester class is another one-use class that fits the pattern of the function operator class. The main run function was changed to function operator function. The caller in the command line constructor was modified accordingly with the instance renamed from tester to test as was done with the parser instances.

The Tester class also had an error mechanism where its error message member was set if an error occurred. Both the constructor and the run function can generate an error. The has error access function returned if an error occurred, and the error message access function returned the error message. These functions were modified to throw an exception containing the error message (a standard string). The function operator function (formerly run) no longer needs to return success status as a boolean. This simplified the command line constructor since errors from both functions are caught with the same section of code. The error message member and its access functions were removed.

The redundant void was also removed from tester function definitions that don't have arguments. This was a practice I used when working with C code where the void in the arguments of a function definition indicates no arguments, as opposed to an empty parentheses, which could also indicate the old Kernighan and Ritchie (K&R) style function definition, which preceded the typed function definitions introduced with the first ANSI C standard. This void usage is used throughout and will slowly be removed as there is no reason to use it anymore.

[branch misc-cpp-stl commit 738eba02e1]

Wednesday, November 12, 2014

Dictionary Information – Standard Classes

When the dictionary classes were changed to use the standard classes, the derived constant number and string information classes were missed because they were in different source files. For adding elements, these classes also used the method of increasing the vector size by one and then setting the new element. This causes the default constructor to be called when the size is increased, and then a copy when the element is set. It is more efficient to use the C++11 emplace functions to construct directly to the new element.

The info dictionary add function previously did the add element (increase vector size) and set element as two separate calls. The add function was changed to pass the token pointer to the add element function and not call the set element when adding an element. The add element functions get a token pointer argument to be used with the emplace back function to add the new element to the end of the value vectors (which were changed to standard vectors). The value vector of the constant string info class was changed to a vector of standard string pointers. The constant string info destructor was changed to use a C++11 range-for loop to delete the strings in this vector.

When dependency on Qt was removed from the dictionary classes, the quint16 type was changed to the standard uint16_t for holding indexes into the dictionaries (in the program code). This change was also missed in the constant info classes. The compiler didn't complain since these two types are the same under the hood. The constant info classes were changed to use uint16_t.

[branch misc-cpp-stl commit 14d3b329b7]

More Miscellaneous C++ and STL Changes

The Table class is going to be completely redesigned to better utilize C++. Before starting this major development, there are several more miscellaneous C++ and STL changes to be made to some of the other classes. The changes are expected to be minor, so development will be done on a single topic branch (to avoid short lived branches). These changes include:

Modifying the dictionary information classes to use STL.
Modifying the Tester class to be a function operator class, and use exceptions for error reporting.
Modifying the Program Code class to use STL, be in its own header file, and not use a container class as its base class.
Modifying the Program Model class to use STL.
Modifying the Translator class to fully use STL, be a function operator class, and use exceptions for error reporting.
Making other minor miscellaneous changes that come up along the way.

Tuesday, November 11, 2014

Recreator – Function Operator

The Recreator class has a single purpose, to take an RPN list and recreate the original input string of the BASIC code. This is similar to the Parser class, which was already changed to be a function operator class (see October 18). The recreator was also changed to be a function operator class. All that was required was to rename the recreate function to operator().

Unlike the parser that is instanced with the input string and tokens are repeatedly obtained using the function operator function until either an error (exception) occurs or an end-of-line token is returned, the recreator is single use and its function operator function will only be called once for a given RPN list. Therefore, only a temporary instance is needed - no member or local instance is required.

The users of the recreator, the tester and program model classes, previously contained a pointer to a recreator instance, which was created by their constructors. The recreator instance was deleted automatically when these class instances were deleted because a standard unique pointer was used. Since the recreator is single use, only a local instance is required and these members were removed. Instead of creating a local variable instance, a temporary instance is used (the instance is created and deleted during the statement):

Recreator recreator; → string = Recreator{}(rpnList);
string = recreator(rpnList);

Since this is the last change to be made to the recreator class, the redundant 'void' keywords were removed from several of the recreator function definitions, and the use of Q_UNUSED macro was replaced with the '(void)' syntax on unused variables in the various recreate functions. This concludes work on the recreator class, so the recreator branch was merged to the develop branch and deleted.

[branch recreator commit a0e99f6dcf]

[branch develop merge commit dc4dd26876]

Monday, November 10, 2014

Recreator – Standard Strings

The strings in the recreator stack item and several local stacks were changed to standard strings along with the recreator output string. The QString::append function was previously used to append to these strings. The std::string class also has an append function but it does not support a single character argument (though does support a count with the single character). Use of append function was replaced with the addition assignment operator (+=).

The append and top append functions of the recreator are still used by the various external recreate functions. Unlike with QString where a plain character is implicitly converted to a QChar, which then implicitly converted to a QString, the std::string class does not have similar functionality. For the same functionality, additional append and top append functions were added that take a plain character argument. All of the append functions use the addition assignment to append to the strings.

The append and top append functions, along with the pop with operands function, were changed to take a string rvalue reference argument (std::string &&). In most cases, the argument given to these functions is a temporary value, which using this type moves the temporary value to the function instead of copying the value. In other cases, the variable passed was no longer needed and was going out of scope, so the variable is passed via the std::move function. The advantage with using an rvalue reference argument is that it requires a temporary, so an error is given when the argument not a temporary value or the variable is not moved.

There are several cases in the recreator functions where values are obtained from the table. Since the table functions are still returning QString values, a call to the toStdString function was added.

The constant string recreate function used the QString::replace function that substitutes all instances of a particular character with a string. This function was used to convert all of the double quote characters in the constant to two double quote characters. There is no equivalent easy to use standard function to do the same thing. A range-for statement was added to iterate though the string making a copy into a local string. For each double quote character added, a second double quote is added.

The recreator contained an output is empty access function, for determining if the output string was empty, and a output last character function, for returning the last character added to the output string. These functions were only used by the remark recreate function to check if the last character added to a non-empty output was not a space (to see is a space should be added before the REM operator). These functions were replaced with the single back is not space function.

[branch recreator commit 56b40e3411]

Sunday, November 9, 2014

Recreator – Standard Stacks

The stack member was changed to a standard stack. Two local stacks were also changed to standard stacks (in the assign recreate and push with operands functions). The stack access functions were modified accordingly:

The push function was replaced with the emplace function defined as a template member function to forward the arguments to the stack emplace function (see details below).
The pop function was replaced with the pop string function with basically does the same thing except is not longer checks if the stack is empty (which was only needed during initial development of the recreator).
The top function was replaced with individual functions for returning the precedence and unary operator flag members of the top stack item. No similar function was added for the string member as there are other functions for accessing the string member: pop string, top append, and new top add parentheses.
A new top add parentheses function was added to add parentheses around the string of the top stack item. This function is only called by the parentheses recreate function.
The precedence and unary operator pointer arguments were removed from the pop with parentheses function since they were only used by the parentheses recreate function (which now has its own access function).
The stack is empty function was renamed to just empty to be consistent with the other stack access functions (which don't have the stack name included) and with the STL naming convention.

The previous push function resized the stack up by one element and set the members of the new item on top of the stack. This caused the default constructor to be called for the new item and the values are then copied to the item. The STL emplace functions allow constructions of the new item to be called directly eliminating the extra copy (one to the function arguments, and another to the item). To allow the same functionality, the new emplace stack access function was defined as a variadic template member function:

template <typename... Args>
void emplace(Args&&... args) {
m_stack.emplace(std::forward<Args>(args)...);
}

This utilizes the new C++11 perfect forwarding feature, allowing a function template to pass its arguments through to another function and avoids unnecessary copying. What the above template does is pass the stack item constructor arguments to the stack emplace function (using the std::forward template function), which are then passed to the stack item constructor when the new item is created on the stack, without copying the arguments.

To use the emplace stack function, a constructor was added to the stack item structure. The constructor has arguments for each of the three member variables. The precedence and unary operator arguments were made optional with defaults (from the replaced push function). The member variables of the stack item were also renamed with the "m_" prefix.

[branch recreator commit 5451552470]

Saturday, November 8, 2014

Recreator – Separator Member

Since there were a number standard string to c-style string conversions to convert to a QString in the recreator functions, the Recreator class is next to be transitioned to the STL. This was started by changing the separator member from QChar to char along with its access functions. An initializer for the separator member was added since the plain char type does not have a default constructor like the QChar class.

Two uses of the separator access function that required special handling. These were in the input assign recreate and assign string recreate functions. Previously the separator was added to other literal character constants. Now that the separator is the plain char type, the compiler will treat the plus operator as addition instead of concatenation. The separator is temporarily converted to a QChar so that the plus gets compiled as string concatenation instead of character addition.

[branch recreator commit 3b5b789198]

Token – Standard String Member

It turned out that changing the token string member to a standard string did not require a lot of other changes. The string access functions were modified and the c_str function call was removed from the constructor initializers. The equality operator function was modified to use straight equality operator for case sensitivity comparison (REM, REM operator and string constants) and the new no case string equal function for case insensitive comparison.

In the rest of the code, the c_str function call was removed where a standard string was present, and the toStdString function call was removed from the token string access function where the result is put into a standard string or output in a standard output stream. In several places in the recreator functions, a c_str function call was added where a QString is needed (the c-style string returned is implicitly converted). Finally, one line in the translator using the QString::startsWith function with the case insensitive option was changed to convert the first character to upper case before comparing it (only the first character was being compared).

[branch token commit ee7b926daa]

Since there is no more work for the token class, the short-lived token branch was merged to the develop branch.

[branch develop merge commit ee12854e36]

Utility – Case Insensitive Comparisons

The token equal operator function uses case insensitive string comparisons that need to be replaced with standard string equivalents. There is no direct equivalent, so the solution used so far was to use the std::equal function passing a case insensitive character comparison lambda function. Since the std::equal function assumes equal length containers, the size of the strings need to be compared first to make sure the strings are the same length (or at least the primary string is not less than the secondary string when doing a string begins with comparison).

This pattern of comparing the string lengths before comparing strings is repeating, therefore two in line functions were added to do this. The first, named no case string equal, checks if the string are equal and the second, named no case string begins with, checks if the length of the primary string is greater than or equal to the string being compared. The lambda function was renamed to no case character equal for consistency. The lambda function was moved from the main header file to the utility header file along with the two new in line functions.

[branch token commit 1dde64fb7b]

Parser – Standard Input Stream

The parser functions have been modified to use a standard input string stream, so the input member could now be changed to the std::istringstream class, and the input position member removed. The current input position can be obtained directly from the input stream using the tellg member function. The skip white space function was removed since this can be done directly on the input stream (in other words, extract white space):

m_input >> std::ws;

In several places, a temporary position or length integer variable is used to get the current position (tellg) or length (length) because these functions return a pos_type and size_type values, which are 64-bit integers. The token constructors and error structure only accept integers (32-bit). There is no reason to change the member variable types to 64-bit integers as there will never be input lines or strings that are long enough to require 64-bit integers.

When using an input stream, care must be taken when using the tellg function to obtain the current input position. This function returns an EOF value (-1) once the input stream has been read past the end. So this function can't be used is a previous operation could have possibly read past the end.

There times when the input position must be reset (like when the second word of a possibly two-word command is not valid). The seekg function is used to the input position. However, this function does not work once the input stream has been read past the end. This is because the EOF flag is set. To clear this condition, the clear function needs to be called, so this call precedes all seekg function calls except one where an EOF cannot have occurred.

In the get string function, the characters read are counted so that the length of the string in the input is known when the token is constructed and returned (pairs of double quotes count as one character in the string, but take two characters in the input string, so must be counted as two). The ending input position cannot be used to determine the length in the input string because the position is not valid if the string constant is at the end of the line (see issue with tellg function above).

The constructor of the parser was changed to take a standard string input. Both callers were modified accordingly - the tester class already had a standard string, but the translator needs to convert from its QString to a standard string (until the translator is modified). Dependency on Qt has almost been removed from the parser except for one call to obtain the name for the REM command (which will be handled when the table is modified).

[branch parser commit 8e71a71fd5]

One outstanding item remains - the token string member is still a QString though its constructors have been modified to take standard string arguments. This will not a trivial change since many users of the token string still expect a QString. Therefore, this work will take place in a new development branch. This concludes work on the parser, so the parser branch was merged to the develop branch and deleted.

[branch develop merge commit 2cafb22a8e]

Thursday, November 6, 2014

Parsing Identifiers – Standard Library

The get identifier function was changed to use a standard input stream (again using a temporary input string stream like the previous functions). This function used scan word support function to look for a word and was renamed more appropriately to get word. Instead of checking for a REM command first (because unlike other commands, a space is not required after the command), the get word function is called first to get a word. If no valid word is found, an empty token pointer is returned.

The first check on a valid word is if the word starts with letters in the REM command name using the std::equal function with the no case compare lambda function. Since the REM command name is still in the table as a QString, it is temporarily converted to standard string. For a remark, the input position is set to beginning of the string of the remark, and the word string is then replaced with the rest of the characters on the line, from which the token is created and returned.

An issue was discovered with the parsing of define function tokens (identifiers that start with "FN"). A valid defined function name should start with a letter, but there was no check for a letter or even a check if there were any characters after the "FN" so identifiers like FN and FN1 were incorrectly accepted as defined function tokens. Instead of rejecting these names as invalid defined function names, the decision was made to allow these names, and treat them as regular names (variables and arrays).

The get word support function was modified in the same way (by using a temporary input stream). It also returned three values, the position after the word found, the data type of the word and whether the word has a parentheses. Two of these were returned by passing references. The position is not needed since it will be obtained from the input stream, however, a string for the word is needed because it is read from the stream. A new Word structure was added to hold the word string, data type and parentheses flag, which is now returned. An empty word string indicates no valid word found.

To simplify the handling of two word commands in the get identifier function, the check of the second to make sure that it does not have a data type or parentheses was moved to the get word function. If the second word does, an empty word string is returned and the input stream is repositioned back to the beginning of the word. A word type argument was added with values first and second to enable this second word checking.

The get identifier function uses the two-word table search function, which was modified to take two standard strings. The token constructor for identifiers was modified to take a standard string argument, which is temporarily converted to a c-style string to initialize the QString token member. Several invalid defined function names were added to parser test #2 (identifiers) to verify these names are treated as plain identifiers (with and without parentheses).

[branch parser commit dbbd9fe054]

Sunday, November 2, 2014

Parsing Operators – Standard Library

The get operator function was changed to use a standard input stream (again using a temporary input string stream like the two previous functions). This function uses one of the table search functions, which was modified to take a standard string.

The table search function used the compare function from the QString class with the case insensitive option. There is no equivalent function in the standard string class. The std::equal function is used instead by passing a no case comparison lambda function. This is the same lambda function used in the Tester class, so this definition was moved to the main header file. Since the name in the table is still a QString, it is temporarily converted to a standard string. The std::equal function assumes the arguments are the same size, so the size of the strings are checked first.

The token constructor for codes was changed to take a standard string, which defaults to an empty string. For now, these are converted for the QString member variable by obtaining a c-style string from the standard string, which is implicitly converted. The only caller of this constructor using this argument is the new token table function, which was also modified to take a standard string. Callers of this function using the string argument were modified to pass a standard string.

[branch parser commit 27f06e8714]

Parsing Strings – Standard Library

The get string function was changed to use a standard input stream (temporarily putting the input string from the current position into a temporary local input string stream of the same name as the member variable to simulate the final parser code). The looking at and the obtaining of current character was changed as previously described.

Instead of incrementing the local position variable for each character in the string constant, this variable is just set to the current input position. This will be changed to get the position within the input stream stream once the member variable is changed. The current input position is incremented for each character. After the change, the current input position member variable will not be needed.

[branch parser commit ab3b1c08f18]

Parsing Numbers – Standard Library

When the parser is changed to use the standard library, instead of placing the input string into a string member variable, it will put into a standard input string stream from which the characters will be pulled from. A position into the input string will not need to be maintained during the processing of the line.

The get number function was the first to be changed to this model. Temporarily, the input string from the current position (a substring) is transferred into a temporary local input string stream of the same name as the member variable to simulate the final parser code. Two failed attempts were made to use standard library functions to parse and read numbers.

The first attempt used the stoi function to convert the number directly. The problem was that it doesn't report the specifics of the error when the conversion fails, throwing only a invalid argument or out-of-range exception. The type of error could be determined by a series of complex checks of the string. A working solution was mostly achieved with one remaining issue. When an out-of-range exception was thrown, there is no clue as to the length of the string that was processed (which is needed to properly highlight the error).

The second attempt used the extraction operator (>>) to get the number directly into a double or integer variable. Again detecting an error and determining the type of error was difficult (and was not actually achieved when this attempt was abandoned). The code again was also complicated.

These attempts were made to try to eliminate the involved (but working) algorithm already in place to parse numbers. The decision was made to use the current algorithm, and was modified to read from a standard input stream. The reading of the current character was changed to:

m_input[pos] → m_input.peek()

The original code incremented a local position when a character had been processed (will become part of the number string). This position increment was replaced with pulling a character from the input stream and appending it to the local number string:

pos++; → number.push_back(m_input.get());

The various character type tests (to upper, is digit) were changed from the QChar functions to the standard ctype tests. Once a possible valid number was parsed into the local string, if itt didn't contain a decimal point or exponent, an attempt is made to convert it to an integer using the stoi function. If successful, an integer number token is created from the local string and returned. If an out-of-range exception is thrown or had a decimal point or exponent, an attempt is made to convert it to a double using the stod function. If successful, a double number token is created from the local string and returned. Another out-of-range exception results in a floating point out of range exception being thrown.

The integer and double token constructors were modified to take standard string arguments. For now, these are converted for the QString member variable by obtaining a c-style string from the standard string, which is implicitly converted. This is temporary until the token string member is changed to a standard string.

[branch parser commit 8513875e33]

Saturday, November 1, 2014

Parser – Number Error Corrections

Qt functions are currently being used to convert strings of numbers to a double or an integer in the get number routine. This routine will be changed to use STL functions. While investigating this, a few problems were discovered with how some of the number errors were being reported.

The "expected sign or digits for exponent in floating point constant" error was being reported even when the exponent sign was present. A new "expected digits for exponent in floating point constant" error for this situation. When an incorrectly formed number contained a single decimal point followed by the start of an exponent ('E'), the "expected digits in mantissa of floating point constant" error only pointed to the decimal point. The error was changed to point to both the decimal point and the 'E' character.

The translator was not reporting the "expected command" error correctly when there was a number error - the error was pointing to the number error which was either not at the beginning of the command or its length was not one. This occurred because the number error was not correctly reported as an unexpected token error when a reference was request (at the beginning of a statement).

This was corrected by adding a reference argument to the get token translator routine with a default of None. Only when this argument is None are number tokens allowed. When an unknown token error is returned from the parser, the reference argument is used to generate the appropriate expected error status. For the first token obtained from the get commands translator routine, this argument was set to All, which prevents number tokens (an unexpected token error is return for all number including number errors).

The get operands translator routine was modified to pass its reference argument directly to the get token call. Since get token now generates the appropriate error for references, this routine no longer needs to intercept the error to return the appropriate error or set the error length to one for references. The status is simply returned when the status is not Good. The LET translate routine handles reporting errors when neither a command nor a reference starts a line. The section handling errors was structured poorly and was rewritten.

Certain types of errors were reported differently as a result of these changes. Previously, the error for an incorrect statement like 34=A was reported as an "expected item for assignment" error pointing to the 34. Now the "expected command" error is reported pointing to only the first character of the number. Both errors are technically correct, and it would be difficult to report the previous error. The error was changed to "expected command or item for assignment" since both are applicable at the beginning of a statement.

The expected results for parser test #3 (numbers), translator tests #1 (assignments), #3 (more assignments), and #14 (parser errors) were updated for these changes. Some addition tests were added to translator test #14 for the new expecting digits for exponent error. Many of the translator tests results were also updated for the expected command message change.

[branch parser commit 20e46cc617]