Interactive BASIC Compiler Project

Wednesday, February 11, 2015

Dictionary – Proper Base Class

The Dictionary class was not designed to be an abstract base class in a dictionary class hierarchy. An abstract base class contains virtual functions, which the Dictionary class did not have. The four functions, clear, add, remove and string were changed to virtual functions.

The Info Dictionary class is derived from the Dictionary class. This is an intermediate class used as a base class for the Constant Number Dictionary and Constant String Dictionary classes. This class implemented its own clear, add and remove. They did their action before or after calling the base Dictionary class function. The add and remove functions had different function signatures from the base class, so these needed to be modified to be the same so that they can override the base class functions.

The Dictionary class add function returned the index of the entry and contained an optional argument for returning the entry type (New, Reused or Exists). Only the Info Dictionary class add function used this argument. It is not a good practice to use an argument for output. The return value of the base add function was modified to return a standard pair of the index and entry type, and the optional output argument removed. The Info Dictionary class add function signature was made to match and was modified to use the standard pair return value. Instead of the output argument.

The return value of the Dictionary base class remove function was corrected from integer to boolean since that is what type of value was returned. The Info Dictionary class remove function signature was made to match and changed modified to return a value (previously it did not have a return value), but note that no caller actually uses this return value.

The six table encode functions were modified to use the first value of the pair returned by the add function. This is temporary until the changes described in the last post are fully implemented. The dereference operator used with the back insert iterator was removed as it is not necessary since the assignment operator of these iterators is also defined to insert to the back of the container.

A small correction was made to the Dictionary class clear function in how the free stack is cleared. Previously an empty initializer {} was used. However, this gave and error with GCC 4.9.2 (the latest available in the tool chains repository, see post on August 5 for how to install or upgrade to GCC 4.9 - substitute "4.9" for "4.8" in the instructions). The empty initializer was changed to std::stack<uint16_t>{} to eliminate the error, which also works with GCC 4.8 (both versions of GCC will be still supported).

[branch table commit c3f9335697]

Tuesday, February 10, 2015

Program – Dictionary Class Problem

While investigating how clean up the chain calling mentioned in the previous post, it was discovered that the dictionary class hierarchy with respect to the base class was not implemented correctly. The following is the thought process on refactoring the call chain and how this dictionary problem was discovered. Consider the current complete REM operand text function:

const std::string remOperandText(ProgramLineReader &programLineReader)
{
auto operand = programLineReader();
return programLineReader.unit()->remDictionary()->string(operand);
}

The encode and remove functions are similar except they have no return value and different dictionary functions are called. The encode function has a back inserter iterator argument instead of a program reader and still has a program unit pointer argument. To remove the call chain and make the code as readable as possible, the function could be changed to something like:

const std::string remOperandText(ProgramReader &programReader)
{
return programReader.getStringFromRemDictionary();
}

Where the name of the program line reader class was going to be simplified (the "Line" part of the name added nothing significant and didn't seem appropriate since the program reader was now reading from a dictionary in addition to a program line). There would be similar functions for adding and removing from the dictionary for the encode and remove functions.

The problem with this scheme is the number of functions needed. Each of the other five codes have nearly identical table functions, except different dictionaries are used, for a total of 18 functions. The Program Reader class will need 18 access functions, one for each table function. Continuing this pattern will required many more table and access functions when the other codes are implemented (array, defined functions and user functions).

To reduce the number of the Program Reader class access functions to three (add to a dictionary, get string from a dictionary, and remove from a dictionary) will be to implement a enumeration for the dictionary types (remarks, constant numbers, constant strings, double variables, integer variables, string variables, and the rest of the unimplemented codes) and pass this to the access function. The REM function would look something like:

const std::string remOperandText(ProgramReader &programReader)
{
return programReader.getStringFrom(Rem_Dictionary);
}

For this implementation, the dictionaries in the program model will need to be placed into an indexable by enumerator container. The program model will provide an access function to the dictionary by enumerator (there is currently one access function for each dictionary - six access functions).

The indexable container will need to contain pointers to the dictionaries. The requires base class pointers to the dictionaries. This is where the problem was discovered - the dictionary class does not have virtual functions. This needs to corrected next.

To reduce the number of table functions to three (encode, operand text, and remove), the dictionary enumerator will be placed in the table entry. The three virtual table functions will pass the dictionary enumerator from the table entry to the program reader function. The intermediate Code With Operand table entry class will define these functions and the individual code classes (REM, constants, variables, etc.) will inherit them. Now, the common operand text function will look something like:

const std::string CodeWithOperand::perandText(ProgramReader &programReader)
{
return programReader.getStringFrom(m_dictionaryType);
}

Sunday, February 8, 2015

Program Model Access Refactoring (Part 2)

The next observation was that the program line reader argument of the operand text and remove virtual table functions was accompanied by the program unit pointer argument (needed to access the dictionaries). Therefore, the program line reader instance can carry the program unit pointer to these virtual functions.

A pointer to the program unit was added to the program line reader class. The new program model create program line reader function was to pass the pointer to itself to the program line reader constructor. An access function was added for this pointer. The program unit pointer argument was removed from all of the operand text and remove functions. The same changes was made to the token constructor for program words, which also has a program line reader argument.

Unfortunately, the resulting operand text and remove functions have an undesirable calling chain, which existed before, but now was extended another level. This is undesirable because it breaks encapsulation, meaning that these functions have knowledge of the internals of each class in the chain. Considered the statements of the REM operand text function:

auto operand = programLineReader();
return programLineReader.unit()->remDictionary()->string(operand);

This function has knowledge that the program line reader has a unit pointer member, that the unit pointer member has REM dictionary member, and that the dictionary has the string function. These statements are also unusual in that a program word is read from the program line reader, and then passed back to the program line reader through a chain of calls. This issue will be rectified with the next change.

[branch table commit 66a0db8247]

Saturday, February 7, 2015

Program Model Access Refactoring

Upon reviewing the most recent changes made to the encode, operand text, and remove virtual table functions, I decided that some refactoring was in order to clean up how the program model is accessed from these functions and how the program line reader is utilized.

I observed that for each of the three locations that created a program line reader instance were identical. The program line reader constructor contained three arguments, the beginning of the code vector, the offset of the line within the code vector and size of the line. For code readability, the number of arguments in a function call should be as few as possible and three is a little too many.

In all three constructions, the offset and size of the line were obtained from a line info structure. This could have been one way to reduce the number of arguments by one - to pass just a reference to a line info structure. This would have meant that the program code header file where the program line reader class is located, would need access to the line info structure, which is an internal class to the program model.

The more header files included, the longer compilation takes, so this was not a desirable solution, plus there would still be two arguments. This would be an improvement, but there was an alternative to eliminate the other argument - the begin code vector iterator.

Since the program model contains the code vector, the line info structure definition and includes the program code header file containing the program line reader class definition, a new create program line reader function was added to the program model class. This function contains a single reference to a line info structure argument and creates a program line reader instance from this argument and the begin code vector iterator.

[branch table commit ba6e5ee46d]

Table – Remove Virtual Function

The remove function is called to remove a reference to a dictionary entry for the operand of a code, possibly removing the entry if there are no more references to it. This function is used by the program model dereference function called when program lines are removed or replaced. This function had to check if the code had an operand (using the is code with operand access function) before the virtual function could be called.

The program model dereference function was modified to use a program line reader instance. The remove function function was modified to check if the remove function pointer is not null before calling the remove function of the table entry. By default, for codes without operands, the remove function will do nothing.

The operand argument of the remove virtual functions was changed to a reference to a program line reader instance. The remove functions were modified to use the program line reader instance to obtain the operand to pass to the desired dictionary to remove the reference to the item.

[branch table commit b980c95369]

Wednesday, February 4, 2015

Table – Operand Text Virtual Function

The operand text function is called to obtain the text for the operand of a code (for example, the name of a variable or the value of numeric constant). It is used when a program code for line is decoded into a list of tokens (an then back to the original text of the line). It is also used for test output. In both cases it had to check if the code has an operand (using the is code with operand access function) before calling the virtual function.

Similar to the encode functions that control the back insert iterator as needed when the code has an operand, the operand text function should control the advancing of the iterator being used to iterate over the program line if the code has an operand. The default operand text function does nothing and does not need to advance this iterator. For now the operand text function was modified to check if the operand text function pointer member is set and then calls the function, otherwise a blank string is returned.

Program Line Reader Class

A regular iterator could have been used with the operand text functions, but the code would look messy (dereference and increment operators would be required). To make the code more readable and the iterator easier to use, it was encapsulated within a new small Program Line Reader function operator class.

The Program Line Reader class contains two members, the iterator itself and an end iterator used to check for the end the line. The constructor contains three arguments, the begin iterator of the program code vector, the offset for the start of the program line, and the size of the line. These values are used to calculate the initial value of the iterator (begin plus offset) and end iterator (iterator plus size).

This class contains three access functions. The function operator function itself returns the program word pointed to be the iterator and advances the iterator. The has more words function returns if there are more words in the program line (iterator is not equal to the end iterator). And there is a previous function for returning the previous word obtained (word before the iterator), which is used by the program model debug text function to get the raw value of the operand to output when there is an operand (the operand text function advances the iterator past this word).

The operand text functions were modified to take a reference to the program line reader instance instead of the operand itself so the codes with operands can advance the iterator as needed (and the other codes without operands do nothing).

Operand Text Sub-Code Argument

The operand text functions for the variable codes previously added the data type change to the string. This was only required for test output and was inhibited for normal operation when a program line was decoded into tokens (this character is added later when the tokens are converted into text using the data type of the code table entry). The sub-code was passed as an argument and when decoding a line, the sub-code was set to a special No Data Type Character sub-code. The reverse logic made the code less readable.

Eventually the debug/test code will be removed (as much as possible) from the regular application into separate unit test programs. Adding the data type character was moved from the operand text functions of the variable codes to the program model debug text function. This was the only use of the sub-code argument, so it was removed from the operand text functions. The special sub-code enumerator was also removed.

Decoding Into Tokens

The program model decode function, a user of the operand text function, was modified to use the new Program Line Reader class. It previously created a token from the table entry of the code in the program word. It then added the sub-code to the token and if the code had an operand, called the operand text function with the No Data Type Character sub-code to prevent adding the data type character and put the resulting string into the string of the token.

Since this section made heavy use of the token, a new token constructor was added to do this work. The arguments for this new constructor are a pointer to the program unit (for code with operands that need to access the dictionaries) and a reference to the program line reader instance used for iterating over the program line.

The new constructor obtains a program word from the program line reader instance. The entry member is set to the table entry for the code of the word. The sub-code member is set to the sub-code of the word. The operand text function for the entry is called passing the program unit pointer and program line reader. The string returned is put into the string member. The string is blank for codes that do not have an operand.

Encoder Tests

The encoder tests did not contain a test where the double data type # character was tested on a variable. One line in encoder test #1 was modified to contain a variable with the # character on both on the left side (reference) and right side (value) of an assignment. The expected results for this test were updated accordingly.

[branch table commit ddc2efabbd]

Thursday, January 29, 2015

Table – Encode Virtual Function

The encode mechanism needed to be improved. The encode function in the program model converts an RPN list to a program line. For each token in the RPN list, the first program word is generated from the code index and sub-codes of the token and added to the program line. If the code in the token has an operand (using the is code with operand access function), the second program word is generated by calling the encode function of the table entry to get a index, which is added to the program line.

This mechanism was not making effective use of polymorphism. A better way is to call the encode function for the table entry, which generates the one or two program words dependent on the requirements of the code in the token (via its table entry). This was accomplished with several changes.

An encode function was added to the token class since the contents of the token are used for encoding. The first argument is a pointer to the program unit (an instance of the program model class) needed for the codes with operands to access the dictionaries. This function also needed access to the program line being encoded. This was handled by passing a C++11 STL back inserter iterator as the second argument. This iterator is used to add words to the program line, such as:

*backInserter = <ProgramWord>;

The token encode function then calls the encode function for the table entry. The arguments include the program unit pointer, back inserter iterator and a pointer to the token. A plain token pointer is passed (the this pointer). The token argument of the entry encode functions were changed from a standard shared token pointer to a plain pointer (a shared pointer is not necessary here anyway). This changes propagated to the dictionary add functions and the dictionary info add element and set element functions.

With the table entry encode functions adding the operand words to the program line directly (via the back inserter iterator), a return value is no longer needed. The default encode function does nothing since most codes don't require a second operand program word. For codes with operands (REM, constants, variables, etc.), their encode functions call the dictionary add function whose index return value is added to the program line.

[branch table commit 0e039c280f]

Wednesday, January 28, 2015

Table – Entry Virtual Functions

The implemented commands (LET, REM, PRINT, INPUT, and INPUT PROMPT) each have different initialization values (a code for LET and REM, an option name for the INPUTs, a translate function pointer for all except REM, a recreate function pointer for all but LET, and an encode, operand text and remove functions pointers for only REM). Creating a single command sub-class constructor to cover these would be messy, and so would creating a different constructor each.

The intention was to replace the functions pointers with virtual pointers, so it was not worth the effort to try and handle these. Virtual functions were added to the base table class, which temporarily call the function pointer if set (translate and recreate), or just call the function pointer (encode, operand text, and remove). For the command sub-class, which is only used for commands not yet implemented, the translate virtual function overrides the base class function and throws a "Not Yet Implemented" error.

The is code with operand access function checked if the operand text function pointer was set. Since the function pointers will be removed, another method for checking if the code has an operand was needed. A new Operand table flag was added to the REM, REM operator, three constant, and six variable table entries. This access function was modified to check this flag.

The users of the function pointers (translator and program model) were modified to call the virtual functions. For the encode, operand text, and remove functions, the program model needs to call the is code with operand function before calling the function. This resulting code a bit messy and needs to be improved.

[branch table commit c24cf75c90]

Tuesday, January 27, 2015

Table – Command Sub-Class (Initial)

An initial command table entry derived class was implemented consisting of a single constructor taking name and second name string arguments with the second name defaulting to a blank string. This constructor can be used with both one-word and two-word commands. It calls the base table constructor passing the two names along with values needed by most commands (Command table flag, precedence of 4 and a pointer to the null expression info instance).

This command constructor is only sufficient for the not yet implemented commands. For now it also sets the default code (no code), blank option string, and null function pointers. This class definition was put into a new header file. In the corresponding source file, instances were created for all of the not yet implemented commands, for example:

static Command Dim("DIM");

The entries for these commands were removed from the static table entry array and the code index enumerators for these commands were removed from the code index enumeration.

The type of the string arguments to the table constructor (name, second name, and option) was changed from a standard string to a C-style string (constant character array pointer) because otherwise, passing a blank string to this constructor required a standard string std::string("") instance, instead of just a blank "" string.

The first time these changes were one, a crash occurred during initialization when trying to add the first of the new command instances because the static table members had not been initialized yet. Because the new command source file appeared before the table source file in the source file list in the CMake build file, so command instances were initialized before the static table members, but these are needed to initialize the command instances. To resolve this, the basic sub-directory source files (which will contain the entry instances) were moved to the end of the source file list so that the table source file with the static table members are initialized first.

[branch table commit 97ff351060]

Monday, January 26, 2015

Table – Automatic Two Flag

The table is no longer dependent on the static table entry array so the implementation of the table class hierarchy can begin. The first sub-class will be for commands. While investigating the requirements for this sub-class, it was noticed that the Two table flag could be set automatically. The Two table flag indicates if a command has a two-word form (for example the INPUT and INPUT PROMPT commands). It is also used for symbol operators that has a two-character form (for example <, <= and <> operators).

The erector class functions were modified to set the Two table flag automatically for these commands and operators. After a two word command is added to name to entry map, if the first word of the two-word command has an entry, then the Two flag of the one-word entry is set. Similarly, after a new primary entry is added to the name to entry map, if the entry is a two-character operator, the Two flag is set on the entry for the first character is there is one. Some additional table entry access functions were added.

The Two table flag was removed from all of the entries. The Two flag was set on the <> operator, which was not necessary (only the one-character form required this flag). Not all of the command entries with the flag were covered by parser tests, so additional tests were added to parser test #2 (identifiers). Not all forms of all operators were covered by the tests. Translator test #11 (temporary strings) contained an extra set of <= tests, which should have been >= tests. New translator test #18 was added to cover all operators with all possible data types.

[branch table commit 937c5e3a57]

Sunday, January 25, 2015

Table – Entry Indexes

Up to now, the index for a code that is stored in the program is the index of the table entry in the static table entry array defined in the table source file. The new table model will not have an array of table entries, but instead will be separate table instances spread among several source files. The common connection will be through the constructor of the base table class. This constructor will put pointers to the entries into a standard vector. As each entry is put into this vector, its index within the vector will be put into an index member of the table entry.

A static index to entry vector was added to the table class along with the index instance member. The set index and add entry access function was added to set the index of the an entry and add the pointer to the entry to this vector. A call to this function was added to the erector function. The index access function was changed to return the value of the index member. The entry access function was changed to access the index to entry vector instead of the static table entries array. Both functions were made inline.

The comments were removed from the instance members since they were only stating the obvious, plus the comment for the expression info pointer member stated that a null pointer indicates no expression info even though this no longer applies because all entries are now assigned an expression info instance even it it is the null instance. This is an example of comments going stale and lying - a good reason for eliminating comments with code explains itself.

[branch table commit 9d74aacf42]

Table – Static Member Access

While refactoring the add to table function and sub-functions into the Erector class functions, one of the desired changes to improve readability was to provide access functions to the static table members. All of these statements contained the table class scope (Table::) to get to the static member and a table entry pointer. This meant that table entry access functions could be added that would then access the static table member.

Ideally these access functions would be inline for efficiency. The reason these changes were not made was that in order to be inline, the table class needs to be defined before the table entry class, but the table entry class needs to be defined before the table class. Using forward declarations may have solved this issue. This would not be necessary if the two classes were merged and since they were going to be merged, no effort was used to solve this issue.

With the two classes merged, these access functions could be implemented. The necessary access functions for the expected data type, alternate, name to entry, and code to entry maps were added and callers were updated accordingly. The table class scope was eliminated from the erector class functions. The erector add to code map and two add to expected data type functions were moved back to the table class since both were passed entry pointers.

Several of the existing access functions were renamed for improved readability. One of the access functions, the alternate count function, was only called from one location in the translator and was used to determine if the unary operator in the token has an alternate binary operator. This section of code looked like this (note the comments and magic numbers):

// check if code has a binary operator
if (m_token->alternateCount(1) > 0) {
m_token->setFirstAlternate(1); // change to binary operator
}

This function was renamed to the has binary operator function. Several constant expressions (integers) were added for predefined operand indexes for use with the various alternate access functions so magic numbers are not used. With the renamed access function and constant expressions, the resulting code is much more readable (note the absence of comments):

if (m_token->hasBinaryOperator()) {
m_token->setToFirstAlternate(BinaryOperator);
}

[branch table commit 03140858b0]

Saturday, January 24, 2015

Table – Single Class

The table class, with static members, and the table entry class, with entry (instance) members, were combined into a single class, named the Table class. The former table entry constructor (which had no body) was moved to the table source file and a call to an Erector instance was added to add the needed info to the static table table members.

The iteration of the entries in the static table entry array that called an Erector instance for each entry was removed from the default constructor since the table entry constructor now does this. The only statements remaining in this constructor are the initialization of the other alternates. This constructor along with the single table instance will remain to initialize these other alternates until this can be reimplemented in the new class hierarchy.

With these constructors in place, the class hierarchy can be implemented one derived class at a time. As each is implemented, the entries will be removed from the static array along with the enumerator from the entry.

The method previously used to disable the copy constructor and copy assignment was to make them private so they are not accessible, at least from outside the class. C++11 has a better method of accomplish this using the new delete feature. C++11 also has move constructors and assignments, so these also needed to be deleted. For example, the copy constructor and assignments are disabled like this:

Table &operator=(const Table &) = delete;
Table(const Table &) = delete;

Some other minor changes were made. All instances of variables named operand index (mostly arguments of access functions) were renamed operand. This was already done in the Erector class. Several redundant comments were also removed.

[branch table commit 9153b17e5e]

Table – Add To Table Refactoring (Part 3)

The add to table function and all the functions it calls share several variables, which required them to be passed between the functions. Enter the new Erector class to contain these functions and shared variables. This class is a function operator class with a constructor that takes a table entry pointer, and the function operator function for adding the element to the table (static member variables). This class was made a friend class of the table and table entry classes to make private member functions accessible.

The shared members of the Erector class include the table entry pointer being added, table entry pointer to the primary entry, table entry pointer to an alternate entry, and an operand index used for iterating and used by multiple functions. Most of the member functions were declared in the header file after the class definition to not clutter up the class definition. Each of these were declared inline since each is called once. Two functions called multiple times were put into the source file.

Many of the functions were renamed again to better explain what they do. There was addition refactoring moving statements up and done the functions during this renaming. Not widely known (at least is wasn't to me) is that C++ contains alternate representations for the logical operators, for example and for &&, or for ||, and not for !. To make the code a little more readable, the not operator was used since the ! operator can be head to see.

The main function was restructured to eliminate the need of the Done structure exception. This unusual mechanism was used as sort of a goto to cause an exit from anywhere. The elimination of this exception-exit mechanism was now possible since the sub-functions no longer returned table entry pointers (which are now member variables and set directly) opening up the use of boolean return values.

[branch table commit f199936e70]

Friday, January 23, 2015

Table – Add To Table Refactoring (Part 2)

Refactoring the remaining large function called from the add to table function was painful, but was finally achieved, though the result is far from ideal. The problem is that there are several variables that need to be shared among the functions. Some functions take a variable and then return the same variable possibly modified. Another needs to modify two arguments, so couldn't return both so reference arguments were used.

Having arguments used as both input and output can be confusing when reading code (arguments should be used for inputs, return values should be used for outputs). Sharing several variables between functions implies that the functions should be wrapped up into another class. This will be the subject of the next refactoring change.

[branch table commit a9cd8e9a1a]

Wednesday, January 21, 2015

Table – Add To Table Refactoring

The add to table function is large and hard to read so it will be refactored. Refactoring this function will require several passes of changes. The first round was to break the function into seven smaller functions. The add expected data type function was already separate since it is called from three places. The result was a simpler main function, but is still rather involved. The separated functions are also simpler, but less than readable (and one is still quite large).

I won't detail the results since this is not the final form (and if successful, no explanation should be necessary as the code should be easy to read).

[branch table commit 23515be87b]

For the next round, I wanted to clean up the main add to table function. There were a number of return statements that were cluttering up the function. The first change was to restructure two of the if statements to eliminate two of the return statements. For the others, there was a pattern with three of the separated functions returning a null pointer indicating that no further action was necessary. The main function then checked for this null pointer and returned.

To clean this up, an empty Done structure was added with no members to be used as an exception. Instead of these three functions returning a null, they instead throw an instance of this empty structure. The main function catches this exception and does nothing (just returns). This is not really an exception condition, but it cleaned up the main function quite a bit. This mechanism allows lower-level functions to cause a return of the add to table function.

[branch table commit 69bd606994]

For the third round, there was some more attempt to make the main add to table function and some of the separated function more readable. This involved renaming functions (again) and adding access functions to make what is being tested in if statements more self explanatory.

One of the new access functions was named has operands, which was too similar to the has operand access functions. These were used to determine if the code has an operand (variable, constant, array, etc.) by checking if the code has an operand text function. These functions were renamed to is code with operand. The last operands access functions were corrected to return an integer instead of a boolean (though this didn't appear to affect its functionality).

[branch table commit 8b9b3f9a4b]