Interactive BASIC Compiler Project

Thursday, January 15, 2015

Token – Set Table Entry

The code member of the token class will be replaced a table entry pointer. This means that the access functions for the code need to be changed. It is easy to change a table entry pointer to a code or a code to a table entry pointer since the code is currently just an index of the table entry. There was already a table entry access function, which for now calls the static table entry access function that just returns the pointer to the entry in the table entry array.

Once the code member is replaced, the table access function will simply return the table entry pointer member. The set code access function was replaced with a matching set table entry access function and callers were updated accordingly. The name of the getter should be named table entry but use of this function is going to be largely replaced with additional pass-through table access functions in the token class (like already exists with the type, data type, has flag, precedence and last operand functions). This will wait until the table entry pointer member is added.

There was also the token constructor that took a code enumerator. Uses of this constructor was replaced with the token constructor taking a table entry pointer (where default values were added for the column and length arguments). Most calls to the former constructor had a table entry pointer and was obtaining the code from it, so the call to the code table access function was simply removed from these.

The program word class was also using the code enumeration type for setting codes and returning codes. This required C-style type casts (though C++ static casts could have been used) to change to and from unsigned 16-bit integers. The code and token types will be replaced with a code type and a code index. Code indexes will only be used in the program code. The code enumeration in the program word access functions was changed to unsigned integers and their callers updated accordingly.

[branch table commit 0dc88434d2]

Wednesday, January 14, 2015

Table – Single Instance

The find code function was the last function in the translator using the table reference member of the class. The recreator and program model classes also no longer used their table reference members. Both the translator and recreator classes contained a table access function for use by the various translate and recreate functions of the codes. These functions were no longer using this access function, so these access functions were removed. The table reference members were removed from these classes. (The table reference member has already been removed from the parser class.)

The table reference members were initialized from the static table instance function. Upon the first call to this function, the table instance is created and initialized. With the removal of all of the table reference members, there were no calls left to initialize the table. This function (along with the static table instance member) was replaced with a static table instance in the table source file. This required the table constructor to be made public.

[branch table commit 2c22674e61]

Tuesday, January 13, 2015

Translator – First/Last Operands Handling

After refactoring the token convert and changing its caller, the process done stack top function, I noticed this later function could be better implemented. Specifically how the first and last operands of the tokens were being returned. If the callers wanted these tokens returned, they would pass pointers to were to put the tokens, otherwise null pointers passed (also the default). Using arguments as outputs is not the best design. Changing this caused a cascade of additional refactoring.

The process done stack top function was changed to return the first and last operands as a standard pair of token pointers (now referred to appropriately as the first and second operands, the members of the pair). If the caller doesn't need them, then it ignores the return value. An Operands alias was added for this pair. The local first and last variables were replaced with an operands pair, which is returned at the end of the function. The comments to this function were removed as they were now outdated and restated what the function did.

The process final operand function was changed to use a first/second operands pair. The operand index argument was used to determine if an operator token was unary. The token itself can be used to determine this. This argument was also passed to the process done stack top function, but it was always the last operand of the token, which can be obtained from the token. This argument was removed. The generic token2 argument was renamed more appropriately to first since it is the first operand of the token, and was made an r-value reference to force the caller to move its token to this function. The comments to this function were also removed.

The process operator function, another caller of the process done stack top function, was also modified to use an operands pair (only the first operand returned is used). The final caller, the LET translate function, does not use the returned operands and didn't need to be modified.

The first and last members of the internal Done Item structure (the values of the done stack member) were replaced with an operands pair. The constructors were updated to initialize the first and second members of the operands pair and a new constructor was added with an operands pair argument. The replace first last function (a poor name) was renamed to the more appropriate replace operands.

[branch table commit 240f967d4a]

Sunday, January 11, 2015

Token – Conversion Handling

The table find code function was the one function taking a token pointer argument that was not modified with the rest of the table functions with a token argument since it did more than just access the code of the token. It had two token arguments, the token being processed (operator, assignment, print item, and input assignment) and its operand token.

This function took no action if the operand data type matched the expected data type of the token. If the operand was a constant, an attempt is made to change the constant to the expected data type if at the last operand and the token does not have the Use Constant As Is flag set. If there is an alternate code of the token that matches that of the operand, the token is changed to that code. Otherwise a hidden conversion code is obtained. An error is thrown if the operand cannot be converted.

Since this function is acting on a token (or possibly two tokens), it made more sense for this function to be a member of the token class. After moving it to the token class, it was renamed to the convert function. With a lot going on in this function, some refactoring was performed to improve the readability and clarity of the code:

Added the is last operand access function to determine if an operand index is the last operand of the code in the token.
Renamed the convert constant function to change constant. Also renamed its argument.
Added the change constant ignore error function, which calls the change constant function and catches an error that may be throw.
Renamed the convert code function to convert code entry. Also renamed its argument.
Removed most of the comments as they were only restarting what appears in the code (this included the callers in the translator routines).

[branch table commit ce6adffc15]

Saturday, January 10, 2015

Table – Entry Access Functions

There were a number of table access functions with a code argument, which was used as an index into the table entry array to access the desired entry member. There also a number of access functions with a token pointer argument, but the token argument was only used to obtain the code (index). All of these functions (except one) were replaced with in-line table entry access functions.

The table access function of the token is used to access the table access functions. Like the existing token type and data type access functions that access the table, has flag and precedence functions were added to the token class so that the intermediate table function call is not needed. Since all of the functions are in-line members, the resulting code is just as efficient.

The table class was also made a friend class of the table entry class so that the static table members are directly accessible. The table entry class was already made a friend class of the table class. Eventually these two classes will be combined as the new table model takes shape.

There were a number of other locations where use of a code was replaced with a table entry pointer. The table entries were made private since all are now covered with access functions. However, this required a table entry constructor (to initialize all of the members) so that the current static table entry array can still be initialized. This is temporary until the new table entry class hierarchy is implemented. See the commit log for other changes that were made.

[branch table commit 2f013da528]

Table – Functions Returning Codes

There were several table (and token) functions that returned a code enumerator (which was really an index). These functions were changed to return a table entry pointer. In the short term this means in some cases calling the temporary code access function to get the index of the entry (the code is still stored in a token).

The functions modified included the find code and token convert code functions. Several new table entry alternate functions replaced equivalent table set token code functions. Click Continue... for details of the changes. Replacing of the use of code enumerators with table entry pointers has been started, which is necessary before the code and type enumerations is replaced with the code type enumeration.

[branch table commit 3df697fa40]

Continued... »

Wednesday, January 7, 2015

Token – Data Type Member

The data type member of the token class was removed since its value should always the same as the table entry for the code in the token (and all tokens now have a code). The data type, is data type, and is data type compatible access functions were modified to get its return value from the table entry for the code. The set data type access function was removed since it is no longer used.

A data type access function was added to the Table Entry class. To access the data type of the table entry, which is within the expression info member, the Expression Info structure definition was moved from the table source file to the table header file so that its members are available from the in-line table entry functions.

These changes caused a problem with the not yet implemented array, defined function and user function codes. There were only one table entry for each of these codes and their data type was Double. This previously worked since the correct data type was in the token. The problem was corrected by adding the missing table entries for these codes for each of the other data types.

There were several locations (translator and tester routines) that to check for these codes by looking for the one code. Now with additional codes, the checks needed to expanded to include the additional alternate codes. This is not ideal, but will be resolved once the code type enumeration is implemented (there will be one code type each for variable, array, defined function and user function codes).

[branch table commit f8d338ae5e]

Tuesday, January 6, 2015

Invalid and Null Code Enumerators

The Invalid code enumerator was used by the table find code function to indicate an invalid conversion and within a token to indicate no code has been set yet. Both uses no longer exist, therefore this enumerator was removed along with the any uses of it (it is no longer necessary to check if a token has a valid code), including the token has valid code access function.

The Null code enumerator was also removed, replaced with the use of the default code enumerator. The first code enumerator was set to 1 so as not to conflict with the default code enumerator. While making this change, it was noticed that the table unary code function was not being used any more, so it was removed.

[branch table commit 7d503805f4]

Token – Data Type Conversion

Hidden conversion codes are inserted into the program when a numeric operand needs to be converted to either a double or integer. However, for numeric constants, no conversion is needed since both the integer and double representation of the constant is available (except for large values that cannot be converted).

The exception mentioned in the last post where the data type of the token is set (excluding token creation) was when a constant is changed or cannot be used because an integer is required but the double is too large (the data type is temporarily changed to the default, which is then checked for by the callers that converts a constant and returns a hidden conversion code for an operand).

The convert constant function was modified to throw an "Expected Valid Integer Constant" status error when an integer is required but the token contains an unconvertible large double constant. Previously this was up to the caller to check if the expected data type was an integer and a double constant wasn't converted. Also, instead of setting the data type before calling the table set token code function, the set token code function with the data type arguments is used.

The convert code function was modified to throw "Expected Type Expression" errors when the token cannot be converted to the desired data type. Only the error Status is thrown and not a Token Error allowing the caller to construct the Token Error. Previously, this function returned the Invalid Code enumerator, which callers checked for.

The table find code function was modified for this change. The "Expected Valid Integer Constant" error is thrown from the convert constant function are caught and ignored because it is possible that the operator or function may contain an alternate that takes a double argument (where a large double constant would be acceptable). If the operator or functions only accepts an integer, then an exception will be throw along with other tokens that cannot be converted.

The translator get expression function was modified to catch error statuses throw from the convert code function and construct a Token Error to throw (by calling the done stack top token error function). Previously this routine had to check for an unconvertible double constant. The process done stack top function was similarly modified to catch errors from the table find code function (also previously checking for an unconvertible constant).

The translator get operand function also called the convert code to check the data type of references. This was not appropriate since no conversion code was needed. The modified convert code was not throwing appropriate errors for references anyway. Therefore, a new token is data type compatible access function was added to check the data type only.

The set token code function without a data type argument was an adapter to the set token code passing the data type contained in the token. Besides the convert constant function, the only other user of this function was the token constructor for string constants, which was modified to use the function with the data type argument directly. The initializer for the code member was also removed since it get initialized by the set token code function. The set token code adapter function was removed.

[branch table commit c2f2420a89]

Sunday, January 4, 2015

Token – Type Member

The token constructors each contain calls to the table static instance function so that the set token or set token code function can be called. These two table functions set the code, type and data type of the token. Now that tokens always contain a code (and later a table entry pointer), the type and data type members will be the same as the table entry (with one exception for data type). So it is not necessary for the token to contain these members.

The type member was removed from the token class (the data type member will have to wait until that one exception is eliminated). The type and is type access functions were modified to read from the table using a new table access function (which for now uses the code to get the table entry pointer). The set type access function was removed as it was only used when the token was created. A type access function was added to the table entry class.

These changes required some refactoring of header files. The token header file include statement in the table file needed to be removed since the token header file needs to access the table entry. The table header file include statement was removed from the token header file. This required the Type enumeration to be moved from the token class to the table header file. This enumeration was not renamed since it will soon be replaced with the new code type enumeration.

[branch table commit 68593c2dde]

Parser – Table Instance Member

From the last change, most uses of the table instance reference member of the parser class was removed; replaced with use of static table functions. The last two uses were for setting of the code (plus type and data type) for constant tokens. For constant string tokens, the setting of the token code was moved to the token constructor for string constants.

The other use of the table instance was for number constants also in the main function operator function of the parser, but with some statements to determine from the desired data type what the data type of the number constant should be and whether to set the Integer Constant sub-code. These statements and the setting of the code of the token were moved to the token constructors for integer and double constants.

For integer constant tokens, the data type is set to integer unless the desired data type is double. There is no need to for the Integer Constant sub-code since the constant is an integer. For double constant tokens, if the value is within the integer range, the Integer Constant is only set if the desired data type is not an integer or double. If the desired data type is indeterminate (Number or Any) then the data type of the token is set to Double. For values outside the integer range, the data type is set to Double.

The table instance reference member was removed since it was no longer being used. The remaining parser constructor was only initializing the input string stream member, so it was moved to and made in-line in the header file.

[branch table commit b89a1b663b]

Parser – Table Entry Pointers

Before the code enumeration (and token type enumeration) can be replaced with the new code type enumeration, uses of the code enumeration type need to be replaced with the use of table entry pointers. This will be done on different parts at a time adding table entry class access functions as needed.

This process was started by changing the table find functions from returning a code enumerator to a table entry pointer (returning a null pointer to indicate no table entry was found). Only the parser routines were using the find functions, so the get identifier and get operator functions were updated to use table entry pointers instead of a code enumerator.

A few table entry access functions were added to support the parser changes including the code, is code, name, has flag and alternate functions. For now the code function simply returns an index value of the entry by subtracting the base of the table entries array. This function is temporary. The is code function checks if the table entry is for a particular code. For now it compares to the code function return value, but eventually will compare to the code member that will be added to the table entry. The alternate function is similar to the alternate code function but returns a table entry pointer.

The code argument of the two token constructors for codes were changed to table entry pointers. For now they just access the code function of the table entry. The immediate goal is to change the interfaces and later to change the underlying code when the token code member is replaced with a table entry pointer. The token constructor taking a code was temporarily left (though the unneeded arguments were removed) for use by the translator routines.

The table entry class was made a friend class of the table class (specifically so the alternate member function can access the static alternate member of the table class). Eventually, the table entry and table classes will be combined into a single class.

[branch table commit 78f0b39780]

Saturday, January 3, 2015

Parser – Parentheses Token Handling

With the forthcoming change from the token type and code enumerations to the code type enumeration, it will be advantageous if all similar codes have the same code type. Table flags will be used command, operator and functions codes since some will need their own code type (for example the LET command and equal operator codes).

The codes being discussed are the codes with operands: constants, variables, arrays, defined functions and user functions. Each have separate codes for each of the three data types, and each (except constants) have an additional three of each data type for reference codes. Currently only constants and variables are fully implemented. For variables, there will be a single Variable code type for each of its six codes. The translator will only need to check this single code type for a variable code.

There was a distinction between token types with parentheses and without (internal functions, defined functions, and generic tokens). For defined and internal functions, this distinction was removed (the token type enumerators were combined for each). For internal functions, instead of checking the token type to determine if there are parentheses, the number of operands is now checked. Eventually defined functions (and later user functions) will have a similar check as the number of operands will be stored in their associated dictionaries.

The parser get word helper function was modified to only check if an open parentheses is present and add it to the identifier string as before, but not take it from the input stream. The parentheses is still needed for functions when searching the table. The get identifier function already removed the parentheses if a table entry was not found, but was modified to remove the parentheses from the input for internal functions only. A new get parentheses access function was added to check if the next character in the stream is an open parentheses, remove the parentheses, and return whether it was a parentheses.

The translator get operand function was modified accordingly. For the single internal function token type, the process internal function routine is only called if the function has operands. For the single defined function token type, the process parentheses token is only called if the next character in the parser is a parentheses (calls the new get parentheses function to remove the parentheses). For the generic parentheses token type (array or user function), the get parentheses function is called to remove the parentheses.

The two separate table entries for defined functions with and without parentheses remain for now with their associated code enumerators and are used to determine if parentheses are present. Once defined functions are fully implemented, these two codes will be replaced with six codes (as described above) and the defined function dictionary will contain whether there should be parentheses (if there are operands).

The test token stream inserter and print token functions were modified for the change in token type enumerators. The token has parentheses access function was only used by the print token function, which used a static map member. The print token function was modified to not require this access function, so it and the static map member were removed. There was also a static map member for precedence. Since all tokens now have a code assigned, all precedences can be obtained from the table, so this static member and its access function were also removed. The expected parser results for tests #2, #4 and #5 were updated for the change in the token types.

[branch table commit 3dd768f604]

Friday, January 2, 2015

Token – Header File Refactoring

With the new table model, the token class will contain a pointer to the table entry of the code instead of the code enumerator (currently used as an index). The table header file currently includes the token header file for the many table functions with a token pointer argument. This is no longer necessary since only token pointer and token pointer references are used (forward declarations are sufficient).

When the token class contains a pointer to the table entry, optimal access to the table entry members will be achieved if the entry access functions are defined in-line. Similarly, optimal access indirectly through to the token access functions to the table members is desired. This will require the table header file to be included in the token header file.

Therefore, some header file refactoring was performed where the token header include statements were moved from all the other header files to the associated source file with the exception of the table header file, which is uses the token type enumeration. When this enumeration is replaced with the code type enumeration, this statement can be removed. Forward declarations were added as needed.

[branch table commit 0777959464]

Parser – Token Creation

A side effect of the last change was that tokens for both codes for operators, functions and commands and codes with operands (constants, variables, arrays, defined functions and user functions) were using the same token constructor. This token constructor searched through alternate codes for the code with the appropriate return data type.

This was unnecessary for operator, function and command codes. The table new token function called for these codes passed in the return data type from the table entry of the code. The token constructor then called the new table set token code function. Since the data type matched the return data type (which was just passed in), no alternates were checked and the code, type and data type of the token was set. This was extra unnecessary work.

A new token constructor was added for operator, function and command codes, which only required arguments for the code, column, length and string. The string argument is only used for the REM and REM operator codes. This constructor replaces the table new token function. This constructor calls the table set code function which just sets the code, type and data type of the token from the table entry of the code. For consistency the code argument was put first in the other token constructor for codes with operands.

While looking at the creation of tokens, I decided that using the standard unique pointer within the parser was unnecessary. The parser can just allocate a token and return its pointer. The translator then can put the allocated tokens into a standard shared pointer. The parser was changed to use plain token pointers. The translator routines were changed to use the new token constructor directly via the standard make shared function. The translator get operand was changed to use the reset function to set the token member since shared pointers cannot be assigned directly to a pointer.

[branch table commit 1bfb76ae0a]

Parser – Codes With Operands

The last token type not being set fully in the parser were codes with operands (constants, variables, arrays, defined functions and user functions). Constant tokens were corrected with the last change. Arrays, defined functions and user functions are not fully implemented and so did not need to be changed. Variables however, were only partially set in the parser (only to the base Variable or Variable Reference code) and weren't set for the data type of the variable until the translator.

The parser get identifier function was modified to set the data type to Double if the word obtained from the input does not have a type. This applies to all identifiers not found in the table. The token constructor for codes is used for commands, operators, functions and codes with operands. The type argument was unnecessary since that is set from the table entry. However, an issue was found with how codes were found in the table.

For operators and functions, the [return] data type of the token is set from the table entry. (This issue doesn't affect commands since command don't have a return data type.) For codes with operands, the data type of the identifier is used to find the appropriate table entry (for example, Variable, Variable Integer, or Variable String) by looking at the data types of alternate codes. The current table set token code function did not work correctly because it searches alternate codes by operand data type. For this instance, the alternate codes need to be search by return data type.

A new set token code function was added without an operand index argument to search by return data type. If the data type (of the identifier) does match the code passed, then the alternate codes are searched for a matching return data type. If there are no alternates or none were found, then the code passed is set in the token along with the token type of the code. The data type is set to the data type of the identifier and not from the table entry (which may not match for codes like arrays that are not fully implemented yet).

The type argument was removed from the token constructor for codes. The type from the table entry of the code was passed (and the new set token code now does this). A call to the new set token code was added to the body of the constructor (previously empty).

Since codes for constants, variable, and variable references were found in the table incorrectly by operand data type, these table entries contained operand data types so that it would work. These codes do not have operands (in the sense that operands and functions do within expressions; not to be confused that in the program, these codes do have an operand index). These table entries were corrected with expression info instances containing no operands.

The translator get operand previously set the default data type of the token just obtained (set to Double if None and not a function). This was removed since the parser now does this. The token set default data type function called to do this was removed. The call to set the code for a no parentheses (variable) token was also no longer needed. With the parser now setting the default data type to Double, the expected results to the parser tests (#2, #3 and #5) needed to be updated.

[branch table commit acc37f0650]