Interactive BASIC Compiler Project: November 2013

Saturday, November 30, 2013

New Information Dictionary – Implementation

The information dictionary class was changed to a normal class derived from the dictionary class. The constructor is given a pointer to the information class instance created outside of the dictionary. This pointer is saved in a pointer defined as an abstract information pointer, which can hold any information class pointer derived from the abstract class.

The add routine first adds the dictionary entry by calling the base dictionary class add routine with the token and case sensitivity option along with a pointer to the new entry flag so that it knows if a new entry was added, a removed entry was reused or an entry already exists. If a new entry was added, an element is added to the additional information by calling the add element interface function of the information instance. If the entry did not exist, the addition information is set from the token by calling the set element interface function of the information instance. The index is returned.

The remove routine first removes the reference to the dictionary entry by calling the base dictionary class remove routine for the index specified. If the entry was removed because it is no longer used, then the additional information for the element is cleared by calling the clear element interface function of the information instance. The base dictionary class remove routine was modified to return whether the entry was removed (made available to reused) or not.

The abstract information class defines the interface to the additional information. The functions are defined as virtual functions with no default functionality so that derived information classes do not implement a function that it does not need. The constant number and string information classes were changed from holding just a single element to being derived from the abstract class.

The constant number information class contains two vectors for the double and integer values. The add element function just extends the two vectors by one element. The set element function copies the token double and integer values into the respective vectors for the element specified. No clear element function was needed since there is nothing to clear. Two array access functions were implemented to access the data in the two vectors, which will be used at run-time.

The constant string information class contains a vector of string instance pointers. The add element function appends a pointer to a newly created string instance to the vector. The set element function copies the token string into the element specified. The clear element function clears the string for the element specified. Once the string instances are created, they will be reused if dictionary entries are removed. A destructor was implemented to delete all of the string instances. An array access function was implemented to access the data in the vector, which will be used at run-time.

Information instance pointers were added to the program model class. These instances are created in the constructor and passed to their associated information dictionaries. Both the constant number and string dictionaries are now information dictionaries. The information instances are deleted in the destructor. There are no longer any known memory issues.

[commit b9772d4149]

Information Dictionary – New Design

The original design of the information dictionary made the assumption that the additional information would be contained in a vector and was given a structure for the information. The definition was a class template where the information structure was the argument, which was put into the vector, which the information dictionary handled directly. However, in the case of the constant number dictionary, two vectors are needed so that memory is not wasted (see last post).

The details of the additional information need to be separated from the information dictionary. In other words, the information dictionary should not know (or assume) that the additional information is a vector. An abstract information class can be used that defines the interface to the additional information. The interface requires several functions for accessing the information:

add element - add a new element to the end of the additional information

set element - set an element from information in a token used when a new element is added or an element previously deleted is reused

clear element - clear the contents of an element when the dictionary entry is removed (made available for reuse)

The actual information classes are derived from the abstract class and implement these functions to manipulate their information as required, which could be stored as a vector, two vectors, or something completely different. The information dictionary has no knowledge of the information class internals and simply uses the interface functions.

The information dictionary class can be a normal class derived from the dictionary class containing a reference to the additional information. The abstract information interface functions are used to manipulate the additional information. The information dictionary needs re-implement these functions from the base dictionary class:

add - adds a new dictionary entry and additional information if not already in the dictionary and returns its index

remove - removes the additional information if the dictionary entry was removed

Friday, November 29, 2013

Information Dictionary Issues

The information dictionary class extended the base dictionary class by adding a vector for additional information and was implemented as a class template (see post from October 6). The additional information in the constant string dictionary contained a pointer to a string instance (see post from October 6).

The memory leak in the constant string dictionary was caused by how the information dictionary template and constant string information classes were implemented. The problem occurred when a string in an entry of the information vector was replaced with the same string. A new information instance was created with a new string pointer, which was put into the information vector, and the old string instance was lost (a memory leak).

When an old program line was dereferenced after the new replacement line was encoded, a string being replaced by the same string had its reference incremented in the dictionary from one to two by the encode, then the dereference decremented the count back to one. However, when the dereference was moved to after the encode, the reference count of the string went from one to zero and the dictionary entry was freed, but not the entry in the information vector. When the new line was encoded, a new string instance was created overwriting the old string instance pointer.

While this problem was not difficult to correct, another issue was discovered, this time with the constant number dictionary where its additional information consisted of a double value and an integer value contained in a structure (see post from October 6). Each double value was aligned on a double boundary (eight bytes) and because an integer is half of a double (four bytes), four bytes of padding is inserted by the compiler between each element in the vector (wasted memory).

The only way to correct this is to separate the two sets of values by having a double value vector and an integer value vector. Unfortunately, the information dictionary template class only allows for a single information vector. A new design is needed for information dictionaries.

Program – Dereferencing Replaced Lines

When a line is replaced, references to dictionary entries in the old line must be removed. This was taking place after the replacement line was encoded. When a dictionary entry is dereferenced, the reference may no longer be used causing the dictionary entry to be made available for another entry. The new line may add new dictionary entries, but with the encode before the dereferencing, the new entry will be added to the end of the dictionary if there are no free slots.

It is desirable for new dictionary entries to use slots that may be freed with the old line being replaced. This will help the dictionary from growing larger then it needs to be. Therefore, the dereference call was moved to before the encode call. With this change, the results for encoder test #2 changed slightly, but only with respect to indexes of a couple of dictionary entries.

A previously undiscovered memory error was reported on encoder test #2 when running the memory test script. The problem occurred in the constant string dictionary with the allocation of the string pointers for the QString instances. While investigating this problem, another issue was discovered in the constant number dictionary, though this issue is much less serious and only results in wasted memory. The conclusion was that the information dictionary class (currently defined as a template) needs to be redesigned.

[commit f284a33ac8]

Tuesday, November 26, 2013

Program – Saved RPN Lists (Removal)

The translated RPN lists for program lines were being saved in the line information list, which also contains the offset of the line within the program code, the size of the line and the index to the error list if the line has an error. These lists were originally used as the source of the data for the program view and were also used to detect when lines changed (by comparing the translated RPN list of the new line with the saved RPN list).

The use of the RPN lists for the program view was removed when the program code array was implemented. With the decode routine, the program code of the lines can now be converted to an RPN list. The line change detection in the update line routine was modified to decode the code of the line for the new line being changed to an RPN list, which is compared to the translated RPN list of the new line. With the RPN lists in the line information list no longer being used, this variable was removed along with all references to it.

A problem was found when RPN lists were compared. The strings of tokens other than REM commands, REM operators and string constants (whose strings comparison must be case sensitive) should use a case insensitive string comparison. However, the case sensitivity argument was not supplied to the compare call, so the default comparison used was case sensitive. The correct argument was added.

[commit dd3e4b62ed]

Sunday, November 24, 2013

Program – Decoder

Before the internal code of a program line can be recreated, the program code needs to be decoded into an RPN list. Like the encoder, which is part of the program model class because it needs access to the dictionaries, the decoder will also be part of the program model.

The decode routine is given the line information of the line to decode containing the offset of the line within the program code and its size. A new RPN list is created and for each program word in the line, a new token is created and assigned the code and sub-code of the program word. If the code has an operand text function in its table entry (implying the code has an operand word), the operand text function is called to get the text for the token from the operand, which is assigned to the string of the token. The token is added to the RPN list. After all the words of the line are processed, a pointer to the RPN list is returned.

Like the encode routine, the decode routine is a private function within the program model class. To access recreated lines of the program, a new line text routine was added. This routine is given the index to the line and starts be retrieving the information for the line, which is passed to the decode routine. The pointer to the RPN list returned is passed to the recreate routine of the recreator instance (which was added to the program model class). The RPN list is deleted and the string returned from the recreate routine is returned.

The temporary check to prevent encoder test files from being used with the recreate output option (-to) was removed from the tester class. In the tester run routine for encoder test files after outputting the code of the program and the dictionary entries, if the recreate output option was selected, each line of the program is output using the line text routine.

The expected results files for the three encoder tests were created from the encoder test results files with the output of the program added to the end. All the encoder tests are recreated correctly. The test script and batch files were updated to also test the encoder test files with the recreate output option.

[commit 1f24a70152]

Saturday, November 23, 2013

Recreator – RPN Lists (Tagged)

The recreator is now complete and all of the initial set of commands (LET, PRINT, INPUT, and REM) are fully supported. The repository has been tagged v0.6.1 to mark this milestone. Some minor issues were also corrected for this commit including:

Corrected an expected results for the recreated translator test #12 (INPUT tests).
Reorganized the access functions in the recreator class.
Renamed the recreator class is empty access function to output is empty so as not to be confused with the stack is empty function.
Renamed the recreator class last access function to the more explicit output last character and removed the output string is empty check.
Corrected some comment formatting and added some missing comments.
Corrected formatting issues where spaces were incorrectly added instead of tab characters (problems are caused by QtCreator editor bugs where is doesn't always pay attention to the spacing/tab settings).
Added some missing FLAG option comments (details below).

The next major step is to integrate the recreator with the GUI, but in order to do this, the program code of lines needs to be decoded into an RPN list of tokens for the recreator. When a line is entered into the program, it will be recreated and the recreated text will replaced the entered text in the edit box. Preferences will be added to control the formatting of the recreated output, for example, if spaces should be added around operators, after commas, etc. The FLAG option comments show where checks for these options will be.

[commit 0cd7b84700]

Recreator – Colons

Colons between statements are handled by setting the colon sub-code of the last token of a statement. To support the recreation of colons, a check was added to the main recreator loop after the check for the parentheses sub-code that if the colon sub-code is set, a colon and a space is added to the output string.

This change caused a slight problem with the recreation of the remark operator if there is a colon just before the remark. Two spaces were being added in front of the "'" operator. To prevent this, a check was added after a non-empty output string check to also not add a space if the last character in the output string is already a space.

A new last access function was added to the recreator class that returns the last character in the output string or a null character if the output string is empty. The expected recreated outputs for translator test #16 (colon tests) were updated and are recreated correctly.

[commit fc90cf0e32]

Recreator – Remarks

There are two codes for remarks, the command code (for the REM command) and the operator code (for the "'" operator). The token contains the string of the remark. Generally, the keyword (REM or "'") along with the remark string is added to the output string. However, there were two issues to be handled.

Unlike all the other commands, no space is required after the REM command when followed by a letter, so a statement like "REMARK A Comment" is valid. The issue is for a REM statement entered as "remark a comment" in all lower case. When recreated, the result would have been the "REMark a comment" statement. So, if the first character of the remark string is lower case, the REM keyword is converted to lower case. This still won't work if something like "Remark A Comment" is entered.

The second issue involves the remark operator. A space is needed before the "'" operator if the command is not at the beginning of the line to provide some separation between the previous statement. To determine if the remark is at the beginning of the line, an is empty access function was added to the recreator class that returns whether the output string is empty (which implies this the beginning of the line if nothing was added yet).

A single rem recreate function was added to handle both remark codes and a pointer to this function was added to the remark code table entries. To test the lower case check, an all lower case remark statement was added to translator test #15 (REM tests). The expected recreated outputs for this test were updated and are recreated correctly.

[commit 03c73c3de7]

Friday, November 22, 2013

Recreator – INPUT Statements

Like PRINT statements, INPUT statements contain several codes including the input parse item (double, integer or string), input begin, input begin with string prompt, input assign reference (double, integer or string), and the input or input prompt command code. The input begin with string prompt and input command codes could also have the option sub-code set. The separator of the recreator class will be used between these codes to keep track of the separators between the input reference items.

As the codes of the INPUT statement are processed, the resulting INPUT statement is built up by adding to the string on top of the holding stack. The string of the current reference is added to the string being built with a separator in between. After each reference or prompt string, the separator is set to a comma (or a semicolon after the prompt string without the option sub-code). At the end of the statement, the INPUT or INPUT PROMPT keyword is added to the output string along with the built up string that is popped from the stack.

Implementation

The input begin string code follows the input prompt string expression, the string of which will be on top of the stack. The input prompt begin recreate function for the input begin string code sets the separator to a comma if the option sub-code is set, otherwise sets it to a semicolon.

The input assign recreate function contains a local string. If the separator is set, the string is set to the string of the reference that is popped from the stack. The separator is added to the string on top of the stack followed by the string of the reference. The separator is not set for the first reference of an INPUT statement, so no action is needed (the string of the reference is left on the stack). The separator is set to a comma for the next reference.

The input recreate function for both the INPUT and INPUT PROMPT code adds the command keyword to the output string. A space is added followed by the built up string that is popped from the stack. If the command code has the option sub-code set (for keeping cursor on the same line), a semicolon is added to the output string. The separator is cleared for the next statement.

The table class name access function with token pointer argument was modified to return the full name of the code. This includes a space and the second word of a code that has the two word option set along with a second name. This was needed for the INPUT PROMPT command that contains two words.

Pointers to the new input recreate functions were added to the table entries of the various input codes. The input parse item and input begin codes do not produce anything during recreation and their table entries were set to the pointer of the blank recreate function. The expected recreated outputs for translator test #12 (INPUT statements) were updated and are recreated correctly.

[commit 9c8631000f]

Saturday, November 16, 2013

Recreator – Unary Operator Problems

Many of the other translator tests were being recreated correctly including tests #7 (errors), #8 (more errors), #9 (semicolon errors), #10 (expression errors), #11 (temporary errors), and #14 (parser errors) once the expected results were updated as these did not have the not yet implemented INPUT and REM statements or colons. However, tests #13 (negative constants) and #17 (constants) were not recreated correctly due to problems involving unary operators.

A problem occurred when a negate unary operator preceded a numeric constant. When created, there was no space between the negate operator and the number. If this statement is translated again, the negate operator and the number become a negative constant, which is not the same, through technically equivalent. This will cause the line change detection in the program model to incorrectly detect a change. The unary operator recreate function was modified to also add a space after the unary operator if the operand begins with a digit or decimal point.

A problem occurred with the negate integer operator. The precedence of the negate integer was incorrectly set to 40 causing parentheses to be added incorrectly during recreation. The precedence should have been 48, the same as the negate double operator, so the table entry for was corrected.

A problem occurred when a unary operator followed a power (exponential) operator, a higher precedence operator. The operand was incorrectly surrounded with parentheses. This binary operator recreate function was modified to also check if the second operand is a unary operator then the operand is not surrounded by parentheses.

These corrections allowed tests #13 and #17 to be recreated correctly. The regression test script was also modified to not ignore white space when comparing to the expected results. Without this change, the first problem above was not detected. The memory test was already not ignoring white space. I'm not sure what the reason was for making the regression test ignore white space.

[commit cf67d09f36]

Recreator – PRINT Statements

There are several codes that make up a PRINT statement including print item (double, integer or string), comma, print function (TAB and SPC), semicolon (only at the end of the statement) and the print command. A recreate function was implemented for each of these codes. Because the PRINT statement is composed of several codes, the separator member variable of the recreator instance is used between the processing of these codes to keep track of separators between the print items.

As the codes of the PRINT statement are processed, the resulting PRINT statement is built up by adding to the string on top of the holding stack. Generally, the string for the current item is popped from the stack, a separator is added to the string on top of the stack, which contains previous items and the string of the current item is added to the string of the previous items that is on top of the stack. At the end of the statement, the PRINT keyword is added to the output string along with the built up string of the print items and separators that is popped from the stack.

Implementation

The print item recreate function contains a local string variable. If the separator is set from a previous print code, the string is set to it. If this separator is not a space, then a space is added after the separator. The separator is a space if the last print code was a comma (see below). The string on top of the stack is popped and added to the string. If the stack is now empty, then the string is pushed to the stack, otherwise the string is added to the string on top of the stack. The separator is set to a semicolon for the next item if there is one.

Spaces are normally added after a comma like a semicolon, but spaces are not added between multiple commas. The print comma recreate function contains a local string variable. If the holding stack is not empty, the string on top of the stack is popped into the local string. A comma is added to the string (which is empty if the stack was empty, like when there is a comma directly after the PRINT keyword). The local string is pushed to the stack. The separator is set to a space. A space is only added after the last consecutive comma.

The print function recreate function first calls the internal function recreate function (since print functions are translated the same as other internal functions), which will process the print function and its operand and leave the result on top of the stack. The print item recreate function is called to process the print function like any other item.

The semicolon code is only found at the end of a PRINT statement and replaces the print command code. The print semicolon recreate function pops the string from the holding stack, adds a semicolon, and pushes it back to the stack. The print recreate function is called to complete the PRINT statement.

The print recreate function adds the PRINT keyword to the output string. If the holding stack is not empty, a space is added to the output string, and the string is popped from the stack and added to the output string.

A new separator is set access function was added to the recreator to a specific character, which is used by the print item recreate function. Pointers to the new print recreate functions were added to the table entries of the various codes. The expected recreated outputs for translator test #6 (PRINT statements) were updated and now recreated correctly.

[commit 1d3a05f610]

Recreator – Interactive Testing

Up to now, the only way to test the recreator was by using the expression and translator test files with the batch test mode. An interactive recreator mode was not implemented since the recreator only supported expressions and creating separate modes for both expressions and commands was unnecessary. With support for commands (just assignments at the moment), an interactive mode for the recreator could be added.

Before implementing the interactive recreator mode, the translator, program unit and recreator instances (which were local variable in the tester class run routine), were changed to the member variables of the tester class. As local variables, it was necessary to pass references to them between the various tester routines, which defeated the purpose of having a class. The output stream is now given to the tester constructor, which is stored in a member variable so that it can be shared by all the class routines, instead of being an argument to the run routine and passed to the other routines.

The tester class was modified to support the new interactive recreator test mode, which is activated with the new "-tr" command line option.

The translate input routine was modified to accept a header string for the list of translated token output. If this header string is not used, then "Output:" is used as before. This routine was also modified to return the pointer to the RPN list if the header string is used. Otherwise, the RPN list is deleted as before and a null pointer is returned, which is also returned when the input line has an error.

The new recreate input routine was added, which starts be calling the translate input routine with the header set to the "Token:" string. If an RPN list is returned (no error detected), the RPN list is recreated and deleted. The recreated output is prefixed with the "Output:" string as its header.

[commit 2dc4b17e97] [commit 0ccfb105e7]

Monday, November 11, 2013

Recreator – Sub-String Assignments

Translated sub-string assignments (previously described on July 14) consist of the string reference, followed by the expressions of the arguments of the sub-string function with the assign sub-string code. This is the same form as when these functions appear in expressions, so sub-strings can be recreated the same way. This is not the case with multiple sub-string assignments where the sub-string assign codes are all at the end of the translated statement with special assign keep codes except for the final code.

At run time, after popping the value to assign and assignment the value, the assign keep codes push the value back to the stack for the next assignment. The final non-keep assignment just pops the value off of the stack. The recreation of multiple sub-string assignments works similarly. The assign keep code will pop the recreated string of the value expression to assign. A sub-string assign keep code recreates the sub-string as a function and push the result back to the stack. For all assign keep codes, the reference string is popped, the current separator is added followed by the value string. This string is pushed back to the stack.

For the final non-keep assign code, the LET keyword is appended to the output string if the assign code has the option sub-code. The value string is popped from the stack, which will also contain the other references in a multiple assignment statement. The string of the final reference is popped and appended to the output string. Finally the value string is appended to the output string.

Implementation

A separator character member variable was added to the recreator class with access functions for getting the separator character, checking if it is set (not the null character), clearing it (setting it to the null character), and setting it.

The new assign string recreate function contains a local string variable and starts by checking if the separator character is not set. An unset separator indicates the first and maybe only reference. For the first reference, the string is set to the assignment operator (an equal) surrounded by spaces and the separator is set to a comma. If the separator is set, then the string is set to it (which is a comma) plus a space. The string on top of the holding stack is popped and appended to the string.

If the assignment token has the sub-string flag set, then the code of the original sub-string function is obtained. This was accomplished by making the original sub-string function code the second associated code in all the table entries of the assign keep and non-keep sub-string codes. The name and the operand count is obtained for the original sub-string function code and the push with operands routine is called to process the arguments of the sub-string function. The resulting string is left on top of the holding stack.

If the assignment code is a keep code (determined if the second associated code index is zero, which is only the case for the keep codes), the value string with the separator and other references prefixed is appended to the string on top of the holding stack. Otherwise the assignment code is the end of the statement. The let recreate function is called to append the LET keyword if the option sub-code is set. The last reference is popped from the holding stack, appended to the output string, and the string with the rest of the references, separators and value is appended to the output string. The separator is cleared for the next command.

All of the translator assignment tests (#1 through #5) are all now recreated corrected except for a single PRINT statement in test #5 (since the PRINT recreation has not been implemented yet).

[commit b1c7dc5bce]

Sunday, November 10, 2013

Recreator – Assignments

Translated assignment statements consists of one (single assignment) or more (list assignment) variables or arrays (references) followed by the expression of the value to assign with an assignment code at the end, which may have the option sub-code indicating the optional LET keyword was entered at the beginning of the statement. When these statement are being recreated upon reaching the assignment code at the end, the strings of the references will be on the string holding stack (the first or only variable at the bottom) along with the string of the value expression (at the top of the stack).

The new assign recreate function handles the assign double, assign integer, assign double list, assign integer list and assign string list codes. The assign string code could be at the end of a mixed sub-string assignment statement and these will be handled differently (see July 14). The strings on the holding stack need to be appended to the output string in reverse order with the necessary separators between them (a comma between the references when there are more than one, and an equal between the last or only reference and the value expression).

To reverse the strings, the assign recreate function contains a local string stack and starts by popping the value expression string from the holding stack and pushing it to this local stack. A local separator string is set to the assignment operator (an equal) with surrounding spaces. The routine loops until the holding stack is empty. For each reference, the separator string is appended and the popped string is pushed to the local stack. The separator string is set to the comma with a space for the next reference.

A new let recreator support function was also added that checks if the assignment token has the option sub-code set and appends the LET keyword with a space to the output string if it does. This function will also be used for multiple mixed sub-string assignment statements. The assign recreate function calls this function and then loops until the local stack is empty appending each string popped to the output string.

A new recreator stack is empty access function was needed to determine if the holding stack is empty. This was implemented in the recreator class header file since it was only one line of code. Since the recreator top, top append, and append access functions are also only one line of code, their definitions were also moved to the header file. Finally, the pointer to the assign recreate function was added to the table entries of the above mentioned assignment codes.

The results for translator tests #1 to #5 (various assignment tests) were changed to the expected results (inputs with spaces added and upper case keywords). The statements in tests #1 (assignment tests) and #3 (data type assignments) are recreated correctly. Many of the statements in the other tests are also recreated correctly, but the ones that are not contain string and sub-string list assignments yet to be implemented.

[commit 97c666a9d3]

Saturday, November 9, 2013

Recreator – Expressions (Tagged)

Expressions are now fully supported by the recreator. All the expressions test recreate the original source correctly. The next step is to add support for the first round of commands (LET, PRINT, INPUT and REM). The first tag of the 0.6 development series was added, v0.6.0.

[commit 0dc1f0ebf5]

Recreator – Arrays and Functions

Supporting arrays and functions in the recreator is only necessary for the translator tests. So that arrays and functions are recreated properly for these tests, preliminary recreate functions were created for arrays, defined functions and user functions.

The array and function recreate functions get the name from the string of the token. An open parentheses is added to the name since it is not stored in the token. The name and the attached count in the RPN item is passed to the push with operands routine. The define function recreate function handles both define function with and without tokens, so an open parentheses is only added to the name if the attached count is not zero. Pointers to these functions were added to the preliminary table entries for the codes of these tokens.

All the expressions in expression test #3 (parenthetical tokens) now recreate the correct output.

[commit b6e8973352]

Translator – Arrays and Functions

A problem with recreating arrays and functions (user and define) is that the translator was not setting a code in these tokens. Preliminary table entries were added for the codes needed. These codes include an array, a define function with no parentheses, a define function with parentheses and a function. There will eventually be codes for each data type, but these four codes are sufficient for now.

In the get operand routine, for the define function with and with no parentheses token types, the appropriate token code is assigned. Since there is only a single code for each of these, the data type of the token is preserved because the table set token routine sets the data type of the token to the one in the table entry for the code.

The process parentheses token routine was modified to set the code of the token to the array code or the function code if the identifier starts with an 'F' character. This routine also processes define functions with parentheses tokens, so the code is not set for these tokens. Later in the routine when the RPN item is created, the attached count is no longer set to zero for array tokens.

The attached count of the RPN item can no longer be used to detect if there are attached tokens. The attached array pointer will instead be used for this detection by checking if the pointer is set to null. The RPN item text routine was updated to use the attached array pointer to detect attached tokens instead of the attached count. This routine was also changed to use member variables directly instead of using the access functions.

[commit e1919952fa]

Recreator – Internal Functions

The arguments (operands) of internal functions precede the code of the internal functions, so the strings of the operands will be on the string holding stack in reverse order when the internal function token is processed. The arguments of define and user functions work the same way as do the subscripts of arrays.

A generic push with operands routine was implemented taking the name string of the function or array and the count of operands, and contains a local string stack and a separator string initialized to a closing parentheses (the string on top of the stack will be the last operand). Looping to the count, the string of an operand is popped from the holding stack, the separator string is appended, this string is pushed to the local stack and the separator string is set to a comma and a space for the next operand (if there is one). The strings are popped from the local stack until empty and appended to the name. The name is pushed to the holding stack.

The push with operands routine also works with functions with no arguments (nothing is pushed to the local stack when the count is zero, so only the name is pushed to the holding stack). An internal function recreate function was added that gets the name and the operand count of the internal function code from the table and calls the push with operands routine. A pointer to this function was added to all of the internal function code table entries.

While trying expression test #3 (parenthetical tokens), there were blank invalid tokens. These occurred from the hidden conversion codes that were present in the first test expression. A blank recreate function was added that does nothing. A pointer to this function was added to the hidden integer and double conversion codes. Eventually, all codes will have a recreate function.

The expected results for expression test #3 and test #4 (internal functions) were set to the correct expected results (with appropriate spacing). All of the expressions in test #4 produce the correct output. Some of the expressions in test #3 do not produce the correct output because they contain arrays, define functions or user functions, which are not being handled yet.

[commit ac9ab4774d]

Recreator – Operand Recreate Function

The method of using an unset (null) recreate function pointer to indicate an operand is problematic during development. It is also possible that the recreate function for code is not yet implemented. Further, using a blank token string to indicate an unimplemented code is also a problem since this also could mean that the code does not produce any output (take for example the hidden conversion codes).

Therefore, an operand recreate function was implemented, which simply pushes the string of the token to the string holding stack. A pointer to this function was added to the constant and variable code table entries. The main recreate routine was modified to surround the string of a token code with no recreate function with question marks. A check was also needed for whether the code in the token is valid before retrieving its recreate function.

[commit 9734165b8b]

Recreator – Error Checks / Expression Mode

Some error checks were added to the recreator along with an expression test mode. The first check is to make sure the string holding stack is empty upon returning from the main recreate routine. Any items left on the stack are popped and appended to the output string prefixed by "NotEmpty" to indicate an error before returning.

However, the expression test mode left the resulting string on the holding stack. Therefore, a check was required for the expression test mode. An expression mode flag argument was added to the recreate routine. When set, the string on top of the holding stack is popped and appended to the output string. This is followed by the check for an empty stack.

An error check was also added to the pop routine to make sure the holding stack is not empty. If the stack is empty, the "<Empty>" string is returned to indicate an error.

The tester translate input routine was modified to pass the expression mode flag to the recreate routine and to just use the output string returned (it is no longer necessary to pop from the holding stack of the recreator for expression mode).

[commit 2980b89a24]

Thursday, November 7, 2013

Recreator – Unnecessary Parentheses

Parentheses in an expression control order of evaluation of the expression and are removed during the translation process. The parentheses are recreated by looking at the precedences of the operators. However, parentheses can be added to an expression that are not required, take the expression A+(B*C). The multiply has higher precedence than add, so the translation becomes A B C * +, which is the same as the expression A+B*C.

So that these unnecessary parentheses are recreated, the translator adds the parentheses sub-code of tokens. The above example expression is translated as A B C *')' +. The translator also allows for extra sets of parentheses; the expression A+((B*C)) is translated to A B C *')' ) +. A closing parentheses code is added when the previous token already has the parentheses sub-code. For a third pair of parentheses, the closing parentheses token gets the parentheses sub-code. And so on. At run-time, parentheses codes and sub-codes are ignored.

Implementation

A new parentheses recreate function was added, which calls the pop with parentheses function with true as the argument to add parentheses when popping the top string. The pop with parentheses was modified to optionally return the precedence value and unary operator flag from the item popped from the holding stack. The string with parentheses along with the precedence value and unary operator flag are then pushed back to the holding stack.

In the recreate routine after processing an item in the RPN list, a check was added for the parentheses sub-code and if set, calls the parentheses recreate function. The parentheses recreate function was also added to the table entry of the closing parentheses code.

During the testing these changes, a problem was found in the translator where the parentheses sub-code was not being added to a closing parentheses token (it incorrectly added another closing parentheses token to the output list). This was due to translator check pending parentheses routine checking if the top item of the done stack (the last item added to the output list) to see if it already had a parentheses sub-code. It should have been checking the last item in the output list instead (the closing parentheses tokens are not pushed to the done stack) .

A few additional expressions were added to expression test #2 (parentheses tests) for testing extra parentheses. Both the expected translated and recreated results were updated. All of the expressions in test #2 are now being recreated correctly.

[commit 385cc5925d]

Wednesday, November 6, 2013

Recreator – Parentheses (Unary Operators)

Handling the recreation of parentheses with unary operators is similar to that of binary operators except there is only one operand with another issue. As with binary operators, if the precedence of the unary operator is higher than the operand on top of the stack, parentheses should be added around the operand. However, parentheses should only be added if the operand on top of the holding stack is not another unary operator. Consider the expression and its translation:

-NOT A% A% NOT Neg%

When the Neg% operator is being processed, its precedence is higher than the NOT on top of the stack (actually the top contains the string "NOT A%" with the precedence of the NOT operator). With just the precedence check, parentheses would be added around the "NOT A%" expression, which is not correct. There needs to be an additional check to not add parentheses if the top item is a unary operator expression.

Implementation

A unary operator flag variable was added to the stack item to indicate if the holding stack item is a unary operator sub-expression. The push access function was modified to take an optional unary operator flag value that is pushed with the string and precedence. The default flag is set to false and is only set to true by the unary operator recreate function.

The unary operator recreate function was modified to get the precedence of the operator being processed from the table. The pop call of the operand was changed to call the new pop with parentheses with the argument set to whether the top stack item is not a unary operator and the operator precedence is higher than the precedence of the top stack item.

There were insufficient expressions in expression test #2 (parenthetical expressions) to test the various situations with unary operators, so several were added. The translated test expected results were also updated for the new expressions. The expressions that still do not match are due to unnecessary entered parentheses.

[commit a5951ecfc3]

Tuesday, November 5, 2013

Recreator – Parentheses (Binary Operators)

Parentheses are removed from expressions during translation. The binary operator recreate function needs to recreate the parentheses for operators depending on their precedences. Consider these expressions with their translations:

A * B + C * D (A + B) * (C + D)
A B * C D * + A B + C D + *

The translations of these two expressions have a similar form. The first expression will be recreated correctly since multiply is higher precedence than add. However the second expression without parentheses "A + B * C + D" does not mean the same thing since add is lower precedence than multiply making the parentheses required.

Parentheses are required around an operand if the precedence of the operator is higher than the operand. Parentheses are also required around the second operand if the precedence of the operator is the same as the operand since operators of the same precedence are processed from left to right.

Implementation

A precedence variable was added the stack item to hold the precedence of an operator sub-expression. The push access function was modified to take an optional precedence value that is pushed with the string. The default precedence is set to the highest precedence for when operands like constants and variables are pushed. A top access function was added so that the item on top of the string holding stack can be accessed.
A new pop with parentheses access function was added that takes a flag argument, which when set will surround the string operand popped from the string holding stack with parentheses when set.

The binary operator recreate function was modified to get the precedence of the operator being processed from the table. The pop call of the second operand was replaced with a call to the new pop with parentheses with the argument set to whether the operator precedence is higher than or the same as the precedence of the item on top of the stack. Similarly the first operand is popped with parentheses if the operator precedence is higher than the top item.

Instead of using the top append access function, the string of the operator expression is built in a local string. The string is first set to the second operand with parentheses if needed. The string is then set to first operand with parentheses if needed, plus a space, plus the operator name, plus another space plus the current value of the string with the second operand. Finally this string is pushed to the string holding stack with the precedence of the operator.

The expected outputs for expression test #2 (parenthetical expressions) were set to the inputs. Many of these expressions match the inputs since the precedence for binary operator is now being handled. The expressions that don't match involve unary operators (no precedence checking yet) and unnecessary entered parentheses.

[commit 694e224aee]

Sunday, November 3, 2013

Recreator – Simple Expressions (Implementation)

The recreator is a separate new class since the RPN list will already have been decoded from the program model (with dictionary lookups to change indexes back into there original names), so there is no need to access neither the program model or the dictionaries. The constructor does nothing more than set the member table instance reference. There are only two additional member variables, the string holding stack and the output string.

The output string will contain the recreated text of a program line and will be appended to as the RPN list is processed. Since the various recreate functions are outside of the recreator class, there is a single append access function for appending a string to the output string.

The holding stack temporarily contains strings as the RPN list is processed and is used to reverse the RPN list. It is defined as a QStack of the StackItem structure, which contains a string. The holding stack was not defined directly as a stack of strings because an additional item will be needed. There are several access functions including a push function to push a string onto the stack, a pop function to pop a string from the stack, and a top append function to append a string to the top string on the stack.

Besides the class member functions, the recreator source file also contains several general recreate functions which are outside the class so that their pointer can be put into in the table entries. These include the unary operator and binary operator recreate functions, which work as described in the previous post.

A recreate function type was added to the table entry structure with an access function. The recreate functions arguments include a reference to the recreator instance and a pointer to the RPN item from the RPN list. Recreate functions were added to all of the operator, constant, and variable codes. A constant string recreate function was implemented for the constant string code that adds the necessary double quotes.

A recreator instance reference argument was added to the tester class translate input routine. For expressions, the recreator does not return an output string (there is no command to pop the string of the final expression from the holding stack and append it to the output string. Instead of adding a special expression mode to the recreator to do this, and since the pop access function is public (for the recreate functions), the translate input routine gets the output string using this function.

The expressions in expression test #1 (simple expressions) are properly recreated and match the inputs except expectedly for spacing and case (lower case word operators are output in upper case). While the other expressions produce recreated output, the output is not correct because parentheses, internal functions, etc. are not yet supported.

[commit 4bdd513c2c]

Saturday, November 2, 2013

Recreator – Simple Expressions

The recreator will work similar to the run-time module. The run-time module will push operand values to a value stack to be popped by operators, functions and commands to perform some operation, which may push results back to the stack. The recreator will push the strings of the operands to a string holding stack. These strings will be popped by operators, functions and commands, which may push modified strings back to the stack or append the strings to the output string. This will make more sense by considering this statement and its translation:

A + B
A B +

At the A operand, the string "A" will be pushed to the holding stack. Same for the B operand with the "B" string being pushed. The plus operator will pop the second operand, the "B" string, from the stack. The string for the operator will be appended with surrounding spaces to the operand now on top of the stack, specifically the " + " string gets append to the "A" string leaving "A + " on top of the stack. The previously popped second operand is then appended to the top of the stack leaving the "A + B" string on top of the stack, which may be an operand for the next operator.

For unary operators, the string for the operator is created. A separator space is appended only for a word operator like NOT, otherwise no space is needed (for example the "-" unary operator). The string of the single operand on top of the stack is popped and appended to the operator string. The resulting string is pushed back to the stack.

For simple expressions, only support for operands (variables and constants) and operators are needed. For string constants some processing is needed: all double quotes in the string of the constant need to be changed to two doubles quotes, and the entire string needs to be surrounded by a pair of double quotes.

The table entry for each code will have a pointer to a recreate function. Since operands already have their string in the token of the item, this string just needs to be pushed to the string holding stack (with the exception of string constants). Since this is trivial, a null pointer will indicate that the string of the token should be pushed to the string holding stack. For string constants, there will be a constant string recreate function. Since all binary operators work the same, there only needs to be a single binary operator recreate function. Likewise for the unary operators.

The recreate process will loop through each item in the RPN list. If there is no recreate function, the string of the token is pushed onto the string holding stack. Otherwise the recreate function is called to process the token. At the end of the RPN list, the resulting output string is returned. For expression mode, nothing gets appended to the output list, so the tester class will need to pop the result string of the expression from the string holding stack.

Table – Simplified Instance Creation

The new recreator class will also require a reference to the single table instance like the other classes (parser, translator, and program model). The initialization of the table reference members of these classes was not consistent - some required the table instance reference to be passed to the constructor and others accessed the static table instance directly to initialize their member. The single table instance creation was also convoluted with several routines:

initialize - static table function to be called once (from main) to create the single instance and it made sure it was not called more than once or if the table entries had errors

has errors - static table function to return if the table has errors, used by the tester class to determine if the table entries had errors, outputs them and aborted the application

error list - static table function to return the list of error, used by the test class to output the table entry errors before aborting the application

setup and check - static table function called by initialize after creating the single table instance to setup and check the table entries

instance - static table function to return a reference to the single table instance, contains checks to make sure that initialize had been called and that the table entries had no errors; and for these errors would abort the application

The table constructor had an argument for the pointer to the table entry array and only set the table entry member. Most of these routines were unnecessary and the single table instance creation was modified for these routines:

instance - static table function to return the single table instance, and will create the single table instance upon the first call

constructor - modified to take both the pointer to the table entry array and the count of table entries, now does the setup and check functionality, and if there are errors, they are reported and the application is aborted

Now if there are table entry errors, the constructor aborts the application and no other class needs to check for table errors. This check was removed from the tester class. Previously when starting in GUI mode, there was no check for table errors. There is no longer a need to initialize the table in the main function or check to make sure it has been initialized.

The method the program model uses to initialize its table instance reference is the preferred method. The instance function is used to initialize the table instance reference member. The parser and translator classes were modified to use this method.

[commit 7a01ac428b]

Friday, November 1, 2013

Recreator – Testing

The recreator will take an RPN list as input. Consideration was given to how the recreator will be tested. There are already many tests for testing the translation of expressions and statements into RPN lists (the expression and translator test input files). Since these are available, they will also be used for testing the recreator.

To avoid having to duplicate these into "recreator" test files, they will be used as is with a new "-to" command line option to activate the recreator on the translator output. When this option is used instead of the "-t" option, the expression or translator input file will be translated as before, and then the RPN list will be passed to the recreator.

Since the recreator has not been implemented yet, to verify this code is working correctly, the same RPN text output is used, but prefixed with the "TEST:" string. Encoder test input files are not supported, which will be added once the recreator is working with the expression and translator tests.

All of the test scripts and Windows batch file were updated to run all the expression and translator test files with the new "-to" test option. Recreator output files have the ".out" extension to not conflict with the ".txt" extension used for the other test output files. Recreator development will now commence.

[commit 9b52fc6d83]

Saturday, November 30, 2013

Friday, November 29, 2013

Tuesday, November 26, 2013

Sunday, November 24, 2013

Saturday, November 23, 2013

Friday, November 22, 2013

Implementation

Saturday, November 16, 2013

Implementation

Monday, November 11, 2013

Implementation

Sunday, November 10, 2013

Saturday, November 9, 2013

Thursday, November 7, 2013

Implementation

Wednesday, November 6, 2013

Implementation

Tuesday, November 5, 2013

Implementation

Sunday, November 3, 2013

Saturday, November 2, 2013

Friday, November 1, 2013

Email

Source and Downloads

Labels

Blog Archive