Interactive BASIC Compiler Project

Tuesday, December 10, 2013

Edit Box – Program Unit Access

The edit class box needs access to the program model, specifically to the program unit currently opened in the instance of an edit box. Eventually when subroutines and functions are implemented, any one of them could will be opened with their own edit box. There could be several edit box instances opened at any given time. Right now there is just a single program unit, the main routine, which will be opened in a single edit box instance.

To allow an edit box instance access to its program unit, a new pointer to a program unit instance (program model class) was added to the constructor of the edit box class. This pointer is stored in a new program unit pointer member variable. The connection of the line changes signal (from the edit box class), and the error list changed signal (from the program unit) are now made in the constructor of the edit box instead of the constructor of the main window class.

The constructor of the main window class was modified to create the program model first (the program unit for the main routine) before creating the edit box instance, which now requires the pointer to program unit instance that it will be editing.

[commit aa2125609a]

Saturday, December 7, 2013

Program (Recreator) and GUI Integration

The recreator is fully integrated with the program model such that program lines can be converted back into text. When lines are entered into the program, the lines need to be recreated back to text and put into the edit box (the GUI), specifically into the text document of the edit box. Like the temporary program view (being used for debugging) is the viewer of the data held by the program model, the edit box is the viewer of the data contained in the document. The edit box also allows editing, so it is more than just a viewer.

The document of the edit box is really just the text representation of the program. The program model holds the actual data of the program. Ideally, the program model would be the document of the edit box and it would convert text to program code and back while editing. However, Qt does not have an abstract text document class from which a document sub-class could be built that would hold its data in another form like program code. The QTextDocument class is meant for text.

Alternatively, a new viewer could be designed that would allow all the text editing features (cut, copy, paste, undo, redo, etc.) like the QPlainTextEdit class that the edit box class is based on. Designing one would be quite an effort. Therefore, the edit box will the viewer for two data models at the same time, the text document (to allow text editing) and the program model (for holding the program code). The program model will be the master of the data, with the text document being updated as the program changes.

This implies that the edit box either own the program model with the program code or at least have easy access to it like via a pointer. The later approach will be used since the main window class will ultimately be the owner of the program. Eventually there will be a list of program models, one for the main routine and several for the subroutines and functions of the program. There will only be associated edit box instances when the main routine, subroutines or functions are open for editing.

Since the edit box will now have access to the program model, signals from the program model (for program changes) do not need to contain actual data. For instance, when a program line has changed, its recreated text is needed to update the text document. The signal could contain both the line number and text (already recreated). Looking at the edit box to document interface, when the document changes, only the position, number of characters removed and inserted are contained in the signal. The edit box must obtain the actual text changes by querying the document. So, when the program changes, only the line number will be sent and the edit box will request the recreated text from the program model.

Wednesday, December 4, 2013

Information Dictionaries – Improved Design

The design of the info dictionaries required the program model to create the instances for the additional info for the dictionary (in this case, the constant number and string dictionaries) and pass this instance for the creation of the dictionary. The program model owned and was responsible for these instances. This is possibly problematic because it did not guarantee that an additional info instance of the correct type was passed to the info dictionary.

The design was changed where new constant number and string classes derived from the base info dictionary class were added. In their constructors, the additional info instance is created of the correct type and they own the instance in the abstract info member pointer in the base class. A destructor was added to the base class to delete this instance, which required a virtual destructor in the abstract info class so that the derived info class destructor gets called.

While not needed until the run-time module is implemented, access functions for the arrays in the additional info of the constant string and number dictionaries were added. These functions simply call the access functions in the derived info classes. However, a type cast to the derived class is needed since the base class defines the additional info instance pointer as an abstract info class pointer.

[commit 451edd6346]

Saturday, November 30, 2013

New Information Dictionary – Implementation

The information dictionary class was changed to a normal class derived from the dictionary class. The constructor is given a pointer to the information class instance created outside of the dictionary. This pointer is saved in a pointer defined as an abstract information pointer, which can hold any information class pointer derived from the abstract class.

The add routine first adds the dictionary entry by calling the base dictionary class add routine with the token and case sensitivity option along with a pointer to the new entry flag so that it knows if a new entry was added, a removed entry was reused or an entry already exists. If a new entry was added, an element is added to the additional information by calling the add element interface function of the information instance. If the entry did not exist, the addition information is set from the token by calling the set element interface function of the information instance. The index is returned.

The remove routine first removes the reference to the dictionary entry by calling the base dictionary class remove routine for the index specified. If the entry was removed because it is no longer used, then the additional information for the element is cleared by calling the clear element interface function of the information instance. The base dictionary class remove routine was modified to return whether the entry was removed (made available to reused) or not.

The abstract information class defines the interface to the additional information. The functions are defined as virtual functions with no default functionality so that derived information classes do not implement a function that it does not need. The constant number and string information classes were changed from holding just a single element to being derived from the abstract class.

The constant number information class contains two vectors for the double and integer values. The add element function just extends the two vectors by one element. The set element function copies the token double and integer values into the respective vectors for the element specified. No clear element function was needed since there is nothing to clear. Two array access functions were implemented to access the data in the two vectors, which will be used at run-time.

The constant string information class contains a vector of string instance pointers. The add element function appends a pointer to a newly created string instance to the vector. The set element function copies the token string into the element specified. The clear element function clears the string for the element specified. Once the string instances are created, they will be reused if dictionary entries are removed. A destructor was implemented to delete all of the string instances. An array access function was implemented to access the data in the vector, which will be used at run-time.

Information instance pointers were added to the program model class. These instances are created in the constructor and passed to their associated information dictionaries. Both the constant number and string dictionaries are now information dictionaries. The information instances are deleted in the destructor. There are no longer any known memory issues.

[commit b9772d4149]

Information Dictionary – New Design

The original design of the information dictionary made the assumption that the additional information would be contained in a vector and was given a structure for the information. The definition was a class template where the information structure was the argument, which was put into the vector, which the information dictionary handled directly. However, in the case of the constant number dictionary, two vectors are needed so that memory is not wasted (see last post).

The details of the additional information need to be separated from the information dictionary. In other words, the information dictionary should not know (or assume) that the additional information is a vector. An abstract information class can be used that defines the interface to the additional information. The interface requires several functions for accessing the information:

add element - add a new element to the end of the additional information

set element - set an element from information in a token used when a new element is added or an element previously deleted is reused

clear element - clear the contents of an element when the dictionary entry is removed (made available for reuse)

The actual information classes are derived from the abstract class and implement these functions to manipulate their information as required, which could be stored as a vector, two vectors, or something completely different. The information dictionary has no knowledge of the information class internals and simply uses the interface functions.

The information dictionary class can be a normal class derived from the dictionary class containing a reference to the additional information. The abstract information interface functions are used to manipulate the additional information. The information dictionary needs re-implement these functions from the base dictionary class:

add - adds a new dictionary entry and additional information if not already in the dictionary and returns its index

remove - removes the additional information if the dictionary entry was removed

Friday, November 29, 2013

Information Dictionary Issues

The information dictionary class extended the base dictionary class by adding a vector for additional information and was implemented as a class template (see post from October 6). The additional information in the constant string dictionary contained a pointer to a string instance (see post from October 6).

The memory leak in the constant string dictionary was caused by how the information dictionary template and constant string information classes were implemented. The problem occurred when a string in an entry of the information vector was replaced with the same string. A new information instance was created with a new string pointer, which was put into the information vector, and the old string instance was lost (a memory leak).

When an old program line was dereferenced after the new replacement line was encoded, a string being replaced by the same string had its reference incremented in the dictionary from one to two by the encode, then the dereference decremented the count back to one. However, when the dereference was moved to after the encode, the reference count of the string went from one to zero and the dictionary entry was freed, but not the entry in the information vector. When the new line was encoded, a new string instance was created overwriting the old string instance pointer.

While this problem was not difficult to correct, another issue was discovered, this time with the constant number dictionary where its additional information consisted of a double value and an integer value contained in a structure (see post from October 6). Each double value was aligned on a double boundary (eight bytes) and because an integer is half of a double (four bytes), four bytes of padding is inserted by the compiler between each element in the vector (wasted memory).

The only way to correct this is to separate the two sets of values by having a double value vector and an integer value vector. Unfortunately, the information dictionary template class only allows for a single information vector. A new design is needed for information dictionaries.

Program – Dereferencing Replaced Lines

When a line is replaced, references to dictionary entries in the old line must be removed. This was taking place after the replacement line was encoded. When a dictionary entry is dereferenced, the reference may no longer be used causing the dictionary entry to be made available for another entry. The new line may add new dictionary entries, but with the encode before the dereferencing, the new entry will be added to the end of the dictionary if there are no free slots.

It is desirable for new dictionary entries to use slots that may be freed with the old line being replaced. This will help the dictionary from growing larger then it needs to be. Therefore, the dereference call was moved to before the encode call. With this change, the results for encoder test #2 changed slightly, but only with respect to indexes of a couple of dictionary entries.

A previously undiscovered memory error was reported on encoder test #2 when running the memory test script. The problem occurred in the constant string dictionary with the allocation of the string pointers for the QString instances. While investigating this problem, another issue was discovered in the constant number dictionary, though this issue is much less serious and only results in wasted memory. The conclusion was that the information dictionary class (currently defined as a template) needs to be redesigned.

[commit f284a33ac8]

Tuesday, November 26, 2013

Program – Saved RPN Lists (Removal)

The translated RPN lists for program lines were being saved in the line information list, which also contains the offset of the line within the program code, the size of the line and the index to the error list if the line has an error. These lists were originally used as the source of the data for the program view and were also used to detect when lines changed (by comparing the translated RPN list of the new line with the saved RPN list).

The use of the RPN lists for the program view was removed when the program code array was implemented. With the decode routine, the program code of the lines can now be converted to an RPN list. The line change detection in the update line routine was modified to decode the code of the line for the new line being changed to an RPN list, which is compared to the translated RPN list of the new line. With the RPN lists in the line information list no longer being used, this variable was removed along with all references to it.

A problem was found when RPN lists were compared. The strings of tokens other than REM commands, REM operators and string constants (whose strings comparison must be case sensitive) should use a case insensitive string comparison. However, the case sensitivity argument was not supplied to the compare call, so the default comparison used was case sensitive. The correct argument was added.

[commit dd3e4b62ed]

Sunday, November 24, 2013

Program – Decoder

Before the internal code of a program line can be recreated, the program code needs to be decoded into an RPN list. Like the encoder, which is part of the program model class because it needs access to the dictionaries, the decoder will also be part of the program model.

The decode routine is given the line information of the line to decode containing the offset of the line within the program code and its size. A new RPN list is created and for each program word in the line, a new token is created and assigned the code and sub-code of the program word. If the code has an operand text function in its table entry (implying the code has an operand word), the operand text function is called to get the text for the token from the operand, which is assigned to the string of the token. The token is added to the RPN list. After all the words of the line are processed, a pointer to the RPN list is returned.

Like the encode routine, the decode routine is a private function within the program model class. To access recreated lines of the program, a new line text routine was added. This routine is given the index to the line and starts be retrieving the information for the line, which is passed to the decode routine. The pointer to the RPN list returned is passed to the recreate routine of the recreator instance (which was added to the program model class). The RPN list is deleted and the string returned from the recreate routine is returned.

The temporary check to prevent encoder test files from being used with the recreate output option (-to) was removed from the tester class. In the tester run routine for encoder test files after outputting the code of the program and the dictionary entries, if the recreate output option was selected, each line of the program is output using the line text routine.

The expected results files for the three encoder tests were created from the encoder test results files with the output of the program added to the end. All the encoder tests are recreated correctly. The test script and batch files were updated to also test the encoder test files with the recreate output option.

[commit 1f24a70152]

Saturday, November 23, 2013

Recreator – RPN Lists (Tagged)

The recreator is now complete and all of the initial set of commands (LET, PRINT, INPUT, and REM) are fully supported. The repository has been tagged v0.6.1 to mark this milestone. Some minor issues were also corrected for this commit including:

Corrected an expected results for the recreated translator test #12 (INPUT tests).
Reorganized the access functions in the recreator class.
Renamed the recreator class is empty access function to output is empty so as not to be confused with the stack is empty function.
Renamed the recreator class last access function to the more explicit output last character and removed the output string is empty check.
Corrected some comment formatting and added some missing comments.
Corrected formatting issues where spaces were incorrectly added instead of tab characters (problems are caused by QtCreator editor bugs where is doesn't always pay attention to the spacing/tab settings).
Added some missing FLAG option comments (details below).

The next major step is to integrate the recreator with the GUI, but in order to do this, the program code of lines needs to be decoded into an RPN list of tokens for the recreator. When a line is entered into the program, it will be recreated and the recreated text will replaced the entered text in the edit box. Preferences will be added to control the formatting of the recreated output, for example, if spaces should be added around operators, after commas, etc. The FLAG option comments show where checks for these options will be.

[commit 0cd7b84700]

Recreator – Colons

Colons between statements are handled by setting the colon sub-code of the last token of a statement. To support the recreation of colons, a check was added to the main recreator loop after the check for the parentheses sub-code that if the colon sub-code is set, a colon and a space is added to the output string.

This change caused a slight problem with the recreation of the remark operator if there is a colon just before the remark. Two spaces were being added in front of the "'" operator. To prevent this, a check was added after a non-empty output string check to also not add a space if the last character in the output string is already a space.

A new last access function was added to the recreator class that returns the last character in the output string or a null character if the output string is empty. The expected recreated outputs for translator test #16 (colon tests) were updated and are recreated correctly.

[commit fc90cf0e32]

Recreator – Remarks

There are two codes for remarks, the command code (for the REM command) and the operator code (for the "'" operator). The token contains the string of the remark. Generally, the keyword (REM or "'") along with the remark string is added to the output string. However, there were two issues to be handled.

Unlike all the other commands, no space is required after the REM command when followed by a letter, so a statement like "REMARK A Comment" is valid. The issue is for a REM statement entered as "remark a comment" in all lower case. When recreated, the result would have been the "REMark a comment" statement. So, if the first character of the remark string is lower case, the REM keyword is converted to lower case. This still won't work if something like "Remark A Comment" is entered.

The second issue involves the remark operator. A space is needed before the "'" operator if the command is not at the beginning of the line to provide some separation between the previous statement. To determine if the remark is at the beginning of the line, an is empty access function was added to the recreator class that returns whether the output string is empty (which implies this the beginning of the line if nothing was added yet).

A single rem recreate function was added to handle both remark codes and a pointer to this function was added to the remark code table entries. To test the lower case check, an all lower case remark statement was added to translator test #15 (REM tests). The expected recreated outputs for this test were updated and are recreated correctly.

[commit 03c73c3de7]

Friday, November 22, 2013

Recreator – INPUT Statements

Like PRINT statements, INPUT statements contain several codes including the input parse item (double, integer or string), input begin, input begin with string prompt, input assign reference (double, integer or string), and the input or input prompt command code. The input begin with string prompt and input command codes could also have the option sub-code set. The separator of the recreator class will be used between these codes to keep track of the separators between the input reference items.

As the codes of the INPUT statement are processed, the resulting INPUT statement is built up by adding to the string on top of the holding stack. The string of the current reference is added to the string being built with a separator in between. After each reference or prompt string, the separator is set to a comma (or a semicolon after the prompt string without the option sub-code). At the end of the statement, the INPUT or INPUT PROMPT keyword is added to the output string along with the built up string that is popped from the stack.

Implementation

The input begin string code follows the input prompt string expression, the string of which will be on top of the stack. The input prompt begin recreate function for the input begin string code sets the separator to a comma if the option sub-code is set, otherwise sets it to a semicolon.

The input assign recreate function contains a local string. If the separator is set, the string is set to the string of the reference that is popped from the stack. The separator is added to the string on top of the stack followed by the string of the reference. The separator is not set for the first reference of an INPUT statement, so no action is needed (the string of the reference is left on the stack). The separator is set to a comma for the next reference.

The input recreate function for both the INPUT and INPUT PROMPT code adds the command keyword to the output string. A space is added followed by the built up string that is popped from the stack. If the command code has the option sub-code set (for keeping cursor on the same line), a semicolon is added to the output string. The separator is cleared for the next statement.

The table class name access function with token pointer argument was modified to return the full name of the code. This includes a space and the second word of a code that has the two word option set along with a second name. This was needed for the INPUT PROMPT command that contains two words.

Pointers to the new input recreate functions were added to the table entries of the various input codes. The input parse item and input begin codes do not produce anything during recreation and their table entries were set to the pointer of the blank recreate function. The expected recreated outputs for translator test #12 (INPUT statements) were updated and are recreated correctly.

[commit 9c8631000f]

Saturday, November 16, 2013

Recreator – Unary Operator Problems

Many of the other translator tests were being recreated correctly including tests #7 (errors), #8 (more errors), #9 (semicolon errors), #10 (expression errors), #11 (temporary errors), and #14 (parser errors) once the expected results were updated as these did not have the not yet implemented INPUT and REM statements or colons. However, tests #13 (negative constants) and #17 (constants) were not recreated correctly due to problems involving unary operators.

A problem occurred when a negate unary operator preceded a numeric constant. When created, there was no space between the negate operator and the number. If this statement is translated again, the negate operator and the number become a negative constant, which is not the same, through technically equivalent. This will cause the line change detection in the program model to incorrectly detect a change. The unary operator recreate function was modified to also add a space after the unary operator if the operand begins with a digit or decimal point.

A problem occurred with the negate integer operator. The precedence of the negate integer was incorrectly set to 40 causing parentheses to be added incorrectly during recreation. The precedence should have been 48, the same as the negate double operator, so the table entry for was corrected.

A problem occurred when a unary operator followed a power (exponential) operator, a higher precedence operator. The operand was incorrectly surrounded with parentheses. This binary operator recreate function was modified to also check if the second operand is a unary operator then the operand is not surrounded by parentheses.

These corrections allowed tests #13 and #17 to be recreated correctly. The regression test script was also modified to not ignore white space when comparing to the expected results. Without this change, the first problem above was not detected. The memory test was already not ignoring white space. I'm not sure what the reason was for making the regression test ignore white space.

[commit cf67d09f36]

Recreator – PRINT Statements

There are several codes that make up a PRINT statement including print item (double, integer or string), comma, print function (TAB and SPC), semicolon (only at the end of the statement) and the print command. A recreate function was implemented for each of these codes. Because the PRINT statement is composed of several codes, the separator member variable of the recreator instance is used between the processing of these codes to keep track of separators between the print items.

As the codes of the PRINT statement are processed, the resulting PRINT statement is built up by adding to the string on top of the holding stack. Generally, the string for the current item is popped from the stack, a separator is added to the string on top of the stack, which contains previous items and the string of the current item is added to the string of the previous items that is on top of the stack. At the end of the statement, the PRINT keyword is added to the output string along with the built up string of the print items and separators that is popped from the stack.

Implementation

The print item recreate function contains a local string variable. If the separator is set from a previous print code, the string is set to it. If this separator is not a space, then a space is added after the separator. The separator is a space if the last print code was a comma (see below). The string on top of the stack is popped and added to the string. If the stack is now empty, then the string is pushed to the stack, otherwise the string is added to the string on top of the stack. The separator is set to a semicolon for the next item if there is one.

Spaces are normally added after a comma like a semicolon, but spaces are not added between multiple commas. The print comma recreate function contains a local string variable. If the holding stack is not empty, the string on top of the stack is popped into the local string. A comma is added to the string (which is empty if the stack was empty, like when there is a comma directly after the PRINT keyword). The local string is pushed to the stack. The separator is set to a space. A space is only added after the last consecutive comma.

The print function recreate function first calls the internal function recreate function (since print functions are translated the same as other internal functions), which will process the print function and its operand and leave the result on top of the stack. The print item recreate function is called to process the print function like any other item.

The semicolon code is only found at the end of a PRINT statement and replaces the print command code. The print semicolon recreate function pops the string from the holding stack, adds a semicolon, and pushes it back to the stack. The print recreate function is called to complete the PRINT statement.

The print recreate function adds the PRINT keyword to the output string. If the holding stack is not empty, a space is added to the output string, and the string is popped from the stack and added to the output string.

A new separator is set access function was added to the recreator to a specific character, which is used by the print item recreate function. Pointers to the new print recreate functions were added to the table entries of the various codes. The expected recreated outputs for translator test #6 (PRINT statements) were updated and now recreated correctly.

[commit 1d3a05f610]

Recreator – Interactive Testing

Up to now, the only way to test the recreator was by using the expression and translator test files with the batch test mode. An interactive recreator mode was not implemented since the recreator only supported expressions and creating separate modes for both expressions and commands was unnecessary. With support for commands (just assignments at the moment), an interactive mode for the recreator could be added.

Before implementing the interactive recreator mode, the translator, program unit and recreator instances (which were local variable in the tester class run routine), were changed to the member variables of the tester class. As local variables, it was necessary to pass references to them between the various tester routines, which defeated the purpose of having a class. The output stream is now given to the tester constructor, which is stored in a member variable so that it can be shared by all the class routines, instead of being an argument to the run routine and passed to the other routines.

The tester class was modified to support the new interactive recreator test mode, which is activated with the new "-tr" command line option.

The translate input routine was modified to accept a header string for the list of translated token output. If this header string is not used, then "Output:" is used as before. This routine was also modified to return the pointer to the RPN list if the header string is used. Otherwise, the RPN list is deleted as before and a null pointer is returned, which is also returned when the input line has an error.

The new recreate input routine was added, which starts be calling the translate input routine with the header set to the "Token:" string. If an RPN list is returned (no error detected), the RPN list is recreated and deleted. The recreated output is prefixed with the "Output:" string as its header.

[commit 2dc4b17e97] [commit 0ccfb105e7]

Interactive BASIC Compiler Project

Tuesday, December 10, 2013

Edit Box – Program Unit Access

Saturday, December 7, 2013

Program (Recreator) and GUI Integration

Wednesday, December 4, 2013

Information Dictionaries – Improved Design

Saturday, November 30, 2013

New Information Dictionary – Implementation

Information Dictionary – New Design

Friday, November 29, 2013

Information Dictionary Issues

Program – Dereferencing Replaced Lines

Tuesday, November 26, 2013

Program – Saved RPN Lists (Removal)

Sunday, November 24, 2013

Program – Decoder

Saturday, November 23, 2013

Recreator – RPN Lists (Tagged)

Recreator – Colons

Recreator – Remarks

Friday, November 22, 2013

Recreator – INPUT Statements

Implementation

Saturday, November 16, 2013

Recreator – Unary Operator Problems

Recreator – PRINT Statements

Implementation

Recreator – Interactive Testing

Email

Source and Downloads

Labels

Blog Archive