The next addition to the program class is to handle the New program action. Previously, the clear function of the edit box was called. This function is part of the QPlainTextEdit base class that the edit box class is derived from. When this function was called, it generated a document changed signal, which the edit box processed and ended up clearing the program unit that was attached to the edit box. However, the program class should have this responsibility since there may not be an attached edit box.
The new action routine in the main window class was modified to call the new clear function in the program class. This new clear function calls a new clear function of the program unit member (a program model instance). Eventually, all of the program units of the program (for subroutines and functions) will be deleted with only the program unit for the main routine being cleared.
The new program model class clear function clears the line information list, program code vector, error list and all of the dictionaries. This required new clear functions be added to the dictionary and information dictionary classes. A new clear function was also added to the abstract information class, which was implemented for the constant number and string information classes. Finally, the clear function sends the new program cleared signal, which is connected to the edit box clear funtion.
The clear function was reimplemented in the edit box class. This was necessary so that a flag could be set before calling the clear function of the QPlainTextEdit base class. This flag indicates to the document changed slot of the edit box routine that the signal should be ignored as it didn't originate by editing the program. The recreating line flag, that is used to ignore document changes when lines are replaced with their recreated text, was used for this purpose. This flag was renamed to the more appropriate ignore change.
A minor problem was also corrected in the main window class. Previously, before a program was saved, the main window instance called the capture modified line routine of the edit box to make sure that any changes made to the current line are saved. This normally takes place when the cursor is moved from the line being changed. This check also needs to take place before the check to see if the program should be saved before it is cleared by the new action. This functionality was put into the new program capture edit changes routine since it is called from two different places.
[commit 8d6e5703c0]
Sunday, December 29, 2013
Tuesday, December 24, 2013
Program – Program Unit
Continuing with the transition to the new program class, the pointer to the program model was moved from the main window class to the program class and renamed to unit. When support for subroutines and functions is added, the single unit will become a list of units with the first one holding the main routine and the list growing for each subroutine and function added to the program. An access function for the program unit was also added to the program class.
[commit 5bc9d15f43]
[commit 5bc9d15f43]
Saturday, December 21, 2013
Program – New Program Class
Currently a program is loaded by the main window class and given to the edit box using the set plain text access function. This will set the text document of the edit box, which causes a document changed signal. When this signal is processed, the edit box updates the program unit that is attached. Because the text cursor is not valid during this operation, the recreated text from the program model cannot be inserted into the text document (the first of two issues mentioned in the last post).
One way to resolve this issue is to load the program directly into the program unit bypassing the edit box. The program generates signals that lines have been changed by line number. The edit box receives these signals, retrieves the recreated text for the lines from the program and puts them into the document. Special handling is needed for lines that contain errors since there would be no code for these lines to recreate text from.
Eventually, when support for subroutines and functions is added, a loaded program will consist of a number of program units, one for the main routine and one for each of the subroutines and functions. The program will consist of a list of program units. When a program is loaded, the load routine will create a new program unit for each routine or function. Likewise for the save routine, which accesses the list of program units.
The list of program units will be contained in a new program class. This class will contain the routines for loading and saving programs. The current program file path name will also be contained in this class. New source and header files were added for the new program class. To start simple, the new class only contains the program file path. Since the file path is included in the applications settings, save and restore settings routines were also implemented.
[commit 17f3e958ed]
One way to resolve this issue is to load the program directly into the program unit bypassing the edit box. The program generates signals that lines have been changed by line number. The edit box receives these signals, retrieves the recreated text for the lines from the program and puts them into the document. Special handling is needed for lines that contain errors since there would be no code for these lines to recreate text from.
Eventually, when support for subroutines and functions is added, a loaded program will consist of a number of program units, one for the main routine and one for each of the subroutines and functions. The program will consist of a list of program units. When a program is loaded, the load routine will create a new program unit for each routine or function. Likewise for the save routine, which accesses the list of program units.
The list of program units will be contained in a new program class. This class will contain the routines for loading and saving programs. The current program file path name will also be contained in this class. New source and header files were added for the new program class. To start simple, the new class only contains the program file path. Since the file path is included in the applications settings, save and restore settings routines were also implemented.
[commit 17f3e958ed]
Sunday, December 15, 2013
Edit Box – Recreated Line Replacement
When a program line is changed or inserted, the line should be recreated and the recreated text should replace the line entered in the edit box. To accomplish this, the program model class will send a signal with the line number of a line that is changed or inserted. The edit box will receive this signal, retrieve the recreated text for the line, and replace the text of the line with this recreated text.
The new program changed signal was added to the program model class. The update line routine was modified to send this signal when a line is changed or inserted. When a line is changed, the actual code of the line may not have changed, for example, if spaces were added or removed, or the case of keywords was changed. In this case, the program code will not be modified, but this signal still needs to be sent so that the edit box reflects the correct recreated program code.
The new program changed slot routine was added to the edit box class, which starts be retrieving the recreated text for the changed line. A text cursor is obtained for the edit box and its position is set to the beginning position of the block containing the line. The cursor is moved to the end of the block keeping the anchor at the beginning, which selects the entire line. The recreated text is inserted at the cursor and since text is selected, the selected text is replaced.
The program model line text routine was modified to return a null string if the line has an error. The program changed slot routine does not replace the text if the text is null indicating that the line has an error. This required the recreator class recreate routine to be modified where the output string is initialized to an empty string (a pair of double quotes) instead of being cleared. Clearing a string creates a null string, and a null string is not quite the same as an empty string (a null string is empty but an empty string is not null).
When text is replaced in the document using a text cursor, document changed and cursor moved signals are generated from the document. These signals need to be ignored when the line is being recreated, otherwise an infinite loop occurs because the document change signal updates the program, which generates another program changed signal, an so on. A flag was added to the edit box and is set before replacing text and cleared afterward. The document changed and cursor moved slot routines were modified to do nothing if this flag is set.
There are two unresolved issues resulting from these changes. The first issue occurs during the initial loading of a program. As a new program is being loaded, each line added to the program should be recreated to the edit box. However, this cannot occur because until the program has been loaded into the document of the edit box, the text cursor is not valid, so can't be used to replace text. The second issue occurs when a line is replaced with recreated text; extra undo commands are added to the undo stack.
[commit 4b34bd2dde]
The new program changed signal was added to the program model class. The update line routine was modified to send this signal when a line is changed or inserted. When a line is changed, the actual code of the line may not have changed, for example, if spaces were added or removed, or the case of keywords was changed. In this case, the program code will not be modified, but this signal still needs to be sent so that the edit box reflects the correct recreated program code.
The new program changed slot routine was added to the edit box class, which starts be retrieving the recreated text for the changed line. A text cursor is obtained for the edit box and its position is set to the beginning position of the block containing the line. The cursor is moved to the end of the block keeping the anchor at the beginning, which selects the entire line. The recreated text is inserted at the cursor and since text is selected, the selected text is replaced.
The program model line text routine was modified to return a null string if the line has an error. The program changed slot routine does not replace the text if the text is null indicating that the line has an error. This required the recreator class recreate routine to be modified where the output string is initialized to an empty string (a pair of double quotes) instead of being cleared. Clearing a string creates a null string, and a null string is not quite the same as an empty string (a null string is empty but an empty string is not null).
When text is replaced in the document using a text cursor, document changed and cursor moved signals are generated from the document. These signals need to be ignored when the line is being recreated, otherwise an infinite loop occurs because the document change signal updates the program, which generates another program changed signal, an so on. A flag was added to the edit box and is set before replacing text and cleared afterward. The document changed and cursor moved slot routines were modified to do nothing if this flag is set.
There are two unresolved issues resulting from these changes. The first issue occurs during the initial loading of a program. As a new program is being loaded, each line added to the program should be recreated to the edit box. However, this cannot occur because until the program has been loaded into the document of the edit box, the text cursor is not valid, so can't be used to replace text. The second issue occurs when a line is replaced with recreated text; extra undo commands are added to the undo stack.
[commit 4b34bd2dde]
Saturday, December 14, 2013
Edit Box – Lines Changed Signal
The edit box was using the lines changed signal to notify the program model instance when lines changed in the document that included the starting line number, the number of lines deleted and the number of lines inserted and the text of the lines. The program model update slot routine was connected to and processed this signal. Since the edit box can access the program model directly, this signal is not needed. The tester class is already accessing this routine directly.
The lines changed signal was removed and the two emits of this signal were changed to calling the update routine directly. The update routine was changed from a public slot to a normal public member function in the program model class. A few comments were added and word 'slot' was added to the comments of all the slot functions so that these would be easier to identify.
[commit c864bdde07]
The lines changed signal was removed and the two emits of this signal were changed to calling the update routine directly. The update routine was changed from a public slot to a normal public member function in the program model class. A few comments were added and word 'slot' was added to the comments of all the slot functions so that these would be easier to identify.
[commit c864bdde07]
Program – Error List Handling
The program model class was keeping a list of any errors detected when lines are translated. After a group of line changes were processed, if the list of errors changed, the entire error list was sent to the edit box via a signal. The edit box stored this list of errors and used it to generate its extra selection list, which is used to highlight the errors. Since the edit box now has access to the program model, it was no longer necessary for the edit box to keep a copy of this list.
So that the edit box can keep its extra selection list up to date, the program model was modified to send signals for when an error item has been inserted, changed, or removed. The connected edit box slots will update the extra selection list accordingly. When the program model is done updating the error list, it send a signal that the error list has changed. The connected edit box slot sets the extra selections to the base QPlainTextEdit class from updated extra selection list. The edit box no longer scans through the error list to generate its extra selection list.
The functionality for finding the next or previous error from the current cursor location was moved from the edit box class to the program model class. The current line number and column are passed to these routines, which are set to the next error line number and column upon returning along with a flag of whether the end or beginning of the program was passed so that a message can be issued.
A new routine was added to the program model class to handle when the current line is edited and contains an error. The error is shifted if the edit takes place before the error, or deleted if the edit takes place within the error. If the error list changes, the appropriate signals are sent to the edit box. This routine was not made a slot since the edit box can call it directly.
During the initial loading of the program, the text cursor in the edit box is not valid. Any errors in the program are sent as inserted error signals. Since the text cursor is not valid, the extra selections cannot be created (each extra selection contains a format and a cursor, which needs to be set to a valid cursor). The errors are temporarily saved in a list. After the text cursor becomes valid, extra selections are created from this list (which is then cleared).
The error list class used by the program model to hold the list of errors previously kept track of the first and last index affected by changes to the list. These indexes were used by the edit box to maintain its extra selection list. With the new direct change signals, these indexes are no longer needed, so the error list was changed to having a simple changed flag. The error list changed signal is only sent when this flag is set. The has changed access function was modified to clear this flag after it is read so a separate reset function is not needed.
Finally, the main window class status bar update slot was modified to receive the error message by an argument in the signal instead of it retrieving the message for the current line from the edit box, which retrieved the message from its copy of the error list. The routine sending this signal in the edit box class was modified to send the error message, which is retrieved from the program model via its list of errors.
[commit 19c5f06d0c]
So that the edit box can keep its extra selection list up to date, the program model was modified to send signals for when an error item has been inserted, changed, or removed. The connected edit box slots will update the extra selection list accordingly. When the program model is done updating the error list, it send a signal that the error list has changed. The connected edit box slot sets the extra selections to the base QPlainTextEdit class from updated extra selection list. The edit box no longer scans through the error list to generate its extra selection list.
The functionality for finding the next or previous error from the current cursor location was moved from the edit box class to the program model class. The current line number and column are passed to these routines, which are set to the next error line number and column upon returning along with a flag of whether the end or beginning of the program was passed so that a message can be issued.
A new routine was added to the program model class to handle when the current line is edited and contains an error. The error is shifted if the edit takes place before the error, or deleted if the edit takes place within the error. If the error list changes, the appropriate signals are sent to the edit box. This routine was not made a slot since the edit box can call it directly.
During the initial loading of the program, the text cursor in the edit box is not valid. Any errors in the program are sent as inserted error signals. Since the text cursor is not valid, the extra selections cannot be created (each extra selection contains a format and a cursor, which needs to be set to a valid cursor). The errors are temporarily saved in a list. After the text cursor becomes valid, extra selections are created from this list (which is then cleared).
The error list class used by the program model to hold the list of errors previously kept track of the first and last index affected by changes to the list. These indexes were used by the edit box to maintain its extra selection list. With the new direct change signals, these indexes are no longer needed, so the error list was changed to having a simple changed flag. The error list changed signal is only sent when this flag is set. The has changed access function was modified to clear this flag after it is read so a separate reset function is not needed.
Finally, the main window class status bar update slot was modified to receive the error message by an argument in the signal instead of it retrieving the message for the current line from the edit box, which retrieved the message from its copy of the error list. The routine sending this signal in the edit box class was modified to send the error message, which is retrieved from the program model via its list of errors.
[commit 19c5f06d0c]
Tuesday, December 10, 2013
Edit Box – Program Unit Access
The edit class box needs access to the program model, specifically to the program unit currently opened in the instance of an edit box. Eventually when subroutines and functions are implemented, any one of them could will be opened with their own edit box. There could be several edit box instances opened at any given time. Right now there is just a single program unit, the main routine, which will be opened in a single edit box instance.
To allow an edit box instance access to its program unit, a new pointer to a program unit instance (program model class) was added to the constructor of the edit box class. This pointer is stored in a new program unit pointer member variable. The connection of the line changes signal (from the edit box class), and the error list changed signal (from the program unit) are now made in the constructor of the edit box instead of the constructor of the main window class.
The constructor of the main window class was modified to create the program model first (the program unit for the main routine) before creating the edit box instance, which now requires the pointer to program unit instance that it will be editing.
[commit aa2125609a]
To allow an edit box instance access to its program unit, a new pointer to a program unit instance (program model class) was added to the constructor of the edit box class. This pointer is stored in a new program unit pointer member variable. The connection of the line changes signal (from the edit box class), and the error list changed signal (from the program unit) are now made in the constructor of the edit box instead of the constructor of the main window class.
The constructor of the main window class was modified to create the program model first (the program unit for the main routine) before creating the edit box instance, which now requires the pointer to program unit instance that it will be editing.
[commit aa2125609a]
Saturday, December 7, 2013
Program (Recreator) and GUI Integration
The recreator is fully integrated with the program model such that program lines can be converted back into text. When lines are entered into the program, the lines need to be recreated back to text and put into the edit box (the GUI), specifically into the text document of the edit box. Like the temporary program view (being used for debugging) is the viewer of the data held by the program model, the edit box is the viewer of the data contained in the document. The edit box also allows editing, so it is more than just a viewer.
The document of the edit box is really just the text representation of the program. The program model holds the actual data of the program. Ideally, the program model would be the document of the edit box and it would convert text to program code and back while editing. However, Qt does not have an abstract text document class from which a document sub-class could be built that would hold its data in another form like program code. The QTextDocument class is meant for text.
Alternatively, a new viewer could be designed that would allow all the text editing features (cut, copy, paste, undo, redo, etc.) like the QPlainTextEdit class that the edit box class is based on. Designing one would be quite an effort. Therefore, the edit box will the viewer for two data models at the same time, the text document (to allow text editing) and the program model (for holding the program code). The program model will be the master of the data, with the text document being updated as the program changes.
This implies that the edit box either own the program model with the program code or at least have easy access to it like via a pointer. The later approach will be used since the main window class will ultimately be the owner of the program. Eventually there will be a list of program models, one for the main routine and several for the subroutines and functions of the program. There will only be associated edit box instances when the main routine, subroutines or functions are open for editing.
Since the edit box will now have access to the program model, signals from the program model (for program changes) do not need to contain actual data. For instance, when a program line has changed, its recreated text is needed to update the text document. The signal could contain both the line number and text (already recreated). Looking at the edit box to document interface, when the document changes, only the position, number of characters removed and inserted are contained in the signal. The edit box must obtain the actual text changes by querying the document. So, when the program changes, only the line number will be sent and the edit box will request the recreated text from the program model.
The document of the edit box is really just the text representation of the program. The program model holds the actual data of the program. Ideally, the program model would be the document of the edit box and it would convert text to program code and back while editing. However, Qt does not have an abstract text document class from which a document sub-class could be built that would hold its data in another form like program code. The QTextDocument class is meant for text.
Alternatively, a new viewer could be designed that would allow all the text editing features (cut, copy, paste, undo, redo, etc.) like the QPlainTextEdit class that the edit box class is based on. Designing one would be quite an effort. Therefore, the edit box will the viewer for two data models at the same time, the text document (to allow text editing) and the program model (for holding the program code). The program model will be the master of the data, with the text document being updated as the program changes.
This implies that the edit box either own the program model with the program code or at least have easy access to it like via a pointer. The later approach will be used since the main window class will ultimately be the owner of the program. Eventually there will be a list of program models, one for the main routine and several for the subroutines and functions of the program. There will only be associated edit box instances when the main routine, subroutines or functions are open for editing.
Since the edit box will now have access to the program model, signals from the program model (for program changes) do not need to contain actual data. For instance, when a program line has changed, its recreated text is needed to update the text document. The signal could contain both the line number and text (already recreated). Looking at the edit box to document interface, when the document changes, only the position, number of characters removed and inserted are contained in the signal. The edit box must obtain the actual text changes by querying the document. So, when the program changes, only the line number will be sent and the edit box will request the recreated text from the program model.
Wednesday, December 4, 2013
Information Dictionaries – Improved Design
The design of the info dictionaries required the program model to create the instances for the additional info for the dictionary (in this case, the constant number and string dictionaries) and pass this instance for the creation of the dictionary. The program model owned and was responsible for these instances. This is possibly problematic because it did not guarantee that an additional info instance of the correct type was passed to the info dictionary.
The design was changed where new constant number and string classes derived from the base info dictionary class were added. In their constructors, the additional info instance is created of the correct type and they own the instance in the abstract info member pointer in the base class. A destructor was added to the base class to delete this instance, which required a virtual destructor in the abstract info class so that the derived info class destructor gets called.
While not needed until the run-time module is implemented, access functions for the arrays in the additional info of the constant string and number dictionaries were added. These functions simply call the access functions in the derived info classes. However, a type cast to the derived class is needed since the base class defines the additional info instance pointer as an abstract info class pointer.
[commit 451edd6346]
The design was changed where new constant number and string classes derived from the base info dictionary class were added. In their constructors, the additional info instance is created of the correct type and they own the instance in the abstract info member pointer in the base class. A destructor was added to the base class to delete this instance, which required a virtual destructor in the abstract info class so that the derived info class destructor gets called.
While not needed until the run-time module is implemented, access functions for the arrays in the additional info of the constant string and number dictionaries were added. These functions simply call the access functions in the derived info classes. However, a type cast to the derived class is needed since the base class defines the additional info instance pointer as an abstract info class pointer.
[commit 451edd6346]
Saturday, November 30, 2013
New Information Dictionary – Implementation
The information dictionary class was changed to a normal class derived from the dictionary class. The constructor is given a pointer to the information class instance created outside of the dictionary. This pointer is saved in a pointer defined as an abstract information pointer, which can hold any information class pointer derived from the abstract class.
The add routine first adds the dictionary entry by calling the base dictionary class add routine with the token and case sensitivity option along with a pointer to the new entry flag so that it knows if a new entry was added, a removed entry was reused or an entry already exists. If a new entry was added, an element is added to the additional information by calling the add element interface function of the information instance. If the entry did not exist, the addition information is set from the token by calling the set element interface function of the information instance. The index is returned.
The remove routine first removes the reference to the dictionary entry by calling the base dictionary class remove routine for the index specified. If the entry was removed because it is no longer used, then the additional information for the element is cleared by calling the clear element interface function of the information instance. The base dictionary class remove routine was modified to return whether the entry was removed (made available to reused) or not.
The abstract information class defines the interface to the additional information. The functions are defined as virtual functions with no default functionality so that derived information classes do not implement a function that it does not need. The constant number and string information classes were changed from holding just a single element to being derived from the abstract class.
The constant number information class contains two vectors for the double and integer values. The add element function just extends the two vectors by one element. The set element function copies the token double and integer values into the respective vectors for the element specified. No clear element function was needed since there is nothing to clear. Two array access functions were implemented to access the data in the two vectors, which will be used at run-time.
The constant string information class contains a vector of string instance pointers. The add element function appends a pointer to a newly created string instance to the vector. The set element function copies the token string into the element specified. The clear element function clears the string for the element specified. Once the string instances are created, they will be reused if dictionary entries are removed. A destructor was implemented to delete all of the string instances. An array access function was implemented to access the data in the vector, which will be used at run-time.
Information instance pointers were added to the program model class. These instances are created in the constructor and passed to their associated information dictionaries. Both the constant number and string dictionaries are now information dictionaries. The information instances are deleted in the destructor. There are no longer any known memory issues.
[commit b9772d4149]
The add routine first adds the dictionary entry by calling the base dictionary class add routine with the token and case sensitivity option along with a pointer to the new entry flag so that it knows if a new entry was added, a removed entry was reused or an entry already exists. If a new entry was added, an element is added to the additional information by calling the add element interface function of the information instance. If the entry did not exist, the addition information is set from the token by calling the set element interface function of the information instance. The index is returned.
The remove routine first removes the reference to the dictionary entry by calling the base dictionary class remove routine for the index specified. If the entry was removed because it is no longer used, then the additional information for the element is cleared by calling the clear element interface function of the information instance. The base dictionary class remove routine was modified to return whether the entry was removed (made available to reused) or not.
The abstract information class defines the interface to the additional information. The functions are defined as virtual functions with no default functionality so that derived information classes do not implement a function that it does not need. The constant number and string information classes were changed from holding just a single element to being derived from the abstract class.
The constant number information class contains two vectors for the double and integer values. The add element function just extends the two vectors by one element. The set element function copies the token double and integer values into the respective vectors for the element specified. No clear element function was needed since there is nothing to clear. Two array access functions were implemented to access the data in the two vectors, which will be used at run-time.
The constant string information class contains a vector of string instance pointers. The add element function appends a pointer to a newly created string instance to the vector. The set element function copies the token string into the element specified. The clear element function clears the string for the element specified. Once the string instances are created, they will be reused if dictionary entries are removed. A destructor was implemented to delete all of the string instances. An array access function was implemented to access the data in the vector, which will be used at run-time.
Information instance pointers were added to the program model class. These instances are created in the constructor and passed to their associated information dictionaries. Both the constant number and string dictionaries are now information dictionaries. The information instances are deleted in the destructor. There are no longer any known memory issues.
[commit b9772d4149]
Information Dictionary – New Design
The original design of the information dictionary made the assumption that the additional information would be contained in a vector and was given a structure for the information. The definition was a class template where the information structure was the argument, which was put into the vector, which the information dictionary handled directly. However, in the case of the constant number dictionary, two vectors are needed so that memory is not wasted (see last post).
The details of the additional information need to be separated from the information dictionary. In other words, the information dictionary should not know (or assume) that the additional information is a vector. An abstract information class can be used that defines the interface to the additional information. The interface requires several functions for accessing the information:
The information dictionary class can be a normal class derived from the dictionary class containing a reference to the additional information. The abstract information interface functions are used to manipulate the additional information. The information dictionary needs re-implement these functions from the base dictionary class:
The details of the additional information need to be separated from the information dictionary. In other words, the information dictionary should not know (or assume) that the additional information is a vector. An abstract information class can be used that defines the interface to the additional information. The interface requires several functions for accessing the information:
add element - add a new element to the end of the additional informationThe actual information classes are derived from the abstract class and implement these functions to manipulate their information as required, which could be stored as a vector, two vectors, or something completely different. The information dictionary has no knowledge of the information class internals and simply uses the interface functions.
set element - set an element from information in a token used when a new element is added or an element previously deleted is reused
clear element - clear the contents of an element when the dictionary entry is removed (made available for reuse)
The information dictionary class can be a normal class derived from the dictionary class containing a reference to the additional information. The abstract information interface functions are used to manipulate the additional information. The information dictionary needs re-implement these functions from the base dictionary class:
add - adds a new dictionary entry and additional information if not already in the dictionary and returns its index
remove - removes the additional information if the dictionary entry was removed
Friday, November 29, 2013
Information Dictionary Issues
The information dictionary class extended the base dictionary class by adding a vector for additional information and was implemented as a class template (see post from October 6). The additional information in the constant string dictionary contained a pointer to a string instance (see post from October 6).
The memory leak in the constant string dictionary was caused by how the information dictionary template and constant string information classes were implemented. The problem occurred when a string in an entry of the information vector was replaced with the same string. A new information instance was created with a new string pointer, which was put into the information vector, and the old string instance was lost (a memory leak).
When an old program line was dereferenced after the new replacement line was encoded, a string being replaced by the same string had its reference incremented in the dictionary from one to two by the encode, then the dereference decremented the count back to one. However, when the dereference was moved to after the encode, the reference count of the string went from one to zero and the dictionary entry was freed, but not the entry in the information vector. When the new line was encoded, a new string instance was created overwriting the old string instance pointer.
While this problem was not difficult to correct, another issue was discovered, this time with the constant number dictionary where its additional information consisted of a double value and an integer value contained in a structure (see post from October 6). Each double value was aligned on a double boundary (eight bytes) and because an integer is half of a double (four bytes), four bytes of padding is inserted by the compiler between each element in the vector (wasted memory).
The only way to correct this is to separate the two sets of values by having a double value vector and an integer value vector. Unfortunately, the information dictionary template class only allows for a single information vector. A new design is needed for information dictionaries.
The memory leak in the constant string dictionary was caused by how the information dictionary template and constant string information classes were implemented. The problem occurred when a string in an entry of the information vector was replaced with the same string. A new information instance was created with a new string pointer, which was put into the information vector, and the old string instance was lost (a memory leak).
When an old program line was dereferenced after the new replacement line was encoded, a string being replaced by the same string had its reference incremented in the dictionary from one to two by the encode, then the dereference decremented the count back to one. However, when the dereference was moved to after the encode, the reference count of the string went from one to zero and the dictionary entry was freed, but not the entry in the information vector. When the new line was encoded, a new string instance was created overwriting the old string instance pointer.
While this problem was not difficult to correct, another issue was discovered, this time with the constant number dictionary where its additional information consisted of a double value and an integer value contained in a structure (see post from October 6). Each double value was aligned on a double boundary (eight bytes) and because an integer is half of a double (four bytes), four bytes of padding is inserted by the compiler between each element in the vector (wasted memory).
The only way to correct this is to separate the two sets of values by having a double value vector and an integer value vector. Unfortunately, the information dictionary template class only allows for a single information vector. A new design is needed for information dictionaries.
Program – Dereferencing Replaced Lines
When a line is replaced, references to dictionary entries in the old line must be removed. This was taking place after the replacement line was encoded. When a dictionary entry is dereferenced, the reference may no longer be used causing the dictionary entry to be made available for another entry. The new line may add new dictionary entries, but with the encode before the dereferencing, the new entry will be added to the end of the dictionary if there are no free slots.
It is desirable for new dictionary entries to use slots that may be freed with the old line being replaced. This will help the dictionary from growing larger then it needs to be. Therefore, the dereference call was moved to before the encode call. With this change, the results for encoder test #2 changed slightly, but only with respect to indexes of a couple of dictionary entries.
A previously undiscovered memory error was reported on encoder test #2 when running the memory test script. The problem occurred in the constant string dictionary with the allocation of the string pointers for the QString instances. While investigating this problem, another issue was discovered in the constant number dictionary, though this issue is much less serious and only results in wasted memory. The conclusion was that the information dictionary class (currently defined as a template) needs to be redesigned.
[commit f284a33ac8]
It is desirable for new dictionary entries to use slots that may be freed with the old line being replaced. This will help the dictionary from growing larger then it needs to be. Therefore, the dereference call was moved to before the encode call. With this change, the results for encoder test #2 changed slightly, but only with respect to indexes of a couple of dictionary entries.
A previously undiscovered memory error was reported on encoder test #2 when running the memory test script. The problem occurred in the constant string dictionary with the allocation of the string pointers for the QString instances. While investigating this problem, another issue was discovered in the constant number dictionary, though this issue is much less serious and only results in wasted memory. The conclusion was that the information dictionary class (currently defined as a template) needs to be redesigned.
[commit f284a33ac8]
Tuesday, November 26, 2013
Program – Saved RPN Lists (Removal)
The translated RPN lists for program lines were being saved in the line information list, which also contains the offset of the line within the program code, the size of the line and the index to the error list if the line has an error. These lists were originally used as the source of the data for the program view and were also used to detect when lines changed (by comparing the translated RPN list of the new line with the saved RPN list).
The use of the RPN lists for the program view was removed when the program code array was implemented. With the decode routine, the program code of the lines can now be converted to an RPN list. The line change detection in the update line routine was modified to decode the code of the line for the new line being changed to an RPN list, which is compared to the translated RPN list of the new line. With the RPN lists in the line information list no longer being used, this variable was removed along with all references to it.
A problem was found when RPN lists were compared. The strings of tokens other than REM commands, REM operators and string constants (whose strings comparison must be case sensitive) should use a case insensitive string comparison. However, the case sensitivity argument was not supplied to the compare call, so the default comparison used was case sensitive. The correct argument was added.
[commit dd3e4b62ed]
The use of the RPN lists for the program view was removed when the program code array was implemented. With the decode routine, the program code of the lines can now be converted to an RPN list. The line change detection in the update line routine was modified to decode the code of the line for the new line being changed to an RPN list, which is compared to the translated RPN list of the new line. With the RPN lists in the line information list no longer being used, this variable was removed along with all references to it.
A problem was found when RPN lists were compared. The strings of tokens other than REM commands, REM operators and string constants (whose strings comparison must be case sensitive) should use a case insensitive string comparison. However, the case sensitivity argument was not supplied to the compare call, so the default comparison used was case sensitive. The correct argument was added.
[commit dd3e4b62ed]
Sunday, November 24, 2013
Program – Decoder
Before the internal code of a program line can be recreated, the program code needs to be decoded into an RPN list. Like the encoder, which is part of the program model class because it needs access to the dictionaries, the decoder will also be part of the program model.
The decode routine is given the line information of the line to decode containing the offset of the line within the program code and its size. A new RPN list is created and for each program word in the line, a new token is created and assigned the code and sub-code of the program word. If the code has an operand text function in its table entry (implying the code has an operand word), the operand text function is called to get the text for the token from the operand, which is assigned to the string of the token. The token is added to the RPN list. After all the words of the line are processed, a pointer to the RPN list is returned.
Like the encode routine, the decode routine is a private function within the program model class. To access recreated lines of the program, a new line text routine was added. This routine is given the index to the line and starts be retrieving the information for the line, which is passed to the decode routine. The pointer to the RPN list returned is passed to the recreate routine of the recreator instance (which was added to the program model class). The RPN list is deleted and the string returned from the recreate routine is returned.
The temporary check to prevent encoder test files from being used with the recreate output option (-to) was removed from the tester class. In the tester run routine for encoder test files after outputting the code of the program and the dictionary entries, if the recreate output option was selected, each line of the program is output using the line text routine.
The expected results files for the three encoder tests were created from the encoder test results files with the output of the program added to the end. All the encoder tests are recreated correctly. The test script and batch files were updated to also test the encoder test files with the recreate output option.
[commit 1f24a70152]
The decode routine is given the line information of the line to decode containing the offset of the line within the program code and its size. A new RPN list is created and for each program word in the line, a new token is created and assigned the code and sub-code of the program word. If the code has an operand text function in its table entry (implying the code has an operand word), the operand text function is called to get the text for the token from the operand, which is assigned to the string of the token. The token is added to the RPN list. After all the words of the line are processed, a pointer to the RPN list is returned.
Like the encode routine, the decode routine is a private function within the program model class. To access recreated lines of the program, a new line text routine was added. This routine is given the index to the line and starts be retrieving the information for the line, which is passed to the decode routine. The pointer to the RPN list returned is passed to the recreate routine of the recreator instance (which was added to the program model class). The RPN list is deleted and the string returned from the recreate routine is returned.
The temporary check to prevent encoder test files from being used with the recreate output option (-to) was removed from the tester class. In the tester run routine for encoder test files after outputting the code of the program and the dictionary entries, if the recreate output option was selected, each line of the program is output using the line text routine.
The expected results files for the three encoder tests were created from the encoder test results files with the output of the program added to the end. All the encoder tests are recreated correctly. The test script and batch files were updated to also test the encoder test files with the recreate output option.
[commit 1f24a70152]
Saturday, November 23, 2013
Recreator – RPN Lists (Tagged)
The recreator is now complete and all of the initial set of commands (LET, PRINT, INPUT, and REM) are fully supported. The repository has been tagged v0.6.1 to mark this milestone. Some minor issues were also corrected for this commit including:
[commit 0cd7b84700]
- Corrected an expected results for the recreated translator test #12 (INPUT tests).
- Reorganized the access functions in the recreator class.
- Renamed the recreator class is empty access function to output is empty so as not to be confused with the stack is empty function.
- Renamed the recreator class last access function to the more explicit output last character and removed the output string is empty check.
- Corrected some comment formatting and added some missing comments.
- Corrected formatting issues where spaces were incorrectly added instead of tab characters (problems are caused by QtCreator editor bugs where is doesn't always pay attention to the spacing/tab settings).
- Added some missing FLAG option comments (details below).
[commit 0cd7b84700]
Recreator – Colons
Colons between statements are handled by setting the colon sub-code of the last token of a statement. To support the recreation of colons, a check was added to the main recreator loop after the check for the parentheses sub-code that if the colon sub-code is set, a colon and a space is added to the output string.
This change caused a slight problem with the recreation of the remark operator if there is a colon just before the remark. Two spaces were being added in front of the "'" operator. To prevent this, a check was added after a non-empty output string check to also not add a space if the last character in the output string is already a space.
A new last access function was added to the recreator class that returns the last character in the output string or a null character if the output string is empty. The expected recreated outputs for translator test #16 (colon tests) were updated and are recreated correctly.
[commit fc90cf0e32]
This change caused a slight problem with the recreation of the remark operator if there is a colon just before the remark. Two spaces were being added in front of the "'" operator. To prevent this, a check was added after a non-empty output string check to also not add a space if the last character in the output string is already a space.
A new last access function was added to the recreator class that returns the last character in the output string or a null character if the output string is empty. The expected recreated outputs for translator test #16 (colon tests) were updated and are recreated correctly.
[commit fc90cf0e32]
Recreator – Remarks
There are two codes for remarks, the command code (for the REM command) and the operator code (for the "'" operator). The token contains the string of the remark. Generally, the keyword (REM or "'") along with the remark string is added to the output string. However, there were two issues to be handled.
Unlike all the other commands, no space is required after the REM command when followed by a letter, so a statement like "REMARK A Comment" is valid. The issue is for a REM statement entered as "remark a comment" in all lower case. When recreated, the result would have been the "REMark a comment" statement. So, if the first character of the remark string is lower case, the REM keyword is converted to lower case. This still won't work if something like "Remark A Comment" is entered.
The second issue involves the remark operator. A space is needed before the "'" operator if the command is not at the beginning of the line to provide some separation between the previous statement. To determine if the remark is at the beginning of the line, an is empty access function was added to the recreator class that returns whether the output string is empty (which implies this the beginning of the line if nothing was added yet).
A single rem recreate function was added to handle both remark codes and a pointer to this function was added to the remark code table entries. To test the lower case check, an all lower case remark statement was added to translator test #15 (REM tests). The expected recreated outputs for this test were updated and are recreated correctly.
[commit 03c73c3de7]
Unlike all the other commands, no space is required after the REM command when followed by a letter, so a statement like "REMARK A Comment" is valid. The issue is for a REM statement entered as "remark a comment" in all lower case. When recreated, the result would have been the "REMark a comment" statement. So, if the first character of the remark string is lower case, the REM keyword is converted to lower case. This still won't work if something like "Remark A Comment" is entered.
The second issue involves the remark operator. A space is needed before the "'" operator if the command is not at the beginning of the line to provide some separation between the previous statement. To determine if the remark is at the beginning of the line, an is empty access function was added to the recreator class that returns whether the output string is empty (which implies this the beginning of the line if nothing was added yet).
A single rem recreate function was added to handle both remark codes and a pointer to this function was added to the remark code table entries. To test the lower case check, an all lower case remark statement was added to translator test #15 (REM tests). The expected recreated outputs for this test were updated and are recreated correctly.
[commit 03c73c3de7]
Friday, November 22, 2013
Recreator – INPUT Statements
Like PRINT statements, INPUT statements contain several codes including the input parse item (double, integer or string), input begin, input begin with string prompt, input assign reference (double, integer or string), and the input or input prompt command code. The input begin with string prompt and input command codes could also have the option sub-code set. The separator of the recreator class will be used between these codes to keep track of the separators between the input reference items.
As the codes of the INPUT statement are processed, the resulting INPUT statement is built up by adding to the string on top of the holding stack. The string of the current reference is added to the string being built with a separator in between. After each reference or prompt string, the separator is set to a comma (or a semicolon after the prompt string without the option sub-code). At the end of the statement, the INPUT or INPUT PROMPT keyword is added to the output string along with the built up string that is popped from the stack.
The input assign recreate function contains a local string. If the separator is set, the string is set to the string of the reference that is popped from the stack. The separator is added to the string on top of the stack followed by the string of the reference. The separator is not set for the first reference of an INPUT statement, so no action is needed (the string of the reference is left on the stack). The separator is set to a comma for the next reference.
The input recreate function for both the INPUT and INPUT PROMPT code adds the command keyword to the output string. A space is added followed by the built up string that is popped from the stack. If the command code has the option sub-code set (for keeping cursor on the same line), a semicolon is added to the output string. The separator is cleared for the next statement.
The table class name access function with token pointer argument was modified to return the full name of the code. This includes a space and the second word of a code that has the two word option set along with a second name. This was needed for the INPUT PROMPT command that contains two words.
Pointers to the new input recreate functions were added to the table entries of the various input codes. The input parse item and input begin codes do not produce anything during recreation and their table entries were set to the pointer of the blank recreate function. The expected recreated outputs for translator test #12 (INPUT statements) were updated and are recreated correctly.
[commit 9c8631000f]
As the codes of the INPUT statement are processed, the resulting INPUT statement is built up by adding to the string on top of the holding stack. The string of the current reference is added to the string being built with a separator in between. After each reference or prompt string, the separator is set to a comma (or a semicolon after the prompt string without the option sub-code). At the end of the statement, the INPUT or INPUT PROMPT keyword is added to the output string along with the built up string that is popped from the stack.
Implementation
The input begin string code follows the input prompt string expression, the string of which will be on top of the stack. The input prompt begin recreate function for the input begin string code sets the separator to a comma if the option sub-code is set, otherwise sets it to a semicolon.The input assign recreate function contains a local string. If the separator is set, the string is set to the string of the reference that is popped from the stack. The separator is added to the string on top of the stack followed by the string of the reference. The separator is not set for the first reference of an INPUT statement, so no action is needed (the string of the reference is left on the stack). The separator is set to a comma for the next reference.
The input recreate function for both the INPUT and INPUT PROMPT code adds the command keyword to the output string. A space is added followed by the built up string that is popped from the stack. If the command code has the option sub-code set (for keeping cursor on the same line), a semicolon is added to the output string. The separator is cleared for the next statement.
The table class name access function with token pointer argument was modified to return the full name of the code. This includes a space and the second word of a code that has the two word option set along with a second name. This was needed for the INPUT PROMPT command that contains two words.
Pointers to the new input recreate functions were added to the table entries of the various input codes. The input parse item and input begin codes do not produce anything during recreation and their table entries were set to the pointer of the blank recreate function. The expected recreated outputs for translator test #12 (INPUT statements) were updated and are recreated correctly.
[commit 9c8631000f]
Saturday, November 16, 2013
Recreator – Unary Operator Problems
Many of the other translator tests were being recreated correctly including tests #7 (errors), #8 (more errors), #9 (semicolon errors), #10 (expression errors), #11 (temporary errors), and #14 (parser errors) once the expected results were updated as these did not have the not yet implemented INPUT and REM statements or colons. However, tests #13 (negative constants) and #17 (constants) were not recreated correctly due to problems involving unary operators.
A problem occurred when a negate unary operator preceded a numeric constant. When created, there was no space between the negate operator and the number. If this statement is translated again, the negate operator and the number become a negative constant, which is not the same, through technically equivalent. This will cause the line change detection in the program model to incorrectly detect a change. The unary operator recreate function was modified to also add a space after the unary operator if the operand begins with a digit or decimal point.
A problem occurred with the negate integer operator. The precedence of the negate integer was incorrectly set to 40 causing parentheses to be added incorrectly during recreation. The precedence should have been 48, the same as the negate double operator, so the table entry for was corrected.
A problem occurred when a unary operator followed a power (exponential) operator, a higher precedence operator. The operand was incorrectly surrounded with parentheses. This binary operator recreate function was modified to also check if the second operand is a unary operator then the operand is not surrounded by parentheses.
These corrections allowed tests #13 and #17 to be recreated correctly. The regression test script was also modified to not ignore white space when comparing to the expected results. Without this change, the first problem above was not detected. The memory test was already not ignoring white space. I'm not sure what the reason was for making the regression test ignore white space.
[commit cf67d09f36]
A problem occurred when a negate unary operator preceded a numeric constant. When created, there was no space between the negate operator and the number. If this statement is translated again, the negate operator and the number become a negative constant, which is not the same, through technically equivalent. This will cause the line change detection in the program model to incorrectly detect a change. The unary operator recreate function was modified to also add a space after the unary operator if the operand begins with a digit or decimal point.
A problem occurred with the negate integer operator. The precedence of the negate integer was incorrectly set to 40 causing parentheses to be added incorrectly during recreation. The precedence should have been 48, the same as the negate double operator, so the table entry for was corrected.
A problem occurred when a unary operator followed a power (exponential) operator, a higher precedence operator. The operand was incorrectly surrounded with parentheses. This binary operator recreate function was modified to also check if the second operand is a unary operator then the operand is not surrounded by parentheses.
These corrections allowed tests #13 and #17 to be recreated correctly. The regression test script was also modified to not ignore white space when comparing to the expected results. Without this change, the first problem above was not detected. The memory test was already not ignoring white space. I'm not sure what the reason was for making the regression test ignore white space.
[commit cf67d09f36]
Recreator – PRINT Statements
There are several codes that make up a PRINT statement including print item (double, integer or string), comma, print function (TAB and SPC), semicolon (only at the end of the statement) and the print command. A recreate function was implemented for each of these codes. Because the PRINT statement is composed of several codes, the separator member variable of the recreator instance is used between the processing of these codes to keep track of separators between the print items.
As the codes of the PRINT statement are processed, the resulting PRINT statement is built up by adding to the string on top of the holding stack. Generally, the string for the current item is popped from the stack, a separator is added to the string on top of the stack, which contains previous items and the string of the current item is added to the string of the previous items that is on top of the stack. At the end of the statement, the PRINT keyword is added to the output string along with the built up string of the print items and separators that is popped from the stack.
Spaces are normally added after a comma like a semicolon, but spaces are not added between multiple commas. The print comma recreate function contains a local string variable. If the holding stack is not empty, the string on top of the stack is popped into the local string. A comma is added to the string (which is empty if the stack was empty, like when there is a comma directly after the PRINT keyword). The local string is pushed to the stack. The separator is set to a space. A space is only added after the last consecutive comma.
The print function recreate function first calls the internal function recreate function (since print functions are translated the same as other internal functions), which will process the print function and its operand and leave the result on top of the stack. The print item recreate function is called to process the print function like any other item.
The semicolon code is only found at the end of a PRINT statement and replaces the print command code. The print semicolon recreate function pops the string from the holding stack, adds a semicolon, and pushes it back to the stack. The print recreate function is called to complete the PRINT statement.
The print recreate function adds the PRINT keyword to the output string. If the holding stack is not empty, a space is added to the output string, and the string is popped from the stack and added to the output string.
A new separator is set access function was added to the recreator to a specific character, which is used by the print item recreate function. Pointers to the new print recreate functions were added to the table entries of the various codes. The expected recreated outputs for translator test #6 (PRINT statements) were updated and now recreated correctly.
[commit 1d3a05f610]
As the codes of the PRINT statement are processed, the resulting PRINT statement is built up by adding to the string on top of the holding stack. Generally, the string for the current item is popped from the stack, a separator is added to the string on top of the stack, which contains previous items and the string of the current item is added to the string of the previous items that is on top of the stack. At the end of the statement, the PRINT keyword is added to the output string along with the built up string of the print items and separators that is popped from the stack.
Implementation
The print item recreate function contains a local string variable. If the separator is set from a previous print code, the string is set to it. If this separator is not a space, then a space is added after the separator. The separator is a space if the last print code was a comma (see below). The string on top of the stack is popped and added to the string. If the stack is now empty, then the string is pushed to the stack, otherwise the string is added to the string on top of the stack. The separator is set to a semicolon for the next item if there is one.Spaces are normally added after a comma like a semicolon, but spaces are not added between multiple commas. The print comma recreate function contains a local string variable. If the holding stack is not empty, the string on top of the stack is popped into the local string. A comma is added to the string (which is empty if the stack was empty, like when there is a comma directly after the PRINT keyword). The local string is pushed to the stack. The separator is set to a space. A space is only added after the last consecutive comma.
The print function recreate function first calls the internal function recreate function (since print functions are translated the same as other internal functions), which will process the print function and its operand and leave the result on top of the stack. The print item recreate function is called to process the print function like any other item.
The semicolon code is only found at the end of a PRINT statement and replaces the print command code. The print semicolon recreate function pops the string from the holding stack, adds a semicolon, and pushes it back to the stack. The print recreate function is called to complete the PRINT statement.
The print recreate function adds the PRINT keyword to the output string. If the holding stack is not empty, a space is added to the output string, and the string is popped from the stack and added to the output string.
A new separator is set access function was added to the recreator to a specific character, which is used by the print item recreate function. Pointers to the new print recreate functions were added to the table entries of the various codes. The expected recreated outputs for translator test #6 (PRINT statements) were updated and now recreated correctly.
[commit 1d3a05f610]
Recreator – Interactive Testing
Up to now, the only way to test the recreator was by using the expression and translator test files with the batch test mode. An interactive recreator mode was not implemented since the recreator only supported expressions and creating separate modes for both expressions and commands was unnecessary. With support for commands (just assignments at the moment), an interactive mode for the recreator could be added.
Before implementing the interactive recreator mode, the translator, program unit and recreator instances (which were local variable in the tester class run routine), were changed to the member variables of the tester class. As local variables, it was necessary to pass references to them between the various tester routines, which defeated the purpose of having a class. The output stream is now given to the tester constructor, which is stored in a member variable so that it can be shared by all the class routines, instead of being an argument to the run routine and passed to the other routines.
The tester class was modified to support the new interactive recreator test mode, which is activated with the new "-tr" command line option.
The translate input routine was modified to accept a header string for the list of translated token output. If this header string is not used, then "Output:" is used as before. This routine was also modified to return the pointer to the RPN list if the header string is used. Otherwise, the RPN list is deleted as before and a null pointer is returned, which is also returned when the input line has an error.
The new recreate input routine was added, which starts be calling the translate input routine with the header set to the "Token:" string. If an RPN list is returned (no error detected), the RPN list is recreated and deleted. The recreated output is prefixed with the "Output:" string as its header.
[commit 2dc4b17e97] [commit 0ccfb105e7]
Before implementing the interactive recreator mode, the translator, program unit and recreator instances (which were local variable in the tester class run routine), were changed to the member variables of the tester class. As local variables, it was necessary to pass references to them between the various tester routines, which defeated the purpose of having a class. The output stream is now given to the tester constructor, which is stored in a member variable so that it can be shared by all the class routines, instead of being an argument to the run routine and passed to the other routines.
The tester class was modified to support the new interactive recreator test mode, which is activated with the new "-tr" command line option.
The translate input routine was modified to accept a header string for the list of translated token output. If this header string is not used, then "Output:" is used as before. This routine was also modified to return the pointer to the RPN list if the header string is used. Otherwise, the RPN list is deleted as before and a null pointer is returned, which is also returned when the input line has an error.
The new recreate input routine was added, which starts be calling the translate input routine with the header set to the "Token:" string. If an RPN list is returned (no error detected), the RPN list is recreated and deleted. The recreated output is prefixed with the "Output:" string as its header.
[commit 2dc4b17e97] [commit 0ccfb105e7]
Monday, November 11, 2013
Recreator – Sub-String Assignments
Translated sub-string assignments (previously described on July 14) consist of the string reference, followed by the expressions of the arguments of the sub-string function with the assign sub-string code. This is the same form as when these functions appear in expressions, so sub-strings can be recreated the same way. This is not the case with multiple sub-string assignments where the sub-string assign codes are all at the end of the translated statement with special assign keep codes except for the final code.
At run time, after popping the value to assign and assignment the value, the assign keep codes push the value back to the stack for the next assignment. The final non-keep assignment just pops the value off of the stack. The recreation of multiple sub-string assignments works similarly. The assign keep code will pop the recreated string of the value expression to assign. A sub-string assign keep code recreates the sub-string as a function and push the result back to the stack. For all assign keep codes, the reference string is popped, the current separator is added followed by the value string. This string is pushed back to the stack.
For the final non-keep assign code, the LET keyword is appended to the output string if the assign code has the option sub-code. The value string is popped from the stack, which will also contain the other references in a multiple assignment statement. The string of the final reference is popped and appended to the output string. Finally the value string is appended to the output string.
The new assign string recreate function contains a local string variable and starts by checking if the separator character is not set. An unset separator indicates the first and maybe only reference. For the first reference, the string is set to the assignment operator (an equal) surrounded by spaces and the separator is set to a comma. If the separator is set, then the string is set to it (which is a comma) plus a space. The string on top of the holding stack is popped and appended to the string.
If the assignment token has the sub-string flag set, then the code of the original sub-string function is obtained. This was accomplished by making the original sub-string function code the second associated code in all the table entries of the assign keep and non-keep sub-string codes. The name and the operand count is obtained for the original sub-string function code and the push with operands routine is called to process the arguments of the sub-string function. The resulting string is left on top of the holding stack.
If the assignment code is a keep code (determined if the second associated code index is zero, which is only the case for the keep codes), the value string with the separator and other references prefixed is appended to the string on top of the holding stack. Otherwise the assignment code is the end of the statement. The let recreate function is called to append the LET keyword if the option sub-code is set. The last reference is popped from the holding stack, appended to the output string, and the string with the rest of the references, separators and value is appended to the output string. The separator is cleared for the next command.
All of the translator assignment tests (#1 through #5) are all now recreated corrected except for a single PRINT statement in test #5 (since the PRINT recreation has not been implemented yet).
[commit b1c7dc5bce]
At run time, after popping the value to assign and assignment the value, the assign keep codes push the value back to the stack for the next assignment. The final non-keep assignment just pops the value off of the stack. The recreation of multiple sub-string assignments works similarly. The assign keep code will pop the recreated string of the value expression to assign. A sub-string assign keep code recreates the sub-string as a function and push the result back to the stack. For all assign keep codes, the reference string is popped, the current separator is added followed by the value string. This string is pushed back to the stack.
For the final non-keep assign code, the LET keyword is appended to the output string if the assign code has the option sub-code. The value string is popped from the stack, which will also contain the other references in a multiple assignment statement. The string of the final reference is popped and appended to the output string. Finally the value string is appended to the output string.
Implementation
A separator character member variable was added to the recreator class with access functions for getting the separator character, checking if it is set (not the null character), clearing it (setting it to the null character), and setting it.The new assign string recreate function contains a local string variable and starts by checking if the separator character is not set. An unset separator indicates the first and maybe only reference. For the first reference, the string is set to the assignment operator (an equal) surrounded by spaces and the separator is set to a comma. If the separator is set, then the string is set to it (which is a comma) plus a space. The string on top of the holding stack is popped and appended to the string.
If the assignment token has the sub-string flag set, then the code of the original sub-string function is obtained. This was accomplished by making the original sub-string function code the second associated code in all the table entries of the assign keep and non-keep sub-string codes. The name and the operand count is obtained for the original sub-string function code and the push with operands routine is called to process the arguments of the sub-string function. The resulting string is left on top of the holding stack.
If the assignment code is a keep code (determined if the second associated code index is zero, which is only the case for the keep codes), the value string with the separator and other references prefixed is appended to the string on top of the holding stack. Otherwise the assignment code is the end of the statement. The let recreate function is called to append the LET keyword if the option sub-code is set. The last reference is popped from the holding stack, appended to the output string, and the string with the rest of the references, separators and value is appended to the output string. The separator is cleared for the next command.
All of the translator assignment tests (#1 through #5) are all now recreated corrected except for a single PRINT statement in test #5 (since the PRINT recreation has not been implemented yet).
[commit b1c7dc5bce]
Sunday, November 10, 2013
Recreator – Assignments
Translated assignment statements consists of one (single assignment) or more (list assignment) variables or arrays (references) followed by the expression of the value to assign with an assignment code at the end, which may have the option sub-code indicating the optional LET keyword was entered at the beginning of the statement. When these statement are being recreated upon reaching the assignment code at the end, the strings of the references will be on the string holding stack (the first or only variable at the bottom) along with the string of the value expression (at the top of the stack).
The new assign recreate function handles the assign double, assign integer, assign double list, assign integer list and assign string list codes. The assign string code could be at the end of a mixed sub-string assignment statement and these will be handled differently (see July 14). The strings on the holding stack need to be appended to the output string in reverse order with the necessary separators between them (a comma between the references when there are more than one, and an equal between the last or only reference and the value expression).
To reverse the strings, the assign recreate function contains a local string stack and starts by popping the value expression string from the holding stack and pushing it to this local stack. A local separator string is set to the assignment operator (an equal) with surrounding spaces. The routine loops until the holding stack is empty. For each reference, the separator string is appended and the popped string is pushed to the local stack. The separator string is set to the comma with a space for the next reference.
A new let recreator support function was also added that checks if the assignment token has the option sub-code set and appends the LET keyword with a space to the output string if it does. This function will also be used for multiple mixed sub-string assignment statements. The assign recreate function calls this function and then loops until the local stack is empty appending each string popped to the output string.
A new recreator stack is empty access function was needed to determine if the holding stack is empty. This was implemented in the recreator class header file since it was only one line of code. Since the recreator top, top append, and append access functions are also only one line of code, their definitions were also moved to the header file. Finally, the pointer to the assign recreate function was added to the table entries of the above mentioned assignment codes.
The results for translator tests #1 to #5 (various assignment tests) were changed to the expected results (inputs with spaces added and upper case keywords). The statements in tests #1 (assignment tests) and #3 (data type assignments) are recreated correctly. Many of the statements in the other tests are also recreated correctly, but the ones that are not contain string and sub-string list assignments yet to be implemented.
[commit 97c666a9d3]
The new assign recreate function handles the assign double, assign integer, assign double list, assign integer list and assign string list codes. The assign string code could be at the end of a mixed sub-string assignment statement and these will be handled differently (see July 14). The strings on the holding stack need to be appended to the output string in reverse order with the necessary separators between them (a comma between the references when there are more than one, and an equal between the last or only reference and the value expression).
To reverse the strings, the assign recreate function contains a local string stack and starts by popping the value expression string from the holding stack and pushing it to this local stack. A local separator string is set to the assignment operator (an equal) with surrounding spaces. The routine loops until the holding stack is empty. For each reference, the separator string is appended and the popped string is pushed to the local stack. The separator string is set to the comma with a space for the next reference.
A new let recreator support function was also added that checks if the assignment token has the option sub-code set and appends the LET keyword with a space to the output string if it does. This function will also be used for multiple mixed sub-string assignment statements. The assign recreate function calls this function and then loops until the local stack is empty appending each string popped to the output string.
A new recreator stack is empty access function was needed to determine if the holding stack is empty. This was implemented in the recreator class header file since it was only one line of code. Since the recreator top, top append, and append access functions are also only one line of code, their definitions were also moved to the header file. Finally, the pointer to the assign recreate function was added to the table entries of the above mentioned assignment codes.
The results for translator tests #1 to #5 (various assignment tests) were changed to the expected results (inputs with spaces added and upper case keywords). The statements in tests #1 (assignment tests) and #3 (data type assignments) are recreated correctly. Many of the statements in the other tests are also recreated correctly, but the ones that are not contain string and sub-string list assignments yet to be implemented.
[commit 97c666a9d3]
Saturday, November 9, 2013
Recreator – Expressions (Tagged)
Expressions are now fully supported by the recreator. All the expressions test recreate the original source correctly. The next step is to add support for the first round of commands (LET, PRINT, INPUT and REM). The first tag of the 0.6 development series was added, v0.6.0.
[commit 0dc1f0ebf5]
[commit 0dc1f0ebf5]
Recreator – Arrays and Functions
Supporting arrays and functions in the recreator is only necessary for the translator tests. So that arrays and functions are recreated properly for these tests, preliminary recreate functions were created for arrays, defined functions and user functions.
The array and function recreate functions get the name from the string of the token. An open parentheses is added to the name since it is not stored in the token. The name and the attached count in the RPN item is passed to the push with operands routine. The define function recreate function handles both define function with and without tokens, so an open parentheses is only added to the name if the attached count is not zero. Pointers to these functions were added to the preliminary table entries for the codes of these tokens.
All the expressions in expression test #3 (parenthetical tokens) now recreate the correct output.
[commit b6e8973352]
The array and function recreate functions get the name from the string of the token. An open parentheses is added to the name since it is not stored in the token. The name and the attached count in the RPN item is passed to the push with operands routine. The define function recreate function handles both define function with and without tokens, so an open parentheses is only added to the name if the attached count is not zero. Pointers to these functions were added to the preliminary table entries for the codes of these tokens.
All the expressions in expression test #3 (parenthetical tokens) now recreate the correct output.
[commit b6e8973352]
Translator – Arrays and Functions
A problem with recreating arrays and functions (user and define) is that the translator was not setting a code in these tokens. Preliminary table entries were added for the codes needed. These codes include an array, a define function with no parentheses, a define function with parentheses and a function. There will eventually be codes for each data type, but these four codes are sufficient for now.
In the get operand routine, for the define function with and with no parentheses token types, the appropriate token code is assigned. Since there is only a single code for each of these, the data type of the token is preserved because the table set token routine sets the data type of the token to the one in the table entry for the code.
The process parentheses token routine was modified to set the code of the token to the array code or the function code if the identifier starts with an 'F' character. This routine also processes define functions with parentheses tokens, so the code is not set for these tokens. Later in the routine when the RPN item is created, the attached count is no longer set to zero for array tokens.
The attached count of the RPN item can no longer be used to detect if there are attached tokens. The attached array pointer will instead be used for this detection by checking if the pointer is set to null. The RPN item text routine was updated to use the attached array pointer to detect attached tokens instead of the attached count. This routine was also changed to use member variables directly instead of using the access functions.
[commit e1919952fa]
In the get operand routine, for the define function with and with no parentheses token types, the appropriate token code is assigned. Since there is only a single code for each of these, the data type of the token is preserved because the table set token routine sets the data type of the token to the one in the table entry for the code.
The process parentheses token routine was modified to set the code of the token to the array code or the function code if the identifier starts with an 'F' character. This routine also processes define functions with parentheses tokens, so the code is not set for these tokens. Later in the routine when the RPN item is created, the attached count is no longer set to zero for array tokens.
The attached count of the RPN item can no longer be used to detect if there are attached tokens. The attached array pointer will instead be used for this detection by checking if the pointer is set to null. The RPN item text routine was updated to use the attached array pointer to detect attached tokens instead of the attached count. This routine was also changed to use member variables directly instead of using the access functions.
[commit e1919952fa]
Recreator – Internal Functions
The arguments (operands) of internal functions precede the code of the internal functions, so the strings of the operands will be on the string holding stack in reverse order when the internal function token is processed. The arguments of define and user functions work the same way as do the subscripts of arrays.
A generic push with operands routine was implemented taking the name string of the function or array and the count of operands, and contains a local string stack and a separator string initialized to a closing parentheses (the string on top of the stack will be the last operand). Looping to the count, the string of an operand is popped from the holding stack, the separator string is appended, this string is pushed to the local stack and the separator string is set to a comma and a space for the next operand (if there is one). The strings are popped from the local stack until empty and appended to the name. The name is pushed to the holding stack.
The push with operands routine also works with functions with no arguments (nothing is pushed to the local stack when the count is zero, so only the name is pushed to the holding stack). An internal function recreate function was added that gets the name and the operand count of the internal function code from the table and calls the push with operands routine. A pointer to this function was added to all of the internal function code table entries.
While trying expression test #3 (parenthetical tokens), there were blank invalid tokens. These occurred from the hidden conversion codes that were present in the first test expression. A blank recreate function was added that does nothing. A pointer to this function was added to the hidden integer and double conversion codes. Eventually, all codes will have a recreate function.
The expected results for expression test #3 and test #4 (internal functions) were set to the correct expected results (with appropriate spacing). All of the expressions in test #4 produce the correct output. Some of the expressions in test #3 do not produce the correct output because they contain arrays, define functions or user functions, which are not being handled yet.
[commit ac9ab4774d]
A generic push with operands routine was implemented taking the name string of the function or array and the count of operands, and contains a local string stack and a separator string initialized to a closing parentheses (the string on top of the stack will be the last operand). Looping to the count, the string of an operand is popped from the holding stack, the separator string is appended, this string is pushed to the local stack and the separator string is set to a comma and a space for the next operand (if there is one). The strings are popped from the local stack until empty and appended to the name. The name is pushed to the holding stack.
The push with operands routine also works with functions with no arguments (nothing is pushed to the local stack when the count is zero, so only the name is pushed to the holding stack). An internal function recreate function was added that gets the name and the operand count of the internal function code from the table and calls the push with operands routine. A pointer to this function was added to all of the internal function code table entries.
While trying expression test #3 (parenthetical tokens), there were blank invalid tokens. These occurred from the hidden conversion codes that were present in the first test expression. A blank recreate function was added that does nothing. A pointer to this function was added to the hidden integer and double conversion codes. Eventually, all codes will have a recreate function.
The expected results for expression test #3 and test #4 (internal functions) were set to the correct expected results (with appropriate spacing). All of the expressions in test #4 produce the correct output. Some of the expressions in test #3 do not produce the correct output because they contain arrays, define functions or user functions, which are not being handled yet.
[commit ac9ab4774d]
Recreator – Operand Recreate Function
The method of using an unset (null) recreate function pointer to indicate an operand is problematic during development. It is also possible that the recreate function for code is not yet implemented. Further, using a blank token string to indicate an unimplemented code is also a problem since this also could mean that the code does not produce any output (take for example the hidden conversion codes).
Therefore, an operand recreate function was implemented, which simply pushes the string of the token to the string holding stack. A pointer to this function was added to the constant and variable code table entries. The main recreate routine was modified to surround the string of a token code with no recreate function with question marks. A check was also needed for whether the code in the token is valid before retrieving its recreate function.
[commit 9734165b8b]
Therefore, an operand recreate function was implemented, which simply pushes the string of the token to the string holding stack. A pointer to this function was added to the constant and variable code table entries. The main recreate routine was modified to surround the string of a token code with no recreate function with question marks. A check was also needed for whether the code in the token is valid before retrieving its recreate function.
[commit 9734165b8b]
Recreator – Error Checks / Expression Mode
Some error checks were added to the recreator along with an expression test mode. The first check is to make sure the string holding stack is empty upon returning from the main recreate routine. Any items left on the stack are popped and appended to the output string prefixed by "NotEmpty" to indicate an error before returning.
However, the expression test mode left the resulting string on the holding stack. Therefore, a check was required for the expression test mode. An expression mode flag argument was added to the recreate routine. When set, the string on top of the holding stack is popped and appended to the output string. This is followed by the check for an empty stack.
An error check was also added to the pop routine to make sure the holding stack is not empty. If the stack is empty, the "<Empty>" string is returned to indicate an error.
The tester translate input routine was modified to pass the expression mode flag to the recreate routine and to just use the output string returned (it is no longer necessary to pop from the holding stack of the recreator for expression mode).
[commit 2980b89a24]
However, the expression test mode left the resulting string on the holding stack. Therefore, a check was required for the expression test mode. An expression mode flag argument was added to the recreate routine. When set, the string on top of the holding stack is popped and appended to the output string. This is followed by the check for an empty stack.
An error check was also added to the pop routine to make sure the holding stack is not empty. If the stack is empty, the "<Empty>" string is returned to indicate an error.
The tester translate input routine was modified to pass the expression mode flag to the recreate routine and to just use the output string returned (it is no longer necessary to pop from the holding stack of the recreator for expression mode).
[commit 2980b89a24]
Thursday, November 7, 2013
Recreator – Unnecessary Parentheses
Parentheses in an expression control order of evaluation of the expression and are removed during the translation process. The parentheses are recreated by looking at the precedences of the operators. However, parentheses can be added to an expression that are not required, take the expression A+(B*C). The multiply has higher precedence than add, so the translation becomes A B C * +, which is the same as the expression A+B*C.
So that these unnecessary parentheses are recreated, the translator adds the parentheses sub-code of tokens. The above example expression is translated as A B C *')' +. The translator also allows for extra sets of parentheses; the expression A+((B*C)) is translated to A B C *')' ) +. A closing parentheses code is added when the previous token already has the parentheses sub-code. For a third pair of parentheses, the closing parentheses token gets the parentheses sub-code. And so on. At run-time, parentheses codes and sub-codes are ignored.
In the recreate routine after processing an item in the RPN list, a check was added for the parentheses sub-code and if set, calls the parentheses recreate function. The parentheses recreate function was also added to the table entry of the closing parentheses code.
During the testing these changes, a problem was found in the translator where the parentheses sub-code was not being added to a closing parentheses token (it incorrectly added another closing parentheses token to the output list). This was due to translator check pending parentheses routine checking if the top item of the done stack (the last item added to the output list) to see if it already had a parentheses sub-code. It should have been checking the last item in the output list instead (the closing parentheses tokens are not pushed to the done stack) .
A few additional expressions were added to expression test #2 (parentheses tests) for testing extra parentheses. Both the expected translated and recreated results were updated. All of the expressions in test #2 are now being recreated correctly.
[commit 385cc5925d]
So that these unnecessary parentheses are recreated, the translator adds the parentheses sub-code of tokens. The above example expression is translated as A B C *')' +. The translator also allows for extra sets of parentheses; the expression A+((B*C)) is translated to A B C *')' ) +. A closing parentheses code is added when the previous token already has the parentheses sub-code. For a third pair of parentheses, the closing parentheses token gets the parentheses sub-code. And so on. At run-time, parentheses codes and sub-codes are ignored.
Implementation
A new parentheses recreate function was added, which calls the pop with parentheses function with true as the argument to add parentheses when popping the top string. The pop with parentheses was modified to optionally return the precedence value and unary operator flag from the item popped from the holding stack. The string with parentheses along with the precedence value and unary operator flag are then pushed back to the holding stack.In the recreate routine after processing an item in the RPN list, a check was added for the parentheses sub-code and if set, calls the parentheses recreate function. The parentheses recreate function was also added to the table entry of the closing parentheses code.
During the testing these changes, a problem was found in the translator where the parentheses sub-code was not being added to a closing parentheses token (it incorrectly added another closing parentheses token to the output list). This was due to translator check pending parentheses routine checking if the top item of the done stack (the last item added to the output list) to see if it already had a parentheses sub-code. It should have been checking the last item in the output list instead (the closing parentheses tokens are not pushed to the done stack) .
A few additional expressions were added to expression test #2 (parentheses tests) for testing extra parentheses. Both the expected translated and recreated results were updated. All of the expressions in test #2 are now being recreated correctly.
[commit 385cc5925d]
Wednesday, November 6, 2013
Recreator – Parentheses (Unary Operators)
Handling the recreation of parentheses with unary operators is similar to that of binary operators except there is only one operand with another issue. As with binary operators, if the precedence of the unary operator is higher than the operand on top of the stack, parentheses should be added around the operand. However, parentheses should only be added if the operand on top of the holding stack is not another unary operator. Consider the expression and its translation:
The unary operator recreate function was modified to get the precedence of the operator being processed from the table. The pop call of the operand was changed to call the new pop with parentheses with the argument set to whether the top stack item is not a unary operator and the operator precedence is higher than the precedence of the top stack item.
There were insufficient expressions in expression test #2 (parenthetical expressions) to test the various situations with unary operators, so several were added. The translated test expected results were also updated for the new expressions. The expressions that still do not match are due to unnecessary entered parentheses.
[commit a5951ecfc3]
-NOT A% A% NOT Neg%When the Neg% operator is being processed, its precedence is higher than the NOT on top of the stack (actually the top contains the string "NOT A%" with the precedence of the NOT operator). With just the precedence check, parentheses would be added around the "NOT A%" expression, which is not correct. There needs to be an additional check to not add parentheses if the top item is a unary operator expression.
Implementation
A unary operator flag variable was added to the stack item to indicate if the holding stack item is a unary operator sub-expression. The push access function was modified to take an optional unary operator flag value that is pushed with the string and precedence. The default flag is set to false and is only set to true by the unary operator recreate function.The unary operator recreate function was modified to get the precedence of the operator being processed from the table. The pop call of the operand was changed to call the new pop with parentheses with the argument set to whether the top stack item is not a unary operator and the operator precedence is higher than the precedence of the top stack item.
There were insufficient expressions in expression test #2 (parenthetical expressions) to test the various situations with unary operators, so several were added. The translated test expected results were also updated for the new expressions. The expressions that still do not match are due to unnecessary entered parentheses.
[commit a5951ecfc3]
Tuesday, November 5, 2013
Recreator – Parentheses (Binary Operators)
Parentheses are removed from expressions during translation. The binary operator recreate function needs to recreate the parentheses for operators depending on their precedences. Consider these expressions with their translations:
Parentheses are required around an operand if the precedence of the operator is higher than the operand. Parentheses are also required around the second operand if the precedence of the operator is the same as the operand since operators of the same precedence are processed from left to right.
A new pop with parentheses access function was added that takes a flag argument, which when set will surround the string operand popped from the string holding stack with parentheses when set.
The binary operator recreate function was modified to get the precedence of the operator being processed from the table. The pop call of the second operand was replaced with a call to the new pop with parentheses with the argument set to whether the operator precedence is higher than or the same as the precedence of the item on top of the stack. Similarly the first operand is popped with parentheses if the operator precedence is higher than the top item.
Instead of using the top append access function, the string of the operator expression is built in a local string. The string is first set to the second operand with parentheses if needed. The string is then set to first operand with parentheses if needed, plus a space, plus the operator name, plus another space plus the current value of the string with the second operand. Finally this string is pushed to the string holding stack with the precedence of the operator.
The expected outputs for expression test #2 (parenthetical expressions) were set to the inputs. Many of these expressions match the inputs since the precedence for binary operator is now being handled. The expressions that don't match involve unary operators (no precedence checking yet) and unnecessary entered parentheses.
[commit 694e224aee]
A * B + C * D (A + B) * (C + D)The translations of these two expressions have a similar form. The first expression will be recreated correctly since multiply is higher precedence than add. However the second expression without parentheses "A + B * C + D" does not mean the same thing since add is lower precedence than multiply making the parentheses required.
A B * C D * + A B + C D + *
Parentheses are required around an operand if the precedence of the operator is higher than the operand. Parentheses are also required around the second operand if the precedence of the operator is the same as the operand since operators of the same precedence are processed from left to right.
Implementation
A precedence variable was added the stack item to hold the precedence of an operator sub-expression. The push access function was modified to take an optional precedence value that is pushed with the string. The default precedence is set to the highest precedence for when operands like constants and variables are pushed. A top access function was added so that the item on top of the string holding stack can be accessed.A new pop with parentheses access function was added that takes a flag argument, which when set will surround the string operand popped from the string holding stack with parentheses when set.
The binary operator recreate function was modified to get the precedence of the operator being processed from the table. The pop call of the second operand was replaced with a call to the new pop with parentheses with the argument set to whether the operator precedence is higher than or the same as the precedence of the item on top of the stack. Similarly the first operand is popped with parentheses if the operator precedence is higher than the top item.
Instead of using the top append access function, the string of the operator expression is built in a local string. The string is first set to the second operand with parentheses if needed. The string is then set to first operand with parentheses if needed, plus a space, plus the operator name, plus another space plus the current value of the string with the second operand. Finally this string is pushed to the string holding stack with the precedence of the operator.
The expected outputs for expression test #2 (parenthetical expressions) were set to the inputs. Many of these expressions match the inputs since the precedence for binary operator is now being handled. The expressions that don't match involve unary operators (no precedence checking yet) and unnecessary entered parentheses.
[commit 694e224aee]
Sunday, November 3, 2013
Recreator – Simple Expressions (Implementation)
The recreator is a separate new class since the RPN list will already have been decoded from the program model (with dictionary lookups to change indexes back into there original names), so there is no need to access neither the program model or the dictionaries. The constructor does nothing more than set the member table instance reference. There are only two additional member variables, the string holding stack and the output string.
The output string will contain the recreated text of a program line and will be appended to as the RPN list is processed. Since the various recreate functions are outside of the recreator class, there is a single append access function for appending a string to the output string.
The holding stack temporarily contains strings as the RPN list is processed and is used to reverse the RPN list. It is defined as a QStack of the StackItem structure, which contains a string. The holding stack was not defined directly as a stack of strings because an additional item will be needed. There are several access functions including a push function to push a string onto the stack, a pop function to pop a string from the stack, and a top append function to append a string to the top string on the stack.
Besides the class member functions, the recreator source file also contains several general recreate functions which are outside the class so that their pointer can be put into in the table entries. These include the unary operator and binary operator recreate functions, which work as described in the previous post.
A recreate function type was added to the table entry structure with an access function. The recreate functions arguments include a reference to the recreator instance and a pointer to the RPN item from the RPN list. Recreate functions were added to all of the operator, constant, and variable codes. A constant string recreate function was implemented for the constant string code that adds the necessary double quotes.
A recreator instance reference argument was added to the tester class translate input routine. For expressions, the recreator does not return an output string (there is no command to pop the string of the final expression from the holding stack and append it to the output string. Instead of adding a special expression mode to the recreator to do this, and since the pop access function is public (for the recreate functions), the translate input routine gets the output string using this function.
The expressions in expression test #1 (simple expressions) are properly recreated and match the inputs except expectedly for spacing and case (lower case word operators are output in upper case). While the other expressions produce recreated output, the output is not correct because parentheses, internal functions, etc. are not yet supported.
[commit 4bdd513c2c]
The output string will contain the recreated text of a program line and will be appended to as the RPN list is processed. Since the various recreate functions are outside of the recreator class, there is a single append access function for appending a string to the output string.
The holding stack temporarily contains strings as the RPN list is processed and is used to reverse the RPN list. It is defined as a QStack of the StackItem structure, which contains a string. The holding stack was not defined directly as a stack of strings because an additional item will be needed. There are several access functions including a push function to push a string onto the stack, a pop function to pop a string from the stack, and a top append function to append a string to the top string on the stack.
Besides the class member functions, the recreator source file also contains several general recreate functions which are outside the class so that their pointer can be put into in the table entries. These include the unary operator and binary operator recreate functions, which work as described in the previous post.
A recreate function type was added to the table entry structure with an access function. The recreate functions arguments include a reference to the recreator instance and a pointer to the RPN item from the RPN list. Recreate functions were added to all of the operator, constant, and variable codes. A constant string recreate function was implemented for the constant string code that adds the necessary double quotes.
A recreator instance reference argument was added to the tester class translate input routine. For expressions, the recreator does not return an output string (there is no command to pop the string of the final expression from the holding stack and append it to the output string. Instead of adding a special expression mode to the recreator to do this, and since the pop access function is public (for the recreate functions), the translate input routine gets the output string using this function.
The expressions in expression test #1 (simple expressions) are properly recreated and match the inputs except expectedly for spacing and case (lower case word operators are output in upper case). While the other expressions produce recreated output, the output is not correct because parentheses, internal functions, etc. are not yet supported.
[commit 4bdd513c2c]
Saturday, November 2, 2013
Recreator – Simple Expressions
The recreator will work similar to the run-time module. The run-time module will push operand values to a value stack to be popped by operators, functions and commands to perform some operation, which may push results back to the stack. The recreator will push the strings of the operands to a string holding stack. These strings will be popped by operators, functions and commands, which may push modified strings back to the stack or append the strings to the output string. This will make more sense by considering this statement and its translation:
For unary operators, the string for the operator is created. A separator space is appended only for a word operator like NOT, otherwise no space is needed (for example the "-" unary operator). The string of the single operand on top of the stack is popped and appended to the operator string. The resulting string is pushed back to the stack.
For simple expressions, only support for operands (variables and constants) and operators are needed. For string constants some processing is needed: all double quotes in the string of the constant need to be changed to two doubles quotes, and the entire string needs to be surrounded by a pair of double quotes.
The table entry for each code will have a pointer to a recreate function. Since operands already have their string in the token of the item, this string just needs to be pushed to the string holding stack (with the exception of string constants). Since this is trivial, a null pointer will indicate that the string of the token should be pushed to the string holding stack. For string constants, there will be a constant string recreate function. Since all binary operators work the same, there only needs to be a single binary operator recreate function. Likewise for the unary operators.
The recreate process will loop through each item in the RPN list. If there is no recreate function, the string of the token is pushed onto the string holding stack. Otherwise the recreate function is called to process the token. At the end of the RPN list, the resulting output string is returned. For expression mode, nothing gets appended to the output list, so the tester class will need to pop the result string of the expression from the string holding stack.
A + BAt the A operand, the string "A" will be pushed to the holding stack. Same for the B operand with the "B" string being pushed. The plus operator will pop the second operand, the "B" string, from the stack. The string for the operator will be appended with surrounding spaces to the operand now on top of the stack, specifically the " + " string gets append to the "A" string leaving "A + " on top of the stack. The previously popped second operand is then appended to the top of the stack leaving the "A + B" string on top of the stack, which may be an operand for the next operator.
A B +
For unary operators, the string for the operator is created. A separator space is appended only for a word operator like NOT, otherwise no space is needed (for example the "-" unary operator). The string of the single operand on top of the stack is popped and appended to the operator string. The resulting string is pushed back to the stack.
For simple expressions, only support for operands (variables and constants) and operators are needed. For string constants some processing is needed: all double quotes in the string of the constant need to be changed to two doubles quotes, and the entire string needs to be surrounded by a pair of double quotes.
The table entry for each code will have a pointer to a recreate function. Since operands already have their string in the token of the item, this string just needs to be pushed to the string holding stack (with the exception of string constants). Since this is trivial, a null pointer will indicate that the string of the token should be pushed to the string holding stack. For string constants, there will be a constant string recreate function. Since all binary operators work the same, there only needs to be a single binary operator recreate function. Likewise for the unary operators.
The recreate process will loop through each item in the RPN list. If there is no recreate function, the string of the token is pushed onto the string holding stack. Otherwise the recreate function is called to process the token. At the end of the RPN list, the resulting output string is returned. For expression mode, nothing gets appended to the output list, so the tester class will need to pop the result string of the expression from the string holding stack.
Table – Simplified Instance Creation
The new recreator class will also require a reference to the single table instance like the other classes (parser, translator, and program model). The initialization of the table reference members of these classes was not consistent - some required the table instance reference to be passed to the constructor and others accessed the static table instance directly to initialize their member. The single table instance creation was also convoluted with several routines:
The method the program model uses to initialize its table instance reference is the preferred method. The instance function is used to initialize the table instance reference member. The parser and translator classes were modified to use this method.
[commit 7a01ac428b]
initialize - static table function to be called once (from main) to create the single instance and it made sure it was not called more than once or if the table entries had errorsThe table constructor had an argument for the pointer to the table entry array and only set the table entry member. Most of these routines were unnecessary and the single table instance creation was modified for these routines:
has errors - static table function to return if the table has errors, used by the tester class to determine if the table entries had errors, outputs them and aborted the application
error list - static table function to return the list of error, used by the test class to output the table entry errors before aborting the application
setup and check - static table function called by initialize after creating the single table instance to setup and check the table entries
instance - static table function to return a reference to the single table instance, contains checks to make sure that initialize had been called and that the table entries had no errors; and for these errors would abort the application
instance - static table function to return the single table instance, and will create the single table instance upon the first callNow if there are table entry errors, the constructor aborts the application and no other class needs to check for table errors. This check was removed from the tester class. Previously when starting in GUI mode, there was no check for table errors. There is no longer a need to initialize the table in the main function or check to make sure it has been initialized.
constructor - modified to take both the pointer to the table entry array and the count of table entries, now does the setup and check functionality, and if there are errors, they are reported and the application is aborted
The method the program model uses to initialize its table instance reference is the preferred method. The instance function is used to initialize the table instance reference member. The parser and translator classes were modified to use this method.
[commit 7a01ac428b]
Friday, November 1, 2013
Recreator – Testing
The recreator will take an RPN list as input. Consideration was given to how the recreator will be tested. There are already many tests for testing the translation of expressions and statements into RPN lists (the expression and translator test input files). Since these are available, they will also be used for testing the recreator.
To avoid having to duplicate these into "recreator" test files, they will be used as is with a new "-to" command line option to activate the recreator on the translator output. When this option is used instead of the "-t" option, the expression or translator input file will be translated as before, and then the RPN list will be passed to the recreator.
Since the recreator has not been implemented yet, to verify this code is working correctly, the same RPN text output is used, but prefixed with the "TEST:" string. Encoder test input files are not supported, which will be added once the recreator is working with the expression and translator tests.
All of the test scripts and Windows batch file were updated to run all the expression and translator test files with the new "-to" test option. Recreator output files have the ".out" extension to not conflict with the ".txt" extension used for the other test output files. Recreator development will now commence.
[commit 9b52fc6d83]
To avoid having to duplicate these into "recreator" test files, they will be used as is with a new "-to" command line option to activate the recreator on the translator output. When this option is used instead of the "-t" option, the expression or translator input file will be translated as before, and then the RPN list will be passed to the recreator.
Since the recreator has not been implemented yet, to verify this code is working correctly, the same RPN text output is used, but prefixed with the "TEST:" string. Encoder test input files are not supported, which will be added once the recreator is working with the expression and translator tests.
All of the test scripts and Windows batch file were updated to run all the expression and translator test files with the new "-to" test option. Recreator output files have the ".out" extension to not conflict with the ".txt" extension used for the other test output files. Recreator development will now commence.
[commit 9b52fc6d83]
Thursday, October 31, 2013
Table – Has Operand Determination
During the development of the encoder before all of the routines were implemented, it was necessary to have a has operand flag in the table entries for codes that have an operand. This is no longer necessary as the equivalent can now be determined by whether the code has an operand text or an encode function pointer.
The has operand flag was removed from the table entries of these codes. A has operand table access function was added that returns whether the table entry of the code has an operand text function. The tests for the has operand flag were replaced with this new function or use the presence of the operand text or encode function pointer (by using the table access functions for these).
The program model operand text function is also no longer needed, which had special allowance for whether there was no operand text function for a code with the has operand flag. The encode routine also had special allowance for this same condition, but this is also not necessary since all codes with operands have an encode function.
[commit 1e8b92e47c]
The has operand flag was removed from the table entries of these codes. A has operand table access function was added that returns whether the table entry of the code has an operand text function. The tests for the has operand flag were replaced with this new function or use the presence of the operand text or encode function pointer (by using the table access functions for these).
The program model operand text function is also no longer needed, which had special allowance for whether there was no operand text function for a code with the has operand flag. The encode routine also had special allowance for this same condition, but this is also not necessary since all codes with operands have an encode function.
[commit 1e8b92e47c]
Wednesday, October 30, 2013
Program – Line Change Detection
The translated RPN lists for program lines is currently being saved in the line information list. This was only temporary until the RPN lists could be encoded into program code and stored. This mechanism was left in place since it is still being used for line change detection. As described in the last post, RPN lists will still be compared to detect line changes except that the program code will be decoded into an RPN list.
However, the form of the decoded RPN list will be slightly different then a translated RPN list. The method of RPN list comparison was to compare each RPN item, where the token, the attached token count, and if non-zero, each attached token was compared. For the token comparison, the token type, the data type for constants, the code for commands and operators except the REM command and operator, the code for internal functions, the string for other token types, the reference flag, and the sub-code.
The tokens in a decoded RPN list will only have the code, sub-code (only program sub-codes), and a string (for all types). Therefore, only these members of the token are compared. For the RPN items, it is unnecessary to compare the attached tokens since these only refer to other tokens in the list, and the token comparisons are sufficient to catch differences, so only the attached token count needs to be compared. In fact, a decoded RPN list will not have attached tokens, only the count, since these will not be necessary for recreation.
Token comparison boils down to comparing the code, sub-code masked by the program only sub-codes, and the string. There is one other issue when comparing the strings that would cause the previous comparison to incorrect detect a change. A non-case sensitive comparison must be used except for the REM, REM operator and string constant codes.
Where testing these changes in the GUI, a token (memory) leak was discovered, which occurred when the line did not change. The program was due to the RPN list of the line to replace was not used in this case, but was not deleted. This was corrected in additional to updating the RPN item and tokens comparison routines.
[commit 16c85dfe79]
However, the form of the decoded RPN list will be slightly different then a translated RPN list. The method of RPN list comparison was to compare each RPN item, where the token, the attached token count, and if non-zero, each attached token was compared. For the token comparison, the token type, the data type for constants, the code for commands and operators except the REM command and operator, the code for internal functions, the string for other token types, the reference flag, and the sub-code.
The tokens in a decoded RPN list will only have the code, sub-code (only program sub-codes), and a string (for all types). Therefore, only these members of the token are compared. For the RPN items, it is unnecessary to compare the attached tokens since these only refer to other tokens in the list, and the token comparisons are sufficient to catch differences, so only the attached token count needs to be compared. In fact, a decoded RPN list will not have attached tokens, only the count, since these will not be necessary for recreation.
Token comparison boils down to comparing the code, sub-code masked by the program only sub-codes, and the string. There is one other issue when comparing the strings that would cause the previous comparison to incorrect detect a change. A non-case sensitive comparison must be used except for the REM, REM operator and string constant codes.
Where testing these changes in the GUI, a token (memory) leak was discovered, which occurred when the line did not change. The program was due to the RPN list of the line to replace was not used in this case, but was not deleted. This was corrected in additional to updating the RPN item and tokens comparison routines.
[commit 16c85dfe79]
Tuesday, October 29, 2013
Recreator – Design Considerations
Originally, there was going to be another module, the decoder, which would convert internal program code into an RPN (Reverse Polish Notation) token list (like the translator produces). The recreator would then convert the RPN list back into the program text (close to the originally entered code). Very early on (see December 19, 2009) this step was considered simple and unnecessary and so was combined into the recreator.
The program model needs to detect when a changed line has actually been changed. The edit box sometimes reports changes lines when the line has not actually change. The user could also have simply added spaces to the line (which are not stored) or changed the case of a keyword, which would not result in a change to the internal program code. As previously mentioned, comparing the internal code of the current line with to the new line is problematic.
The new line would first have to be encoded, which will affect the dictionaries. Either this would need to be undone, or the old line removed first to dereference dictionary entries only to have them referenced again when putting in the new line. This is acceptable for simple dictionaries (variables, constants, remarks, etc.), but is much more involved with the blocking commands (IF-END IF, FOR-NEXT, etc.), where the block will probably be kept in a block dictionary.
A better alternative is to convert the program line into an intermediate RPN token list (decoded). The translated RPN list of the new line and be compared to the decoded RPN list of the current line. Since the decode operation can be contained in a single routine like the encode operation, the program model will own this decode routine also.
Since the decode routine will be present, then it makes sense for the recreator to take a decoded RPN list as input to convert to the program text. This method has another advantage for testing. Currently, the test code translates expressions and statements into RPN lists. These RPN lists can then be passed to the recreator for testing, therefore there will be a new test mode for taking these existing tests, translate them to RPN lists, and then recreate them back to text.
The program model needs to detect when a changed line has actually been changed. The edit box sometimes reports changes lines when the line has not actually change. The user could also have simply added spaces to the line (which are not stored) or changed the case of a keyword, which would not result in a change to the internal program code. As previously mentioned, comparing the internal code of the current line with to the new line is problematic.
The new line would first have to be encoded, which will affect the dictionaries. Either this would need to be undone, or the old line removed first to dereference dictionary entries only to have them referenced again when putting in the new line. This is acceptable for simple dictionaries (variables, constants, remarks, etc.), but is much more involved with the blocking commands (IF-END IF, FOR-NEXT, etc.), where the block will probably be kept in a block dictionary.
A better alternative is to convert the program line into an intermediate RPN token list (decoded). The translated RPN list of the new line and be compared to the decoded RPN list of the current line. Since the decode operation can be contained in a single routine like the encode operation, the program model will own this decode routine also.
Since the decode routine will be present, then it makes sense for the recreator to take a decoded RPN list as input to convert to the program text. This method has another advantage for testing. Currently, the test code translates expressions and statements into RPN lists. These RPN lists can then be passed to the recreator for testing, therefore there will be a new test mode for taking these existing tests, translate them to RPN lists, and then recreate them back to text.
Monday, October 28, 2013
Class Definition Consistency
In preparing to create the recreator class, I noticed that all the class definitions were not consistent - some had the private members at the beginning and some had them at the end. Having the member functions at the beginning allow access functions for instance to use them, at least this was the case with early C++ compilers (or at least was my understanding when learning C++ over two decades ago). But this does not appear to be a requirement with modern compilers.
It appears the Qt developers like to put the private member variables at the end of the class. The public function definitions start at the beginning followed by the private section, which start with the private function definitions. So, before embarking on creating of the recreate class definition, the non-conforming classes were changed to this style.
[commit 8fc5c92519]
It appears the Qt developers like to put the private member variables at the end of the class. The public function definitions start at the beginning followed by the private section, which start with the private function definitions. So, before embarking on creating of the recreate class definition, the non-conforming classes were changed to this style.
[commit 8fc5c92519]
Sunday, October 27, 2013
Encoder/Program – Release
This concludes the integration of the encoder with the program model and dictionaries for the initial BASIC features being implemented. The program model still keeps all of the translation RPN lists, but these are only used to detect when a line changes. Once the recreator is implemented, these lists will no longer need to be kept.
Version 0.5.3 has been released (branch0.5 was merged to the master branch and tagged v0.5.3). Archive files containing the source and binaries (with test files) for both Windows and Linux have been uploaded to SourceForge. For Windows, there is also the ibcp-libs.zip file that contains all the dynamic linked libraries required to run the program if the MinGW and QtSDK packages have not been installed (extract into the ibcp directory). Linux should already have the required libraries installed. This concludes the 0.5 development series.
Implementation of the recreator will now begin with the 0.6 development series. The recreator will convert the internal program code back to a reasonable facsimile of the originally entered program text. This text will then replace the original text in the edit box.
[commit d984a70c6b]
Version 0.5.3 has been released (branch0.5 was merged to the master branch and tagged v0.5.3). Archive files containing the source and binaries (with test files) for both Windows and Linux have been uploaded to SourceForge. For Windows, there is also the ibcp-libs.zip file that contains all the dynamic linked libraries required to run the program if the MinGW and QtSDK packages have not been installed (extract into the ibcp directory). Linux should already have the required libraries installed. This concludes the 0.5 development series.
Implementation of the recreator will now begin with the 0.6 development series. The recreator will convert the internal program code back to a reasonable facsimile of the originally entered program text. This text will then replace the original text in the edit box.
[commit d984a70c6b]
GUI Program View – Code Output
The program view contents was changed from the text of the translated RPN list to the debug text output of the program code using the debug text routine of the program model. The program model data function is used by the program view widget for getting its contents. Unfortunately, this was not as simple as change as it sounds.
The data function is a constant function (const). Because this function is constant, the debug text function also needed to be a constant function. Making this function constant required several variables in the function to be constant, and the operand text function also needed to be changed to constant. Changing the operand text function to constant required the table operand text function pointers to be constant. Changing these functions required their program model pointer argument to also be change to constant.
[commit f089c78b59]
The data function is a constant function (const). Because this function is constant, the debug text function also needed to be a constant function. Making this function constant required several variables in the function to be constant, and the operand text function also needed to be changed to constant. Changing the operand text function to constant required the table operand text function pointers to be constant. Changing these functions required their program model pointer argument to also be change to constant.
[commit f089c78b59]
Subscribe to:
Posts (Atom)