Interactive BASIC Compiler Project

Tuesday, February 8, 2011

Translator – Expression Type Release

The changes in getting the expression type code working correctly are completed. The plan was that the next official release would include support for INPUT, but that was before the issue of checking expression types came up when considering the reporting of errors with the string operand of the INPUT PROMPT command.

This is also the first release since development was moved from VIDE2 (using Insight, a GUI for gdb, for debugging) to NetBeans (which includes a GUI front end to gdb). A make file was developed so that NetBeans is not required to build the program for the project. The program is build by simply issuing the make command in the main IBCP directory.

All of the test programs were also modified to work using the same make file. The VIDE2 project files were removed. The test programs are built using the make tests command. A specific test can be built, for example the stack test program is built with the make test_stack command. The parser and table test programs were removed as there functionality is now built into the main IBCP program.

The ibcp_0.1.14-src.zip file has been uploaded at Sourceforge IBCP Project along with the binary for the program. This release contains the complete current source including the test programs and a single make file for building the main program and optionally the test programs. It is recommended that the source be extracted into a clean directory. Now the INPUT command can be implemented...

Monday, February 7, 2011

Translator – Assignments and INSTR

While debugging assignments of temporary strings, specifically the assignment of a string that needs to be attached to the assignment operator, done stack empty errors were occurring. This was caused because it was trying to attach two strings, only one of which was on the done stack. Somehow the number of strings member in the table entry was set to two – it appeared to be counting both operands even though a check was put in not to count the first operand of an assignment operator (and was working).

The problem was finally determined to be caused by the INSTR table entries. The assignment operator table entries and the INSTR table entries were sharing the same expression information structure instance. When the assignment operator entry was initialized, it properly set the number of strings to one. But when it processed the INSTR table entries, it changed the number of strings to two, since it was the same instance. This was corrected by giving the assignment operator entries their own instances of the expression information structure.

This corrected the assignment operator problem, but the INSTR functions were giving an “expected operator or close parentheses” error at the second comma of the three argument form of INSTR. This occurred because the associated INSTR codes (Instr2T1, Instr2T2 and Instr2TT) did not have the multiple flag set. This was corrected by setting the multiple flag in the T1, T2 and TT forms of the Instr2 code entries, and the Instr3 associated code entries were placed after each of their corresponding Instr2 codes.

Sunday, February 6, 2011

Translator – Assignment Table Entries

The assignment operator table entries previously only had one operand, which was used for both the variable(s) being assigned and for the expression being assigned, since there both the same data type. This did not work for temporary strings because the variable operand needed to a string and operand being assigned needed to be a temporary string. So, the assignment operators were given two operands, the first for the variable(s) being assigned and the second for the operand of the assignment.

This change caused a problem with the main assign and assign list operators, which contained the list of all the related associated assignment operator codes (one for each data type). When find code is processing the assignment operand (the second operand) for double assignment, it does not need search for associated codes. To stop this, the index to the second set of associated codes, needed to be set to the number of associated codes, but this caused the Table initialization check to fail. So this initialization was modified to accept this condition.

The Table initialization was also modified to not count the first string operand for assignment operators (the entry's reference flag is set) because the string variable(s) being assigned are not attached to the assign string operator.

One strange problem remained with assignments, which involved the INSTR table entries...

Translator – Assignments of Temporary Strings

A problem with assignments of temporary strings was that the assignment command handler was not performing all the actions it needed to. These additional tasks were found be comparing to the process final operand routine, which it turned out could be used with some special allowances for assignment operators.

The code in the assignment command handler was replaced with a call to the process final operand routine except for the check if the done stack is empty (needed otherwise the find code routine would flag it as a bug with its empty done stack check) and the clearing of the reference flag of the assignment operator token.

In the process final operand routine, a check was added if the token's table entry has the reference flag set (set only for assignment operators), then the assignment operator token will not be put onto the done stack, though it will still get operands attached (strings, but not temporary strings), and any parentheses in the first and last operands of the token being assigned are deleted.

There were additional problems with the Table entries for the assignment operators...

Saturday, February 5, 2011

Translator – Temporary String Support - Debugging

The Parser tests expected results needed to be updated because the EOL code had changed (due to extra table entries being added). Several of the Translator tests expected results needed to be updated for the new temporary string codes. For example, the convention for the operators are CatStr: +$, CatStrT1: +$1, CatStrT2: +$2, and CatStrTT: +$T. The convention for functions is a T added to the function name, for example Len: LEN( and LenTmp: LENT(. The INSTR functions with two string arguments use the convention T1, T2 and TT.

Some minor table entries problems were fixed, but one significant problem that needed correcting was in the number of strings member in the expression information structure in each table entry. This member is automatically calculated during table initialization, but was counting both string and temporary string operands. This field is used to determine how many strings need to be popped from the done stack and attached. Temporary strings are not left on the done stack, so the number of strings should have only included the number of strings only, not temporary strings.

There was one other issue with constant strings. Technically these are already known to be strings, unlike tokens with and without parentheses that may be variables/arrays or user functions. Therefore, constants don't need to be attached to operands. But there is no allowance in the current implementation for this, so to keep things simple, they will continue to be attached. The proper code is already set (for a string operand), and the Encoder will see that they are constants and leave the code as is. Normally, if the Encoder determines that an operand is user function, meaning it produces a temporary string, it will change the code appropriately to one that has a temporary string operand.

There is still a problem with assignments...

Translator – Full Temporary String Support

The main work for adding full support for temporary string includes adding all the new codes for temporary strings and adding all the table entries for these new codes. Part of this was adding new operand arrays that contain the temporary string type(s), adding new associated arrays and increasing the maximum size of the associated array (the assign and assign list associated arrays contain five associated codes).

For example, previously there was the CatStr code (both operands are strings). This was expanded to the CatStrT1 (first operand is a temporary string), CatStrT2 (second operand is a temporary string), and CatStrTT (both operands are strings) codes. The is similarly for functions. For example, previously there was the Len code (argument is a string). This was expanded to the LenTmp (argument is a temporary string). The INSTR function that takes two arguments, the T1/T2/TT format was used.

The sub-string functions don't need to be expanded, because their operands are transferred to the next operator or function. If their operand is a string, then the next operator or function will be one that takes a string operand. If their operand is a temporary string, then the next operator or function will be one that takes a temporary string (that would need to be deleted at run-time when it is no longer needed). Therefore, there does not need to be separate codes (one that takes a string and a temporary string).

Thursday, February 3, 2011

Translator – No Array Assignments

Tokens with parentheses can be either an array or a user function, but the Translator can't determine which since it doesn't have access to the Dictionary (the repository for all elements of the program including variables, arrays, defined functions, user functions, constants, etc. with information about each). The Dictionary will be developed along side the Encoder. For now, the Translator will simply translate from the input code entered to reverse polish notation.

It was thought that since a token with parentheses on the left side of an assignment can only be an array, that the Translator could assume that it was an array and handle the subscripts accordingly. Currently the Translator just attaches all the operands (subscripts) to these tokens. The Encoder (by looking up the identifier in the Dictionary to determine what it is) will handle arrays and user functions. For arrays, all the subscripts need to be numerical (doubles would get a CvtInt added) and for functions, the arguments must match the function definition.

There is one additional issue with arrays through that the Encoder will handle, which is to check if the number of subscripts is correct for the array. This information will be in the Dictionary, which the Translator does not have access to. If arrays were assumed, the current attachment of subscripts, including the number of subscripts attached, would not include. The Encoder would then not know how many subscripts were found (which needs to be checked) unless this number was somehow passed through. Therefore, the current mechanism will be left in place. The only thing left to the expression type processing is to add full support for temporary strings...

Wednesday, February 2, 2011

Translator – Array Assignments

There's been a change of plan for the Translator handling arrays on the left side of assignment statements, which will be explained in the next post. What was needed to handle array subscripts was a stripped down routine similar to the code routine, to handle integers and doubles (for which a hidden CvtInt code would be added). As the customized new function was being implemented, some unnecessary code was found.

In the find code routine, when the token with an error was not a reference, there was a check if the last token added to the output was different from the token on the done stack, and if it was, it was assumed that the last token was a sub-string function (which was not put on the done stack because its operand was left on the done stack). So this sub-string token was returned for the error instead of the token on top of the done stack.

In addition, if the last token was not the expected sub-string function, a bug error was returned since it was assumed to be invalid condition. However, this is a valid condition if a dummy close parentheses token was added to the output, which would then be the last token added to the output.

With the first and last operand implementation, these checks are no longer necessary, since the for sub-string functions, the first operand left on the done stack acquires the sub-string function token as its first operand and this first operand token is returned as the location of the error. Therefore, these extra checks were removed.

Tuesday, February 1, 2011

Translator – Correcting More Token Leaks

There were a few more token leaks that needed to corrected, one with the first and last operand changes and rest with the PRINT command. Details of the rest of the token leaks found and corrected are after the continue. All tokens leaks have now been corrected, time to handle array assignments...

Continued... »

Sunday, January 30, 2011

Translator – Correcting Token Leaks

The crash on Translator test 12 was caused because the token leaks were only being output after a successful translation but not when an error was reported. After a number of errors, the token leaks built up and when they were finally output, their information didn't line up with the current input line causing a crash when accessing outside of a buffer. Once that problem was fixed, the token memory leaks were fixed one by one.

For one of the leaks, an incorrect fix was applied causing multiple token deletes. For some reason these tokens were not output. A segmentation fault occurred when attempting to debug this problem before it even got to the output code when it was attempting to delete the string in the token a second time. Setting the string pointer to NULL didn't help – apparently the memory was already being used for something else.

Apparently gdb (under NetBeans) catches these segmentation faults where the program running by itself does not. In any case, at least the code was at least detecting the extra delete condition. Details of the memory leaks found and corrected so far after the continue. A common theme is that the token leaks occur when something is popped from the stack and not put anywhere because any tokens in any of the stacks, the pending parentheses token or the output list get deleted by the Translator's cleanup routine.

Continued... »

Saturday, January 29, 2011

Token Leaks – Implementation

To implement token leak detection, a token pointer list was added to the Token class as a static member and a token list element pointer was added as a regular member. The new and delete operators were overloaded in the Token class to be able to maintain this token list.

The overloaded new operator function first allocates an array of bytes for the size argument, which is type cast to a Token pointer. The token pointer is appended to the static token pointer list and the element pointer returned from the list append function is put into the element pointer member of the token. The value of the token pointer is returned.

The overloaded delete operator function type casts the void pointer to a token pointer. If the element pointer is NULL, then this token has already been deleted. Otherwise the element pointer in the token is used to remove the token pointer from the token pointer list. The element pointer of the token is set to NULL to indicate that it has been deleted (to detect multiple deletes of the same token). The memory used by the token is then deleted.

This additional error that can occur when a token being deleted more than once. To also catch these errors, when a token is being deleted, if its element pointer is NULL, then a copy of the token will be added to a new delete list, which will be a list of tokens. A copy of the token needs to be made because the memory has already been deleted and could be reused by another allocation. It's also possible that the memory may have already been reused and the token copy will be garbage, but at least it will be known there is an extra delete somewhere.

Finally, the test code was modified to output any tokens that have not been deleted once the RPN output list has been output and cleaned up (all of its tokens deleted). The tokens in the delete list are also output. Once the code was working and running, many token leaks were found (in six of the Translator tests), and Translator test 12 (more error tests) was crashing. Time to fix some problems...

List Class Update

The List class is currently keeping track of the beginning and end of the linked list with a master element. The master element is created when the list is created. A list element contains a previous element and next element link pointers along with the value of the element. The amount of memory allocated for the master element is less the element value using a strange syntax for subtracting the size of the value from the total size of the element – not ideal.

When a list is created, the links of the master element are set to point to the master element itself. The list is empty when the master element's next and previous pointers are the same as the master element pointer. When linking through the list, the first element is the master element's next pointer. The end of the list is detected when the master element is reached when linking to an element's next pointer.

In considering a design for the token linked list, it was realized that the master element is not really necessary. All that is needed is a pointer to the first element in the list. Or, for a first-in-last-out list, a pointer to the last element. The last pointer method will be used and the variable will be named tail. The list is linked circularly meaning that the last element's next pointer points to the first element in the list and the first element's previous pointer points to the last element. If the list is empty, then tail will be set to NULL.

Using the single tail pointer, the first element in the list is now obtained using tail->next assuming the list is not empty, otherwise NULL will be returned. The last element is now obtained simply by using tail. When appending items to end of the list, the new element is linked into the list and tail is set to the new element. When removing the last element from the list, a check is needed to see if the list is empty after removing the item.

Sourceforge Uploads Now Working

Sourceforge is accepting uploads again and the files for the latest pre-release are now there.

Friday, January 28, 2011

Token Leaks – Initial Design Thoughts

When considering a design to keep track of tokens created, there needed to be an easy way to remove the token once it has been deleted. Tokens left in the list would be memory leaks. The first idea was a simple linked list where each element consisted of a previous link, a next link, a pointer to the token and some type of identifier of where in the code that the Token was allocated. A pointer to the list element would be added to the Token class.

When a token is created, it would be added to the list and a pointer to the list element would be put into the token. When a token is deleted, it would use the element pointer to delete the item from the list. For tokens still in the list at the end, the identifier would indicate where the token was created, though the contents of the token itself would probably be sufficient in determining why it was not deleted.

At first, a special token list class was considered. It would only need a pointer to the first element in the list. Some code on how this would work (adding and deleted elements) was sketched out. But wait, there was already a List class, and then it was realized that the List class doesn't need a master element to keep track of the first and last elements in the list. Only a pointer to the first (or last) element is needed...

Thursday, January 27, 2011

Memory Leak Detection

Memory leaks have been a continual problem. Frequently another one is discovered like the recent pending parentheses issue. Some short of memory leak detection is needed. In C++, the new and delete operators can be overloaded so that a custom memory management can be used. While this is a possible solution, a memory manager with leak detection can be very involved. Something simpler is desired.

Analyzing the current source code, looking for all the uses of new operators currently used, here's what was found:

Tokens – allocated in a number of places; these have been where most of the memory leaks have been found

Strings – for now these are allocated when creating tokens, so if the tokens get deleted these will get deleted (in the Token destructor)

Stacks – the arrays in the stacks upon first creation and when expanded; the arrays will be deleted when the stacks go out of scope (as part of the destructors of the classes that use them)

Lists – the elements of the list, which are deleted when the lists go out of scope (as part of the destructors of the classes that use them)

Translator Output list – this is a one time allocation in the Translator when a new translation is initiated; it is handed over to the caller when getting the translated results; so it's up to the caller to clean this up (currently handled in the test code)

RpnItems – everyone of these that are allocated is appended to the Translator's output list, so these will be handled when this list is cleaned up

Table – one time allocation in the main function, which is not important as there is only one, but a delete was added at the end of the main function

Error Lists – these are allocated as part of the Token and Table initialization when an error is detected, but were never deleted, which is not important as these were only created when there was an error (and the program exits), but deletes were added

So it looks like Tokens are the only specific area where there is currently a problem with memory leaks. Next, a simple solution for detecting token leaks...

Sourceforge Not Accepting New Uploads

Sourceforge does not appear to be accepting new uploads at the moment. Last night it just seemed to be taking a long time to be distributed to the mirrors, but this morning the same message appeared. This may explain the problem. For some reason it did except the new release notes. For now the previously posted link for the latest pre-release does not work.