Interactive BASIC Compiler Project

Sunday, February 28, 2010

Parser Class – Debugging

I've come to realize that debugging and testing the Parser code is not going to be a trivial undertaking. The Parser class routines have been implemented along with simple test_parser program to test the Parser class code. This program has an array of test input strings that will be feed to the Parser one at time. For each test input, the test input line is output and the start() function is called. It enters a loop calling get_token() until a NULL token pointer is returned. An error token returned also terminates the loop.

Inside the token loop, it simply outputs the contents of the token. At the end of the loop, it deletes the token, since for now the goal is testing if the proper token contents are returned. If the token contains an error, the error message is output with a indicator to where in the line the error is (using the column field in the token). The token is then deleted and the loop terminated.

Otherwise, the contents of the token will be output. There is a switch statement on the token type. There are arrays of strings for token type names, data type names and code names. The value in the token is used to index into these arrays so a name can be output (it would be a pain to look up numbers for each to see if the code is working correctly). Each token is output on a separate line indented under the input line.

I decided not to wait until the whole Parser is debugged and working before releasing code, since the goal of Open Source is to throw out code as soon as possible. Therefore, the code will be released in stages once each of the major Parser routines is debugged, tested and working. The first stage will be the Immediate Commands code (and of course the test_parser code). As for version numbering, the 0.0.x series will be continued through the debugging of the Parser. Once all of the Parser is working, the version number will be upped 0.1.0. The same convention will be used as the Translator is implemented, debugged and tested; which will continue through versions 0.1.x, until working with the release of version 0.2.0. An so on (the first entirely working Interactive BASIC Compiler will be 1.0.0).

Thursday, February 18, 2010

Table Class – Testing and Release

Code was added to test each of the different type of searches to verify that the searches and the table entries were setup properly. Each search test looked for the first, last and some entry in the middle of the search group. Bad searches were included with each type. (Note that since there are currently no DataTypeWord entries in the table, only a bad searches were tested for this type.)

The main include has now been implemented named ibcp.h and only currently contains the definition needed for the Table class. This file will contain all the definitions needed for this project - at least for now, but in the future may be reorganized into separate files if it becomes too large. There is also a codes.txt file that lists all the current codes and their numeric values - sorted both alphabetically and numeric. This file was very helpful during debugging and testing. It was simply created by editing the Code enumeration into a new file. There will probably be a future Tcl script to automatically create this file as new codes are added.

The ibcp_0.0.4-src.zip file has been uploaded at Sourceforge IBCP Project. The binaries for the test programs have now also been uploaded so compilation is not necessary. There's becoming quite a few test programs and test source and project files will probably be moving into a test subdirectory.

Wednesday, February 17, 2010

Table Class – Error Types

The plan was to write a test program (test_table.cpp) that contains the call to the Table constructor with the try/catch code to output any errors. Then errors would be introduced into the table entry array to test the error handling code. There were some coding errors and some errors in the table (both duplicate and missing codes) that needed to be corrected.

After the problems were corrected, errors were still being reported that the bracketing codes were missing. The problem was that the bracketing code indexes were not in the index_code array (because they were put into the range array).

To fix this problem, how the bracketing codes work was modified. The code that checked for each of the begin and end bracket codes was removed – these can be stored in the index_code array and this solved the missing errors. The indexes are then copied to the range structure.

Two more checks that needed to be performed on the range structure. The first error being to make sure the begin index was not greater than the end index, otherwise the search would malfunction (not find anything). The second error was to make sure that none of the search types begin and end indexes overlap any of the others.

Detection of all of the different types of errors were then tested to make sure they operated properly. Posts Table Class – Data Members and Table Class – Summary were updated to reflect these changes.

Sunday, February 14, 2010

Table Class – Compilation

As mentioned, the Table constructor will throw an exception if any errors are found in the table. The caller of the Table constructor will look something like:

   Table *table;
   try
   {
       table = new table();
   }
   catch (List<TableError> *error_list)
   {
       ...output errors in error_list...
       exit(1);
   }

A test program was written to see if the basic concept of having a constructor throw and an error and then catching the error in the main routine. The test program test_cons will be in the next release. This worked as intended. However, as mentioned in the previous post, upon compiling table.cpp, the error list allocation line generated a compiler error essentially saying that constructor TableError::TableError() was missing. It was unknown where an empty constructor was being called since it wasn't being explicitly called. Guessing that it was missing a copy constructor for some reason, but adding one did not eliminate the error.

Continued... »

Table Class – Summary

The major components for the Table class are:

   TableEntry – structure for one entry in the Table
   TableErrType – enumeration of the table errors (duplicate/missing)
   TableError – structure for a Table error (type, code, index1, index2)
   TableSearch   – enumeration of search types (see Table Class – Searching)
   Table   – main class definition

The TableEntry structure was first shown in Table (Parser), but has had a few changes since then. The string field was replaced with name and name2, where name2 is for the second name of two word commands (NULL otherwise). The two flag was renamed multiple (see Parser – Implementation Notes). And there will be a flags field currently used for immediate commands only (see Parser – Immediate Commands – Implementation) along with a set of constants defined for each flag (where each constant will have one bit set so that multiple constants can be ORed together).

The begin and end codes were not fully explained in Table Class – Data Members. These will be codes (for example BegPlainWord_Code and EndPlainWord_Code) in entries of the table entry array bracketing their associated entries (for example Plain Words). These entries are not actually scanned during searches, but used during initialization to find the begin and end indexes of the entries types to set up the range array in the Table class.

The TableError structure contains constructors for initializing each of the error types. The information for each error type is contained in a structure for the error type and all the structures are within an anonymous union with the error type variable outside. The Table constructor allocates the error list:

   List<TableError> *error_list = new List<TableError>

The intention is that as errors are found in the table, they will be added to the error list by first creating the error using one of the TableError constructors, and then appending it to error_list. At the end of the Table constructor, if error_list is not empty, then the error_list pointer exception will be thrown. In the caller of the Table constructor, the catch will output the errors and terminate the program. However, the above line generates a compiler error...

Updated Wednesday, February 17, 2010; 8:10 pm: Removed details about the error types since there are now four.

Saturday, February 13, 2010

Table Class - Searching

The Table class needs a few functions for searching. It will also contain initialization for some table related variables and several access functions (the table entries themselves will be kept private). The actual data for the table will be put into a structured array initializer outside the class.

Each of the search functions will return the index of the table entry of the item found. The index can then be used with the access functions. To speed up searching of the Table, the entries in the table will be grouped together by type. There will be 3 major search functions:

Search for an immediate command including argument form
Search for a string of a given length of a given type
Search for a two word command

For the search for a string, there will be four types:

Plain Words – Commands, Word Operators, and Internal Functions that don't have a parentheses
Data Type Words – Internal Functions that have a data type symbol but no parentheses (currently none are planned initially, but examples are DATE$ and TIME$)
Parentheses – Internal Functions that have parentheses including those that may have a data type symbol, for example MID$(...)
Symbol – Symbol character Operators (one or two characters) and other symbols, for example colon statement separator and single-quote remark

The immediate command search returns the command arguments in a structure within the string field of the token. This required a String constructor be added for a generic (void) pointer along with a length. With this constructor is new access function get_data(), which returns the void pointer.

Friday, February 12, 2010

Table Class – Access Functions

The Table class will contain access functions taking either a code value (one of the Code enumerations) or an table entry index (which will be the internal value used in the program):

   index(code) – return the table entry index for the code
   code(index)   – return the code for the table entry
   type(index) – return the token type for the table entry
   datatype(index) – return the associated data type for the table entry (if any)
   multiple(index) – return whether the command supports multiple words/characters

There will also be a flags member in the TableEntry structure. Currently it is only used for the immediate command to indicate which argument forms that the command supports. The other entries will have the flags member set to Null (no flags set). This will probably be used later to indicate other conditions for the rest of the entries.

Thursday, February 11, 2010

Table Class – Data Members

The Table class will contain these data members:

   entry – pointer to the table entries array (TableEntry)
   index_code – pointer to the code to index array
   range – array of Range structures (with begin and end index members)

The constructor will initialize all the Table data including setting the entry pointer to the table entry array (a local static in the table.cpp source file), allocating the index_code array, and setting up the index_code and range arrays. The destructor will delete the index_code array.

The initialization includes the recording the index of each code in the table to the index_code array (recording an error for any duplicates found), checking and recording an error for any missing codes, setting the search type range being and end indexes, checking if the begin index is not greater than the end index for each range (recording an error for any range errors found), and checking that the indexes of the ranges do not overlap each other (recording an error for any overlap errors found). Any errors will be appended to an error list. At the end of initialization, if this error list is not empty, an exception with the error list will be thrown. An error list will be used so that all errors in the table will be reported.

Updated Wednesday, February 17, 2010; 8:40 pm: changed to reflect the four different type of table errors.

Tuesday, February 9, 2010

Parser – Immediate Commands (Implementation)

Immediate Commands will be parsed in the separate function get_command(), which will be called only for the first token on the line (when the column is zero). The entire command will be parsed including the command's arguments, and the entire command will be put into a single token. If the line does not contain a valid command, the Parser will proceed to the other functions from the beginning of the line. The commands are one of the forms:

   Blank Command has no arguments
   Line   Command has a line number
   Range Command has a line number range
   Line-Incr Command has a line number and an increment
   Range-Incr Command has a line number range, a start line and an increment
   String Command has a string argument

There will be an field in the Table for each of the immediate commands that will contain flags to which forms each command supports. The first thing get_command() will do is search the Table for the letter command to see if it is a valid command. If it is, then get_command() will proceed to parse the rest of the line looking for one of the above forms for the arguments. The separate function scan_command() will be called to actually scan and parse the arguments . This will make it easy to return immediately when the command is fully parsed or an error is found. After the arguments are parsed and the command syntax is good, the table will be search again for the command letter and the form of the arguments found to see if it is a valid command syntax.

Continued... »

Tuesday, February 2, 2010

Parser – Immediate Commands

As the new Table class was being designed, I realized that the immediate commands were forgotten. These are temporary until the GUI is implemented. Therefore, these will be implemented the simplest way. Back when I developed programs on GW-Basic, there was an add-on to the Basic that supplied some convenience features, specifically being able to use one character commands like L for LIST, E for EDIT, S for save. The temporary immediate commands will be implemented this way. There will be white space allowed in the commands and must be of one of several different forms to be recognized as an immediate command. The one character will be at the beginning of the line, the forms will be:

   LIST: L Lxxx Lxxx-yyy
   EDIT: E Exxx
   DELETE: Dxxx Dxxx-yyy
   RUN: R
   RENUM: Rxxx-yyy Rxxx-yyy,zz Rxxx-yyy,nnn,zz
   SAVE: S S"fff"
   LOAD: L"fff"
   NEW: N
   AUTO: A Axxx A,zz Axxx,zz
   CONT: C
   QUIT: Q

Where xxx is a line number, xxx-yyy is a line number range (where xxx and/or yyy is optional), nnn is a new line number (optional), zz is an increment (optional) and fff is a file name. Note that some of the same letters are used for different commands (RUN/RENUM, LIST/LIST). Which one can be determined by the arguments provided. There are three new commands that weren't included previously: AUTO (to automatically provided line numbers when entering program lines), CONT (to continue running the program after a break) and QUIT (to exit). If the line entered doesn't match one of these forms exactly, it will be processed by the regular Parser functions (consider the line L100=4, which is an immediate assignment, not a LIST command).

Several of the commands will do a check before they are executed. For RUN, LOAD, NEW and QUIT, if the program is not saved then there will be a prompt asking if the program should be saved first. For SAVE, S by itself will save the program using the current program name (from either a LOAD or the last S"fff" command, otherwise there will be a prompt for a file name).

One last thing, I decided to allow string constants to be terminated by the end of the line, in other words, no closing double-quote. GW-Basic works this way. Also convenient for LOAD and SAVE for just entering L"fff and S"fff. This requires just small change to the get_string() function – just don't return an error token for this condition. For program lines, when one of these unterminated string constants are recreated, the closing double-quote will be there.

Saturday, January 30, 2010

New List Class – Release

The test_stack program was modified to include a new test_list() function that tests all the new list functions. The existing stack test functions test to make sure all the existing stack functions are still working correctly. The print_stack functions were renamed to print_list along with the arguments since these functions are really printing the underlying list. (Besides, it didn't make sense calling print_stack() from the test_list() function.) The ibcp_0.0.3-src.zip file has been uploaded at Sourceforge IBCP Project. The binaries for the test programs have now also been uploaded so compilation is not necessary.

One last minor fix was needed to the List class in the destructor. Upon scanning through the ANSI-ISO_C++.pdf document, I learned there is a distinction between using new to allocate a single item and new[] to allocate an array. You must use the corresponding delete and delete[] to deallocate the memory. For the List class, "new Element" is used throughout so there are no issues with that. However, the master element is allocated using new[]:

master = (Element *)new char[sizeof(Element) – sizeof(T)];

This is used because the master element only contains previous and next element pointers, but no value. The statement above subtracts the size of the value (template generic type T) from the size of the element. Since delete[] needs to be used with new[], in the destructor, the delete master statement was changed to:

delete[] (char *)master;

This gives delete[] the character pointer originally obtained from the new char[...] statement before is was type cast to an Element pointer.

Now it is time to get back to the Parser implementation.

New List Class – Implementation

A problem occurred during the update of the list class. It was desired that the append, insert and remove functions to not be inline functions. Defining them in the class automatically makes them inline. Therefore, they were defined outside the main List class. They are still in the list.h include file, but because they are template functions, no code is generated until a class is instantiated and the function is used.

The problem occurred for the functions that return an element pointer. The List class contains the public Element structure (next and previous element pointers and the generic value of type T). The Element structure needs to be defined within the List class since it contains the type T of the template. Inside the class definition, the Element structure can be simply referred to as “Element” or “Element *” for pointers. Outside the function, the syntax “List<type>::Element” and “List<type>::Element *” is needed where “type” is the instantiated type. For defining template functions “List<T>::Element” and “List<T>::Element *” are used.

Therefore, to define a template function that returns and Element pointer and takes an Element pointer as an argument would appear that it should be written as:

template <class T> List<T>::Element *List<T>::fun(List<T>::Element *element)

Unfortunately, the compiler gave the error: error: expected constructor, destructor, or type conversion before '*' token. The problem was with the return value since the argument syntax was in the List class previously causing no issues.

Continued... »

List Class – Updated

The List class needs to be updated to include more list like functions. The first implementation was really oriented towards the support of stack like functions (i.e. push and pop). For a list, elements need to be added to any position in the list, not just the end. This will be needed for the Translator as the RPN list is built. The following operations are therefore needed:

Add a value to a new element after an element (append)
Add a value to a new element before an element (insert)
Get a value and remove it's element (remove)

Several other changes will be made to the list class.

Continued... »

Friday, January 29, 2010

Parser – Implementation Notes

The get_token() function always allocates a new Token and sets the column to the current position (by subtracting the pointer to beginning of the line, i.e. input). Therefore a constructor was added that accepts an integer value (the column) and also initializes the string pointer to NULL. A destructor was also added to delete the string if it is not NULL.

The get_identifier() function calls scan_word to get the first word and then checks to see if the string begins with “FN” (including lower case) for a one-line user function. No reason to search the table first. Otherwise, the table is then searched. If not found than the identifier is returned including whether there was a opening parentheses and the data type if any.

If the word was found in the table and is only a single word, the command, internal function or word operator is returned. I decided to change the Boolean twoflag in the table to an Enumeration named Multiple so that three word and three character operators could be supported. For now only two word commands and two character operators are supported.

If the command can be two words, white space is skipped and scan_word() is called again to get the second word. If there is a data type or parentheses, then it is not a valid second word of a command. If the first word is valid as a single word, then it is returned as the token (the second word is held for the next token). Otherwise, the table is searched again for the two words. If not found in the table, then if the first word is valid as a single word, then it is returned as the token, otherwise an error token is returned. If found, the two word command is returned as the token. Support for three word commands was not implemented at this time.

The implementation of skip_whitespace(), scan_word(), and get_number() was straight forward. The implementation of get_operator() was similar to get_identifier() except only single characters are involved (no scanning necessary). Support for three character operators will not be implemented at this time.

I decided that searching the table should not be part of the Parser class, but should be part of another class for the Table. By the way, the Operator Table will now be known as simply the Table, since it holds more than just operators.

Friday, January 22, 2010

Parser – Support Functions

There will a few support functions that will be used by the main functions:

   skip_whitespace() - skips white space at the current position
   scan_word() - scan for a word of an identifier
   scan_string() - scan for string constant counting or copying the string
   search_table() - searches Operator Table for one or two words

The skip_whitespace() function will be called before the main parsing functions are called. It will also be used by get_identifier() before scanning for the second word if the command of the identifier is part of a two word command.

The scan_word() function will be used by get_identifier() to actually check and get the identifier. This functionality is in it's own function so that it can be used twice if the identifier is a command that is part of a two word command.

The scan_string() function will be used by get_string() to first count the size of the string so that the memory can be allocated and then called again to copy the string. The reason for the double scan is the possibility of the presence two double-quotes in the string, which are only counted and copied as one character (so a simple copy after the string length is counted won't work).

The search_table() function will be used by get_identifier() and get_operator() to search the Operator Table. There will be two string arguments for searching two word commands. The second string will be optional, passed as NULL when searching for one word or operator.

Tuesday, January 19, 2010

Parser – Main Functions

The Parser code will be broken into four main functions, one for each of the major token types:

   get_identifier()  - checks for and gets an identifier
   get_number() - checks for and gets an constant integer or double number
   get_string()   - checks for and gets a string constant
   get_operator()   - checks for and gets an operator (symbol characters)

Each of these functions will take no arguments and return a Boolean flag if they found a token of the type they were looking. If they don't find their token type at the current position, they will return false, so the next main token function can be called. If all return false, then a syntax error will be returned.

When one of these functions find their token type, they will continue scanning the input line for end of the token. They will fill the previous allocated token before returning true. If they detect an error, they will set the token to the error token type with an appropriate error message. They will leave the current position set to the first character that they did not process, ready for the next token to be parsed (after skipping white space).

Interactive BASIC Compiler Project

Sunday, February 28, 2010

Parser Class – Debugging

Thursday, February 18, 2010

Table Class – Testing and Release

Wednesday, February 17, 2010

Table Class – Error Types

Sunday, February 14, 2010

Table Class – Compilation

Table Class – Summary

Saturday, February 13, 2010

Table Class - Searching

Friday, February 12, 2010

Table Class – Access Functions

Thursday, February 11, 2010

Table Class – Data Members

Tuesday, February 9, 2010

Parser – Immediate Commands (Implementation)

Tuesday, February 2, 2010

Parser – Immediate Commands

Saturday, January 30, 2010

New List Class – Release

New List Class – Implementation

List Class – Updated

Friday, January 29, 2010

Parser – Implementation Notes

Friday, January 22, 2010

Parser – Support Functions

Tuesday, January 19, 2010

Parser – Main Functions

Email

Source and Downloads

Labels

Blog Archive