Sunday, February 28, 2010

Parser Class – Debugging

I've come to realize that debugging and testing the Parser code is not going to be a trivial undertaking. The Parser class routines have been implemented along with simple test_parser program to test the Parser class code. This program has an array of test input strings that will be feed to the Parser one at time. For each test input, the test input line is output and the start() function is called. It enters a loop calling get_token() until a NULL token pointer is returned. An error token returned also terminates the loop.

Inside the token loop, it simply outputs the contents of the token. At the end of the loop, it deletes the token, since for now the goal is testing if the proper token contents are returned. If the token contains an error, the error message is output with a indicator to where in the line the error is (using the column field in the token). The token is then deleted and the loop terminated.

Otherwise, the contents of the token will be output. There is a switch statement on the token type. There are arrays of strings for token type names, data type names and code names. The value in the token is used to index into these arrays so a name can be output (it would be a pain to look up numbers for each to see if the code is working correctly). Each token is output on a separate line indented under the input line.

I decided not to wait until the whole Parser is debugged and working before releasing code, since the goal of Open Source is to throw out code as soon as possible. Therefore, the code will be released in stages once each of the major Parser routines is debugged, tested and working. The first stage will be the Immediate Commands code (and of course the test_parser code). As for version numbering, the 0.0.x series will be continued through the debugging of the Parser. Once all of the Parser is working, the version number will be upped 0.1.0. The same convention will be used as the Translator is implemented, debugged and tested; which will continue through versions 0.1.x, until working with the release of version 0.2.0. An so on (the first entirely working Interactive BASIC Compiler will be 1.0.0).

Thursday, February 18, 2010

Table Class – Testing and Release

Code was added to test each of the different type of searches to verify that the searches and the table entries were setup properly. Each search test looked for the first, last and some entry in the middle of the search group. Bad searches were included with each type. (Note that since there are currently no DataTypeWord entries in the table, only a bad searches were tested for this type.)

The main include has now been implemented named ibcp.h and only currently contains the definition needed for the Table class.  This file will contain all the definitions needed for this project - at least for now, but in the future may be reorganized into separate files if it becomes too large.  There is also a codes.txt file that lists all the current codes and their numeric values - sorted both alphabetically and numeric.  This file was very helpful during debugging and testing.  It was simply created by editing the Code enumeration into a new file.  There will probably be a future Tcl script to automatically create this file as new codes are added.

The ibcp_0.0.4-src.zip file has been uploaded at Sourceforge IBCP Project.  The binaries for the test programs have now also been uploaded so compilation is not necessary.  There's becoming quite a few test programs and test source and project files will probably be moving into a test subdirectory.

Wednesday, February 17, 2010

Table Class – Error Types

The plan was to write a test program (test_table.cpp) that contains the call to the Table constructor with the try/catch code to output any errors. Then errors would be introduced into the table entry array to test the error handling code. There were some coding errors and some errors in the table (both duplicate and missing codes) that needed to be corrected.

After the problems were corrected, errors were still being reported that the bracketing codes were missing. The problem was that the bracketing code indexes were not in the index_code array (because they were put into the range array).

To fix this problem, how the bracketing codes work was modified. The code that checked for each of the begin and end bracket codes was removed – these can be stored in the index_code array and this solved the missing errors. The indexes are then copied to the range structure.

Two more checks that needed to be performed on the range structure. The first error being to make sure the begin index was not greater than the end index, otherwise the search would malfunction (not find anything). The second error was to make sure that none of the search types begin and end indexes overlap any of the others.

Detection of all of the different types of errors were then tested to make sure they operated properly. Posts Table Class – Data Members and Table Class – Summary were updated to reflect these changes.

Sunday, February 14, 2010

Table Class – Compilation

As mentioned, the Table constructor will throw an exception if any errors are found in the table. The caller of the Table constructor will look something like:

    Table *table;
    try
    {
        table = new table();
    }
    catch (List<TableError> *error_list)
    {
        ...output errors in error_list...
        exit(1);
    }

A test program was written to see if the basic concept of having a constructor throw and an error and then catching the error in the main routine. The test program test_cons will be in the next release. This worked as intended. However, as mentioned in the previous post, upon compiling table.cpp, the error list allocation line generated a compiler error essentially saying that constructor TableError::TableError() was missing. It was unknown where an empty constructor was being called since it wasn't being explicitly called. Guessing that it was missing a copy constructor for some reason, but adding one did not eliminate the error.

Table Class – Summary

The major components for the Table class are:

    TableEntry    – structure for one entry in the Table
    TableErrType  – enumeration of the table errors (duplicate/missing)
    TableError    – structure for a Table error (type, code, index1, index2)
    TableSearch   – enumeration of search types (see Table Class – Searching)
    Table         – main class definition

The TableEntry structure was first shown in Table (Parser), but has had a few changes since then. The string field was replaced with name and name2, where name2 is for the second name of two word commands (NULL otherwise). The two flag was renamed multiple (see Parser – Implementation Notes). And there will be a flags field currently used for immediate commands only (see Parser – Immediate Commands – Implementation) along with a set of constants defined for each flag (where each constant will have one bit set so that multiple constants can be ORed together).

The begin and end codes were not fully explained in Table Class – Data Members. These will be codes (for example BegPlainWord_Code and EndPlainWord_Code) in entries of the table entry array bracketing their associated entries (for example Plain Words). These entries are not actually scanned during searches, but used during initialization to find the begin and end indexes of the entries types to set up the range array in the Table class.

The TableError structure contains constructors for initializing each of the error types. The information for each error type is contained in a structure for the error type and all the structures are within an anonymous union with the error type variable outside. The Table constructor allocates the error list:

    List<TableError> *error_list = new List<TableError>

The intention is that as errors are found in the table, they will be added to the error list by first creating the error using one of the TableError constructors, and then appending it to error_list. At the end of the Table constructor, if error_list is not empty, then the error_list pointer exception will be thrown. In the caller of the Table constructor, the catch will output the errors and terminate the program. However, the above line generates a compiler error...

Updated Wednesday, February 17, 2010; 8:10 pm: Removed details about the error types since there are now four.

Saturday, February 13, 2010

Table Class - Searching

The Table class needs a few functions for searching. It will also contain initialization for some table related variables and several access functions (the table entries themselves will be kept private). The actual data for the table will be put into a structured array initializer outside the class.

Each of the search functions will return the index of the table entry of the item found. The index can then be used with the access functions. To speed up searching of the Table, the entries in the table will be grouped together by type. There will be 3 major search functions:
  • Search for an immediate command including argument form
  • Search for a string of a given length of a given type
  • Search for a two word command
For the search for a string, there will be four types:
  • Plain Words – Commands, Word Operators, and Internal Functions that don't have a parentheses
  • Data Type Words – Internal Functions that have a data type symbol but no parentheses (currently none are planned initially, but examples are DATE$ and TIME$)
  • Parentheses – Internal Functions that have parentheses including those that may have a data type symbol, for example MID$(...)
  • Symbol – Symbol character Operators (one or two characters) and other symbols, for example colon statement separator and single-quote remark
The immediate command search returns the command arguments in a structure within the string field of the token. This required a String constructor be added for a generic (void) pointer along with a length. With this constructor is new access function get_data(), which returns the void pointer.

Friday, February 12, 2010

Table Class – Access Functions

The Table class will contain access functions taking either a code value (one of the Code enumerations) or an table entry index (which will be the internal value used in the program):

    index(code)     – return the table entry index for the code
    code(index)     – return the code for the table entry
    type(index)     – return the token type for the table entry
    datatype(index) – return the associated data type for the table entry (if any)
    multiple(index) – return whether the command supports multiple words/characters

There will also be a flags member in the TableEntry structure.  Currently it is only used for the immediate command to indicate which argument forms that the command supports.  The other entries will have the flags member set to Null (no flags set).  This will probably be used later to indicate other conditions for the rest of the entries.

Thursday, February 11, 2010

Table Class – Data Members

The Table class will contain these data members:

    entry      pointer to the table entries array (TableEntry)
    index_code pointer to the code to index array
    range      array of Range structures (with begin and end index members)

The constructor will initialize all the Table data including setting the entry pointer to the table entry array (a local static in the table.cpp source file), allocating the index_code array, and setting up the index_code and range arrays.  The destructor will delete the index_code array.

The initialization includes the recording the index of each code in the table to the index_code array (recording an error for any duplicates found), checking and recording an error for any missing codes, setting the search type range being and end indexes, checking if the begin index is not greater than the end index for each range (recording an error for any range errors found), and checking that the indexes of the ranges do not overlap each other (recording an error for any overlap errors found). Any errors will be appended to an error list. At the end of initialization, if this error list is not empty, an exception with the error list will be thrown.  An error list will be used so that all errors in the table will be reported. 

Updated Wednesday, February 17, 2010; 8:40 pm: changed to reflect the four different type of table errors.

Tuesday, February 9, 2010

Parser – Immediate Commands (Implementation)

Immediate Commands will be parsed in the separate function get_command(), which will be called only for the first token on the line (when the column is zero). The entire command will be parsed including the command's arguments, and the entire command will be put into a single token. If the line does not contain a valid command, the Parser will proceed to the other functions from the beginning of the line. The commands are one of the forms:

    Blank        Command has no arguments
    Line         Command has a line number
    Range  Command has a line number range
    Line-Incr  Command has a line number and an increment
    Range-Incr  Command has a line number range, a start line and an increment
    String  Command has a string argument

There will be an field in the Table for each of the immediate commands that will contain flags to which forms each command supports. The first thing get_command() will do is search the Table for the letter command to see if it is a valid command. If it is, then get_command() will proceed to parse the rest of the line looking for one of the above forms for the arguments. The separate function scan_command() will be called to actually scan and parse the arguments . This will make it easy to return immediately when the command is fully parsed or an error is found. After the arguments are parsed and the command syntax is good, the table will be search again for the command letter and the form of the arguments found to see if it is a valid command syntax.

Tuesday, February 2, 2010

Parser – Immediate Commands

As the new Table class was being designed, I realized that the immediate commands were forgotten. These are temporary until the GUI is implemented. Therefore, these will be implemented the simplest way. Back when I developed programs on GW-Basic, there was an add-on to the Basic that supplied some convenience features, specifically being able to use one character commands like L for LIST, E for EDIT, S for save. The temporary immediate commands will be implemented this way. There will be white space allowed in the commands and must be of one of several different forms to be recognized as an immediate command. The one character will be at the beginning of the line, the forms will be:

    LIST:  L  Lxxx  Lxxx-yyy
    EDIT:  E  Exxx
    DELETE:  Dxxx  Dxxx-yyy
    RUN:  R
    RENUM:   Rxxx-yyy  Rxxx-yyy,zz  Rxxx-yyy,nnn,zz
    SAVE:  S  S"fff"
    LOAD:  L"fff"
    NEW:  N
    AUTO:  A  Axxx  A,zz  Axxx,zz
    CONT:  C
    QUIT:  Q

Where xxx is a line number, xxx-yyy is a line number range (where xxx and/or yyy is optional), nnn is a new line number (optional), zz is an increment (optional) and fff is a file name. Note that some of the same letters are used for different commands (RUN/RENUM, LIST/LIST). Which one can be determined by the arguments provided.  There are three new commands that weren't included previously: AUTO (to automatically provided line numbers when entering program lines), CONT (to continue running the program after a break) and QUIT (to exit). If the line entered doesn't match one of these forms exactly, it will be processed by the regular Parser functions (consider the line L100=4, which is an immediate assignment, not a LIST command).

Several of the commands will do a check before they are executed. For RUN, LOAD, NEW and QUIT, if the program is not saved then there will be a prompt asking if the program should be saved first. For SAVE, S by itself will save the program using the current program name (from either a LOAD or the last S"fff" command, otherwise there will be a prompt for a file name).

One last thing, I decided to allow string constants to be terminated by the end of the line, in other words, no closing double-quote. GW-Basic works this way. Also convenient for LOAD and SAVE for just entering L"fff and S"fff. This requires just small change to the get_string() function – just don't return an error token for this condition. For program lines, when one of these unterminated string constants are recreated, the closing double-quote will be there.