Interactive BASIC Compiler Project: 2010

Thursday, December 30, 2010

Translator – More Operand Processing Debugging

Continuing with debugging, the next difference was a PRINT command that contained sub-string functions where the expected results needed to be updated (sub-string functions don't get attached tokens, tokens are attached to next token).

Finally in the Tenth (PRINT statements) test, there were three “debug #1” bug errors, which should have been an “expected numeric expression” errors. These test lines were incomplete statements (e.g. PRINT A+). This bug error was put in to fix later because the existing code was not correct. The code was modified to return the expected type error for the operand of the operand token on top of the hold stack (taking into account whether it is the first operand of a unary operator or second operand for a binary operator). It was previously set to return the variable type for the command on top of the command stack (which was wrong – the PRINT command has no data type).

Now on the the Eleventh (error statements) tests, which has quite of few incorrect error messages...

Wednesday, December 29, 2010

Windows, NetBeans and CVS (CVSNT)

This post is not strictly related to the IBCP project, but will be helpful explaining how NetBeans (6.9.1) on Windows (XP) was made to work with CVS since it took nearly a day or searching and experimenting to accomplish (and many others were having similar problems).

The initial hope was that NetBeans for Windows would play nice with the CVS installed with MSYS (Minimum SYStem), but it did not. Searching indicated that NetBeans requires a CVS server even for a local repository and apparently, the CVS that comes with MSYS is not sufficient, though when doing a cvs ‑v command, it does respond … 1.11 (client/server). Further searching uncovered The CvsGui project and the WinCVS 2.1.1.1 package (includes WinCVS and CVSNT) was downloaded from http://sourceforge.net/projects/cvsgui/files/. Initially only the CVSNT package was installed (and this may have been sufficient).

Not realizing that there was a Service control panel under the Start menu (under CVSNT) that is used to configure CVSNT, the WinCVS package was installed. Normally WinCVS will also install the required CVSNT (the CVS server), but can be skipped if CVSNT is already installed. WinCVS is a very nice GUI for looking at checked out software (including nice version and branch graphs), however, it does not contain any configuration of CVSNT. And exactly how to check out a module from the repository was not discovered. This was not important as either the command line or NetBeans can be used to check out a module.

Once CVSNT is installed (indicates version 2.0.51d), it requests that Windows be restarted. This is necessary to install and start the CVS services, which occurred automatically upon reboot. Next the CVS server needs to be configured by using the Service control panel. The first tab reported that the services (CVS Service and CVS Lock service) were both running (a good sign).

The CVS repository needs to be identified using the Repositories tab by selecting Add. Using the browse button on the next dialog (the ... button), the repository that had been previously placed at C:/msys/1.0/home/cvsroot by using the MSYS CVS command was selected and entered in the Location field. The Name field was set to /msys/1.0/home/cvsroot.

After selecting Checkout... in NetBeans, the CVS Root was set to :local:/msys/1.0/home/cvsroot using the Edit... button, setting the Access Method to local and setting the Repository path to /msys/1.0/home/cvsroot (the same name as entered in the Name field in the CVSNT Service control panel Repositories tab). If all goes well, selecting Next> will immediately advance to the Module to Checkout page where the Module, Branch and Location Folder fields can be set. If this doesn't appear immediately, NetBeans either responds with an error, or spins its wheels looking for something (don't bother waiting – it's not going to work).

Several environment variables were set in the attempt to get it work and may or may not be necessary. The environment variables can be accessed by right-clicking on My Computer and selecting Properties, then going to the Advanced tab and selecting Environment Variables. Under System variables the following was entered:

PATH – Edited with C:\Program Files\cvsnt; at the beginning
CVSROOT – New set to C:\msys\1.0\home\cvsroot
CVS_EXE – New set to C:\Program Files\cvsnt\cvs.exe
CVS_RSH – New set to C:\Program Files\cvsnt\cvs.exe

One last warning about using CVSNT on Windows. CVSNT expects text files in the repository to be in DOS format (lines are terminated by CR/LF characters). If the text files are in Unix format (terminated by only LF characters), upon checkout, CVSNT will mess up the files by putting extra CR characters on each line. This is probably due to being a Windows program and expecting that text files to be in the expected Windows (DOS) format.

NetBeans and Debugging – Conclusions

Enough time has now been spent debugging with NetBeans to make some conclusions. After some adjusting, using the GDB debugger under NetBeans appears to be easier then using Insight, the GUI front-end to GDB. The two are definitely different and initially it appeared that is was more difficult looking inside classes with NetBeans, but this was not the case. NetBeans puts the active local variables for each code line executed in the Variables window automatically and it's very easy to step into them to look at the contents.

With Insight, the variables first need to be added to the Watch window. Insight is never aware of allocated arrays for pointer members – it treats them as simply pointers. If you want to look at the contents of a particular array element, another variable needs to be added to the Watch window. And if too many variables are added to the Watch window, Insight becomes unstable (crashes). NetBeans usually appears to be aware of arrays and lets you descend into which ever element you want to look at.

In Insight there is no way to edit a Watch variable name, say to look at a different array element – it must be deleted and a new one entered (a lot of typing). In NetBeans, the watch variables can be edited (using Customize). Though disconcerting, with the jVi plugin installed, NetBeans requires the use of Vi commands in the New Watch and it starts in command mode – this will take a little getting used to. One last thing about the jVi plugin – sometimes it just stops working (NetBeans returns to its default editor). Restarting NetBeans corrects the problem.

In conclusion, NetBeans nicely integrates everything, therefore, development will be transferred to the NetBeans IDE with CVSNT (see next post on how this was made to work). The next release of the source will include the necessary files to build with NetBeans (development with VIDE2 will now be abandoned).

Translator – Operand Processing Debugging

The Ninth (LET command) test contained multiple equal assignments, so the expected results needed to be updated since the multiple equal assignments feature was removed. Once the expected results were corrected, including removing the comma sub-code that was only needed to tell the difference between multiple equal and comma assignments, this test was successful. There was no necessity to used debugging here, but on to the Tenth (PRINT command) test that is crashing...

The Tenth test was crashing because of an invalid status code was being returned. This was tracked down to the add print code routine that was calling the process final_operand routine incorrectly (passed status variable as an argument instead of setting it to the return value).

Now all the PRINT commands were generating a “done stack not empty” bug error. This was caused by the hidden print codes (PrintDbl, PrintInt and PrintStr) not being popped from the done stack. These should not have been pushed on to the done stack and this was caused because these codes did not have the Print flag set in the table, which would have prevented them from being pushed on to the done stack.

Next the PRINT commands were generating a “invalid use of print function” error at the PRINT command. This was caused during the processing of the last hidden print code on the statement when the top of the command stack was checked for the PRINT command (the stack was empty, so the check was not performed). The PRINT command had been popped by the End-of-line command handler that called the PRINT command handler, which handles the last print code by calling the add print code routine. The add print code calls the process final operand routine, which checked for the print command. The solution was for the End-of-line command handle to leave the command on top of the stack when the command's handler is called and then pop the command off upon returning.

Tuesday, December 28, 2010

NetBeans and Debugging

Now with the CVS/CR problem resolved (the NetBeans editor will show modifications from the CVS repository as changes are made) and the program builds and runs, it was time to start the debugger.

First off, NetBeans gave a warning that GDB 6.3 (installed with MinGW) was not supported. A later version (7.2) was found at the MingGW Sourceforge page http://sourceforge.net/projects/mingw/files/MinGW/BaseSystem/GDB/GDB‑7.2/ and the gdb‑7.2‑1‑mingw32‑bin.tar.lzma file was downloaded. This file consisted of three files, gdb.exe, gdbserver.exe and gdb‑python27.exe. The current files at \MinGw\bin were renamed to gdb‑6.3.exe and gdbserver-6.3.exe just in case they were needed and the three new files were copied in. Insight appears to still run, but I'm not sure if it using GDB 6.3 or GDB 7.2. NetBeans now seemed to be happy.

The next thing to figure out was how to specify command line arguments for the program being debugged. With Insight, this was done with the Console window using the “set args” command. The first thing attempted was to enable the GDB console in NetBeans, but there is not need to explain how to do that because the console isn't available until the program is started and by then it is too late for the program to see the arguments set because it had already read them. The solution was discovered under Project Properties in the Run section – there is an Arguments setting. Now with the arguments set to “‑t 9” for Ninth Translator test, debugging can begin...

NetBeans – Building the Program

Before debugging, the program needed to be built. Within NetBeans, the IBCP project was created and the IBCP source code was checked out of the CVS repository. The .h include files were added in the project under Header files and the .cpp source files were added under Source files. If NetBeans proves satisfactory, the necessary NetBeans files will be included with the source files.

First, some options were set in NetBeans to simply the default directory structure, which assumes a multiple platform product - only Windows is being used here. By default, programs built are put into the directory “dist\Debug\MinGW-Windows” (for the debug configuration, “Release” for the release configuration). This is unnecessary and inconvenient. To make it build ibcp.exe in the root IBCP directory, the option in Project Properties, Linker, Output was changed to just ibcp from the more involved default ${CND_DISTDIR}/${CND_CONF}/${CND_PLATFORM}/ibcp.

Selecting build project immediately generated errors. The first file was opened and it looked as if it was double spaced (back-slash terminated line continuations caused the errors since the blank line prematurely terminated the line). Somehow in most of the source files, a control-M got added to the end of each line. This is a Unix vs. DOS test file format issue – Unix puts a single newline (control-J or linefeed) at the end of each line and DOS (Windows) puts a CRLF (control-M/control-J) at the end of each line. Vim deals with this invisibly and the GCC compilers ignore it.

I first noticed a week ago that these CRLF were in some of the source files when I did a “cvs diff” command it showed the whole file had changed. Outputting the diff into a file and looking at it with Vim showed the problem was all the lines from one file had “^M” characters at the end of each line, therefore causing no lines to compare. This was causing no problems with either Vim or the compiler.

After some playing around, the problem was discovered to be caused by CVSNT needed by NetBeans (more on this later), which seems to get confused by Unix format files in the repository. The first attempt was to convert all the files to Unix format, but this did not help. After all the files were converted to DOS format, the problem appeared to be solved. Further checkouts were OK. The program then built with no issues. Tests ran as last released. Now on to debugging...

NetBeans and Vim (jVi plugin)

The version of NetBeans installed is the latest 6.9.1 and can be obtained for free at http://netbeans.org/. The C/C++ pack was installed. JDK 6 (Java Development Kit) also needs to be installed, but that was already installed on the computer for something else. The jVi (vi editor clone) plugin can be downloaded at http://sourceforge.net/projects/jvi/. The downloaded nbvi‑1.3.0.x1.zip file was unzipped, which contains a directory that was put into the Program Files\NetBeans 6.9.1 directory. To install this plugin package, once NetBeans was installed and running, go to Tools/Plugins and the Downloaded tab. Select “Add Plugins...” and find the nbvi-1.3.0.x1 directory and select the two .nbm files. Once both are selected, select the Install button.

Before going into what was needed to make CVS work with NetBeans (which is not necessary for building the IBCP program from the source, but I want to give details on how to do it since it may help someone else in a similar situation as it took a bit of searching to figure it out), I wanted to see what was needed to make debugging work within NetBeans (it works very nice under Linux). This can be done by actually debugging starting with the Ninth Translator (Commands) test...

Monday, December 27, 2010

New Development Environment

I decided to try out a new IDE I recently learned about – NetBeans, which has integration with CVS, an integrated debugger (front end to GDB), and there is an available plugin to get vim editor capability. Getting the Windows version to work properly was going to be a challenge through.

Currently for development, MSYS (Minimum SYStem) with MinGW (Minimum GNU for Windows) is being used with CVS for MSYS installed for version control. MinGW was updated to version 4.4.0 of the GNU Compiler Suite. For controlling (building) the project, VIDE2 (a simple IDE) is being used, but it contains a very primitive editor (gVim is being used instead for editing). For debugging, Insight, a GUI front end for GDB (version 6.3) is installed. Insight is not exactly stable (crashes a lot), and the latest version 6.8.1 could not be made to work with MSYS.

I'm not sure I will be able to get NetBeans working satisfactory. So far, after a lot of struggle, NetBeans is working with CVS, though not the CVS that was installed with MSYS. Complete details will be in a future post. The IBCP was successful checked out of the CVS repository with NetBeans, the IBCP project was created NetBeans and was built successfully. Next will be to get the debugger working.

Translator – Sub-String Pre-Release

Now the first eight Translator tests succeed. There are more problems to correct with tests 9 through 12 (test 10 crashes). But, since it has been half a year since code has been released, it's time to commit and tag the changes to CVS and make another pre-release.

I was distracted during the last two months with a work project and didn't get much time to work on this project. I made the mistake of not committing the last good working code to CVS (mainly because I wanted to fix just one more bug). It took almost a week to recover, figure out what I was in the middle of, make the code compile again, and get the most recent changes working (almost). With that accomplished, a pre-release can be made. The file ibcp_0.1.14-pre-2-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Now on to more debugging to get the rest of the tests working again...

Saturday, December 25, 2010

Translator – Sub-String Debugging

To implement the sub-string change, not pushing the sub-string function to the done stack leaving its string operand on the stack (to carry to the next token), is accomplished by using the same mechanism that prevents print function from being pushed on the done stack (set the done push flag to false). With the number of operands being left set to zero, the sub-string function's string operand will be left on the done stack.

As debugging continued, nothing was getting attached to some AssignStr tokens. The assign command handler was only attaching tokens of the String data type, not the TmpStr data type returned (by the concatenate operator and the non-sub-string functions). In these cases, the AssignStr code will be changed to the AssignStrTmp code, but none of the associated codes for TmpStr arguments have been implemented yet (that will come next). Therefore, this will be left as is for now (TmpStr tokens will not be attached to AssignStr).

The new sub-string implementation (not pushing them on the done stack), made sub-string assignments stop working (produced “done stack empty” errors). The variable being assigned was popped from the done stack because it was not needed since it doesn't need to be attached to a token. When the assignment operator (equal) went to look for what was being assigned to determine which assignment code was needed, the done stack was empty. In this case, the sub-string function needs to be on the done stack. Therefore, a check was added to still push the sub-string function token if its reference flag is set.

The sub-string change also affected error reporting - the “expected numeric expression” error for a statement like Z=B+MID$(A$,2,3) was now incorrectly pointing to the A$ instead of the MID$ token (because the A$ carried forward and the MID$ token was not on top of the done stack).

To correct this problem, the code at the end of the find code routine that returns the appropriate error when an operand of the wrong data type was modified. If the last token appended to the output list doesn't match the top token on the done stack and the last token is a sub-string function, then the token returned is set to the sub-string token instead of the top token.

Friday, December 24, 2010

Translator – Sub-String Implementation

Implementation of the transfer of operands of sub-strings will mostly take place in the process final operand function. But first, some information needs to be added to the Table, namely the number of string arguments each code has. This value could be calculated on the fly as needed, but it will make for a simpler design to just put these values in the table. The table initialization code (Table constructor) was modified to calculate these values automatically.

In the process final operand routine, the internal function and operator section, when the number of operands passed in is zero, needs to set the number of operands to the number of string arguments that are on the done stack. These operands will be popped from the done stack and attached to the internal function or operand token. Only string operands were left on the done stack (other numeric operands were popped by the find code routine as they do not need to be attached to any token).

However, for sub-string functions, which have only one string operand, the string operand will be left on the done stack, to be attached to the next token. If the next token is another sub-string function, then the string operand argument will continue to be carried forward. Therefore, the number of (string) operands to pop from the done stack will only be set to the code's number of string arguments if the code is not a sub-string. For sub-string function, the number of operands will be left set to zero, so nothing will be popped and attached to the sub-string function token.

Translator – Operands of Sub-Strings

One final issue was noticed with the eighth (Sub-String Assignments) Translator test that needed to be corrected, which can be seen with these sample statements (and their current translations):

A$ = LEFT$(B$, 1) A$<ref> B$ 1 LEFT$([B$] Assign$
A$ = LEFT$(B$+C$, 2) A$<ref> B$ C$ +$[B$,C$] 2 LEFT$([+$] Assign$

In these examples, remember that the Translator cannot determine if B$ and C$ are variables or functions, therefore these operands are attached to the code they belong to so that the Encoder can adjust the operators and functions accordingly with associated codes with the appropriate arguments. However, sub-string functions add a wrinkle to this scheme.

During run-time, sub-string functions work the same whether their string operand is a reference or a temporary string – they will simply modify the string pointer and length of the string on top of the evaluation stack. Temporary strings are not deleted since the sub-string refers to the temporary string – whichever code has the sub-string function as an operand will take care of deleting the temporary strings. Only be one code for each sub-string function is needed (not two, one with a reference string and one with a temporary string as an operand). The correct translations of the statements above should be:

A$ = LEFT$(B$, 1) A$<ref> B$ 1 LEFT$( Assign$[B$]
A$ = LEFT$(B$+C$, 2) A$<ref> B$ C$ +$[B$,C$] 2 LEFT$( Assign$[+$]

In the first statement, the B$ should be attached to the Assign$ token so that the Encoder can change it to an AssignStrTmp code if B$ turns out to be a function (results of functions are temporary strings) or leave it the AssignStr code if it is a variable. In the second statement, the Translator can change it to an AssignStrTmp code with no attached token, but until temporary strings are fully support, it will attach the +$ to an AssignStr code.

Since the result of a sub-string function is the same as its operand (reference or temporary string), the operands of the sub-string function simply get transferred to the token that has the sub-string function as an operand.

Sunday, October 10, 2010

Translator – Debugging - Sub-String Assignments (III)

The next issue is the output of the correct error messages. Consider these expressions and the current error messages (underlined tokens indicate where the error is pointing):

RIGHT$(LEFT$(A$,1),2) = B$ item cannot be assigned
RIGHT$(A,2) = B$ expected string expression
RIGHT$(A%,2) = B$ expected string expression
RIGHT$(A$+B$,2) = C$ item cannot be assigned

The desired error message is “expected string variable” for the first three statement and “expected comma” for the fourth statement. The section that receives an internal function token and checks for a sub-string function in an assignment was previously corrected to accept sub-string functions and set their reference flag (so that the reference flag of the first operand is checked).

An additional check was needed to catch a second sub-string function. Checking if there is a token on top of the hold stack with its reference flag set is sufficient. This condition only occurs when there is a sub-string assignment. However, this check is only be performed when on the first operand.

When the find code routine found an operand with an incorrect data type, it was returning “expected <datatype> expression” errors. This was modified that if the reference flag of the token is set, then an expected variable error is returned. However, for strings, the “expected string item for assignment” error was returned to indicate sub-string assignments are allowed. But if already inside the sub-string function, this needed to be the “expected string variable” error.

For the last statement, a check was added to the operator section in the add token routine, the same if the token on top of the hold stack has its reference flag set on the first operand check, if the token is not a comma, then an “expected comma” error is returned.

Saturday, October 9, 2010

Translator – Debugging - Sub-String Assignments (II)

The next issue was more difficult to resolve and was a strange problem because statements that appeared to cause the program to crash, worked by themselves. Using the process of elimination, the statement below was finally found to be causing the crash:

RIGHT$(A$+B$,2) = C$

When the comma token is received, the +$ token is added to the output list and is pushed on the done stack. The RIGHT$ token is on top the hold stack with its reference flag set because it is a sub-string assignment statement. Being a sub-string assignment, the first operand must have the reference flag set, which for the +$ token was not, hence an error.

This error is detected in the find code routine where it is checking for a reference of the first operand when the token passed in has its reference flag set. If the reference flag of the operand is not set, an error is returned. The passed in token is first deleted and the error token is set to the token of the first operand.

The problem in this case was that the passed in token, the sub-string function, that gets deleted, is still on top of the hold stack. When an error occurs on a line, one of the things that the cleanup routine does is delete all the tokens on the hold stack. The sub-string function token gets deleted twice, which causes a crash later on – allocated memory must not be freed more than one.

The solution upon these reference errors is to delete the token in the find code routine only if the token is not an internal (sub-string) function. The find code is only called from two places where the token will have its reference flag set: internal sub-string functions in sub-string assignment statements (which are on the hold stack and should not be deleted) and assignment tokens (which should be deleted).

The next problem is that currently, the error reported against the + in the statement above is “item can not be assigned” but should be “expected comma” instead. This problem affects several other statement all related to the fact that a string variable is expected and the error message should reflect this...

Monday, October 4, 2010

Translator – Debugging - Sub-String Assignments

There were no issues with the seventh (Data Type Assignment) Translator tests other than the expected text file needed to be updated for all the new operator codes. The eighth (Sub-String Assignment) tests were all failing with a bug message indicating the done stack was empty when it was expecting operands and one of the tests were crashing the program.

The done stack empty errors were caused because the string within the sub-string that was being assigned was already popped off of the done stack by the assignment (variables being assigned do not need to be attached to the assignment token). But when the sub-string function was processed, it expected to find the first operand, which is a string, on the done stack. A check needed to be added to the process final operand function to not count string operands for functions that have the reference flag set (which is only set for sub-string functions on the left side of an assignment statement).

The next issue for the sub-string assignment tests, was that the Translator was incorrectly adding an AssignStr code instead of the correct AssignSubStr code. This problem was caused because the find code routine was using the wrong condition to check for an exact data type match when looking at the operands of the main and associated codes to find the proper code for the operand. The code obtained from the conversion code table was used – the exact match check was if the conversion code was the Null code. This allowed the Sub-String data type to match the AssignStr that was expecting a String data type instead of continuing to check the AssignSubStr, which expects the Sub-string data type.

To correct this problem, the exact match check was changed to compare the actual data type to the code's operand's data type. This is the check the old find code routine used. Also, the section that inserts the conversion code was modified to only add the conversion code from the conversion table if it is not the Null code. In other words, the Null code indicates that the data type can be converted, but no actual conversion code is needed.

Sunday, October 3, 2010

Translator – Debugging - String Assignments

Debugging has continued; problems with the third (Array/Function Parentheses Expressions), fourth (Internal Functions), and fifth (Assignments) Translator tests were corrected. There were some done stack not empty errors from the sixth (Data Type) test with the string assignment tests.

The problem turned out to be that the string variable being assigned was not popped from the done stack because it was a string. (Strings that are not determined to be temporary strings are left on the done stack so that they can be popped later and attached to the item that uses them – like an operator, function or an array.) However, these string variables were not being popped and attached to the assignment command/operator.

This was by design, the thinking was that items being assigned are definitely variables or arrays and not function calls. (Functions can also be assigned, but for these assignments, the function names will not have arguments and are only permitted within the function body.) In any case, variables being assigned do not need to be attached to the assignment command, including string variables.

To solve this issue, in the find code routine where non-string values are popped from the done stack – an additional check was added that if the token needed to be a reference, then it will be popped from the done stack regardless of its data type. The reference flag is checked when the calling token has its reference flag set (which is set only for assignment operators).

Saturday, September 11, 2010

Translator – New Operand Code (Debugging II)

The Translator was not processing the equal operator properly (the test expression was “not A < 5 = B > 2”) where the equal was translated to a =%2 (EqI2) and it should have been =% (EqInt). The problem was that the Equal token handler needs to process first operand like operators when expression (equal operator), so a call to the find code routine was added.

The Translator was not generating the correct output for strings (the test expression was “a$ = "this" + "test"”) where the attached operands was just strange. The problem was found in the process final operand routine, which was not popping enough string operands from the done stack to attach to the + (CatStr) and = (EqStr) tokens because the loop prematurely ended before popping the last string operand. The loop needed to be terminated with a “––i >= 0” instead of a “––i > 0”.

From this last test expression, I realized that there is no need to save string operands that are constant strings. Constant strings will be coded the same as variables (in other words, string constants will have dictionary entries just like variables) and are Strings and not Temporary Strings (they don't need to be deleted during execution). For now, it will be assumed that the Encoder will take of this issue.

So far, the first (Simple Expressions) and second (Parentheses Expressions) Translator tests are working correctly though some of the expected results needed to be updated for the new associated codes. However, the third (Array/Function Parentheses Expressions) Translator test is crashing on the very first test expression. Debugging continues...

Thursday, September 9, 2010

Translator – New Operand Code (Debugging)

There were several problems in the Table that needed to be fixed, all caught by the check code during initialization (so it is a good thing to have the initialization checks to significantly cut down on debugging time). Some of the problems were in the check code itself, but these were easily corrected.

The first problem was the check for the index to the second operand associated codes – to make sure that the index is not greater than the number of associated codes in the array. This check can only performed if the index is greater than zero because otherwise the code either does not have a second set of associated codes or the index is zero (which does not need to be checked). Several entries had the wrong index to the second operand associated codes the needed to be corrected.

The last problem was the checks added for multiple entries (e.g. MID$) where an entry with the multiple flag set, the name of the function of the next table entry must be the same and the next entry must have expression information present – these checks were reversed (reporting an error when the entries were correct).

Once the Table initialization succeeded, the regression tests were run, where all of them failed including the Parser tests. The Parser tests failures were due to the function name array in the test code was not updated for all the new associated codes that were added to the table. To prevent having to update this array in the future (which is a pain), the process was somewhat automated.

A new awk script was created (similar to the existing codes.awk) that processes the ibcp.h include file scanning for the xx_Code enumeration values and creating a list of quoted strings separated by commas (except the last) with the name of the code, i.e. the “xx” in the enumeration value. The output of this file is put into an include file and an include statement was added into the array to get these strings.

Without using a more sophisticated make system, the awk scripts still need to be run manually when the Code enumeration is modified. The Parser tests now all succeed so debugging continues with Translator tests...

Monday, September 6, 2010

Translator – New Operand Processing

Once it appeared all the changes were made for the new operand processing (new find code and process final operand routines), the process of getting it compile began. Several corrections were made. Once most of the errors were corrected, one problem remained.

Previously there was an operand array in the Translator class that held pointers to the output list items. This array was filled in the old find code routine and used by the callers of find code to attached to a newly created PRN output item. The array is not necessary for the new find code and so was removed. However, the array was still be used by the Assign command handler, which filled the array with the value being assigned to be used when creating the output item.

The operand array was only used locally so the Assign command handler was updated to have a local variable for holding the operand of the value being assigned. If this operand is a string, then a one element operand pointer is allocated to attach to the RPN output item. Now that the code compiles successfully, debugging can begin...

Sunday, August 22, 2010

Translator – Process Final Operand

The rewrite of the find code routine and the Translator has been a major undertaking. In addition, another project has taken priority (and this project is for my actual paying job). Fortunately I'm getting to the end of the modifications for the new find code and will be able to start testing soon.

There is some code that calls the new find code that was very similar in the add operator routine and the close parentheses token handler and so a new function was implemented, which will process the final operand.

This new process final operand routine will call the find code routine for the last or only operand of either an operator or an internal function token. The non-temporary string operands for the operator or internal function will be counted. These are the operands that need to be saved with the operator or internal function token in the output list.

This new routine will also be called for arrays and non-internal function tokens from the close parentheses token handler. All the operands for these tokens need to be saved with the token (the Encoder will sort out what to do with these operands). To identify these tokens, the number of operands will be passed as an argument (the value will be zero for operators and internal functions).

This routine will allocate an array of operand pointers for the number of operands to save. The pointers to the operands will be popped from the done stack and put into the array. A new RPN list item will be created and the operands will be attached to it, and the RPN item will be appended to the output list. The output list item will be pushed to the done stack unless the token was an internal print function.

Sunday, August 1, 2010

Translator – New Find Code (Multiple Assignment)

The last issue for the new find code implementation is the most tricky and requires the elimination of the C-like multiple equal assignment (for example, A=B=C=0). The problem occurs when the items being assigned are array elements, which can be shown in this example statement:

A(I) = B(I) = 5

In this example, the A can be assumed to be an array because a function (with arguments) cannot be assigned. The subscripts must be numeric expressions. However, after the first equal, this statement can be interpreted two ways depending on whether B is an array or a function call:

I CvtInt A(<ref> I CvtInt B(<ref> 5 AssignList
I CvtInt A(<ref> I B([I] 5 = CvtDbl Assign

The first interpretation is a multiple list assignment of two array elements and the second interpretation is a single assignment to the result of an equality comparison between the result of a function call and a constant. The two resulting translations are radically difference. The Encoder can't be expected to change one translation into the other once it determines whether B is an array or a function.

A similar problem can occur with multiple sub-string assignments. These problems occur because an equal can be one of two different operators, assignment and equality. C resolves this issue by having difference operators for assignment (=) and equality (==). If this C-like multiple equal assignment is eliminated, then the rules are greatly simplified.

Therefore, in an expression, any equals will be equality operators. In a statement, only the first equal is the assignment operator. After this equal, an expression follows so any equals will be equality operators. Multiple assignments can still be performed by using the comma to separate the items being assigned.

Saturday, July 31, 2010

Translator – New Find Code (Multiple Flag)

The next issue for the new find code implementation is with internal functions that have different number of arguments (MID$, INSTR and ASC). Currently the number of arguments is checked at the closing parentheses before the call to find code to check the data types of the arguments. Before calling find code, if the number of arguments was not correct, but the Multiple Flag was set, then a search for another code was made where the number of arguments did match.

For the new implementation, the data types of the internal function arguments will be checked as each argument is processed – in other words, at the comma token (or close parentheses token for the last argument). The checking for an alternate form of the internal function will now take place in the comma token handler.

Previously to support checking if a comma was valid in an internal function call (as opposed to a closing parentheses), the internal function table entry with the most number of arguments had to first in the table because otherwise an error would occur since a closing parentheses would be expected if the table entry with the smaller number of arguments was listed first. The number of arguments would be checked, and the code changed if necessary, in the closing parentheses token handler.

Now with the comma token handler taking care of changing the code when needed, the smaller number of arguments table entry needs to be first. If a comma is received where a parentheses is expected (at the last argument), a check will be made and if the Multiple Flag is set then it will move to the next table entry with the next number of argument (otherwise an “expected closing parentheses” error occurs).

This method implies that the table entries for the number of arguments must be in order by number of arguments. If there are three forms of an internal function (there are currently none), then the Multiple Flag should be in each table entry except the last. To insure the table entries are placed in the table correctly, a check will be added to the table initialization along with the other current table checks.

Translator – New Find Code (Reference Flag)

here has been a struggle with a number of issues that have turned up with the new find code implementation, including reference checking, internal functions with multiple forms (different number of arguments), and multiple assignments (specifically multiple equal assignments, which may need to be removed from the language).

The current find code routine checks if the first argument is a reference if the table entry for the token has the Reference Flag. This was used for the assignment operators when called from the set assign command routine, which is called from the comma and equal token handlers (assignment operators have the Reference Flag). If this would be the only location that needs to check for a reference flag, the checking could be moved from the find code routine to the set assign command routine.

However, there is another situation that needs to check for a reference flag - the first argument to a sub-string assignment. This is currently handled by an additional check in the find code routine. With the new find code, this will occur at the first comma for the sub-string function (from the comma token handler). So the reference check should still be done in the new find code routine.

The reference flag can't be added to the sub-string table entries since not all instances of the sub-string functions are assignments. The token being checked (internal function, operator, assignment code or print code) is passed to the find code routine. A simply method would be to set the reference flag in this token argument if it's operand needs to be a reference. The find code would check for a reference if the token's reference flag is set.

Therefore, in the set assign command, the reference flag of the assignment token will be set. Also, when internal functions are first pushed to the hold stack, a check is made to see if the mode is currently one of the assignment modes and only a sub-string function is permitted. At this time, the reference flag of the sub-string function can be set.

There will be no problem leaving the reference flag set in the internal function or operator token since nothing downstream will be checking for a reference flag on these type of tokens.

Saturday, July 24, 2010

Translator – Find Code (New Design)

The current find code routine pops all of the operands for an operator or internal function from the done stack, checks the data type of each changing the code to an associated code as needed, and inserts hidden conversion codes as needed. The find code routine is currently called from these locations:

add operator – to get the appropriate code for unary and binary operators
add print code – to get the data type specific print code
set assign command – to get the data type specific assignment code
close parentheses handler – to check the arguments for internal functions

The Translator will be modified to find the appropriate binary operator code at the first operand, and find the appropriate binary operator code again at the second operand. Also for internal functions, each argument will be checked as it is processed, in other words, in the comma handler.

The new find code will only check one operand where the index of the operand to check will be passed as an argument. The callers of find code will be modified:

operator handler – call for first operand of a binary operator
add operator – call for second operand of a binary (or operand of a unary) operator
comma handler – call for current argument of an internal function
close parentheses handler – call for current argument of an internal function

Tuesday, July 20, 2010

Translator – Operators (New Codes)

Table entries for the new codes for the Double/Integer (I2) and Integer/Double (I1) were added for the operators. It is not necessary to have all the operand combinations for the logical operators (AND, OR, etc.) that produce an integer result for integer operands. These should be used with integers, but double operands will be allowed and the Translator will continue to insert hidden CvtInt codes for these.

Likewise for the integer division operator (\) that's meant to take double operands and produce an integer result. There's no reason to have separate versions for integer operands as the regular division operator can be used for these. So hidden CvtDbl codes will be inserted for integer operands.

Sunday, July 18, 2010

Translator – Operators (New Design)

The new design for operators will be that each data type combination will have it's own code. There will be associated codes for the first operand and associated codes for the second operand. For efficiency, there will be just one associated codes array with the second operand associated codes after the first operand associated codes. In addition to the number of associated codes value, there will be a new secondary associated codes index that will point to the first second operand associated, which can also be used to determine the end of the first operand associated codes.

Again using plus as an example, the main codes along with associated codes are list below. The convention will still be that the default operator has double arguments. The “I1” and “I2” codes represent the first or second operand being an integer, and the “Int” code for two integer operands. Similarly for strings where the “T1” and “T2” codes represent the first or second operand being a temporary string, and the “TT” code will for two temporary string operands.

Add (4, 3) AddI1, CatStr, CatStrT1, AddI2
AddI1 (1, 0) AddInt
CatStr (1, 0) CatStrT2
CatStrT1 (1, 0) CatStrTT

The second operand associated codes are underlined. The first number in parentheses is the number of associated codes and the second underlined number is the start index of the second operand associated codes. The decision for this design was arrived at by looking at the assembly language generated for approximately what the run-time will look like. Information about the assembly language research after the Continued break...

Continued... »

Friday, July 16, 2010

Translator – Operators (Change Needed)

A problem has surfaced as the new table entries were being added for temporary strings. The problem is unrelated to strings, but is related to integers. The plan was to make use of a associated codes as each operand is processed. Using plus operator as an example, there would have been these codes (with their return and operand data types):

Add (Dbl) Dbl Dbl (also handles Int Dbl and Dbl Int with CvtDbl)
AddInt (Int) Int Int
CatStr (Tmp) Str Str
CatStrT1 (Tmp) Tmp Str
CatStrT2 (Tmp) Str Tmp
CatStrTT (Tmp) Tmp Tmp

The Add code would have the associated codes of AddInt, CatStr and CatStrT1. In other words, one code for each of the four data types of the first operand. For handling the second operand for strings, the CatStr code would have an associated code of CatStrT2, and the CatStrT1 code would have an associated code of CatStrTT. The main code would be selected if the second operand was a string (i.e. could not be determined to be a temporary string), and the associated code would be selected if the second operand was a temporary string.

With this scheme, a problem occurs when the first operand is an integer, where the AddInt code would be selected. But what happens when the second operand is a double? The code needs to revert to the Add code since integers need to be promoted to doubles (doubles do not demote to integers).

This could be handled by making the Add code an associated code of the AddInt code. But a CvtDbl needs to be added for the first operand, but the first operand is no longer being saved (except for strings that are not temporary). Which means that integer first operands would need to be saved. Anyway, this all started to get rather complicated. I have some ideas on how to resolved this, but some test code needs to be tried to see which idea is better...

Tuesday, July 13, 2010

Translator – String Processing (Change)

The planned changes for operator and internal functions where their operands (arguments) will no longer be kept on the done stack conflicts with the way strings are handled.

If an operator or internal function has string operands, all the operands are attached to the operator or internal function. This is necessary for string operands because the Translator has insufficient information to determine if generic string tokens (with and without a parentheses) are temporary or not. Temporary strings need different codes at run-time because the temporary strings are deleted when no longer needed. The Encoder will determine if these tokens refer to variables and arrays (not temporary) or functions (temporary).

As mentioned, all operands are attached, which is unnecessary because only the string operands are actually needed. Further, the Translator can determine if some strings are temporary, like the result of the concatenate operator and most of the internal functions that return a string. The sub-string functions only return a temporary string if their argument is a temporary string.

To resolve this conflict between saving the string operands and the new operand processing, if a string operand (or argument) cannot be determined to be a temporary string, it will be left on the done stack. When the operator or internal function is appended to the output, it's specific code for the data type has already been determined. If the code has one or more string data type operands (not temporary string operands), the string operands will be counted to determine how many strings were left on the done stack. These strings will be popped and attached to the operator or internal function token.

Monday, July 12, 2010

Translator – Operand Processing (Change)

Previously, operands were processed (checking types and inserting conversion codes) after all of the operands were received. The operands are pushed to the done stack. Operators are temporarily pushed to the hold stack and are removed before a lower precedence operator is pushed to the hold stack (the method of rearranging operators according to their precedence).

Similarly for functions and arrays that are not processed until the matching closing parentheses token is received, at which time all the arguments or sub-scripts are on the done stack. No type checking can be performed for non-internal functions and arrays, so their operands are saved with (attached to) the array or function token.

A change has already been implemented for multiple list assignments where each assignment item (variable or sub-string) is checked as received, at either comma or equal tokens. Something similar can also be performed for internal functions, in fact, some logic has already been added that determines if a comma or closing parentheses token is valid after each argument. Logic can be added to also check the data type of the argument, adding conversion operators when necessary.

It has also been determined that for a binary operator, it data type specific code is needed on the hold stack based on the first operand of an operator (for possible error reporting). If necessary, a conversion code will be added at the same time. And it is now not necessary to keep the first operand on the done stack, so it can be popped. When the operator is emptied from the hold stack, only the second operand needs to be checked.

It is possible that the operator code may need to be changed from the code found based on the first operand. Currently there is only one operator that has more than one code with the same first operand data type but different data types for the second operand (Power and PowerMul, discussed here). The multiple table flag can be used to identify these type of operators (this is the same flag used to identify internal functions with different codes for the number of arguments). When set, the Translator will look for the correct code instead of inserting a conversion code.

Sunday, July 11, 2010

Translator – Expressions Type (Implementation)

This whole expression type subject is getting rather involved and I am not convinced that are situations will be handled correctly. Consider one more example:

Z$ = A$ + B$ < C$ + D$ = E < F

The expression is a perfectly valid integer expression, however, it is the wrong type for a string assignment. Previously, the error would have been “expected string expression” pointing at the = token (confusing). With the first operand implementation, the error will be pointing to the A$ token (good), in other words, the beginning of the numeric expression. Conceivably a better error might be “expected string operator or end-of-statement” pointing at the < token. But going with the “what is entered is what was intended” then the first operand implementation may be good enough.

The best course of action is to proceed with the first operand implementation and see what the results are.

One last action needs to be performed when operator tokens are received. In order to produce the correct error when a statement prematurely ends after an operator (when another operand is expected), the data type of a binary operator's second operand needs to be known. Therefore, before pushing an operator to the hold stack, the specific code of the operator will be determined based on the first operand, which will be on top of the done stack, and this code will be pushed to the hold stack. If an error occurs, the top of the hold stack can be checked to see what type of expression is expected to follow. (The find code routine with respect to operators, may need to be rewritten as a result of this change.)

Translator – Prematurely Ended Expressions

A solution has been defined for the issue of identifying the correct location of a data type error for properly completed expressions, that is, expressions that end with an operand (when an operator is expected next). But there is still an issue when an expression ends prematurely – when an operand is expected after an operator. Consider these examples:

Z$=A$+B$+
Z%=A$+B$<
Z$=A$+B$<
Z%=A$+B$+

Each of these could be either an “expected numeric expression” or “expected string expression” error pointing to the end of the expression (the token or EOL after the last operator). But could better errors be output? For example 1 and 2, an “expected string expression” error pointing to the end of the statement is appropriate because a string expression would make these statements correct.

The situation is a little murky with example 3, but it appears that the final expression should be a string, where is as the < operator takes string operands to produce an integer. A string expression is expected after the < token, but this would make the final expression the wrong type. It would appear that the correct error should be “expected string operator or end-of-statement” pointing to the < token.

The situation is very murky with example 4. The expression could be correctly finished with say a “C$<D$” so an “expected string expression” could be appropriate. However, final + could also be replaced with a “<C$” so an “expected numeric operator” could be appropriate also (but without the words “or end-of-statement” since that would make the wrong final expression type). This may be unclear since not any numeric operator would work (it must be an operator that takes string expressions and produces a numeric value, specifically the relational and equality operators – the error message could be reworded to indicate this).

It is best to assume that what has been entered up to the error is what was intended. So for example 4, the error should be “expected string expression” pointing to the end-of-line. However, for example 3, even with a string expression, the final expression would be the wrong type. While there are operators that take string operands and produce numeric results, there are currently no planned operators that take numeric operands and produce string results. This assumption could be used to define the proper error, but these type of operators could be added later. There is one such operator in FreeBASIC - the & concatenate operator can take any type argument including numeric operands and produce a string result. So it will be assumed that these type of operators could exist.

Saturday, July 10, 2010

Translator – Expression Types and Unary Operators

Unary operators only have one operand, so do not have a first operand. In fact, any errors reported at the unary operator sub-expression, should point to the unary operator, not it's operand (unless the operand has an error). Therefore, the first operand is only attached to a binary operator pushed to the done stack. The first operand token will be set to the default value of NULL for a unary operator.

Translator – Expression Types and Parentheses

Normally during the translation process, parentheses tokens are removed. However, for reporting errors with the data type of an expression, if an expression starts with an parentheses and is the wrong data type, then the error needs to point to the opening parentheses, not to a token inside the parentheses.

When a closing parentheses is received and processed, after emptying the hold stack of all operators, the opening parentheses will be popped from the hold stack. Previously, both the open and closing parentheses tokens were deleted. If the parentheses were unnecessary, the parentheses sub-code was set in the last operator appended to the output. The last operator will also be on top of the done stack with it's first operand. Consider this invalid statement:

Z$ = A$ + B$ + (C + D * E)

When the closing parentheses is processed, there will be a * token on top of the stack and its first operator will be set to the C token. The “expected string expression” should point to the open parentheses. Therefore, the closing parentheses needs to change the first operand token to the open parentheses token. This also means that the open parentheses token can't be deleted.

Since the open parentheses token can't be deleted, it must be marked as a temporary token (using a Temporary sub-code). When an operator token is popped from the done stack (either by another operator or at the end of the expression when a command is processed), if it contains a temporary first operand token, the token must be deleted.

When an error occurs, or an expression is prematurely ended (also an error), the data type of the first operand will be checked to determine which error will be reported (based on what type of expression is expected). Open parentheses tokens do not have a data type. Therefore, when an operator's first operand is set to an open parentheses token, it must inherit the data type of the first operand that it is replacing.

Parentheses may also be nested, so another open parentheses token may replace a first operand that is already set to an open parentheses. No extra checking is necessary since the previous open parentheses will have a data type – the new open parentheses token will inherit the same data type.

Friday, July 9, 2010

Translator – Expression Type (Procedure)

To show how keeping the first operand will aid it reporting errors at the correct token, consider these examples again:

Z% = A$ + B$ + C$
Z% = A$ + B$ > C$

The first statement is processed in this sequence:

The Z% token is appended to the output and pushed to the done stack.
Since the mode is Command, the = token is interpreted as an assignment, so an AssignInt command is pushed to the command stack, the Z% token is popped from the done stack and the mode is set to EqualAssigment.
The A$ token is appended to the output and pushed to the done stack.
The first + is pushed to the hold stack, and being an operator, the mode is changed to Expression (further equal tokens will be interpreted as an equality operator).
The B$ is appended to the output next and pushed to the done stack.
When the second + is received, it empties the first + from the hold stack (being the same or higher precedence).
The first + pops the A$ and B$ from the done stack, and a +$ is appended to the output. The first operand, A$, does not contain a first operand (it's not an operator), so the +$ is pushed to the done stack with A$ as it's first operand.
The second + is pushed to the hold stack.
The C$ is appended to the output.
The end of statement empties the second + from the hold stack.
The second + pops the +$(A$) and the C$ from the done stack, and a +$ is appended to the output. The first operand, the +$(A$) has a first operand, A$, so the second +$ is pushed to the done stack with the A$ as it's first operand.
The assign command handler will be called since there is an assignment on the command stack, which will pop the value being assigned, the second +$, and will see that it is the wrong data type (an integer is expected), so an “expected numeric expression” is reported. But instead of pointing to the second +$, it's first operand, the A$ token, is returned.

For the second example, the >$ that would be on top of the done stack when the assign command handler is called. It's data type is an integer, which is correct, so no error occurs. However, sub-expressions in parentheses need additional handling...

Thursday, July 8, 2010

Translator – Expression Type (New Design)

The new design for the rest of the data type error detection consists of remembering the token of the first operand of each sub-expression within an expression, so that an error can be reported against this token when a data type error is detected. Consider this invalid assignment statement:

Z% = A$ + B$ + C$

Currently this reports an “expected numeric value” at the second + operator. It should report “expected numeric expression” at the A$ token. However, the detection can't occur at the A$ token before the entire expression is processed. Consider this valid assignment statement:

Z% = A$ + B$ > C$

The expression becomes an integer at the > operator, therefore an “expected numeric expression” can't be reported at the A$ token.

Each binary operator token appended to the output is also pushed onto the done stack, replacing it's operands. The token of the first operand of the operator will be attached to the operator when it is pushed to the done stack. If the first operand is another operator, then this operator's first operand is attached (in other words, the operator will inherit the first operand's first operand if there is one).

Wednesday, July 7, 2010

Translator – Assignments (Development)

The last several days was spent implementing the new design for handling assignments. The assignment operators are no longer handled by the operator routines, and now by the comma and equal token handlers and the assign command handler (which previously didn't do anything). Two support functions were also implemented, one to put the appropriate assignment command on the command stack based on the first (perhaps only) item being assigned, and the other to check each assignment item for the correct data type (allowing for mixed string and sub-strings).

At the end of the statement, the value (expression) being assigned is checked for the correct type in the assign command handler, adding a hidden conversion as needed for the numeric data types. Because the assignment operators are no longer handled as binary operators, the table entries for the assignment operators were modified where each only has one operand (for the value being assigned).

Also, tokens with parentheses being assigned can be assumed to be arrays since a function with arguments cannot be assigned (only the function name alone, without parentheses, can be assigned). Therefore, unlike tokens with parentheses in expressions that can be either an array or a function call, the values in parentheses of an array being assigned can be assumed to be subscripts, which must be integers (or doubles with conversion). If it turns out that the name is not an array when encoded, the Encoder will report the error.

Several other data type reporting error issue were also corrected but without adding any special expression type handling as was planned in the failed design concept. A new concept was developed, which probably won't require the Translator to keep track of the expression type while translating. More details to follow...

Saturday, July 3, 2010

Translator – Expression Type (Failed Design)

The first idea on how to implement expression type checking into the Translator failed. It's not necessary to go into the details of the design, but basically the idea was to set the expression type at the start of the expression. For assignments, the expression type would be the same as the variable(s) being assigned. For PRINT statements, any expression type would be allowed, but would be set based on the first operand. For INPUT PROMPT, the expression type would be string. However, certain expressions would trip this up. First consider this statement:

Z = A$ + B$ + C$

Currently this would report an “expected numeric expression” at the second plus. It should report the error at the A$. The idea was at the equal, the expression type would be set to numeric, then upon seeing A$, which is a string, an error would be reported. However, consider this valid statement:

Z = A$ + B$ < C$

The Translator can't set the expression type to numeric and then report the error at the A$ because the expression eventually becomes numeric at the less than.

The bottom line is that the Translator can't determine if the expression is an the correct type until the entire expression is translated, but, still needs to report at the first instance where the data type error occurs, not at the last operator processed. The goal now is to get the new assignment implementation working (needed first anyway), and then get the expression type handling implemented...

Friday, July 2, 2010

Translator – Assignments (New Implementation)

The new implementation of assignment statements consist of removing the existing code list assignment processing from the add operator routine and adding processing to the equal token handler, comma token handler and the assign command handler (which currently does nothing).

Comma: When the first comma token is received, the data type of the assignment is set to the first assignment item and an assign list code appropriate for the item's data type is pushed to the command stack instead of pushing the main assign list code to the hold stack. If there is a LET command already on the command stack, it will be replaced with the assign list code.

For each additional comma token, the data type of the assignment item will be checked to make sure it matches the current data type. If the data type is a string or sub-string and the new item's data type is also a string or sub-string, but not the same, the assign list code will be changed to the AssignListMixStr code. When an equal token is received in the comma assignment, the mode is set to expression after the last list assignment item is checked and the expression type is changed to Numeric or String.

Equal: When an equal token is received first, the data type of the assignment is set to the assignment item and an assign code appropriate for the item's data type is pushed to the command stack instead of the hold stack. If there is a LET command already on the command stack, it will be replaced with the assign code. The expression type is changed to Numeric or String in case the expression starts after the equal.

If a second equal token is received, the assign token on top of the command stack is changed to the appropriate assign list code. For each additional equal token, the data type of the assignment item will be checked to make sure it matches the current data type. The same check for strings made for each additional comma token is also made for each additional equal.

I realized that it is not necessary to save string variables in assignment and list assignment statements – it is already known these operands will not be temporary strings. Only a string value needs to be saved for string assignments as it may be a temporary string. Also, the Translator can identify some strings as temporaries (the result of the concatenate operator and all but the sub-string internal functions that return a string). Modifying the Translator for these temporary strings will be done after the INPUT command is implemented.

Thursday, July 1, 2010

Translator – Assignments (Change)

The way assignment and list assignment statements are processed needs to be changed to work with the expression type handling. Currently the assignment tokens are processed as operators, pushed to the hold stack when received and processed when emptied be a lower or same precedence token is received (an EOL token). The processing is performed by the find code routine activate by seeing the reference table flag set, with the additional list assignment processing in the add operator routine after find code returns.

A better method is to process assignments and list assignments as the statement is processed. This means as each comma and equal tokens are received, the data types of items being assigned are checked immediately. The first item received (a token with no parentheses, a token with parentheses or a sub-string function) determines the type of the assignment. For each additional item received in the assignment list, the data type of the item must match exactly, except for the string data type where strings and sub-strings may be mixed in the same list assignment statement.

Once the expression starts, the expression type is set to either Numeric (for the double and integer data type since both can accept either for an assignment value) or String. At the end of the statement, the data type of the expression will be checked, a hidden conversion operator will be added if necessary, followed by the assign or assign list token.

This new method is simpler than the current method, which contained involved code for detecting errors in the list assignment – the first error in the statement needed to be reported. Complicating matters was the fact that the assignment list items are processed backwards because the items are popped from the done stack in the reverse order from how they were received.

Wednesday, June 30, 2010

Translation – Expression Type

The Translator needs to keep track of the expression type as an expression is translated. The expression type is similar to the data type, but not the same. There are two types of expressions, Numeric (the double and integer data types are interchangeable in expressions) and String (including the sub-string data type). The issue was discovered when working on the INPUT PROMPT string expression, but the problem applies to all expressions. Consider these examples that will currently report an error at an inappropriate token:

Z = A$ + B$ + C$
^-- expected double

Z$ = A + B * C
^-- expected string

These errors will be confusing. The errors in both cases should be pointing to the A$ or A variables and should report “expected numeric expression” and “expected string expression” errors. Using just the word double in the message might imply that an integer expression is unacceptable, which is not the case.

The expression type may need to be set to a generic Any expression type, like would be needed for the PRINT command's expressions or the arguments of non-internal functions. For internal functions, the expression type can be set to the correct value for each argument (using the Numeric expression type if the argument is a double or an integer).

For arrays, the subscripts need to be integers (Numeric), but remember that the Translator does not know if a token with a parentheses is an array or a user function. So the expression type will have to be set to the Any expression type and the Encoder will have to do the checking once it is known whether the token is an array or user function.

Tuesday, June 29, 2010

Translation – INPUT and Error Reporting

Upon designing the actions required by the add input codes routine, I realized that it may need to point to a different token when an error is detected. Consider this example:

INPUT PROMPT A;B

The error will be “expected string expression” and should be pointing to the A, but the current token being processed is the semicolon token. Therefore, instead of passing the token code of the token being processed, a reference to the token pointer will be passed so that it can be changed to point to the actual token with the error.

For The INPUT command handler (called at end of statement tokens), a Colon code was going to be passed regardless of the actual end of statement token, but now a reference to a token pointer will be passed. This will be whatever end of statement token, which was passed to the INPUT command handler and will be passed on to the add input codes routine.

This led to another problem, consider this example:

INPUT PROMPT A*B+C;D

The + operator will be on top of the done stack after the expression is processed, so the error will be pointing to + since it will have the double data type. The error should be pointing to the A indicating that a string expression was expected. This problem applies to other statements as well, so some sort of expression type detection is needed...

Monday, June 28, 2010

Translator – INPUT Command (Design)

As entries were being written for INPUT with Semicolons, INPUT with Commas, and INPUT Command Handler, there was a lot of copying and pasting of the same text for what the will need to do for translation. In considering how to reduce the amount of words, a realization was made that this is an indication that since the code is the same or very similar that perhaps the common code should be put into a single routine.

The plan was already to have an add input code routine like the add print code routine. But since there is more common code for the INPUT and INPUT PROMPT commands, the functionality of this add input codes (plural since multiple codes are involved) routine will be expanded. This routine will need to know the command being translated (INPUT or INPUT PROMPT) along with the current command flags and will need to know which token is being processed (Semicolon, Comma, or an end of statement token).

The first argument then will be the command item from the command stack, which contains the current command code, command flags and the command's token. From the Semicolon and Comma token handlers, this will be the top command item on the command stack. From the INPUT command handler, the command item has already been popped from the command stack and is passed as an argument. This is why the add input codes routine can't just use the top of the command stack.

The second argument will just need to be the code for the token being processed. The Semicolon and Comma token handlers just need to pass their own code. The INPUT command handler will pass the Colon code, which will be interpreted as the end of statement. (Noting that the INPUT command handler could be called from any end of statement token like EOL, Colon, ELSE, and ENDIF; so using the Colon code is appropriate.)

Sunday, June 27, 2010

Translator – New Token Modes

Translating INPUT statements will require additional token modes. The current token modes are:

Command – Translator is expecting a command token (or start of an assignment)
Assignment – Translator is expecting an item for an assignment statement
EqualAssignment – An equal token was received when the mode was Command or Assignment; another equal token would indicate a multiple assignment statement (commas are not permitted)
CommaAssignment – A comma token was received when the mode was Command or Assignment; another comma token would indicate continuation of a multiple assignment statement (an equal token would indicate the end of the list and the begin of the expression)
Expression – Translator is expecting operands of operators depending on the current state

When a semicolon appears at the end of an INPUT statement, no further tokens should be received except for an end of statement token (EOL, colon, ELSE, and ENDIF). A new mode is required so the Translator can make sure no additional non end of statement tokens are received:

EOS – Translator is expecting an end of statement token only

An INPUT statement contains variable(s) that are to be input. Expressions are not allowed (except for the string expression after the PROMPT keyword, or within subscripts of array variables). The INPUT translation could be implemented to check if the token on top of the done stack has the reference flag set, and if not, report an “expected variable” error. But that could leave to strange errors being reported, consider this example (with the translation of the expression):

INPUT A*B+C A B * C +

The + will be on top of the done stack after this expression is translated (being the result of the translated expression). The + token will not have the reference flag set since it is an operator. The INPUT is expecting a reference, so it would report “expected variable” pointing to the + token. This would be very confusing – why would a variable be expected at the +? The correct error should be “expecting comma or end of statement” pointing to the * token. A new mode is required so the Translator can make sure no operators (except for end of expression operators comma, semicolon and EOL) are received:

Reference – Translator is only expecting reference tokens and end of expression operator

Reference tokens include tokens without parentheses and tokens with parentheses. However, these type of tokens could be variables, arrays or functions, but the Translator is not able to determine which. Therefore, the Encoder could still find errors if a user function was placed in an INPUT statement. Lastly, sub-string functions (while valid in assignment statements) are not valid in INPUT statements, therefore the Reference mode needs to check for these and report an error.

Translator – INPUT command

Similar to the translation of the PRINT statement, a lot of the translation work of the INPUT statement will take place in the semicolon and comma token handlers, with the INPUT and INPUT PROMPT command handlers being called at the end of the statement. The codes have been renamed for consistency and clarity, InputGet code will now be InputBegin, and InputPromptStr and InputPrompTmp will now be InputBeginStr and InputBeginTmp (the Tmp versions are not handled by the Translator).

There will be two command flags to keep track of the INPUT statement translation. The first is the InputBegin command flag, which indicates whether an InputBegin has been appended to the output yet. The semicolon and comma token handlers will use this flag to determine if the InputBegin code has been appended (INPUT) and whether a prompt string expression result is expected (INPUT PROMPT).

The semicolon will also use the InputBegin command flag to determine if it is at the end of the statement, which determines whether to set the second command flag, InputKeep. The InputKeep flag will be used by the command handlers at the end of the statement to determine whether the InputKeep sub-code flag should be set in the Input and InputPrompt codes at the end of the translated statement.

Saturday, June 26, 2010

Translator – Error Handling (Preliminary Release)

I decided to try using CVS branching for development of new releases. This way, working code (but not ready to release) could be committed to the CVS repository and development can continue. Differences made can then be easily checked. What has been done was good files were copied to a different file name, which is fine for one file, but when a whole set of files are involved, it becomes pain. This accounts for the strange CVS revision numbers in the source file, but should clear up once an official release is made.

Since the known error reporting issues have been resolved, this is a good time to make a preliminary release before the changes to implement the INPUT command begin. There are now many test inputs for testing errors (there should probably be more). The file ibcp_0.1.14-dev-1-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program. Now implementation of the INPUT command can begin...