Saturday, May 15, 2010

Translator – Temporary Strings (Release)

Before testing the changes, the test code was modified to output the operands that were saved by the Translator within square brackets separated by commas. This change was made to see if operands were being saved correctly for the correct tokens (array, user functions, operators with string operands and internal functions with string operands). Only the primary operand token is output, so it the operand is an operator, that is what is output.

The code appears to be working, at least for all the current test inputs with one exception (no new test inputs were added at this time). If a conversion operator was inserted for an operand, the operand is not pointing to the conversion operator. For example, for the statement Z$=MID$(A$,B+C,D), the MID$ token is output as MID3$([A$,+,C], but should be output as MID3$([A$,CvtInt,CvtInt] since conversion operators were inserted for the + and D operands.

Other than this problem the code appears to be working. Since there is more work is needed to complete string data type handling in the Translator and not much was changed (temporary strings, which had a minor affect on the Translator; and operands are saved for later processing as needed for the Encoder), the code is being released only as a developmental release and ibcp_0.1.11-dev-1-src.zip has been uploaded at Sourceforge IBCP Project along with the binary for the program.

The rest of the string data type handling has to do with the sub-string functions (aka LEFT$, MID$ and RIGHT$), which will handle strings slightly differently during run-time...

Translator – Temporary Strings (Implementation)

Upon making the changes to add temporary strings, I realized that there are more string operators than just the CatStr operator – there are the equality, relational, assign, and assign list operators, all of which can have string operands (either reference or temporary). For the equality and relational operators, the character arrays of any temporary strings need to be deleted after the comparison is made.

For the assign operator, if the operand is a reference string, then a new character array needs to be allocated and the reference string copied. However, for a temporary string, the string variable's character array can be set to the temporary character array (making it a reference string) and the current character array is deleted, eliminating the need to allocate and copy. For assigning a list, the temporary string can only be used for one of the string variables in the list, the rest need new arrays allocated.

There is already a String flag used for immediate commands, and since it's value doesn't conflict with any of the existing flags (immediate command flags are separate), it can also be used for operators and internal function codes that have string operands. A note was added to the code to make sure it's value is not used for another flag value.

A constructor was created for the new RpnItem structure that takes a token pointer, number of operands and a pointer to an operand array as arguments (the later two default to 0 and NULL). If the number of operands and operand array pointer is supplied, an operand array will be allocated for a non-zero number of operands and the operand pointers will be copied from the supplied array. A destructor was also created to delete the token and the operand array.

Translator – Temporary Strings

The impact of temporary strings on the Translator is not much since the work of finding associated codes will be left for the Encoder. However, there are a few changes that are required – the main one is passing the operands to the Encoder so that it has easy access to them.

A new TmpStr data type will be added. The data type of the string operator (CatStr) and the internal functions that return a string will be changed to TmpStr.  This includes the internal functions CHR$, REPEAT$, SPACE$ and STR$. Something else is needed for the functions LEFT$, MID$ and RIGHT$, which will be revealed shortly. Any string DefFunP and DefFunN token types will also be changed to TmpStr since defined (one-line) string functions will return a temporary string during run-time.

The conversion code table used by the match code function needs to be expanded to include the new TmpStr data type. As far as the Translator is concerned, the String and TmpStr data types are the same, therefore the conversion code table entries for String to TmpStr and TmpStr to String will be set to the Null code.

For the string operator and internal string functions that have string operands, the operands need to be attached to the operator/function token within the output list. To accomplish this, the output list will be changed from a token pointer to a pointer to a new RpnItem structure that will contain the token pointer, the number of operands (0 if not applicable) and a pointer to an array of output list item (RpnItem) pointers.

There will be a new String flag added for these codes so that the Translator can easily identify them. When this flag is set, the number of operands will be set and the output list pointer array will be allocated. This array will be filled from the operands popped from the done stack. The done stack will also be changed to a stack of RpnItem structure pointers.

This saving of operands functionality is also needed for parentheses tokens, which can be arrays or user functions. The processing of array subscripts and user function arguments also needs to be delayed until the Encoder since the Translator doesn't know the difference between an array (whose subscripts all need to be integers) or a user function (whose argument types will be contained in the dictionary and defined in a function definition statement).