Saturday, March 27, 2010

Translator – Commas

The commas that separate the subscripts or arguments will be processed as operators. The precedence of the comma operator will be the same as the closing parentheses, which will empty the hold stack up to but not including the array or function token. The comma will be processed as a token requiring special processing, where the comma will simply be counted and the comma token will not be pushed onto the hold stack.

This will require a number of operands counter. This counter is initialized to 0. When a parentheses token is pushed onto the hold stack, this variable is set to 1 indicating there is an array or function token on the stack. Each comma operator processed increments this counter. If no commas tokens are seen, the counter remains set to 1. If the counter is 0 when a comma token is seen, then an “unexpected comma” error will be reported.

So now, upon reaching the closing parentheses, if the counter is 0, then an open parentheses token is expected to be on top of the hold stack as previously implemented. If the counter is 1 or more, then the counter contains the number of operands added to the output list and an array or function argument is expected to be on top of the hold stack. The array or function token is then popped off of the hold stack and added to the output list.

Translator – Arrays and Functions

As mentioned, the Translator will not know the number or subscripts that arrays should have.  It will also not know (except for internal functions) the number of arguments that functions should have. Worse, the Translator will not even know if an identifier with a parentheses refers to an array or a function. (The Translator will be able to identify a define function since the Parser has already identified these by the “FN” with their own token type.)

As it turns out, translation wise, arrays and functions can be are handled identically – one has subscripts and the other has arguments. But these can also be thought as operands just like for operators except where operators have one (unary) or two (binary) operands, arrays and functions can have one or more operands. Functions with no arguments (including the IntFuncN and DefFuncN token types) are already being handled as simple operands – the Translator didn't need to distinguish between variables and functions with no arguments.

To translate arrays and functions, a token that has a parentheses will be pushed onto the hold stack. The precedence of these tokens will the same as the open parentheses (between Null and close parentheses for the same reason) to keep them on the hold stack. Upon reaching the matching closing parentheses token, the number of operands will be counted (and validated for an internal function). The array or functions token will then be popped off of the hold stack and added to the output list. But exactly how will the operands be counted...

Translator – Functions

Now that the translation arrays have been defined, it's time to define the translation of functions. Functions (whether Internal, one-line Defined or User) have arguments separated by commas. Take the example MID$(A$+B$,INT(A+0.5),I+5) containing two internal function calls where the tokens will be parsed as:
MID$( A$ + B$ , INT( A + 0.5 ) , I + 5 )
Again there are tokens for commas and closing parentheses. This expression will be translated to RPN as:
A$ B$ + A 0.5 INT( I 5 + MID$(
The functions are found after their associated arguments. Functions will pop their already evaluated arguments off of the run-time stack (which will be in reverse order), perform there operations, and then push their result back onto the run-time for the next part of the expression.

Though this example contains only internal functions, one-line defined functions and user functions work the same way. The big difference is that Translator (using Table) will know the number of arguments that internal functions have and can perform error checking for the number of arguments. However, this can't be done for one-line defined (FN) functions and user functions. This job will be left for the Encoder (which will access the Dictionary for this information).

Translator – Arrays

Arrays have subscripts as in A(I) and B(3,X). For now there will be no limitation on the number of subscripts that an array can have (there will be a limitation once the internal language is defined, but this limitation will be sufficiently large). Take the example A(X+5,Y*2,Z) for a three dimension array element where the tokens will be parsed as:
A( X + 5 , Y * 2, Z )
This includes tokens for the commas separating the subscripts and the closing parentheses. Also note that in this example, the subscripts can be expressions. This array element expression will be translated to RPN as:
X 5 + Y 2 * Z A(
Note that the subscript expressions are first with the array at the end. When this array element is processed at run-time, the expressions are processed as discussed before. The run-time stack will contain the values X+5, Y*2 and Z (Z on top of the stack) when execution reaches the array name.

The instruction code for the array will contain the number of subscripts, so when the array element offset is evaluated, it will know that there are 3 subscripts on the stack (in reverse order) that need to be popped. Once the offset is calculated, the value of the array element will be obtained from memory and pushed back onto the stack. This value will then be the operand for the rest of expression the array element is in.

There is a problem with arrays during translation, and that is that the Translator will not know the actual number of subscripts that a particular array will have. Therefore, the subscript checking will be delayed for the Encode, which will use the Dictionary to obtain this information. Remember that the purpose of the Translator is to rearrange the incoming program code into RPN format, doing as much error checking as possible.