Friday, July 5, 2013

New Translator – Command Translation

To start the translation of a command, a new get command routine was implemented and starts by getting a token taking into account an assignment statement that does not have the LET keyword.  If the first token has an error, the "expected command" error is returned since it does not matter what type of parser error was detected.

If the token obtained is a command token, the pointer to the translate function for the command is obtained from the table.  Otherwise an assignment statement is assumed and the pointer to the translate function for the LET command is obtained.  The token will be passed to the LET translate function.

The interface of the translate functions contains a reference to the translator (so the command can access the various translator routine like the get token and get expression routines), a pointer to the command token (so the command can add it to the output list), and a reference to a token pointer to be used to return the token the terminated the command or where an error was detected (and will be used to pass the first token to the LET translator function for an implied assignment statement).  The token status is returned.

If the translate function pointer is not set, the token is marked unused and a "not yet implemented" error is returned.  For now, no translate function pointers have been set in the table (none have been implemented).  The translate functions will replace the command handlers.  The token handlers are not needed with the new translator.

The new translator routine was modified to call either the get expression or the new get command routine depending on the expression mode argument.  The expression mode argument does not need to be saved with the new translator routines.  A temporary '-nt' test option was added to access the new translator routines for statements, and similarly the '-n' test option was expanded to support translator test files.  Obviously none of the translator tests succeed with the new translator routines.

[commit de01f48ebb]

New Translator – Parentheses Expressions (Tagged)

The implementation of expressions for all tokens with parentheses including open and closing parentheses is now complete.  The new translator routines now fully support expressions and version v0.4.1 has been tagged.  Note that expression test #1 currently still fails with the regression and memory test scripts because of problems with the expression mode in the old translator routines.  The new translator routines run all expression tests successfully.  Implementation of command translation can now commence in the new translator.

[commit ac6658f16f]

New Translator – Arrays/User Functions

There are four token types used for arrays and functions, which include identifiers with and without parentheses (could be either a variable, an array or a user function to be determined by the encoder), and a defined function with and without parentheses.  A define function without parentheses simply gets added to the output list and push to the done stack like constants, identifiers with no parentheses,  and internals functions with no parentheses.  To support the other two types (identifiers and define functions with parentheses) the get operand routine was updated to call the newly implemented get parentheses token routine.

The new get parentheses token routine starts by pushing the parentheses token to the hold stack, which being of low precedence, will create a border as the expressions of the arguments are processed.  A number of operands counter is initialized and a loop is entered for each operand by first calling the get expression routine.  If the parentheses token is an identifier with parentheses, it could be a user function.  Arguments of user functions are passed by reference, so any operand that could be an variable or array element has its reference flag set.

The terminating token is then checked.  For a comma, the token is deleted and the operand is counted.  For a closing parentheses, the existing process final operand function is called, which upon success attaches all the operands, appends the token with parentheses to the RPN output list, and pushes the token to the done stack.  The token with parentheses is then dropped from the hold stack.  For other terminating tokens, the appropriate error is returned.

One expression in test #3 had a different result from the old translator routines due to two issues.  First, an array in the expression incorrectly had its reference flag set by the old routines (the new routines correctly did not).  Second, the argument of a define function along with the define function incorrectly had their reference flags set.  For a defined function, the old routines assumed the define functions would be passed by reference (and set the reference flags of arguments).  This is no longer the case (see last post).   These minor issues were corrected in the old routines and results for expression test #3 were updated.

[commit 847c5d078e]

New Translator – User Functions

User functions can take two forms, a full user function (using the FUNCTION syntax) or a simple define function (using the DEF FN syntax).  A full user function will pass arguments by reference.  This only applies to variables and array elements.  Results of expressions will be passed by value.  To keep things consistent internally, a temporary value will be allocated and a pointer to it will be passed, so all arguments will be a reference.  To force a single variable or array element to be passed by value, it can be surrounded by a set of parentheses.

Define functions will have two forms, a single line form that looks like an assignment and a multiple line form (that will end with an END DEF statement have one or more assignments for the return value).  Unlike full user functions, arguments to define functions will only be passed by value.  This make sense for the single line form since the arguments can't be assigned inside the function, and to keep the internal code consistent between the two forms, the multiple line form will also have arguments passed by value.  Arguments will be local variables, so any assignments will not affect the original variables.

This is the same method of argument passing used by QBASIC, and seems to be a reasonable design choice, so will also be used for this project.  Subroutines (the SUB syntax) will use the same pass by reference scheme as full user functions.  The passing of entire arrays will be dealt with later.  QBASIC allows this by listing the array name followed by an opening and closing parentheses with no subscripts.  This same syntax may be used for this project.

To handle function calls in the translator, the reference flag of an operand is set if it is an identifier with or without parentheses.  These tokens could end up being function calls, but this will be handled by the encoder.  Identifiers with parentheses could also end up being an array, where its arguments are integer subscripts.  Again this will be handled by the encoder, which will add any needed internal convert to integer codes (for double subscripts) or report errors for string subscripts.  The reference flag in any subscripts will be ignored.  Regardless of whether the identifier is an array or function, the translator will attach a pointer to each argument for the encoder.