Wednesday, December 30, 2009

Parser

The parser needs to take an input line and separate out tokens so that the translator can begin the process of converting the line into the internal RPN format. The tokens will be one of several different types:
  1. Command Name
  2. Internal Function Name
  3. Remark (Comment)  (New)
  4. Operator
  5. Variable Name
  6. Array Name
  7. User Function Name
  8. Constant
The first four on this list are part of the BASIC language and will be in listed in the Operator Table. The Operator Table will contain several pieces of information like the priority of the operator, the internal code of the operator, the string representation of the operator (used by the Parser and the Recreator), the function to call when running the program, etc. The items in the Operator Table will be expanded as the Interactive BASIC Compiler is developed.  For now the Operator Table will contain the strings and the type (command, internal function or operator). For functions, there will also be a data type (integer, double, string or print).  (New) Comments will require special handling by the parser.

The last four on this list each will also have a data type associated with them. If a token is not found in the Operator Table, then it is a member of one of the last four, or it is an invalid token (for example, if a symbol is found that is not an operator).

The Parser will return one token with a type and data type at a time. Along with each token will be the column that the token starts. This column will be used for error reporting. There's no point in converting the entire line into tokens before the tokens are processed.

Updated Wednesday, January 6, 2010; 10.55 pm: Added information about remarks.

No comments:

Post a Comment

All comments and feedback welcomed, whether positive or negative.
(Anonymous comments are allowed, but comments with URL links or unrelated comments will be removed.)