Wednesday, December 30, 2009

Language Definition – Identifiers

Before getting into how the Parser is going to identify token types, there needs to some definition of what the identifiers will look like.  There will be no limit placed on the size of identifiers, however, there is a practical upper limit because of the program line length.  While I would like to limit line lengths to say 80 characters, this is probably not realistic.  Some type of line wrapping will be necessary, but that's a problem for another day.

Identifiers must start with a letter, but may contain any number of letters and numbers plus the under-bar character.  Identifiers will be case insensitive, however, identifiers will be saved as first entered.  In other words, if a variable name like SomeVariableName is entered, that's how it will be saved, but any form like somevariablename, SOMEVARIABLENAME, SOMEvariableName, etc. will refer to the same variable, but the name will be displayed as it was first entered.  (There will be allowance to rename variables later.)

Identifiers must be unique between variables, arrays, functions, subroutines and must not be any of the reserved BASIC commands and operators (e.g. PRINT, IF, etc. or even say Print, however the reserved BASIC command can be used within an identifier, for example, Print5 is acceptable).  At the end of the identifier can be an optional symbol for the data type: “%” for integer, “$” for string, and “#” for double precision (the default).  Later perhaps single precision can be supported with a “!” character.  The data type symbol is considered part of the name, therefore the variable names Variable, Variable% and Variable$ all refer to different variables and may all be contained in a program.

Arrays, functions and subroutine identifier names contain an opening parenthesis at the end with no intervening white space.  Note that while the opening parenthesis is considered part of the identifier, it is not stored.  Therefore, having both Variable and Variable() in the same program is not allowed.  This will allow array names to be used without the parentheses like in passing an entire array to a function or subroutine and MAT statements if implemented.  Subroutine identifier names do not have a data type symbol as they don't have a return value.

No comments:

Post a Comment

All comments and feedback welcomed, whether positive or negative.
(Anonymous comments are allowed, but comments with URL links or unrelated comments will be removed.)