Sunday, June 16, 2013

Translator Design – Revisited

The next step was going to be either the Recreator (recreating from the RPN lists produced by the Translator, since that is all that is available) or the Encoder (including defining the format of the internal program).  For the later, the internal format of the program needs to be defined.  Integral for this definition is how the program will be run.

The basic format chosen is a pure reverse polish notation.  The problem with this is that it makes the translator design very complicated and somewhat convoluted (for example, the processing of a PRINT statement is spread throughout the translator code).  And so far only three fairly simple commands have been implemented.  The problem is that the design is token centric, meaning that the specific tokens are processed and they decide how they should be handled based on the current state of the translator (command, expression, etc.).

A better design, and on that should be easier to implement, especially once the more complex commands are implemented, is one that is command centric, meaning the commands decide how the program lines are processed and how the tokens are handled.  This also means a pure RPN design of the internal program is not necessary.  The design I have in mind should only impact run time slightly, but will significant simplify the translation of more complex commands.

The new design will be explained in upcoming posts.  The goal for the 0.4 development series will be the implementation of the new translator design, which will include the currently implemented commands (LET, PRINT, INPUT, INPUT PROMPT and REM) along with handling expressions.  The size of the current Translator (header, source, token handlers and command handlers) is a total of 2,863 lines.  It will be curious to see if new design is able to reduce this.