More detailed design within the expression parser:
I realised that there are essentially two parsing contexts to consider, and that the set of lexical tokens I should expect in each context is completely disjoint. This will simplify error handling, since I don't have to figure out which context I'm in ex-post-facto, and I think the stack management is also made more robust this way. One criticism of the shunting-yard algorithm is that it allows parsing expressions with the tokens in the "wrong" order and thus appearing syntactically nonsensical, but this measure neatly fixes that while also simplifying the code a little.
After pushing a Stop operator token, the parser starts in the "Operand" context, since a single operand is itself a valid expression, while a single operator is not. The Operand context allows a numeric literal, a variable reference, a unary operator (negate or NOT), or an open bracket. The latter two push operator tokens on the stack but then remain in Operand context; literals and variables are
actually operands and cause the parser to shift to "Operator" context. If the token at the cursor is not recognised in Operand context, a syntax error is thrown. Stacked operators are not executed from Operand context, since unary operators always have higher precedence than binary operators and their operands are not yet available.
In "Operator" context, the parser looks for infix operators and close brackets, and causes the immediate execution of equal-or-higher precedence operators already on the stack. This, for example, causes unary operators to be executed as soon as their operands are available. If the token at the cursor is not recognised in Operator context, the end of the expression is inferred, all stacked operators are executed, and control is returned to the statement parser with the cursor suitably advanced. Mismatched brackets would be recognised by attempted execution of a stacked bracket token outside of a close-bracket context, or encountering the Stop token on the stack while processing a close-bracket.
Several operators will be implemented as combinations of other operators to reduce code size and the number of unique operator tokens. In particular:
- Negate and NOT can use the same code implementing a subtraction from zero, entered with the Carry flag set and cleared respectively. This is a 6502-specific optimisation that would not be obvious for code ported from some other 8-bit CPU.
- Subtract can be implemented by pushing Add and Negate operators in that order, so that the top operand is negated before being added to its partner.
- Unequal can be implemented by pushing NOT and Equal operators in that order, so that the Equal condition is tested and then the result inverted.
- Similarly, >= is NOT <, <= is NOT >. This equivalence is only valid for integers, and would break for IEEE-754 floating point, but is a valid optimisation for us.
Obviously, these are all optimisations for code size, and many will result in lower execution speed. That's the kind of tradeoff needed to keep the code within 2KB with a reasonably complete feature set.
PEEK will likely be a unary operator, since it fits neatly into the above scheme. I may use the BBC BASIC indirection operators, ? and !, instead of the PEEK keyword, since they are easier to write recognition code for. However, implementing the indirection operators instead of POKE may be more complicated, since I have a very restricted concept of an lvalue. It might work if ? and ! are implemented as statement keywords, essentially duplicating the assignment statement.