Wow, those are very cool optimisations ! Thank you SOOO much.
Unfortunately my interpreter is still over 600 bytes
(613 to be exact)
The main problem now is that it'd be great to make the register instructions fit in a 256 byte window, so that I can toss the lookup table for the high byte (which eats 31 bytes).
(EDIT : Right now the distance between Do_DJNZ and Do_LXOR is 306, so we are not too far to getting it under 256, to toss the high byte lookup table)
I uploaded my link to show the resulting interpreter.
Also you introduced an error by replacing the iny by the dey in the shift right loop, the iny is correct, since Y starts negative and grows up to zero.
Your dispatcher is interesting, however as it saves 8 bytes compared to mine, it needs 8 extra bytes of lookup table, as the non-register instructions are looked up by words. Since the result is neutral I kept mine (but keep in mind that any alternatives are welcome, of course).
Another idea I had was to toss lookup table entierely and write the opcode directly to PCL, so that each instruction has for exemple 8 bytes reserved for it. The problem is that while this would make dispatching simpler and faster, the instructions themselves would be hard to code, and would be a puzzle of jumping in all directions.
At some point I wanted to make the C flag of the interpreter match exactly the C flag of the interpreted code, so I wouldn't need this C location at all. The problem was that the shifts in the dispatching code made this impossible, as well as the comparison of the branching code. If only there was an alternative for those 2 cases, that'd be very cool.
I say that you removed the 'tay' at the end of the NewByteCode routine. The problem is that without this I'd have to do it manually each time for each branch instruction. I don't think there's another way arround. If there is one, please tell me so.
I was also wondering if self-modifying code could be useful anywhere in the interpreter, but I didn't think that was the case. Please tell me if I missed something.
As for the instruction themselves, the specs are still very open, I change the instruction as I am writing code for the VM, so it's no big surprise some of them have changed every time.