Hi EEye
The 'x' in a case statement should always be working in your favour, because they give the logic synthesis the maximum degree of freedom to implement whatever is smallest or fastest.
In a clean (RISC-like) instruction encoding, the destination register would just be some selection of bits from the IR, and wouldn't take any decoding. But we have this legacy instruction set: and bit 3 is right down there in that legacy set. In the 6502, that was quite efficient, but of course in this branch you've added more registers and operations and filled out the bottom 8 bits. That's evidently slowed down the machine - it's bound to slow down the decode, in retrospect, because unless you're extremely careful, you're perturbing something which was put together with great care for efficient decoding. Up to a point you get away with that because decode hasn't been on the critical path, but if you add extra opcodes without the same care for placement as the original 6502 designers then the decode will get more complex.
A radical suggestion - which would need even more cooperation from the assembler than you already need - would be to modify the encodings of existing opcodes, so that for example LDX would signal in the upper bits that X is the destination. Instead of $00A6, $00B6 and so on, you'd use $20A6, $20B6 or whatever would be appropriate to address X in the register file. However, I see from
your code that your register file now has 5-bit addresses... so I'll steer clear of trying to think too hard about this particular implementation. I think I'm revisiting ideas I
put forward in the 65Org16.c thread.
I see in your latest checkin that you have changed some x's to 0's and you have the speed as 91MHz. What was it before?
The problem with lots of complex decode lines - even with Arlet's original - is being absolutely sure that every opcode is doing what it should, and nothing more.
Cheers
Ed