Good to see, that you have improved your project in speed as well as compatiblity. Very nice!
Still I have some recommendations. Don't put the ATMEGA1284 manual with the code. Since I was following your project from the beginning, I have downloaded the manual now for the 4th time and I had it already. It would be nice to have it as a separate download or just point to the download on Atmel's website.
You are doing a "jmp MAIN" at the end of every instruction. If you still have lots of flash space available, adding the MAIN code snippet as a macro to the end of every instruction would save 3 AVR cycles per instruction. If space is limited you could do it at least for the most used instructions.
The code itself could also be improved:
Code:
Main:
sbic gpior0, 0 ; if IRQ triggered
rcall cpu_irq ; execute it now
cpu_exec:
movw ZL, XL ;
adiw XL, 1 ;
cp zh, mmap ; IO page
breq IO
brcs RAM
ROM:
lpm zl, Z
rjmp fetch
IO:
ldi zl, 0xEA ;nop in IO area
rjmp fetch
RAM:
inc zh
ld zl, Z
fetch:
ldi zh, high(opctbl)
lsl zl
adc zh, zero ; pointer to opcode jump
ijmp ; call opc code
The "breq IO" can be ommited and you could put 256 * 0xEA in front of the ROM image. This is true not only for the opcode fetches, but also for any other instruction stream fetch, as those should never read from IO-space. The NOP instruction (as it is not doing anything else) in turn could check for X-1 pointing to IO and call an exception. Saves 1 cycle.
Instead of the "rjmp fetch" in the ROM:-part you could repeat the actual code in the fetch:-part saving another 2 cycles for ROM fetches.
Code:
.macro op_decode
sbic gpior0, 0 ; if IRQ triggered
rcall cpu_irq ; execute it now
.ifndef cpu_exec
.equ cpu_exec = pc
.endif
movw ZL, XL ;
adiw XL, 1 ;
cp zh, mmap ; IO page
brcs RAM
ROM:
lpm zl, Z
ldi zh, high(opctbl)
lsl zl
adc zh, zero ; pointer to opcode jump
ijmp ; call opc code
RAM:
inc zh
ld zl, Z
ldi zh, high(opctbl)
lsl zl
adc zh, zero ; pointer to opcode jump
ijmp ; call opc code
.endmacro
Then replace all "jmp MAIN" with op_decode.
Oh, one more thing. Your invalid opcodes are all implemented as 1 byte NOPs but the real thing has some implemented as 2 or 3 byte NOPs. You bet, that I will test that in the upcomming 65C02 version of my functional test.