resman wrote:
drogon wrote:
tmr4 wrote:
I also think of A as having a variable width. For what I'm working on now, I figured out early on that it's easiest just to stay in 16-bit mode and only switch to 8-bit mode when needed and switched back immediately afterwards.
That was my strategy for my (BCPL) bytecode interpreter, but that 16-bit fetch and subsequent
AND #$00FF adds so many cycles to each bytecode fetch it measurably slows it down )-: (and switching to 8-bit mode here doesn't help either unless I could guarantee zero in the upper 8-bits)
-Gordon
I found it useful to keep the M flag as 16 bit and the index registers in 8 bit mode for PLASMA. My byte code fetch/dispatch is:
Code:
INY ; NEXTOP @ $F0
LDX $FFFF,Y ; FETCHOP @ $F1, IP MAPS OVER $FFFF @ $F2
JMP (OPTBL,X) ; OPIDX AND OPPAGE MAP OVER OPTBL
and runs out of zero page. Y is the interpreters instruction pointer (IP) offset. The IP+Y value gets renormalized during certain instructions' execution to keep it from overflowing. Also, the byte codes are even, so there is no requirement to shift the value but it does limit the number of byte codes to 128 with an 8 bit X register.
On occasion I do have to switch A to 8 bit mode, but at least it isn't every byte code fetch. Mixing accumulator/memory and index widths can limit the about of width flag thrashing with a little creativity.
That's nice and simple. Hm.
I have a full compliment of 256 opcodes to deal with... I did try to work through scenarios where I could use an index register as the program counter, but didn't come up with anything useful, also keeping it in zero page and self-modifying the "PC" but it was also handy to have the VM's PC kept as a 32-bit register in zero page for various arithmetic operations - like jumps and so on.
This is the current dispatcher:
Code:
; Loop of the interpreter
lda [regPC] ; Load 16-bit value (7)
and #$00FF ; We only want 8-bits... (3)
asl ; Double for indexing in 16-bit wide jump table (2)
tax (2)
; Increment the PC
inc regPC+0 ; Low word (7)
beq incH1 ; 2 cycles + 1 when branch taken (2)
jmp (opcodesLo,x) (6) = 29
incH1: inc regPC+2 ; Top word (7)
jmp (opcodesLo,x) (6) = 37
This is a macro that's in-lined with every opcode the VM handles. It adds space but the cycles it saves are worth it. So 29 cycles or rarely 37 cycles overhead for every VM opcode that's executed.
Cheers,
-Gordon
_________________
--
Gordon Henderson.
See my
Ruby 6502 and 65816 SBC projects here:
https://projects.drogon.net/ruby/