Alright, here's an implementation.
http://pastebin.com/Stjdh9ip (edit: uncommented link at the bottom of the post)
I included that as test.asm in the new distribution that you can download (which also includes other changes and additions). make, make test, and run bin/acheron.prg in VICE to watch it print "12345" to the top of the screen.
Now, I had been delaying implementing a number of instructions which don't byte-pack even pairs of 4-bit parameters together. Things like "add rD, imm8" and "stpmb rD" (STore Prior value as Memory Byte at rD) leave a parameter byte only half-used. However, this decodes quickly (they're pre-ASL'd in the byte for alignment) and represent a number of useful instructions. So I started putting some of those in. Remember, just adding an implementation,
Code: Select all
OP stpmb, rd, regs, "memory(rD) := low byte of rP"
lda 0,x ; read rP's value first
sta zptemp
decode_rd ; no shifting, just adding an offset and updating rP
lda zptemp
sta (0,x) ; store into (rD), which is the new rP
jmp mainLoop2c
anywhere in the VM source automatically sets everything up, including adding the instruction to the documentation.
That's right, but you only have a limited number of registers you can access at one time. If you need more you run into trouble...
But how different is that from a stack? You always need to pull variables to a stack's head, while you only need to reload temp work registers when registers are scarce; they stay put otherwise. Loops make this quite apparent, where every iteration needs to restore the same pattern of information for stack processing, while registers can simply hold the iterated state.
I also see value in holding onto register values moreso in longer functions than in smaller ones. You tend to see more references to the same local values in the former.
Parameters are always an issue with copy overhead, though. For any calling convention, you need to put data in the right place, which is always going to end up shuffling something somewhere, unless the functions are single-use, tailored to directly access where the caller already has its data.
That's an interesting idea, which I'd like to see used in a longer code example.
While the pastebin uses pre-decrement during string rendering, here's a post-decrement sample that shows using rP: (remainder and buf are just names for registers in the outputString function)
Code: Select all
addi remainder, '0' ; convert to ASCII, 'remainder' becomes the new rP
stpmb buf ; store rP into (buf), buf becomes the new rP
subp 1 ; decrement rP, which is buf now
There is a certain elegance here that I like, stack-like brevity with a register bank, and avoidance of redundant register decoding. I am hopeful of its speed implications, and am aware that finding how best to chain rP through the instructions will take some time to get right.
Besides the 16 general purpose registers you have the rather invisible instruction pointer (iptr), the global pointer (gptr) and a return stack using the stack pointer of the 6502, which is inaccessible to the Acheron. Correct?
Yes. However, iptr, gptr, and all the other implementation zpvars are automatically exported as assembler labels. Therefore, their addresses are still syntactically accessible, e.g. "set r3, gptr" and dereferencing from there.
Let's start with the question: which registers do you use for the parameters?
As a general calling convention, parameters start from r0 from the caller's perspective, then the called function grows the register stack by however many local variables it needs. convertToString does a 'grow 3' to gain a fresh r0, r1, r2 to use, while r3+ are views of r0+ from the caller, which it can still directly read. The function body can write into r3+ to set return values and modify the caller's r0+ values as side effects. It ends with 'rets 3' to return and un-grow the 3 regs away to realign the caller's r0.
Note that this exercise does not specify using any return values, and the complexity is pretty low, so some of the things I've designed Acheron for that I've had issues with, aren't being shown hard. However, I do need to get better debugging tools first before going hardcore with complexity.
edit: Here's a plain version of the functions without any comments, to show the density:
http://pastebin.com/pLQKx3hL