GARTHWILSON wrote:
2+loop, the internal compiled by 2+LOOP, has 59 instructions by my count. It would be fun to see what that comes down to.
Okay, one more code fragment teaser for Garth, then I have some serious work to do on my documentation and CPUSim package. Thanks to teamtempest for kindly sharing his 16-bit CPUSim core with me ... I think that he probably saved me several hours of tinkering and cussing.
Code:
loop:
:bd0c0000 inc ,s ; increment index on rTOS
checkend:
:410c0000 lda ,s
:c10c0001 cmp 1,s ; compare incremented index to limit
:5c5e000a bcs quitloop ; if index >= limit, loop is complete
contloop: ; else loop around:
:4d060000 ldy ,y ; load IP with loop back address
:5e060000 jmp (,y+) ; and proceed
+loop:
:410c0001 lda 1,s
:150c0000 add ,s ; update the index by the incrementer.
:810c0001 sta 1,s
:420c0000 lda ,s+ ; pull incrementer and check sign
:5caffff6 bpl checkend ; if + then do a forward compare
:410c0001 lda 1,s ; else
:c10c0000 cmp ,s ; do a reverse compare (limit >= index)
:5c4ffff6 bcc contloop ; for end-of-loop check
quitloop:
:9108???? stz LOOP_LEAVE ; show that LEAVE was not used to exit
:4d0c0002 ldy 2,s ; load IP with end addr from the rstack
:580c0003 lds #3,s ; discard index, limit, and end address
:5e060000 jmp (,y+) ; from rstack and proceed.
This isn't really a fair fight, but it looks like 18 words of code on my 65m32 are doing the same job as about 231 bytes of code on the 65c02. The reason that it's not fair is because the 65c02 is not in its native 8-bit element here, and is bogging down on multi-byte stack items. For a more accurate efficiency comparison, it might be more helpful to compare the 65c816's implementation of a 16-bit loop/+loop combo, since that would level the playing field regarding native word size, and concentrate more on how efficient my operand scheme is, and whether or not there is room for improvement. Of course, I'm its proud father, so I'm somewhat biased ...
Mike
P.S. Regarding a normalized 65c816 vs. 65m32 code efficiency comparison: The 'm32 has only its unique operand structure (with embedded constants), while the '816 has the advantage of being able to pack up to two complete instructions in 16-bits, plus its indirect addressing modes, which are missing from the 'm32. It beats the 'm32 for fetch (@) when judging strictly by normalized code density ... how do you think it will pan out for something more complex?
P.P.S. Perhaps it would be an interesting exercise for me to try leveling the playing field the other way, by coding a 128-bit version for the 'm32? Nah, that's not a good use of my spare time ... I need to get my specs ready for a new thread A.S.A.P.
P.P.P.S. As I was drifting off to sleep last night, it occurred to me that there could be a minor bug in the above fragment; I believe that it handles negative increments properly, but will probably choke on a negative index and/or limit. Those of you with 6502 experience should be able to see it, so I'll leave it in for now ... there's no danger in it, since it has no host on which to run ... yet!
[EDIT 2013.10.07]: jmp (,y+) is not a proper ITC NEXT, since IP points to the CFA, not the actual machine code.
It needs to be replaced with:
Code:
NEXT (2 instructions, 2 machine words, 5 cycles)
ldu ,y+ ; W = (IP) , IP += 1
jmp (,u+) ; execute code @ (W) , W += 1
_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some
VTL02C on it and see how it grows on you!
Mike B.
(about me) (learning how to github)