http://calc6502.com/RobotGame/summary.html#Results
Scroll down just a little bit to see the table of results.
One of the things I noticed is that many of the slower (<10% the speed of assembly) items had nested do loops in them and I was just curious what the penalty/overhead of using 16-bit loop variables for small counts (eg. stuff you'd count in X or Y in assembly). Fortunately, because Forth is awesome, I can scratch that curiosity itch and discover that the answer is "not as much as I was expecting, but more than double", which totally makes sense to me for using a 16-bit index vs an 8-bit one.
Leepivonka had provided some "replacement" words for do loops, so that made me want to try that myself. Rather than replace Tali2's do-loop words, these words are meant to compliment them. They use the Y register similar to how one might in assembly to count down to zero, so only the upper limit is given, and it's limited to 8-bits (but you can get 256 loops by specifying 0 as the starting value). Because these words use the Y register, I just made word names with a "Y" prefix. After writing this, I realized that it doesn't really matter which register is used, and A could have been used just as well (and sometimes is used because Tali2 has a PUSH-A macro to get A on the top of the Forth data stack.
For those that are interested, here is my code and the results I got (using the cycle counting tests available in the simulated Tali2 test suite).
Code: Select all
\ PROGRAMMER : Sam Colwell
\ FILE : yloop.fs
\ DATE : 2021-04
\ DESCRIPTION : For small loops of 256 or less, these words will use
\ the Y register in cowntdown mode only (eg load with 8-bit starting
\ count and it will always count down to zero) which is generally useful
\ for code that needs to run a number of times, but doesn't need to
\ control the direction of the counting (in exchange for speed).
\ Add the assembler.
assembler-wordlist >order
: YDO ( C: -- addr) ( R: n --) ( Start loop that will run n times )
0 POSTPONE LDY.X \ Load Y with the starting value.
POSTPONE INX POSTPONE INX \ Remove value from stack.
HERE \ Save location to loop back to (leave on stack)
POSTPONE PHY \ Save current index to return stack.
; IMMEDIATE COMPILE-ONLY
: YLOOP ( C: addr -- ) ( R: -- ) ( Loop back up to start if y nonzero)
POSTPONE PLY \ Get current index from return stack.
POSTPONE DEY \ Count this iteration.
3 POSTPONE BEQ \ If we reached zero, continue on (branch over jump)
POSTPONE JMP \ otherwise jump to the top of the loop.
\ The jump should pull it's address from the stack
\ It was placed there by the ydo word.
; IMMEDIATE COMPILE-ONLY
: YI
\ The current index is on the return stack.
POSTPONE PLA \ Get the index into A and put it on the Forth stack.
POSTPONE PHA
POSTPONE PUSH-A \ This is a macro in Tali to put the value in A on TOS.
; IMMEDIATE COMPILE-ONLY
: YJ
\ The index from the outer loop is the SECOND byte on the return stack.
POSTPONE PLY \ Pull the one we don't want into Y
POSTPONE PLA \ Pull the one we do want into A
POSTPONE PHA \ Put both of them back (in the right order)
POSTPONE PHY
POSTPONE PUSH-A \ Put the J index on the Forth stack.
; IMMEDIATE COMPILE-ONLY
Next is a basic functionality test:
Code: Select all
\ Try it out.
: testing
5 ydo
3 ydo
cr ." yi=" yi . ." yj=" yj .
yloop
yloop
;
\ Results:
testing
yi=3 yj=5
yi=2 yj=5
yi=1 yj=5
yi=3 yj=4
yi=2 yj=4
yi=1 yj=4
yi=3 yj=3
yi=2 yj=3
yi=1 yj=3
yi=3 yj=2
yi=2 yj=2
yi=1 yj=2
yi=3 yj=1
yi=2 yj=1
yi=1 yj=1 ok
Finally, the cycle counting tests (the cycle-test word takes the XT and prints the number of CPU cycles):
Code: Select all
\ Cycle-Test RESULTS:
decimal
: testingy 255 ydo yloop ;
: testingdo 255 0 do loop ;
: testingyi 255 ydo yi drop yloop ;
: testingdoi 255 0 do i drop loop ;
: testingyy 255 ydo 255 ydo yloop yloop ;
: testingdodo 255 0 do 255 0 do loop loop ;
: testingyyij 255 ydo 255 ydo yi yj 2drop yloop yloop ;
: testingdodoij 255 0 do 255 0 do i j 2drop loop loop ;
' testingy cycle_test CYCLES: 3660 ok
' testingdo cycle_test CYCLES: 16775 ok
' testingyi cycle_test CYCLES: 13605 ok
' testingdoi cycle_test CYCLES: 32075 ok
' testingyy cycle_test CYCLES: 933900 ok
' testingdodo cycle_test CYCLES: 4291340 ok
' testingyyij cycle_test CYCLES: 5420625 ok
' testingdodoij cycle_test CYCLES: 11053940 ok
I very much enjoy the way these new looping words are just as "valid" as the words that Tali2 comes with, and I could use them in places where I needed to squeeze some extra cycles out of a routine.