Here is a modified version of the benchmark code.
It includes a checksum to help verify the processing is correct.
Code:
0 variable ChkSum
: ggd ( a b -- ggd )
begin
dup
while
swap over mod
dup ChkSum +!
repeat
drop
;
: benchmark ( n -- )
0 ChkSum !
dup 0 do
dup 0 do
j i ggd drop
loop
loop
drop
ChkSum @ .
;
I've fixed my cc@ so it returns 32 bits of cycle counter, so I can time longer runs.
Here are my results for various FORTHs:
FIG modified 8bit Indirect-Threaded -------------------
104 bytes
cc@ 200 benchmark cc@ d- d. 748 -1023891113 OK 1024 sec @ 1MHz
3000000003. 50003 u/ . . -5540 20015 OK
FIG modified 8bit Subroutine-Threaded --------------------
133 bytes
cc@ 200 benchmark cc@ d- d. 748 -382665927 OK 383 sec @ 1MHz
3000000003. 50003 um/mod . . -5540 20015 OK
FIG modified 16bit Indirect-Threaded -------------------
104 bytes
cc@ 200 benchmark cc@ d- d. 748 -485138841 OK 485 sec @ 1MHz
3000000003. 50003 u/ . . -5540 20015 OK
FIG modified 16bit Subroutine-Threaded -------------------
136 bytes
cc@ 200 benchmark cc@ d- d. 748 -192266443 OK 192 sec @ 1MHz
3000000003. 50003 u/ . . -5540 20015 OK
65816F 16bit Subroutine-Threaded ------------------
140 bytes
cc@ 200 benchmark cc@ d- d. 748 -117452133 ok 117 sec @ 1MHz
3000000003. 50003 um/mod . . -5540 20015 ok
Tali 8bit Subroutine-Threaded ---------------------
410 bytes
cc@ 200 benchmark cc@ d- d. 748 -369954558 ok 370 sec @ 1MHz
3000000003. 50003 um/mod . . -5540 20015 ok
GForth-fast on x64 --------------------
200 benchmark 748 ok
3000000003. 50003 um/mod . . 59996 20015 ok