Another 65816 STC Forth

leepivonka · Post by **leepivonka** » Wed Aug 26, 2020 2:59 am

Here is a console log while loading "Robot game" by druzyek viewtopic.php?f=1&t=6054

So far it's down to 32kbytes in bank 0 (including 9kbytes of FORTH headers) + more data in the "tile" bank & screen data in the "screen" bank.

Partially ported & completely untested so far.

leepivonka · Post by **leepivonka** » Wed Jul 07, 2021 4:06 pm

Playing with the simple benchmark in viewtopic.php?f=9&t=6637

FIG modified 8bit Indirect-Threaded: cc@ 200 benchmark cc@ d- d. 748 -1023891113 OK 1024 sec @ 1MHz
FIG modified 8bit Subroutine-Threaded: cc@ 200 benchmark cc@ d- d. 748 -382665927 OK 383 sec @ 1MHz
FIG modified 16bit Indirect-Threaded: cc@ 200 benchmark cc@ d- d. 748 -485138841 OK 485 sec @ 1MHz
FIG modified 16bit Subroutine-Threaded: cc@ 200 benchmark cc@ d- d. 748 -192266443 OK 192 sec @ 1MHz
65816F 16bit Subroutine-Threaded inlined optimized: cc@ 200 benchmark cc@ d- d. 748 -117452133 ok 117 sec @ 1MHz
Tali 8bit Subroutine-Threaded inlined: cc@ 200 benchmark cc@ d- d. 748 -353588076 ok 354 sec @ 1MHz
GForth x64: 200 benchmark 748 ok

Comparing generated code:

Code: Select all

: ggd ( a b -- ggd )            
  ----source-----       ----65816F STC--------  ----FIG STC-------      -----FIG ITC--------------
                                                                        216b .word DoCol
   begin
     dup                04C8 LDA 00,x           1e27 jsr Dup            216d .word Dup
   while                04CA TAY                1e2a jsr TestTos        216f .word ZBranch,$2183-*
                        04CB BNE 04D0           1e2d bne *+5
                        04CD JMP 04E3           1e2f jmp $1e47
     swap               04D0 JSR Swap+000C      1e32 jsr Swap           2173 .word Swap
     over               04D3 LDA 02,x           1e35 jsr Over           2175 .word Over
     mod                04D5 JSR Mod+0004}      1e38 jsr Mod            2177 .word Mod
     dup                04D8 LDA 00,x           1e3b jsr Dup            2179 .word Dup
     ChkSum +!          04DA CLC                1e3e jsr ChkSum         217b .word ChkSum
                        04DB ADC ChkSum+0003    1e41 jsr PlusStore      217d .word PlusStore
                        04DE STA ChkSum+0003
   repeat               04E1 BRA 04C8           1e44 jmp $1e27          217f .word Branch,$216d-*
   drop                 04E3 INX                1e47 jsr Drop           2183 .word Drop
                        04E4 INX
  ;                     04E5 RTS                1e4a rts                2185 .word SemiS
                        -- 30 bytes --          -- 36 bytes --          -- 28 bytes --

: benchmark ( n -- )
  ----source-----       ----65816F STC--------  ----FIG STC--------     -----FIG ITC--------------
                                                                        2193 .word DoCol
  0                     04F2 LDA #0000          1e57 jsr Zero           2195 .word Zero
  ChkSum                                        1e5a jsr ChkSum         2197 .word ChkSum
  !                     04F5 STA ChkSum+0003    1e5d jsr Store          2199 .word Store
  dup                   04F8 LDA 00,x           1e60 jsr Dup            219b .word Dup
                        04FA TAY
  0                     04FB LDA #0000          1e63 jsr Zero           219d .word Zero
  do                    04FE PHY                1e66 jsr PDo            219f .word PDo
                        04FF PHA                        
    dup                 0500 LDA 00,x           1e69 jsr Dup            21a1 .word Dup
                        0502 TAY
    0                   0503 LDA #0000          1e6c jsr Zero           21a3 .word Zero
    do                  0506 PHY                1e6f jsr PDo            21a5 .word PDo
                        0507 PHA
      j                 0508 LDA 05,s           1e72 jsr J              21a7 .word j
                        050A DEX
                        050B DEX
                        050C STA 00,x
      i                 050E LDA 01,s           1e75 jsr I              21a9 .word i
                        0510 DEX
                        0511 DEX
                        0512 STA 00,x
      ggd               0514 JSR ggd            1e78 jsr ggd            21ab .word ggd
      drop              0517 INX                1e7b jsr Drop           21ad .word Drop
                        0518 INX
     loop               0519 PLA                1e7e jsr PLoop          21af .word PLoop,$21a7-*
                        051A INA
                        051B CMP 01,s
                        051D BNE 0507           1e81 bcc $1e72
                        051F PLY
   loop                 0520 PLA                1e83 jsr PLoop          21b3 .word PLoop,$21a1-*
                        0521 INA
                        0522 CMP 01,s
                        0524 BNE 04FF           1e86 bcc $1e69
                        0526 PLY
  drop                  0527 INX                1e88 jsr Drop           21b7 .word Drop
                        0528 INX
  ChkSum                                        1e8b jsr ChkSum         21b9 .word ChkSum
  @                     0529 LDA ChkSum+0003    1e8e jsr At             21bb .word At
  .                     052C JSR Dot+0004       1e91 jsr Dot            21bd .word Dot
  ;                     052F RTS                1e94 rts                21bf .word SemiS
                        -- 62 bytes --          -- 62 bytes --          -- 46 bytes --

The word CC@ in the following code returns the simulator cycle counter as a double.

FIG-like 8-bit Indirect-threaded console log

Code: Select all

F:\65816>\65816s\release\65816s fig8\0265sxb.lst
65816S Jun  9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C8E14 000300 nop
.g

fig-FORTH 1.1 modified

A=00FE X=00EF Y=0000 S=01F5 ENvMXdIzC D=0000 B=00 58B0F860 000653 cli
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
 OK
0 variable ChkSum OK
 OK
: ggd ( a b -- ggd )
   begin
     dup
   while
     swap over mod
     dup ChkSum +!
   repeat
   drop
  ; OK
: benchmark ( n -- )
  0 ChkSum !
  dup 0 do
    dup 0 do
      j i ggd drop
      loop
    loop
  drop
  ChkSum @ .
  ; eof
 OK
hex OK
' ChkSum .  2163 OK
' ggd .  216D OK
' benchmark ' OK
here . 21C1 OK
decimal OK

A=00FE X=00E7 Y=0000 S=01F5 ENvMXdIzC D=0000 B=00 58B0F860 000653 cli
.d2150,21c2
2158: 86 43 48 4B 53 55 CD 14 1C                      0 variable ChkSum
2161: 1C 49             .word DoVar
2163: 00 00             .word 0

2165: 83 47 47 C4 58 21 F1                            : ggd
216b: F1 09             .word DoCol
                                                        begin
216d: 3E 09             .word Dup                         dup
216f: 44 04 12 00       .word ZBranch,$2183-*            while
2173: 23 09             .word Swap                        swap
2175: 0B 09             .word Over                        over
2177: 7C 15             .word Mod                         mod
2179: 3E 09             .word Dup                         dup
217b: 61 21             .word ChkSum                      ChkSum
217d: 64 09             .word PlusStore                   +!
217f: 25 04 EC FF       .word Branch,$216d-*             repeat
2183: 1A 09             .word Drop                      drop
2185: B7 07             .word SemiS                     ;

2187: 89 42 45 4E 43 48 4D 41 52 CB 65 21             : benchmark
2193: F1 09             .word DoCol
2195: 75 0A             .word Zero                      0
2197: 61 21             .word ChkSum                    ChkSum
2199: B8 09             .word Store                     !
219b: 3E 09             .word Dup                       dup
219d: 75 0A             .word Zero                      0
219f: C6 04             .word PDo                       do
21a1: 3E 09             .word Dup                         dup
21a3: 75 0A             .word Zero                        0
21a5: C6 04             .word PDo                         do
21a7: E5 04             .word j                             j
21a9: DF 04             .word i                             i
21ab: 6B 21             .word ggd                           ggd
21ad: 1A 09             .word Drop                          drop
21af: 65 04 F6 FF       .word PLoop,$21a7-*                loop
21b3: 65 04 EC FF       .word PLoop,$21a1-*              loop
21b7: 1A 09             .word Drop                      drop
21b9: 61 21             .word ChkSum                    ChkSum
21bb: 94 09             .word At                        @
21bd: FB 1A             .word Dot                       .
21bf: B7 07             .word SemiS                     ;

.g
cc@ 200 benchmark cc@ d- d.  748 -1023891113 OK    cycles, 1023.9 sec @ 1MHz
3000000003. 50003 u/ . .  -5540 20015 OK

FIG-like 8-bit Subroutine-threaded

Code: Select all

F:\65816\Fig8>sub

F:\65816\Fig8>..\cc65\bin\ca65 -l sub.lst sub.txt

F:\65816\Fig8>\65816s\release\65816s sub.lst
65816S Jun  9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C2903 000300 nop
.g

fig-FORTH 1.1 modified STC

A=00FE X=00F2 Y=0019 S=01F9 ENvMXdIzC D=0000 B=00 58B0F860 000406 cli
.@..\f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
 OK
0 variable ChkSum OK
 OK
: ggd ( a b -- ggd )
   begin
     dup
   while
     swap over mod
     dup ChkSum +!
   repeat
   drop
  ; OK
: benchmark ( n -- )
  0 ChkSum !
  dup 0 do
    dup 0 do
      j i ggd drop
      loop
    loop
  drop
  ChkSum @ .
  ; eof
 OK
' chksum ex4  1E19OK
' ggd ex4 1E27OK
' benchmark ex4 1E57OK
here ex4 1E95OK


1E10: 43 68 6B 53 75 6D 06 97 19        0 variable ChkSum
1e19: 20 B4 0F
1e1c: 00 00

1e1e: 07 00 00

1e21: 67 67 64 03 19 1E                       : ggd
                                                begin
1e27: 20 78 06          jsr Dup                   dup
1e2a: 20 55 03          jsr TestTos              while
1e2d: D0 03             bne *+5
1e2f: 4C 47 1E          jmp $1e47
1e32: 20 61 06          jsr Swap                  swap
1e35: 20 4A 06          jsr Over                  over
1e38: 20 75 12          jsr Mod                   mod
1e3b: 20 78 06          jsr Dup                   dup
1e3e: 20 19 1E          jsr ChkSum                ChkSum
1e41: 20 96 06          jsr PlusStore             +!
1e44: 4C 27 1E          jmp $1e27                repeat
1e47: 20 57 06          jsr Drop                drop
1e4a: 60                rts                     ;

1e4b: 62 65 6E 63 68 6D 61 72 6B 09 27 1E     : benchmark
1e57: 20 AD 07          jsr Zero                0
1e5a: 20 19 1E          jsr ChkSum              ChkSum
1e5d: 20 E2 06          jsr Store               !
1e60: 20 78 06          jsr Dup                 dup
1e63: 20 AD 07          jsr Zero                0
1e66: 20 EF 15          jsr PDo                 do
1e69: 20 78 06          jsr Dup                   dup
1e6c: 20 AD 07          jsr Zero                  0
1e6f: 20 EF 15          jsr PDo                   do
1e72: 20 13 16          jsr J                       j
1e75: 20 0C 16          jsr I                       i
1e78: 20 27 1E          jsr ggd                     ggd
1e7b: 20 57 06          jsr Drop                    drop
1e7e: 20 51 16          jsr PLoop                  loop
1e81: 90 EF             bcc $1e72
1e83: 20 51 16          jsr PLoop                loop
1e86: 90 E1             bcc $1e69
1e88: 20 57 06          jsr Drop                drop
1e8b: 20 19 1E          jsr ChkSum              ChkSum
1e8e: 20 C2 06          jsr At                  @
1e91: 20 64 18          jsr Dot                 .
1e94: 60                rts                     ;

.g
cc@ 200 benchmark cc@ d- d.  748 -382665927 OK
3000000003. 50003 um/mod . .  -5540 20015 OK

FIG-like 16-bit Indirect-threaded

Code: Select all

D:\65816>\65816s\release\65816s fig\0265sxb.lst
65816S Jun  9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C6013 000400 nop
.g

fig-FORTH 65816

A=0002 X=00B8 Y=0E7A S=01F5 envmxdIzc D=7F00 B=00 4A90FB02 00066a lsr a
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
 OK
0 variable ChkSum OK
 OK
: ggd ( a b -- ggd )
   begin
     dup
   while
     swap over mod
     dup ChkSum +!
   repeat
   drop
  ; OK
: benchmark ( n -- )
  0 ChkSum !
  dup 0 do
    dup 0 do
      j i ggd drop
      loop
    loop
  drop
  ChkSum @ .
  ; OK
 eof
hex OK
' chksum . 1B0A OK
' ggd .  1B14 OK
' benchmark . 1B3C OK
 OK

A=0002 X=00B8 Y=0E7A S=01F5 envmxdIzc D=7F00 B=00 4A90FB02 00066a lsr a
.d1b00,1b80

1aff: 86 43 68 6B 53 75 ED F2 1A                        0 variable ChkSum
1b08: 85 09             .word DoVar
1b0a: 00 00             .word 0

1b0c: 83 67 67 E4 FF 1A                               : ggd
1b12: 38 09             .word DoCol
                                                        begin
1b14: A7 08             .word Dup                         dup
1b16: 94 04 12 00       .word ZBranch,$1b2a-*            while
1b1a: 94 08             .word Swap                        swap
1b1c: 7D 08             .word Over                        over
1b1e: 48 14             .word Mod                         mod
1B20: A7 08             .word Dup                         dup
1b22: 08 1B             .word ChkSum                      ChkSum
1b24: C3 08             .word PlusStore                   +!
1b26: 7E 04 EC FF       .word Branch,$1b14-*             repeat
1b2a: 8B 08             .word Drop                      drop
1b2c: 6A 07             .word SemiS                     ;

1b2e: 89 62 65 6E 63 68 6D 61 72 EB 0C 1B             : benchmark
1b3a: 38 09             .word DoCol
1b3c: B8 09             .word Zero                      0
1b3e: 08 1B             .word ChkSum                    ChkSum
1B40: 05 09             .word Store                     !
1b42: A7 08             .word Dup                       dup
1b44: B8 09             .word Zero                      0
1b46: E6 04             .word PDo                       do
1b48: A7 08             .word Dup                         dup
1b4a: B8 09             .word Zero                        0
1b4c: E6 04             .word PDo                         do
1b4e: FF 04             .word J                             j
1B50: F9 04             .word I                             i
1b52: 12 1B             .word ggd                           ggd
1b54: 8B 08             .word Drop                          drop
1b56: AD 04 F6 FF       .word PLoop,$1b4e-*                loop
1b5a: AD 04 EC FF       .word PLoop,$1b48-*              loop
1b5e: 8B 08             .word Drop                      drop
1B60: 08 1B             .word ChkSum                    ChkSum
1b62: E7 08             .word At                        @
1b64: 76 19             .word Dot                       .
1b66: 6A 07             .word SemiS                     ;

.g
decimal OK
cc@ 200 benchmark cc@ d- d.  748 -485138841 OK
3000000003. 50003 u/ . .  -5540 20015 OK

FIG-like 16-bit Subroutine-threaded

Code: Select all

D:\65816>\65816s\release\65816s fig\fsub.lst
65816S Jun  9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C1010 000300 nop
.g
4242
fig-FORTH 65816 subroutine

A=00FE X=00AC Y=0000 S=01F4 envMxdIZC D=0000 B=00 2BB00528 0004f4 pld
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
 OK
0 variable ChkSum OK
 OK
: ggd ( a b -- ggd )
   begin
     dup
   while
     swap over mod
     dup ChkSum +!
   repeat
   drop
  ; OK
: benchmark ( n -- )
  0 ChkSum !
  dup 0 do
    dup 0 do
      j i ggd drop
      loop
    loop
  drop
  ChkSum @ .
  ; eof
 OK
hex OK
' chksum .17CA  OK
' ggd .17D6  OK
' benchmark .1806  OK

A=00FE X=00AC Y=0000 S=01F4 envMxdIZC D=0000 B=00 2BB00528 0004f4 pld
.d17c0,181f
17C0: 06 43 68 6B 53 75 6D 06 BA 17             0 variable ChkSum 
17ca: 20 2E 0E          jsr Create.@Run
17cd: 00 00             .res 2

17cf: 03 67 67 64 03 CA 17                    : ggd
                                                begin
17d6: 20 BF 06          jsr Dup                   dup
17d9: 20 B2 03          jsr TestTos              while
17dc: D0 03             bne *+5
17de: 4C F5 17          jmp $17f5
17e1: 20 B0 06          jsr Swap                  swap
17e4: 20 A5 06          jsr Over                  over
17e7: 20 EC 10          jsr Mod                   mod
17ea: 20 BF 06          jsr Dup                   dup
17ed: 20 CA 17          jsr ChkSum                ChkSum
17F0: 20 D7 06          jsr PlusStore             +!
17f3: 80 E1             bra $17d6                repeat
17f5: 20 29 03          jsr Drop                drop
17f8: 60                rts                     ;

17f9: 09 62 65 6E 63 68 6D 61 72 6B 09 D6 17  : benchmark
1806: 20 89 07          jsr Zero                0
1809: 20 CA 17          jsr ChkSum              ChkSum
180c: 20 11 07          jsr Store               !
180f: 20 BF 06          jsr Dup                 dup
1812: 20 89 07          jsr Zero                0
1815: 20 44 03          jsr PopAY               do
1818: 5A                phy
1819: 48                pha
181a: 20 BF 06          jsr Dup                   dup
181d: 20 89 07          jsr Zero                  0
1820: 20 44 03          jsr PopAY                 do
1823: 5A                phy
1824: 48                pha
1825: 20 72 14          jsr J                       j
1828: 20 69 14          jsr I                       i
182b: 20 D6 17          jsr ggd                     ggd
182e: 20 29 03          jsr Drop                    drop
1831: 20 90 14          jsr (loop)                 loop
1834: 30 EF             bmi $1825
1836: 20 90 14          jsr (loop)               loop
1839: 30 DF             bmi $181a
183b: 20 29 03          jsr Drop                drop
183e: 20 CA 17          jsr ChkSum              ChkSum
1841: 20 F7 06          jsr At                  @
1844: 20 FB 15          jsr Dot                 .
1847: 60                rts                     ;
.
.g
decimal OK
cc@ 200 benchmark cc@ d- d. 748 -192266443  OK
3000000003. 50003 u/ . . -5540 20015  OK

65816F 16-bit subroutine-threaded inlined optimized

Code: Select all

D:\65816>\65816s\release\65816s h\0265sxb.lst
65816S Jun  9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 7818FBC2 008d3e sei
.g
>
65816F 2020Dec06

A=00FE X=00AC Y=F706 S=045E envMxdIZC D=0000 B=00 2BB00728 00f713 pld
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 )  ok

0 variable ChkSum  ok
see chksum
04BD 200DBC     JSR BC0D {Variable+0009}
04C0 0000

: ggd ( a b -- ggd )  compiled
   begin  compiled
     dup  compiled
   while  compiled
     swap over mod  compiled
     dup ChkSum +!  compiled
   repeat  compiled
   drop  compiled
  ;  ok
see ggd
                                        begin
04C8 B500       LDA 00,x                  dup
04CA A8         TAY                      while
04CB D003       BNE 04D0 {ggd+0008}
04CD 4CE304     JMP 04E3 {ggd+001B}
04D0 208F95     JSR 958F {Swap+000C}      swap
04D3 B502       LDA 02,x                  over
04D5 203B9D     JSR 9D3B {Mod+0004}       mod
04D8 B500       LDA 00,x                  dup
04DA 18         CLC                       ChkSum +!
04DB 6DC004     ADC 04C0 {ChkSum+0003}
04DE 8DC004     STA 04C0 {ChkSum+0003}
04E1 80E5       BRA 04C8 {ggd}           repeat
04E3 E8         INX                     drop
04E4 E8         INX
04E5 60         RTS                     ;
 ok

: benchmark ( n -- )   compiled
  0 ChkSum !  compiled
  dup 0 do  compiled
    dup 0 do   compiled
      j i ggd drop   compiled
      loop   compiled
    loop   compiled
  drop  compiled
  ChkSum @ .  compiled
  ; eof
  ok
see benchmark
04F2 A90000     LDA #0000 {' SInIndx0}  0
04F5 8DC004     STA 04C0 {ChkSum+0003}  ChkSum !
04F8 B500       LDA 00,x                dup
04FA A8         TAY
04FB A90000     LDA #0000 {' SInIndx0}  0
04FE 5A         PHY                     do
04FF 48         PHA
0500 B500       LDA 00,x                  dup
0502 A8         TAY
0503 A90000     LDA #0000 {' SInIndx0}    0
0506 5A         PHY                       do
0507 48         PHA
0508 A305       LDA 05,s                    j
050A CA         DEX
050B CA         DEX
050C 9500       STA 00,x
050E A301       LDA 01,s                    i
0510 CA         DEX
0511 CA         DEX
0512 9500       STA 00,x
0514 20C804     JSR 04C8 {ggd}              ggd
0517 E8         INX                         drop
0518 E8         INX
0519 68         PLA                        loop
051A 1A         INA
051B C301       CMP 01,s
051D D0E8       BNE 0507 {benchmark+0015}
051F 7A         PLY
0520 68         PLA                      loop
0521 1A         INA
0522 C301       CMP 01,s
0524 D0D9       BNE 04FF {benchmark+000D}
0526 7A         PLY
0527 E8         INX                     drop
0528 E8         INX
0529 ADC004     LDA 04C0 {ChkSum+0003}  ChkSum @
052C 200BB8     JSR B80B {.+0004}       .
052F 60         RTS                     ;
 ok

cc@ 200 benchmark cc@ d- d.  748 -117452133  ok
3000000003. 50003 um/mod . .  -5540 20015  ok

Tali 8-bit subroutine-threaded inlined

Code: Select all

D:\65816>\65816s\release\65816s tali_20200205\ophis.bin
65816S Jun  9 2021 16:02:46
32768 bytes loaded at 0x8000
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 78A200BD 00e001 sei
.g
Tali Forth 2 kernel for 65816s (5 Jan 2020)


Tali Forth 2 for the 65c02
Version 1.0 24. Jan 2020
Copyright 2014-2020 Scot W. Stevenson
Tali Forth 2 comes with absolutely NO WARRANTY
Type 'bye' to exit
1 strip-underflow !  ok
hex here . 800  ok
decimal  ok

A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00e014 lsr a
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 )  ok
  ok
0 variable ChkSum  ok
  ok
: ggd ( a b -- ggd )  compiled
   begin  compiled
     dup  compiled
   while  compiled
     swap over mod  compiled
     dup ChkSum +!  compiled
   repeat  compiled
   drop  compiled
  ;  ok
: benchmark ( n -- )   compiled
  0 ChkSum !  compiled
  dup 0 do  compiled
    dup 0 do   compiled
      j i ggd drop   compiled
      loop   compiled
    loop   compiled
  drop  compiled
  ChkSum @ .  compiled
  ;  ok
 eof
hex here . 97A  ok

decimal  ok
see chksum
nt: 800  xt: 80E
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 5

080E  20 ED D4 00 00   ....

80E   D4ED jsr
811      0 brk
 ok

see ggd
nt: 813  xt: 81E
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 67

081E  CA CA B5 02 95 00 B5 03  95 01 20 04 92 5F 08 B5  ........ .. .._..
082E  00 B4 02 95 02 94 00 B5  01 B4 03 95 03 94 01 CA  ........ ........
083E  CA B5 04 95 00 B5 05 95  01 20 83 9F E8 E8 CA CA  ........ . ......
084E  B5 02 95 00 B5 03 95 01  20 0E 08 20 78 99 4C 1E  ........  .. x.L.
085E  08 E8 E8  ...

                                begin
81E        dex                    dup
81F        dex
820      2 lda.zx
822      0 sta.zx
824      3 lda.zx
826      1 sta.zx
828   9204 jsr                   while
82B   B508 .word $85f
82D      0 lda.zx                 swap
82F      2 ldy.zx
831      2 sta.zx
833      0 sty.zx
835      1 lda.zx
837      3 ldy.zx
839      3 sta.zx
83B      1 sty.zx
83D        dex                    over
83E        dex
83F      4 lda.zx
841      0 sta.zx
843      5 lda.zx
845      1 sta.zx
847   9F83 jsr                    mod
84A        inx
84B        inx
84C        dex                    dup
84D        dex
84E      2 lda.zx
850      0 sta.zx
852      3 lda.zx
854      1 sta.zx
856    80E jsr                    ChkSum
859   9978 jsr                    +!
85C    81E jmp                   repeat
85F        inx                  drop
860        inx
 ok

see benchmark
nt: 862  xt: 873
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 262

0873  CA CA 74 00 74 01 20 0E  08 20 09 A1 CA CA B5 02  ..t.t. . . ......
0883  95 00 B5 03 95 01 CA CA  74 00 74 01 A9 09 48 A9  ........ t.t...H.
0893  61 48 38 A9 00 F5 02 95  02 A9 80 F5 03 95 03 48  aH8..... .......H
08A3  B5 02 48 18 B5 00 75 02  95 00 B5 01 75 03 48 B5  ..H...u. ....u.H.
08B3  00 48 E8 E8 E8 E8 CA CA  B5 02 95 00 B5 03 95 01  .H...... ........
08C3  CA CA 74 00 74 01 A9 09  48 A9 45 48 38 A9 00 F5  ..t.t... H.EH8...
08D3  02 95 02 A9 80 F5 03 95  03 48 B5 02 48 18 B5 00  ........ .H..H...
08E3  75 02 95 00 B5 01 75 03  48 B5 00 48 E8 E8 E8 E8  u.....u. H..H....
08F3  CA CA 86 2A BA 38 BD 07  01 FD 09 01 A8 BD 08 01  ...*.8.. ........
0903  FD 0A 01 A6 2A 95 01 94  00 CA CA 86 2A BA 38 BD  ....*... ....*.8.
0913  01 01 FD 03 01 A8 BD 02  01 FD 04 01 A6 2A 95 01  ........ .....*..
0923  94 00 20 1E 08 E8 E8 20  8A 97 18 68 75 00 A8 B8  .. ....  ...hu...
0933  68 75 01 48 98 48 E8 E8  70 03 4C F3 08 68 68 68  hu.H.H.. p.L..hhh
0943  68 68 68 20 8A 97 18 68  75 00 A8 B8 68 75 01 48  hhh ...h u...hu.H
0953  98 48 E8 E8 70 03 4C B9  08 68 68 68 68 68 68 E8  .H..p.L. .hhhhhh.
0963  E8 20 0E 08 A1 00 A8 F6  00 D0 02 F6 01 A1 00 95  . ...... ........
0973  01 94 00 20 26 8C  ... &.

873        dex                  0
874        dex
875      0 stz.zx
877      1 stz.zx
879    80E jsr                  ChkSum
87C   A109 jsr                  !
87F        dex                  dup
880        dex
881      2 lda.zx
883      0 sta.zx
885      3 lda.zx
887      1 sta.zx
889        dex                  0
88A        dex
88B      0 stz.zx
88D      1 stz.zx
88F      9 lda.#                do
891        pha
892     61 lda.#
894        pha
895        sec
896      0 lda.#
898      2 sbc.zx
89A      2 sta.zx
89C     80 lda.#
89E      3 sbc.zx
8A0      3 sta.zx
8A2        pha
8A3      2 lda.zx
8A5        pha
8A6        clc
8A7      0 lda.zx
8A9      2 adc.zx
8AB      0 sta.zx
8AD      1 lda.zx
8AF      3 adc.zx
8B1        pha
8B2      0 lda.zx
8B4        pha
8B5        inx
8B6        inx
8B7        inx
8B8        inx
8B9        dex                    dup
8BA        dex
8BB      2 lda.zx
8BD      0 sta.zx
8BF      3 lda.zx
8C1      1 sta.zx
8C3        dex                    0
8C4        dex
8C5      0 stz.zx
8C7      1 stz.zx
8C9      9 lda.#                  do
8CB        pha
8CC     45 lda.#
8CE        pha
8CF        sec
8D0      0 lda.#
8D2      2 sbc.zx
8D4      2 sta.zx
8D6     80 lda.#
8D8      3 sbc.zx
8DA      3 sta.zx
8DC        pha
8DD      2 lda.zx
8DF        pha
8E0        clc
8E1      0 lda.zx
8E3      2 adc.zx
8E5      0 sta.zx
8E7      1 lda.zx
8E9      3 adc.zx
8EB        pha
8EC      0 lda.zx
8EE        pha
8EF        inx
8F0        inx
8F1        inx
8F2        inx
8F3        dex                      j
8F4        dex
8F5     2A stx.z
8F7        tsx
8F8        sec
8F9    107 lda.x
8FC    109 sbc.x
8FF        tay
900    108 lda.x
903    10A sbc.x
906     2A ldx.z
908      1 sta.zx
90A      0 sty.zx
90C        dex                      i
90D        dex
90E     2A stx.z
910        tsx
911        sec
912    101 lda.x
915    103 sbc.x
918        tay
919    102 lda.x
91C    104 sbc.x
91F     2A ldx.z
921      1 sta.zx
923      0 sty.zx
925    81E jsr                      ggd
928        inx                      drop
929        inx
92A   978A jsr                     loop
92D        clc
92E        pla
92F      0 adc.zx
931        tay
932        clv
933        pla
934      1 adc.zx
936        pha
937        tya
938        pha
939        inx
93A        inx
93B      3 bvs
93D    8F3 jmp
940        pla
941        pla
942        pla
943        pla
944        pla
945        pla
946   978A jsr                   loop
949        clc
94A        pla
94B      0 adc.zx
94D        tay
94E        clv
94F        pla
950      1 adc.zx
952        pha
953        tya
954        pha
955        inx
956        inx
957      3 bvs
959    8B9 jmp
95C        pla
95D        pla
95E        pla
95F        pla
960        pla
961        pla
962        inx                  drop
963        inx
964    80E jsr                  ChkSum
967      0 lda.zxi              @
969        tay
96A      0 inc.zx
96C      2 bne
96E      1 inc.zx
970      0 lda.zxi
972      1 sta.zx
974      0 sty.zx
976   8C26 jsr                  .
 ok
: cc@ [ hex ca c, ca c, ca c, ca c, 02 c, f4 c, 0 c,  ] ;  ok
cc@ d. -1219363202  ok
ccc@ 200 benchmark cc@ d- d. 748 -353588076  ok
3000000003. 50003 um/mod . .  -5540 20015  ok

GForth x64

Code: Select all

GForth x64
Gforth 0.7.9_20161109, Copyright (C) 1995-2016 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `help' for basic help
variable ChkSum  ok
: ggd begin dup while swap over mod dup ChkSum +! repeat drop ;  ok
: benchmark 0 ChkSum !  compiled
  dup 0 do  dup 0 do  j i ggd drop  loop loop drop  compiled
  ChkSum @ $ffff and . ;  ok
200 benchmark 748  ok
3000000003. 50003 um/mod . .
*terminal*:7:1: '3000000003.' is a double-cell integer; type `help' for more info59996 20015  ok

leepivonka · Post by **leepivonka** » Wed Jul 07, 2021 4:38 pm

Converting the VTL02 & Tiny BASIC code in viewtopic.php?f=2&t=2612&start=109
to FORTH & running it:

Code: Select all

F:\65816>\65816s\release\65816s h\0265sxb.lst
65816S Apr  8 2021 15:52:05
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 7818FBC2 008d3e sei
.g
>
65816F 2020Dec06

A=00FE X=00AC Y=F706 S=045E envMxdIZC D=0000 B=00 2BB00728 00f713 pld
.@f_primb.txt
.g
\ Inspired by http://forum.6502.org/viewtopic.php?f=2&t=2612&start=109  ok
  ok
\       FORTH                   VTL02 on v6502     TinyBASIC on vCPU  ok
\       -----                   --------------     -----------------  ok
  ok
Variable N  SeeLatest
04B8 200DBC     JSR BC0D {Variable+0009}
04BB 0000       BRK #00

 ok
Variable M  SeeLatest
04C1 200DBC     JSR BC0D {Variable+0009}
04C4 0000       BRK #00

 ok
Variable D  SeeLatest
04CA 200DBC     JSR BC0D {Variable+0009}
04CD 0000       BRK #00

 ok
Variable E  SeeLatest
04D3 200DBC     JSR BC0D {Variable+0009}
04D6 0000       BRK #00

 ok
  ok
: Prim  compiled
  CC@ 2>R  compiled
  7 N !                         \       10 N=7             10N=7  compiled
  4 M !                         \       20 M=4             20M=4  compiled
  Begin  compiled
    5 D !                       \       30 D=5             30D=5  compiled
    2 E !                       \       40 E=2             40E=2  compiled
 [ LDef @50 ]  compiled
    N @ D @ Mod                 \       50 X=N/D           50IFN%D=0GOTO100  compiled
     PluA [ tay, LRefR @100 beq, ]  compiled
                                \       55 #=%=0*100  compiled
      E @ D +!                  \       60 D=D+E           60D=D+E  compiled
      6 E @ - E !               \       70 E=6-E           70E=6-E  compiled
      D @ Dup * N @ U<=         \       80 #=N>(D*D)*50    80IFD*D<=NGOTO50  compiled
        PluA [ tay, LRefR @50 bne, ]  compiled
    N @ .                       \       90 ?=" ";          90?" ";N;  compiled
                                \       95 ?=N  compiled
 [ LDef @100 ]  compiled
    M @ N +!                    \       100 N=N+M          100N=N+M  compiled
    6 M @ - M !                 \       110 M=6-M          110M=6-M  compiled
   N @ 999 U>= Until            \       120 #=N<999*30     120IFN<999GOTO30  compiled
  cr CC@ 2R> D- 2Dup D. ." cycles, "  compiled
     D>F 10e6 F/ F. ." sec @ 10MHz "  compiled
  ;

SeeLatest  \ show disassembly                   FORTH                   VTL02              TinyBASIC
04DF 02F5       COP #F5                         CC@
04E1 48         PHA                             2>R
04E2 5A         PHY
04E3 A90700     LDA #0007 {' SInCnt0+0001}      7               \       10 N=7             10N=7
04E6 8DBB04     STA 04BB {N+0003}               N !
04E9 A90400     LDA #0004 {' SIn_Buf0}          4               \       20 M=4             20M=4
04EC 8DC404     STA 04C4 {M+0003}               M !
                                                Begin
04EF A90500     LDA #0005 {' SIn_Buf0+0001}       5             \       30 D=5             30D=5
04F2 8DCD04     STA 04CD {D+0003}                 D !
04F5 A90200     LDA #0002 {' SInEnd0}             2             \       40 E=2             40E=2
04F8 8DD604     STA 04D6 {E+0003}                 E !
                                                [ LDef @50 ]
04FB ADBB04     LDA 04BB {N+0003}                 N @           \       50 X=N/D           50IFN%D=0GOTO100
04FE CA         DEX
04FF CA         DEX
0500 9500       STA 00,x
0502 ADCD04     LDA 04CD {D+0003}                 D @
0505 203B9D     JSR 9D3B {Mod+0004}               Mod
0508 B500       LDA 00,x                          PluA
050A E8         INX
050B E8         INX
050C A8         TAY                               [ tay,
050D F02D       BEQ 053C {Prim+005D}              LRefR @100 beq, ]
                                                                \       55 #=%=0*100
050F ADD604     LDA 04D6 {E+0003}                   E @         \       60 D=D+E           60D=D+E
0512 18         CLC                                 D +!
0513 6DCD04     ADC 04CD {D+0003}
0516 8DCD04     STA 04CD {D+0003}
0519 A90600     LDA #0006 {' SInCnt0}               6           \       70 E=6-E           70E=6-E
051C 38         SEC                                 E @ -
051D EDD604     SBC 04D6 {E+0003}
0520 8DD604     STA 04D6 {E+0003}                   E !
0523 ADCD04     LDA 04CD {D+0003}                   D @         \       80 #=N>(D*D)*50    80IFD*D<=NGOTO50
0526 CA         DEX                                 Dup
0527 CA         DEX
0528 9500       STA 00,x
052A 208E9B     JSR 9B8E {*+0004}                   *
052D ADBB04     LDA 04BB {N+0003}                   N @
0530 203AA1     JSR A13A {U<=+0022}                 U<=
0533 A8         TAY                                 PluA [ tay,
0534 D0C5       BNE 04FB {Prim+001C}                LRefR @50 bne, ]
0536 ADBB04     LDA 04BB {N+0003}                 N @           \       90 ?=" ";          90?" ";N;
0539 200BB8     JSR B80B {.+0004}                 .             \       95 ?=N
                                                [ LDef @100 ]
053C ADC404     LDA 04C4 {M+0003}                 M @           \       100 N=N+M          100N=N+M
053F 18         CLC                               N +!
0540 6DBB04     ADC 04BB {N+0003}
0543 8DBB04     STA 04BB {N+0003}
0546 A90600     LDA #0006 {' SInCnt0}             6             \       110 M=6-M          110M=6-M
0549 38         SEC                               M @ -
054A EDC404     SBC 04C4 {M+0003}
054D 8DC404     STA 04C4 {M+0003}                 M !
0550 ADBB04     LDA 04BB {N+0003}                N @            \       120 #=N<999*30     120IFN<999GOTO30
0553 C9E703     CMP #03E7                        999 U>=
0556 9097       BCC 04EF {Prim+0010}             Until
0558 20DCA6     JSR A6DC {CR}                   cr
055B 02F5       COP #F5                         CC@
055D 20FA94     JSR 94FA {PsuYA}
0560 7A         PLY                             2R>
0561 68         PLA
0562 20AC98     JSR 98AC {D-+0003}              D-
0565 B400       LDY 00,x                        2Dup
0567 B502       LDA 02,x
0569 20DFB7     JSR B7DF {D.+0003}              D.
056C 202BB90863 JSR B92B {."+000B} "cycles, "   ." cycles, "
0578 20A9D1     JSR D1A9 {D>F}                  D>F
057B 2044CC0040 JSR CC44 {FLiteral+007C} 10000000  10e6
0584 2037D1     JSR D137 {F/}                   F/
0587 2028D5     JSR D528 {F.}                   F.
058A 202BB90C73 JSR B92B {."+000B} "sec @ 10MHz "  ." sec @ 10MHz "
059A 60         RTS                             ;

 ok
  ok
Prim 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139
 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443
 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613
 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787
 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971
 977 983 991 997
1732018 cycles, 0.173201799 sec @ 10MHz  ok
                        \       #=1                RUN  ok
                        \       --------------     -----------------  ok
                        \       Elapsed:           Elapsed:  ok
                        \       1m18.5s            1m16.0s eof
  ok

leepivonka · Post by **leepivonka** » Mon Jun 06, 2022 3:58 am

Playing with the game of Life
viewtopic.php?f=9&t=3706&start=75
https://github.com/Martin-H1/Forth-CS-1 ... r/life.fth

BigEd · Post by **BigEd** » Mon Jun 06, 2022 12:16 pm

One of my favourite topics!

I don't suppose optimisation is necessarily very high up your agenda, but I notice
JSR x
RTS
is a relatively common pattern, and a little shorter and sweeter as
JMP x

Martin_H · Post by **Martin_H** » Mon Jun 06, 2022 12:58 pm

leepivonka wrote:

Playing with the game of Life
viewtopic.php?f=9&t=3706&start=75
https://github.com/Martin-H1/Forth-CS-1 ... r/life.fth

Cool! Have fun with it.

leepivonka · Post by **leepivonka** » Tue Jun 07, 2022 8:25 am

BigEd:
Optimization in the compiler is a major feature of this Forth. I'm trying to get it to accept ANSI Forth & generate reasonably optimized subroutine-threaded 65816 code that is dramatically faster & only a little larger than ITC code.
Transforming "JSR x; RTS" to "JMP x" is tempting - it is shorter and faster. Most instances of x don't mind the removed return address & this transformation will work fine but instances of x that do anything with the contents of the return stack at or above the removed return address will break. I hit major complications having the compiler determine if a particular x cares about the return stack layout.

Martin_H:
Thank you for a nice chunk of Forth code. It makes a good test for the compiler & is interesting to play with too.
In col+ col- row+ row- I've removed the Mod & replaced it with tests & fixups for the boundary wrap. Mod is a slow subroutine call on the 65816.
I keep looking at how to speed up the combinations of col+ col@ col- row+ row@ row- curr@ but I don't have any good ideas yet that don't involve assembly or implementation specific compiler hints.

BigEd · Post by **BigEd** » Tue Jun 07, 2022 8:55 am

Oh... what kind of x would care about how deep things are on the return stack? I must be missing something!

Martin_H · Post by **Martin_H** » Tue Jun 07, 2022 3:12 pm

leepivonka wrote:

In col+ col- row+ row- I've removed the Mod & replaced it with tests & fixups for the boundary wrap. Mod is a slow subroutine call on the 65816.
I keep looking at how to speed up the combinations of col+ col@ col- row+ row@ row- curr@ but I don't have any good ideas yet that don't involve assembly or implementation specific compiler hints.

That occurred to me during my first optimization pass, but I left mod alone. On legacy hardware a test and fixup would definitely be faster, but on a modern machine a branch is likely to blow your instruction pipeline.

I recently ported the life program to a Scamp 3 microcontroller and on it mod had problems with negative numbers. So I used test and fixups there.

It's definitely tricky and machine dependent for such a simple block of code.

barrym95838 · Post by **barrym95838** » Wed Jun 08, 2022 10:18 pm

BigEd wrote:

Oh... what kind of x would care about how deep things are on the return stack? I must be missing something!

It's not about the depth, but rather what the word is expecting to reside at the top of the return stack. An assembly language example of this phenomenon would be primm, which doesn't like to be JMPed to under most circumstances.

BigEd · Post by **BigEd** » Thu Jun 09, 2022 6:55 am

Thanks Mike!

leepivonka · Post by **leepivonka** » Tue Nov 29, 2022 4:57 pm

Looking at the BBC BASIC Mandelbrot in viewtopic.php?f=1&t=6800&start=11

Here is a Forth version running on a 65816 in native mode. It takes about 36 seconds on a 2MHz 65816.

JimBoyd · Post by **JimBoyd** » Thu Dec 01, 2022 1:41 am

barrym95838 wrote:

BigEd wrote:

Oh... what kind of x would care about how deep things are on the return stack? I must be missing something!

It's not about the depth, but rather what the word is expecting to reside at the top of the return stack. An assembly language example of this phenomenon would be primm, which doesn't like to be JMPed to under most circumstances.

A Forth example would be the routine compiled by ." , which would be the Forth version of primm, although a leading count may be used instead of a terminating null.

leepivonka · Post by **leepivonka** » Sun Dec 04, 2022 6:29 pm

Here is another example of problems changing the last JSR to a JMP:

Code: Select all

\ Changing last JSR to JMP  ok
  ok
: RDrop1 ( r: n -- )  \ drop caller's top return stack entry  compiled
  [ pla,        \ pop my rts addr  ok
  1 d,s sta,    \ store it over my caller's return stack top entry  ok
  ] ;  SeeLatest
04C7 68         PLA
04C8 8301       STA $01,s
04CA 60         RTS

 ok
  \ This is a simple implementation of RDrop .  ok
  ok
: 2*1 ( n1 -- n2 )  \ double n1  compiled
  [ 0 d,x asl,  ok
  ] ;  SeeLatest
04D1 1600       ASL $00,x
04D3 60         RTS

 ok
  \ This is a simple implemenation of 2* .  ok
  ok
: Test1 ( -- )  compiled
  >R Nop RDrop1 ;  SeeLatest
04DC B500       LDA $00,x
04DE E8         INX
04DF E8         INX
04E0 48         PHA
04E1 EA         NOP
04E2 20C704     JSR $04C7 {RDrop1}
04E5 60         RTS

 ok
  \ Changing JSR to JMP here will cause RDrop1 to malfunction.  ok
  \ RDrop1 expects the JSR's return address to be on the return stack, with the caller's TOS below it.  ok
  ok
: T.
est2 ( n -- n )  compiled
  2*1 ;  SeeLatest
04EE 20D104     JSR $04D1 {2*1}
04F1 60         RTS

 ok
  \ Changing JSR to JMP here will work fine, & 9 cycles faster.  ok
  \ 2*1 only expects a returns address to be on the return stack.  ok
 eof
  ok

The best I've come up with so far is to have a flag on each word indicating if can handle the JSR to JMP transformation.
This also somewhat ties in with the problem of inlining & EXECUTEing this type of words.

Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth

Re: Another 65816 STC Forth