Another 65816 STC Forth
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
Here is a console log while loading "Robot game" by druzyek viewtopic.php?f=1&t=6054
So far it's down to 32kbytes in bank 0 (including 9kbytes of FORTH headers) + more data in the "tile" bank & screen data in the "screen" bank.
Partially ported & completely untested so far.
So far it's down to 32kbytes in bank 0 (including 9kbytes of FORTH headers) + more data in the "tile" bank & screen data in the "screen" bank.
Partially ported & completely untested so far.
- Attachments
-
- F_Robot_4.zip
- compile console log
- (108.26 KiB) Downloaded 133 times
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
Playing with the simple benchmark in viewtopic.php?f=9&t=6637
FIG modified 8bit Indirect-Threaded: cc@ 200 benchmark cc@ d- d. 748 -1023891113 OK 1024 sec @ 1MHz
FIG modified 8bit Subroutine-Threaded: cc@ 200 benchmark cc@ d- d. 748 -382665927 OK 383 sec @ 1MHz
FIG modified 16bit Indirect-Threaded: cc@ 200 benchmark cc@ d- d. 748 -485138841 OK 485 sec @ 1MHz
FIG modified 16bit Subroutine-Threaded: cc@ 200 benchmark cc@ d- d. 748 -192266443 OK 192 sec @ 1MHz
65816F 16bit Subroutine-Threaded inlined optimized: cc@ 200 benchmark cc@ d- d. 748 -117452133 ok 117 sec @ 1MHz
Tali 8bit Subroutine-Threaded inlined: cc@ 200 benchmark cc@ d- d. 748 -353588076 ok 354 sec @ 1MHz
GForth x64: 200 benchmark 748 ok
Comparing generated code:
The word CC@ in the following code returns the simulator cycle counter as a double.
FIG-like 8-bit Indirect-threaded console log
FIG-like 8-bit Subroutine-threaded
FIG-like 16-bit Indirect-threaded
FIG-like 16-bit Subroutine-threaded
65816F 16-bit subroutine-threaded inlined optimized
Tali 8-bit subroutine-threaded inlined
GForth x64
FIG modified 8bit Indirect-Threaded: cc@ 200 benchmark cc@ d- d. 748 -1023891113 OK 1024 sec @ 1MHz
FIG modified 8bit Subroutine-Threaded: cc@ 200 benchmark cc@ d- d. 748 -382665927 OK 383 sec @ 1MHz
FIG modified 16bit Indirect-Threaded: cc@ 200 benchmark cc@ d- d. 748 -485138841 OK 485 sec @ 1MHz
FIG modified 16bit Subroutine-Threaded: cc@ 200 benchmark cc@ d- d. 748 -192266443 OK 192 sec @ 1MHz
65816F 16bit Subroutine-Threaded inlined optimized: cc@ 200 benchmark cc@ d- d. 748 -117452133 ok 117 sec @ 1MHz
Tali 8bit Subroutine-Threaded inlined: cc@ 200 benchmark cc@ d- d. 748 -353588076 ok 354 sec @ 1MHz
GForth x64: 200 benchmark 748 ok
Comparing generated code:
Code: Select all
: ggd ( a b -- ggd )
----source----- ----65816F STC-------- ----FIG STC------- -----FIG ITC--------------
216b .word DoCol
begin
dup 04C8 LDA 00,x 1e27 jsr Dup 216d .word Dup
while 04CA TAY 1e2a jsr TestTos 216f .word ZBranch,$2183-*
04CB BNE 04D0 1e2d bne *+5
04CD JMP 04E3 1e2f jmp $1e47
swap 04D0 JSR Swap+000C 1e32 jsr Swap 2173 .word Swap
over 04D3 LDA 02,x 1e35 jsr Over 2175 .word Over
mod 04D5 JSR Mod+0004} 1e38 jsr Mod 2177 .word Mod
dup 04D8 LDA 00,x 1e3b jsr Dup 2179 .word Dup
ChkSum +! 04DA CLC 1e3e jsr ChkSum 217b .word ChkSum
04DB ADC ChkSum+0003 1e41 jsr PlusStore 217d .word PlusStore
04DE STA ChkSum+0003
repeat 04E1 BRA 04C8 1e44 jmp $1e27 217f .word Branch,$216d-*
drop 04E3 INX 1e47 jsr Drop 2183 .word Drop
04E4 INX
; 04E5 RTS 1e4a rts 2185 .word SemiS
-- 30 bytes -- -- 36 bytes -- -- 28 bytes --
: benchmark ( n -- )
----source----- ----65816F STC-------- ----FIG STC-------- -----FIG ITC--------------
2193 .word DoCol
0 04F2 LDA #0000 1e57 jsr Zero 2195 .word Zero
ChkSum 1e5a jsr ChkSum 2197 .word ChkSum
! 04F5 STA ChkSum+0003 1e5d jsr Store 2199 .word Store
dup 04F8 LDA 00,x 1e60 jsr Dup 219b .word Dup
04FA TAY
0 04FB LDA #0000 1e63 jsr Zero 219d .word Zero
do 04FE PHY 1e66 jsr PDo 219f .word PDo
04FF PHA
dup 0500 LDA 00,x 1e69 jsr Dup 21a1 .word Dup
0502 TAY
0 0503 LDA #0000 1e6c jsr Zero 21a3 .word Zero
do 0506 PHY 1e6f jsr PDo 21a5 .word PDo
0507 PHA
j 0508 LDA 05,s 1e72 jsr J 21a7 .word j
050A DEX
050B DEX
050C STA 00,x
i 050E LDA 01,s 1e75 jsr I 21a9 .word i
0510 DEX
0511 DEX
0512 STA 00,x
ggd 0514 JSR ggd 1e78 jsr ggd 21ab .word ggd
drop 0517 INX 1e7b jsr Drop 21ad .word Drop
0518 INX
loop 0519 PLA 1e7e jsr PLoop 21af .word PLoop,$21a7-*
051A INA
051B CMP 01,s
051D BNE 0507 1e81 bcc $1e72
051F PLY
loop 0520 PLA 1e83 jsr PLoop 21b3 .word PLoop,$21a1-*
0521 INA
0522 CMP 01,s
0524 BNE 04FF 1e86 bcc $1e69
0526 PLY
drop 0527 INX 1e88 jsr Drop 21b7 .word Drop
0528 INX
ChkSum 1e8b jsr ChkSum 21b9 .word ChkSum
@ 0529 LDA ChkSum+0003 1e8e jsr At 21bb .word At
. 052C JSR Dot+0004 1e91 jsr Dot 21bd .word Dot
; 052F RTS 1e94 rts 21bf .word SemiS
-- 62 bytes -- -- 62 bytes -- -- 46 bytes --
FIG-like 8-bit Indirect-threaded console log
Code: Select all
F:\65816>\65816s\release\65816s fig8\0265sxb.lst
65816S Jun 9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C8E14 000300 nop
.g
fig-FORTH 1.1 modified
A=00FE X=00EF Y=0000 S=01F5 ENvMXdIzC D=0000 B=00 58B0F860 000653 cli
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
OK
0 variable ChkSum OK
OK
: ggd ( a b -- ggd )
begin
dup
while
swap over mod
dup ChkSum +!
repeat
drop
; OK
: benchmark ( n -- )
0 ChkSum !
dup 0 do
dup 0 do
j i ggd drop
loop
loop
drop
ChkSum @ .
; eof
OK
hex OK
' ChkSum . 2163 OK
' ggd . 216D OK
' benchmark ' OK
here . 21C1 OK
decimal OK
A=00FE X=00E7 Y=0000 S=01F5 ENvMXdIzC D=0000 B=00 58B0F860 000653 cli
.d2150,21c2
2158: 86 43 48 4B 53 55 CD 14 1C 0 variable ChkSum
2161: 1C 49 .word DoVar
2163: 00 00 .word 0
2165: 83 47 47 C4 58 21 F1 : ggd
216b: F1 09 .word DoCol
begin
216d: 3E 09 .word Dup dup
216f: 44 04 12 00 .word ZBranch,$2183-* while
2173: 23 09 .word Swap swap
2175: 0B 09 .word Over over
2177: 7C 15 .word Mod mod
2179: 3E 09 .word Dup dup
217b: 61 21 .word ChkSum ChkSum
217d: 64 09 .word PlusStore +!
217f: 25 04 EC FF .word Branch,$216d-* repeat
2183: 1A 09 .word Drop drop
2185: B7 07 .word SemiS ;
2187: 89 42 45 4E 43 48 4D 41 52 CB 65 21 : benchmark
2193: F1 09 .word DoCol
2195: 75 0A .word Zero 0
2197: 61 21 .word ChkSum ChkSum
2199: B8 09 .word Store !
219b: 3E 09 .word Dup dup
219d: 75 0A .word Zero 0
219f: C6 04 .word PDo do
21a1: 3E 09 .word Dup dup
21a3: 75 0A .word Zero 0
21a5: C6 04 .word PDo do
21a7: E5 04 .word j j
21a9: DF 04 .word i i
21ab: 6B 21 .word ggd ggd
21ad: 1A 09 .word Drop drop
21af: 65 04 F6 FF .word PLoop,$21a7-* loop
21b3: 65 04 EC FF .word PLoop,$21a1-* loop
21b7: 1A 09 .word Drop drop
21b9: 61 21 .word ChkSum ChkSum
21bb: 94 09 .word At @
21bd: FB 1A .word Dot .
21bf: B7 07 .word SemiS ;
.g
cc@ 200 benchmark cc@ d- d. 748 -1023891113 OK cycles, 1023.9 sec @ 1MHz
3000000003. 50003 u/ . . -5540 20015 OK
Code: Select all
F:\65816\Fig8>sub
F:\65816\Fig8>..\cc65\bin\ca65 -l sub.lst sub.txt
F:\65816\Fig8>\65816s\release\65816s sub.lst
65816S Jun 9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C2903 000300 nop
.g
fig-FORTH 1.1 modified STC
A=00FE X=00F2 Y=0019 S=01F9 ENvMXdIzC D=0000 B=00 58B0F860 000406 cli
.@..\f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
OK
0 variable ChkSum OK
OK
: ggd ( a b -- ggd )
begin
dup
while
swap over mod
dup ChkSum +!
repeat
drop
; OK
: benchmark ( n -- )
0 ChkSum !
dup 0 do
dup 0 do
j i ggd drop
loop
loop
drop
ChkSum @ .
; eof
OK
' chksum ex4 1E19OK
' ggd ex4 1E27OK
' benchmark ex4 1E57OK
here ex4 1E95OK
1E10: 43 68 6B 53 75 6D 06 97 19 0 variable ChkSum
1e19: 20 B4 0F
1e1c: 00 00
1e1e: 07 00 00
1e21: 67 67 64 03 19 1E : ggd
begin
1e27: 20 78 06 jsr Dup dup
1e2a: 20 55 03 jsr TestTos while
1e2d: D0 03 bne *+5
1e2f: 4C 47 1E jmp $1e47
1e32: 20 61 06 jsr Swap swap
1e35: 20 4A 06 jsr Over over
1e38: 20 75 12 jsr Mod mod
1e3b: 20 78 06 jsr Dup dup
1e3e: 20 19 1E jsr ChkSum ChkSum
1e41: 20 96 06 jsr PlusStore +!
1e44: 4C 27 1E jmp $1e27 repeat
1e47: 20 57 06 jsr Drop drop
1e4a: 60 rts ;
1e4b: 62 65 6E 63 68 6D 61 72 6B 09 27 1E : benchmark
1e57: 20 AD 07 jsr Zero 0
1e5a: 20 19 1E jsr ChkSum ChkSum
1e5d: 20 E2 06 jsr Store !
1e60: 20 78 06 jsr Dup dup
1e63: 20 AD 07 jsr Zero 0
1e66: 20 EF 15 jsr PDo do
1e69: 20 78 06 jsr Dup dup
1e6c: 20 AD 07 jsr Zero 0
1e6f: 20 EF 15 jsr PDo do
1e72: 20 13 16 jsr J j
1e75: 20 0C 16 jsr I i
1e78: 20 27 1E jsr ggd ggd
1e7b: 20 57 06 jsr Drop drop
1e7e: 20 51 16 jsr PLoop loop
1e81: 90 EF bcc $1e72
1e83: 20 51 16 jsr PLoop loop
1e86: 90 E1 bcc $1e69
1e88: 20 57 06 jsr Drop drop
1e8b: 20 19 1E jsr ChkSum ChkSum
1e8e: 20 C2 06 jsr At @
1e91: 20 64 18 jsr Dot .
1e94: 60 rts ;
.g
cc@ 200 benchmark cc@ d- d. 748 -382665927 OK
3000000003. 50003 um/mod . . -5540 20015 OK
Code: Select all
D:\65816>\65816s\release\65816s fig\0265sxb.lst
65816S Jun 9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C6013 000400 nop
.g
fig-FORTH 65816
A=0002 X=00B8 Y=0E7A S=01F5 envmxdIzc D=7F00 B=00 4A90FB02 00066a lsr a
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
OK
0 variable ChkSum OK
OK
: ggd ( a b -- ggd )
begin
dup
while
swap over mod
dup ChkSum +!
repeat
drop
; OK
: benchmark ( n -- )
0 ChkSum !
dup 0 do
dup 0 do
j i ggd drop
loop
loop
drop
ChkSum @ .
; OK
eof
hex OK
' chksum . 1B0A OK
' ggd . 1B14 OK
' benchmark . 1B3C OK
OK
A=0002 X=00B8 Y=0E7A S=01F5 envmxdIzc D=7F00 B=00 4A90FB02 00066a lsr a
.d1b00,1b80
1aff: 86 43 68 6B 53 75 ED F2 1A 0 variable ChkSum
1b08: 85 09 .word DoVar
1b0a: 00 00 .word 0
1b0c: 83 67 67 E4 FF 1A : ggd
1b12: 38 09 .word DoCol
begin
1b14: A7 08 .word Dup dup
1b16: 94 04 12 00 .word ZBranch,$1b2a-* while
1b1a: 94 08 .word Swap swap
1b1c: 7D 08 .word Over over
1b1e: 48 14 .word Mod mod
1B20: A7 08 .word Dup dup
1b22: 08 1B .word ChkSum ChkSum
1b24: C3 08 .word PlusStore +!
1b26: 7E 04 EC FF .word Branch,$1b14-* repeat
1b2a: 8B 08 .word Drop drop
1b2c: 6A 07 .word SemiS ;
1b2e: 89 62 65 6E 63 68 6D 61 72 EB 0C 1B : benchmark
1b3a: 38 09 .word DoCol
1b3c: B8 09 .word Zero 0
1b3e: 08 1B .word ChkSum ChkSum
1B40: 05 09 .word Store !
1b42: A7 08 .word Dup dup
1b44: B8 09 .word Zero 0
1b46: E6 04 .word PDo do
1b48: A7 08 .word Dup dup
1b4a: B8 09 .word Zero 0
1b4c: E6 04 .word PDo do
1b4e: FF 04 .word J j
1B50: F9 04 .word I i
1b52: 12 1B .word ggd ggd
1b54: 8B 08 .word Drop drop
1b56: AD 04 F6 FF .word PLoop,$1b4e-* loop
1b5a: AD 04 EC FF .word PLoop,$1b48-* loop
1b5e: 8B 08 .word Drop drop
1B60: 08 1B .word ChkSum ChkSum
1b62: E7 08 .word At @
1b64: 76 19 .word Dot .
1b66: 6A 07 .word SemiS ;
.g
decimal OK
cc@ 200 benchmark cc@ d- d. 748 -485138841 OK
3000000003. 50003 u/ . . -5540 20015 OK
Code: Select all
D:\65816>\65816s\release\65816s fig\fsub.lst
65816S Jun 9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 EA4C1010 000300 nop
.g
4242
fig-FORTH 65816 subroutine
A=00FE X=00AC Y=0000 S=01F4 envMxdIZC D=0000 B=00 2BB00528 0004f4 pld
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) OK
OK
0 variable ChkSum OK
OK
: ggd ( a b -- ggd )
begin
dup
while
swap over mod
dup ChkSum +!
repeat
drop
; OK
: benchmark ( n -- )
0 ChkSum !
dup 0 do
dup 0 do
j i ggd drop
loop
loop
drop
ChkSum @ .
; eof
OK
hex OK
' chksum .17CA OK
' ggd .17D6 OK
' benchmark .1806 OK
A=00FE X=00AC Y=0000 S=01F4 envMxdIZC D=0000 B=00 2BB00528 0004f4 pld
.d17c0,181f
17C0: 06 43 68 6B 53 75 6D 06 BA 17 0 variable ChkSum
17ca: 20 2E 0E jsr Create.@Run
17cd: 00 00 .res 2
17cf: 03 67 67 64 03 CA 17 : ggd
begin
17d6: 20 BF 06 jsr Dup dup
17d9: 20 B2 03 jsr TestTos while
17dc: D0 03 bne *+5
17de: 4C F5 17 jmp $17f5
17e1: 20 B0 06 jsr Swap swap
17e4: 20 A5 06 jsr Over over
17e7: 20 EC 10 jsr Mod mod
17ea: 20 BF 06 jsr Dup dup
17ed: 20 CA 17 jsr ChkSum ChkSum
17F0: 20 D7 06 jsr PlusStore +!
17f3: 80 E1 bra $17d6 repeat
17f5: 20 29 03 jsr Drop drop
17f8: 60 rts ;
17f9: 09 62 65 6E 63 68 6D 61 72 6B 09 D6 17 : benchmark
1806: 20 89 07 jsr Zero 0
1809: 20 CA 17 jsr ChkSum ChkSum
180c: 20 11 07 jsr Store !
180f: 20 BF 06 jsr Dup dup
1812: 20 89 07 jsr Zero 0
1815: 20 44 03 jsr PopAY do
1818: 5A phy
1819: 48 pha
181a: 20 BF 06 jsr Dup dup
181d: 20 89 07 jsr Zero 0
1820: 20 44 03 jsr PopAY do
1823: 5A phy
1824: 48 pha
1825: 20 72 14 jsr J j
1828: 20 69 14 jsr I i
182b: 20 D6 17 jsr ggd ggd
182e: 20 29 03 jsr Drop drop
1831: 20 90 14 jsr (loop) loop
1834: 30 EF bmi $1825
1836: 20 90 14 jsr (loop) loop
1839: 30 DF bmi $181a
183b: 20 29 03 jsr Drop drop
183e: 20 CA 17 jsr ChkSum ChkSum
1841: 20 F7 06 jsr At @
1844: 20 FB 15 jsr Dot .
1847: 60 rts ;
.
.g
decimal OK
cc@ 200 benchmark cc@ d- d. 748 -192266443 OK
3000000003. 50003 u/ . . -5540 20015 OK
Code: Select all
D:\65816>\65816s\release\65816s h\0265sxb.lst
65816S Jun 9 2021 16:02:46
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 7818FBC2 008d3e sei
.g
>
65816F 2020Dec06
A=00FE X=00AC Y=F706 S=045E envMxdIZC D=0000 B=00 2BB00728 00f713 pld
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) ok
0 variable ChkSum ok
see chksum
04BD 200DBC JSR BC0D {Variable+0009}
04C0 0000
: ggd ( a b -- ggd ) compiled
begin compiled
dup compiled
while compiled
swap over mod compiled
dup ChkSum +! compiled
repeat compiled
drop compiled
; ok
see ggd
begin
04C8 B500 LDA 00,x dup
04CA A8 TAY while
04CB D003 BNE 04D0 {ggd+0008}
04CD 4CE304 JMP 04E3 {ggd+001B}
04D0 208F95 JSR 958F {Swap+000C} swap
04D3 B502 LDA 02,x over
04D5 203B9D JSR 9D3B {Mod+0004} mod
04D8 B500 LDA 00,x dup
04DA 18 CLC ChkSum +!
04DB 6DC004 ADC 04C0 {ChkSum+0003}
04DE 8DC004 STA 04C0 {ChkSum+0003}
04E1 80E5 BRA 04C8 {ggd} repeat
04E3 E8 INX drop
04E4 E8 INX
04E5 60 RTS ;
ok
: benchmark ( n -- ) compiled
0 ChkSum ! compiled
dup 0 do compiled
dup 0 do compiled
j i ggd drop compiled
loop compiled
loop compiled
drop compiled
ChkSum @ . compiled
; eof
ok
see benchmark
04F2 A90000 LDA #0000 {' SInIndx0} 0
04F5 8DC004 STA 04C0 {ChkSum+0003} ChkSum !
04F8 B500 LDA 00,x dup
04FA A8 TAY
04FB A90000 LDA #0000 {' SInIndx0} 0
04FE 5A PHY do
04FF 48 PHA
0500 B500 LDA 00,x dup
0502 A8 TAY
0503 A90000 LDA #0000 {' SInIndx0} 0
0506 5A PHY do
0507 48 PHA
0508 A305 LDA 05,s j
050A CA DEX
050B CA DEX
050C 9500 STA 00,x
050E A301 LDA 01,s i
0510 CA DEX
0511 CA DEX
0512 9500 STA 00,x
0514 20C804 JSR 04C8 {ggd} ggd
0517 E8 INX drop
0518 E8 INX
0519 68 PLA loop
051A 1A INA
051B C301 CMP 01,s
051D D0E8 BNE 0507 {benchmark+0015}
051F 7A PLY
0520 68 PLA loop
0521 1A INA
0522 C301 CMP 01,s
0524 D0D9 BNE 04FF {benchmark+000D}
0526 7A PLY
0527 E8 INX drop
0528 E8 INX
0529 ADC004 LDA 04C0 {ChkSum+0003} ChkSum @
052C 200BB8 JSR B80B {.+0004} .
052F 60 RTS ;
ok
cc@ 200 benchmark cc@ d- d. 748 -117452133 ok
3000000003. 50003 um/mod . . -5540 20015 ok
Code: Select all
D:\65816>\65816s\release\65816s tali_20200205\ophis.bin
65816S Jun 9 2021 16:02:46
32768 bytes loaded at 0x8000
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 78A200BD 00e001 sei
.g
Tali Forth 2 kernel for 65816s (5 Jan 2020)
Tali Forth 2 for the 65c02
Version 1.0 24. Jan 2020
Copyright 2014-2020 Scot W. Stevenson
Tali Forth 2 comes with absolutely NO WARRANTY
Type 'bye' to exit
1 strip-underflow ! ok
hex here . 800 ok
decimal ok
A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00e014 lsr a
.@f_benchggd.txt
.g
( http://forum.6502.org/viewtopic.php?f=9&t=6637 ) ok
ok
0 variable ChkSum ok
ok
: ggd ( a b -- ggd ) compiled
begin compiled
dup compiled
while compiled
swap over mod compiled
dup ChkSum +! compiled
repeat compiled
drop compiled
; ok
: benchmark ( n -- ) compiled
0 ChkSum ! compiled
dup 0 do compiled
dup 0 do compiled
j i ggd drop compiled
loop compiled
loop compiled
drop compiled
ChkSum @ . compiled
; ok
eof
hex here . 97A ok
decimal ok
see chksum
nt: 800 xt: 80E
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 5
080E 20 ED D4 00 00 ....
80E D4ED jsr
811 0 brk
ok
see ggd
nt: 813 xt: 81E
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 67
081E CA CA B5 02 95 00 B5 03 95 01 20 04 92 5F 08 B5 ........ .. .._..
082E 00 B4 02 95 02 94 00 B5 01 B4 03 95 03 94 01 CA ........ ........
083E CA B5 04 95 00 B5 05 95 01 20 83 9F E8 E8 CA CA ........ . ......
084E B5 02 95 00 B5 03 95 01 20 0E 08 20 78 99 4C 1E ........ .. x.L.
085E 08 E8 E8 ...
begin
81E dex dup
81F dex
820 2 lda.zx
822 0 sta.zx
824 3 lda.zx
826 1 sta.zx
828 9204 jsr while
82B B508 .word $85f
82D 0 lda.zx swap
82F 2 ldy.zx
831 2 sta.zx
833 0 sty.zx
835 1 lda.zx
837 3 ldy.zx
839 3 sta.zx
83B 1 sty.zx
83D dex over
83E dex
83F 4 lda.zx
841 0 sta.zx
843 5 lda.zx
845 1 sta.zx
847 9F83 jsr mod
84A inx
84B inx
84C dex dup
84D dex
84E 2 lda.zx
850 0 sta.zx
852 3 lda.zx
854 1 sta.zx
856 80E jsr ChkSum
859 9978 jsr +!
85C 81E jmp repeat
85F inx drop
860 inx
ok
see benchmark
nt: 862 xt: 873
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 262
0873 CA CA 74 00 74 01 20 0E 08 20 09 A1 CA CA B5 02 ..t.t. . . ......
0883 95 00 B5 03 95 01 CA CA 74 00 74 01 A9 09 48 A9 ........ t.t...H.
0893 61 48 38 A9 00 F5 02 95 02 A9 80 F5 03 95 03 48 aH8..... .......H
08A3 B5 02 48 18 B5 00 75 02 95 00 B5 01 75 03 48 B5 ..H...u. ....u.H.
08B3 00 48 E8 E8 E8 E8 CA CA B5 02 95 00 B5 03 95 01 .H...... ........
08C3 CA CA 74 00 74 01 A9 09 48 A9 45 48 38 A9 00 F5 ..t.t... H.EH8...
08D3 02 95 02 A9 80 F5 03 95 03 48 B5 02 48 18 B5 00 ........ .H..H...
08E3 75 02 95 00 B5 01 75 03 48 B5 00 48 E8 E8 E8 E8 u.....u. H..H....
08F3 CA CA 86 2A BA 38 BD 07 01 FD 09 01 A8 BD 08 01 ...*.8.. ........
0903 FD 0A 01 A6 2A 95 01 94 00 CA CA 86 2A BA 38 BD ....*... ....*.8.
0913 01 01 FD 03 01 A8 BD 02 01 FD 04 01 A6 2A 95 01 ........ .....*..
0923 94 00 20 1E 08 E8 E8 20 8A 97 18 68 75 00 A8 B8 .. .... ...hu...
0933 68 75 01 48 98 48 E8 E8 70 03 4C F3 08 68 68 68 hu.H.H.. p.L..hhh
0943 68 68 68 20 8A 97 18 68 75 00 A8 B8 68 75 01 48 hhh ...h u...hu.H
0953 98 48 E8 E8 70 03 4C B9 08 68 68 68 68 68 68 E8 .H..p.L. .hhhhhh.
0963 E8 20 0E 08 A1 00 A8 F6 00 D0 02 F6 01 A1 00 95 . ...... ........
0973 01 94 00 20 26 8C ... &.
873 dex 0
874 dex
875 0 stz.zx
877 1 stz.zx
879 80E jsr ChkSum
87C A109 jsr !
87F dex dup
880 dex
881 2 lda.zx
883 0 sta.zx
885 3 lda.zx
887 1 sta.zx
889 dex 0
88A dex
88B 0 stz.zx
88D 1 stz.zx
88F 9 lda.# do
891 pha
892 61 lda.#
894 pha
895 sec
896 0 lda.#
898 2 sbc.zx
89A 2 sta.zx
89C 80 lda.#
89E 3 sbc.zx
8A0 3 sta.zx
8A2 pha
8A3 2 lda.zx
8A5 pha
8A6 clc
8A7 0 lda.zx
8A9 2 adc.zx
8AB 0 sta.zx
8AD 1 lda.zx
8AF 3 adc.zx
8B1 pha
8B2 0 lda.zx
8B4 pha
8B5 inx
8B6 inx
8B7 inx
8B8 inx
8B9 dex dup
8BA dex
8BB 2 lda.zx
8BD 0 sta.zx
8BF 3 lda.zx
8C1 1 sta.zx
8C3 dex 0
8C4 dex
8C5 0 stz.zx
8C7 1 stz.zx
8C9 9 lda.# do
8CB pha
8CC 45 lda.#
8CE pha
8CF sec
8D0 0 lda.#
8D2 2 sbc.zx
8D4 2 sta.zx
8D6 80 lda.#
8D8 3 sbc.zx
8DA 3 sta.zx
8DC pha
8DD 2 lda.zx
8DF pha
8E0 clc
8E1 0 lda.zx
8E3 2 adc.zx
8E5 0 sta.zx
8E7 1 lda.zx
8E9 3 adc.zx
8EB pha
8EC 0 lda.zx
8EE pha
8EF inx
8F0 inx
8F1 inx
8F2 inx
8F3 dex j
8F4 dex
8F5 2A stx.z
8F7 tsx
8F8 sec
8F9 107 lda.x
8FC 109 sbc.x
8FF tay
900 108 lda.x
903 10A sbc.x
906 2A ldx.z
908 1 sta.zx
90A 0 sty.zx
90C dex i
90D dex
90E 2A stx.z
910 tsx
911 sec
912 101 lda.x
915 103 sbc.x
918 tay
919 102 lda.x
91C 104 sbc.x
91F 2A ldx.z
921 1 sta.zx
923 0 sty.zx
925 81E jsr ggd
928 inx drop
929 inx
92A 978A jsr loop
92D clc
92E pla
92F 0 adc.zx
931 tay
932 clv
933 pla
934 1 adc.zx
936 pha
937 tya
938 pha
939 inx
93A inx
93B 3 bvs
93D 8F3 jmp
940 pla
941 pla
942 pla
943 pla
944 pla
945 pla
946 978A jsr loop
949 clc
94A pla
94B 0 adc.zx
94D tay
94E clv
94F pla
950 1 adc.zx
952 pha
953 tya
954 pha
955 inx
956 inx
957 3 bvs
959 8B9 jmp
95C pla
95D pla
95E pla
95F pla
960 pla
961 pla
962 inx drop
963 inx
964 80E jsr ChkSum
967 0 lda.zxi @
969 tay
96A 0 inc.zx
96C 2 bne
96E 1 inc.zx
970 0 lda.zxi
972 1 sta.zx
974 0 sty.zx
976 8C26 jsr .
ok
: cc@ [ hex ca c, ca c, ca c, ca c, 02 c, f4 c, 0 c, ] ; ok
cc@ d. -1219363202 ok
ccc@ 200 benchmark cc@ d- d. 748 -353588076 ok
3000000003. 50003 um/mod . . -5540 20015 ok
Code: Select all
GForth x64
Gforth 0.7.9_20161109, Copyright (C) 1995-2016 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `help' for basic help
variable ChkSum ok
: ggd begin dup while swap over mod dup ChkSum +! repeat drop ; ok
: benchmark 0 ChkSum ! compiled
dup 0 do dup 0 do j i ggd drop loop loop drop compiled
ChkSum @ $ffff and . ; ok
200 benchmark 748 ok
3000000003. 50003 um/mod . .
*terminal*:7:1: '3000000003.' is a double-cell integer; type `help' for more info59996 20015 ok
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
Converting the VTL02 & Tiny BASIC code in viewtopic.php?f=2&t=2612&start=109
to FORTH & running it:
to FORTH & running it:
Code: Select all
F:\65816>\65816s\release\65816s h\0265sxb.lst
65816S Apr 8 2021 15:52:05
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 7818FBC2 008d3e sei
.g
>
65816F 2020Dec06
A=00FE X=00AC Y=F706 S=045E envMxdIZC D=0000 B=00 2BB00728 00f713 pld
.@f_primb.txt
.g
\ Inspired by http://forum.6502.org/viewtopic.php?f=2&t=2612&start=109 ok
ok
\ FORTH VTL02 on v6502 TinyBASIC on vCPU ok
\ ----- -------------- ----------------- ok
ok
Variable N SeeLatest
04B8 200DBC JSR BC0D {Variable+0009}
04BB 0000 BRK #00
ok
Variable M SeeLatest
04C1 200DBC JSR BC0D {Variable+0009}
04C4 0000 BRK #00
ok
Variable D SeeLatest
04CA 200DBC JSR BC0D {Variable+0009}
04CD 0000 BRK #00
ok
Variable E SeeLatest
04D3 200DBC JSR BC0D {Variable+0009}
04D6 0000 BRK #00
ok
ok
: Prim compiled
CC@ 2>R compiled
7 N ! \ 10 N=7 10N=7 compiled
4 M ! \ 20 M=4 20M=4 compiled
Begin compiled
5 D ! \ 30 D=5 30D=5 compiled
2 E ! \ 40 E=2 40E=2 compiled
[ LDef @50 ] compiled
N @ D @ Mod \ 50 X=N/D 50IFN%D=0GOTO100 compiled
PluA [ tay, LRefR @100 beq, ] compiled
\ 55 #=%=0*100 compiled
E @ D +! \ 60 D=D+E 60D=D+E compiled
6 E @ - E ! \ 70 E=6-E 70E=6-E compiled
D @ Dup * N @ U<= \ 80 #=N>(D*D)*50 80IFD*D<=NGOTO50 compiled
PluA [ tay, LRefR @50 bne, ] compiled
N @ . \ 90 ?=" "; 90?" ";N; compiled
\ 95 ?=N compiled
[ LDef @100 ] compiled
M @ N +! \ 100 N=N+M 100N=N+M compiled
6 M @ - M ! \ 110 M=6-M 110M=6-M compiled
N @ 999 U>= Until \ 120 #=N<999*30 120IFN<999GOTO30 compiled
cr CC@ 2R> D- 2Dup D. ." cycles, " compiled
D>F 10e6 F/ F. ." sec @ 10MHz " compiled
;
SeeLatest \ show disassembly FORTH VTL02 TinyBASIC
04DF 02F5 COP #F5 CC@
04E1 48 PHA 2>R
04E2 5A PHY
04E3 A90700 LDA #0007 {' SInCnt0+0001} 7 \ 10 N=7 10N=7
04E6 8DBB04 STA 04BB {N+0003} N !
04E9 A90400 LDA #0004 {' SIn_Buf0} 4 \ 20 M=4 20M=4
04EC 8DC404 STA 04C4 {M+0003} M !
Begin
04EF A90500 LDA #0005 {' SIn_Buf0+0001} 5 \ 30 D=5 30D=5
04F2 8DCD04 STA 04CD {D+0003} D !
04F5 A90200 LDA #0002 {' SInEnd0} 2 \ 40 E=2 40E=2
04F8 8DD604 STA 04D6 {E+0003} E !
[ LDef @50 ]
04FB ADBB04 LDA 04BB {N+0003} N @ \ 50 X=N/D 50IFN%D=0GOTO100
04FE CA DEX
04FF CA DEX
0500 9500 STA 00,x
0502 ADCD04 LDA 04CD {D+0003} D @
0505 203B9D JSR 9D3B {Mod+0004} Mod
0508 B500 LDA 00,x PluA
050A E8 INX
050B E8 INX
050C A8 TAY [ tay,
050D F02D BEQ 053C {Prim+005D} LRefR @100 beq, ]
\ 55 #=%=0*100
050F ADD604 LDA 04D6 {E+0003} E @ \ 60 D=D+E 60D=D+E
0512 18 CLC D +!
0513 6DCD04 ADC 04CD {D+0003}
0516 8DCD04 STA 04CD {D+0003}
0519 A90600 LDA #0006 {' SInCnt0} 6 \ 70 E=6-E 70E=6-E
051C 38 SEC E @ -
051D EDD604 SBC 04D6 {E+0003}
0520 8DD604 STA 04D6 {E+0003} E !
0523 ADCD04 LDA 04CD {D+0003} D @ \ 80 #=N>(D*D)*50 80IFD*D<=NGOTO50
0526 CA DEX Dup
0527 CA DEX
0528 9500 STA 00,x
052A 208E9B JSR 9B8E {*+0004} *
052D ADBB04 LDA 04BB {N+0003} N @
0530 203AA1 JSR A13A {U<=+0022} U<=
0533 A8 TAY PluA [ tay,
0534 D0C5 BNE 04FB {Prim+001C} LRefR @50 bne, ]
0536 ADBB04 LDA 04BB {N+0003} N @ \ 90 ?=" "; 90?" ";N;
0539 200BB8 JSR B80B {.+0004} . \ 95 ?=N
[ LDef @100 ]
053C ADC404 LDA 04C4 {M+0003} M @ \ 100 N=N+M 100N=N+M
053F 18 CLC N +!
0540 6DBB04 ADC 04BB {N+0003}
0543 8DBB04 STA 04BB {N+0003}
0546 A90600 LDA #0006 {' SInCnt0} 6 \ 110 M=6-M 110M=6-M
0549 38 SEC M @ -
054A EDC404 SBC 04C4 {M+0003}
054D 8DC404 STA 04C4 {M+0003} M !
0550 ADBB04 LDA 04BB {N+0003} N @ \ 120 #=N<999*30 120IFN<999GOTO30
0553 C9E703 CMP #03E7 999 U>=
0556 9097 BCC 04EF {Prim+0010} Until
0558 20DCA6 JSR A6DC {CR} cr
055B 02F5 COP #F5 CC@
055D 20FA94 JSR 94FA {PsuYA}
0560 7A PLY 2R>
0561 68 PLA
0562 20AC98 JSR 98AC {D-+0003} D-
0565 B400 LDY 00,x 2Dup
0567 B502 LDA 02,x
0569 20DFB7 JSR B7DF {D.+0003} D.
056C 202BB90863 JSR B92B {."+000B} "cycles, " ." cycles, "
0578 20A9D1 JSR D1A9 {D>F} D>F
057B 2044CC0040 JSR CC44 {FLiteral+007C} 10000000 10e6
0584 2037D1 JSR D137 {F/} F/
0587 2028D5 JSR D528 {F.} F.
058A 202BB90C73 JSR B92B {."+000B} "sec @ 10MHz " ." sec @ 10MHz "
059A 60 RTS ;
ok
ok
Prim 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139
149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443
449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613
617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787
797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971
977 983 991 997
1732018 cycles, 0.173201799 sec @ 10MHz ok
\ #=1 RUN ok
\ -------------- ----------------- ok
\ Elapsed: Elapsed: ok
\ 1m18.5s 1m16.0s eof
ok
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
Playing with the game of Life
viewtopic.php?f=9&t=3706&start=75
https://github.com/Martin-H1/Forth-CS-1 ... r/life.fth
viewtopic.php?f=9&t=3706&start=75
https://github.com/Martin-H1/Forth-CS-1 ... r/life.fth
- Attachments
-
- F_LifeB4.zip
- Console log, with FORTH typed in the right margin
- (5.78 KiB) Downloaded 106 times
Re: Another 65816 STC Forth
One of my favourite topics!
I don't suppose optimisation is necessarily very high up your agenda, but I notice
JSR x
RTS
is a relatively common pattern, and a little shorter and sweeter as
JMP x
I don't suppose optimisation is necessarily very high up your agenda, but I notice
JSR x
RTS
is a relatively common pattern, and a little shorter and sweeter as
JMP x
Re: Another 65816 STC Forth
leepivonka wrote:
Playing with the game of Life
viewtopic.php?f=9&t=3706&start=75
https://github.com/Martin-H1/Forth-CS-1 ... r/life.fth
viewtopic.php?f=9&t=3706&start=75
https://github.com/Martin-H1/Forth-CS-1 ... r/life.fth
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
BigEd:
Optimization in the compiler is a major feature of this Forth. I'm trying to get it to accept ANSI Forth & generate reasonably optimized subroutine-threaded 65816 code that is dramatically faster & only a little larger than ITC code.
Transforming "JSR x; RTS" to "JMP x" is tempting - it is shorter and faster. Most instances of x don't mind the removed return address & this transformation will work fine but instances of x that do anything with the contents of the return stack at or above the removed return address will break. I hit major complications having the compiler determine if a particular x cares about the return stack layout.
Martin_H:
Thank you for a nice chunk of Forth code. It makes a good test for the compiler & is interesting to play with too.
In col+ col- row+ row- I've removed the Mod & replaced it with tests & fixups for the boundary wrap. Mod is a slow subroutine call on the 65816.
I keep looking at how to speed up the combinations of col+ col@ col- row+ row@ row- curr@ but I don't have any good ideas yet that don't involve assembly or implementation specific compiler hints.
Optimization in the compiler is a major feature of this Forth. I'm trying to get it to accept ANSI Forth & generate reasonably optimized subroutine-threaded 65816 code that is dramatically faster & only a little larger than ITC code.
Transforming "JSR x; RTS" to "JMP x" is tempting - it is shorter and faster. Most instances of x don't mind the removed return address & this transformation will work fine but instances of x that do anything with the contents of the return stack at or above the removed return address will break. I hit major complications having the compiler determine if a particular x cares about the return stack layout.
Martin_H:
Thank you for a nice chunk of Forth code. It makes a good test for the compiler & is interesting to play with too.
In col+ col- row+ row- I've removed the Mod & replaced it with tests & fixups for the boundary wrap. Mod is a slow subroutine call on the 65816.
I keep looking at how to speed up the combinations of col+ col@ col- row+ row@ row- curr@ but I don't have any good ideas yet that don't involve assembly or implementation specific compiler hints.
Re: Another 65816 STC Forth
Oh... what kind of x would care about how deep things are on the return stack? I must be missing something!
Re: Another 65816 STC Forth
leepivonka wrote:
In col+ col- row+ row- I've removed the Mod & replaced it with tests & fixups for the boundary wrap. Mod is a slow subroutine call on the 65816.
I keep looking at how to speed up the combinations of col+ col@ col- row+ row@ row- curr@ but I don't have any good ideas yet that don't involve assembly or implementation specific compiler hints.
I keep looking at how to speed up the combinations of col+ col@ col- row+ row@ row- curr@ but I don't have any good ideas yet that don't involve assembly or implementation specific compiler hints.
I recently ported the life program to a Scamp 3 microcontroller and on it mod had problems with negative numbers. So I used test and fixups there.
It's definitely tricky and machine dependent for such a simple block of code.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Another 65816 STC Forth
BigEd wrote:
Oh... what kind of x would care about how deep things are on the return stack? I must be missing something!
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: Another 65816 STC Forth
Thanks Mike!
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
Looking at the BBC BASIC Mandelbrot in viewtopic.php?f=1&t=6800&start=11
Here is a Forth version running on a 65816 in native mode. It takes about 36 seconds on a 2MHz 65816.
Here is a Forth version running on a 65816 in native mode. It takes about 36 seconds on a 2MHz 65816.
- Attachments
-
- F_MandSB1_8.zip
- Console log, with additional comments on disassemblys
- (6.18 KiB) Downloaded 96 times
Re: Another 65816 STC Forth
barrym95838 wrote:
BigEd wrote:
Oh... what kind of x would care about how deep things are on the return stack? I must be missing something!
A Forth example would be the routine compiled by ." , which would be the Forth version of primm, although a leading count may be used instead of a terminating null.
-
leepivonka
- Posts: 168
- Joined: 15 Apr 2016
Re: Another 65816 STC Forth
Here is another example of problems changing the last JSR to a JMP:
The best I've come up with so far is to have a flag on each word indicating if can handle the JSR to JMP transformation.
This also somewhat ties in with the problem of inlining & EXECUTEing this type of words.
Code: Select all
\ Changing last JSR to JMP ok
ok
: RDrop1 ( r: n -- ) \ drop caller's top return stack entry compiled
[ pla, \ pop my rts addr ok
1 d,s sta, \ store it over my caller's return stack top entry ok
] ; SeeLatest
04C7 68 PLA
04C8 8301 STA $01,s
04CA 60 RTS
ok
\ This is a simple implementation of RDrop . ok
ok
: 2*1 ( n1 -- n2 ) \ double n1 compiled
[ 0 d,x asl, ok
] ; SeeLatest
04D1 1600 ASL $00,x
04D3 60 RTS
ok
\ This is a simple implemenation of 2* . ok
ok
: Test1 ( -- ) compiled
>R Nop RDrop1 ; SeeLatest
04DC B500 LDA $00,x
04DE E8 INX
04DF E8 INX
04E0 48 PHA
04E1 EA NOP
04E2 20C704 JSR $04C7 {RDrop1}
04E5 60 RTS
ok
\ Changing JSR to JMP here will cause RDrop1 to malfunction. ok
\ RDrop1 expects the JSR's return address to be on the return stack, with the caller's TOS below it. ok
ok
: T.
est2 ( n -- n ) compiled
2*1 ; SeeLatest
04EE 20D104 JSR $04D1 {2*1}
04F1 60 RTS
ok
\ Changing JSR to JMP here will work fine, & 9 cycles faster. ok
\ 2*1 only expects a returns address to be on the return stack. ok
eof
ok
This also somewhat ties in with the problem of inlining & EXECUTEing this type of words.