Re: Fastest 65C02 Forth?
Posted: Wed Dec 18, 2019 8:22 am
Here is some sample code on a hacked up version of Tali:
For comparison, here is the same sample in 65816F running on a 65816 CPU:
Code: Select all
Tali Forth 2 kernel for 65816s (27 Dec 2018)
-1 strip-underflow ! ok
ok
: DSum >r 0. r> 1+ 0 do i m+ loop ; ok
: Timer cc@ 2>r 1000 DSum cc@ 2r> d- d. ." cycles " d. ; ok
timer 143466 cycles 500500 ok
see dsum
nt: A9A xt: AC0
flags (CO AN IM NN UF HC R6): 0 0 0 1 0 0 0
size (decimal): 118
0AC0 B5 00 B4 01 E8 E8 5A 48 A9 00 CA CA 95 00 74 01 ......ZH ......t.
0AD0 A9 00 CA CA 95 00 74 01 68 7A CA CA 95 00 94 01 ......t. hz......
0AE0 F6 00 D0 02 F6 01 CA CA 74 00 74 01 38 A9 00 F5 ........ t.t.8...
0AF0 02 A8 A9 80 F5 03 95 03 48 5A 18 98 75 00 A8 B5 ........ HZ..u...
0B00 01 75 03 E8 E8 E8 E8 48 5A DA BA 38 BD 02 01 FD .u.....H Z..8....
0B10 04 01 A8 BD 03 01 FD 05 01 FA CA CA 95 01 94 00 ........ ........
0B20 20 88 A8 18 68 69 01 A8 68 69 00 48 5A 70 03 4C ...hi.. hi.HZp.L
0B30 09 0B 68 68 68 68 ..hhhh
AC0 0 lda.zx >r
AC2 1 ldy.zx
AC4 inx
AC5 inx
AC6 phy
AC7 pha
AC8 0 lda.# 0.
ACA dex
ACB dex
ACC 0 sta.zx
ACE 1 stz.zx
AD0 0 lda.#
AD2 dex
AD3 dex
AD4 0 sta.zx
AD6 1 stz.zx
AD8 pla r>
AD9 ply
ADA dex
ADB dex
ADC 0 sta.zx
ADE 1 sty.zx
AE0 0 inc.zx 1+
AE2 2 bne
AE4 1 inc.zx
AE6 dex 0
AE7 dex
AE8 0 stz.zx
AEA 1 stz.zx
AEC sec do
AED 0 lda.#
AEF 2 sbc.zx
AF1 tay
AF2 80 lda.#
AF4 3 sbc.zx
AF6 3 sta.zx
AF8 pha
AF9 phy
AFA clc
AFB tya
AFC 0 adc.zx
AFE tay
AFF 1 lda.zx
B01 3 adc.zx
B03 inx
B04 inx
B05 inx
B06 inx
B07 pha
B08 phy
B09 phx i
B0A tsx
B0B sec
B0C 102 lda.x
B0F 104 sbc.x
B12 tay
B13 103 lda.x
B16 105 sbc.x
B19 plx
B1A dex
B1B dex
B1C 1 sta.zx
B1E 0 sty.zx
B20 A888 ( ' m+ 3 + ) jsr m+
B23 clc loop
B24 pla
B25 1 adc.#
B27 tay
B28 pla
B29 0 adc.#
B2B pha
B2C phy
B2D 3 bvs
B2F B09 ( ' DSum 49 + ) jmp
B32 pla
B33 pla
B34 pla
B35 pla ok
ok
see timer
nt: B1C xt: B42
flags (CO AN IM NN UF HC R6): 0 0 0 1 0 0 0
size (decimal): 50
0B42 20 17 80 20 01 AF A9 E8 A0 03 CA CA 95 00 94 01 .. .... ........
0B52 20 C0 0A 20 17 80 20 2A AF 20 DE A8 20 24 B5 20 .. .. * . .. $.
0B62 61 A3 4C 6E 0B 63 79 63 6C 65 73 20 20 6A B5 20 a.Ln.cyc les j.
0B72 24 B5 $.
B42 8017 ( ' cc@ ) jsr cc@
B45 AF01 ( ' 2>r ) jsr 2>r
B48 E8 lda.# 1000
B4A 3 ldy.#
B4C dex
B4D dex
B4E 0 sta.zx
B50 1 sty.zx
B52 AC0 ( ' DSum ) jsr DSum
B55 8017 ( ' cc@ ) jsr cc@
B58 AF2A ( ' 2r> ) jsr 2r>
B5B A8DE ( ' d- 3 + ) jsr d-
B5E B524 ( ' d. 3 + ) jsr d.
B61 A361 ( ' sliteral 34 + ) jsr ." cycles "
B64 B6E ( ' Timer 2C + ) jmp
B67 "cycles "
B6E B56A ( ' Type ) jsr
B71 B524 ( ' d. 3 + ) jsr d.
ok
-------------------------------------------------------------------
Tali Forth 2 kernel for 65816s (27 Dec 2018)
-1 strip-underflow ! ok
ok
$ff00 constant ACIA:STAT ok
$ff01 constant ACIA:DATA ok
ok
: emit1 compiled
begin acia:stat c@ $10 and until compiled
acia:data c! ; ok
ok
ok
see acia:stat
nt: A9F xt: AC5
flags (CO AN IM NN UF HC R6): 0 0 0 1 0 0 0
size (decimal): 7
0AC5 A9 00 A0 FF 4C 43 B8 ....LC.
AC5 0 lda.#
AC7 FF ldy.#
AC9 B843 ( ' dup 7 + ) jmp ok
see acia:data
nt: AB5 xt: ADB
flags (CO AN IM NN UF HC R6): 0 0 0 1 0 0 0
size (decimal): 7
0ADB A9 01 A0 FF 4C 43 B8 ....LC.
ADB 1 lda.#
ADD FF ldy.#
ADF B843 ( ' dup 7 + ) jmp ok
ok
see emit1
nt: AC7 xt: AED
flags (CO AN IM NN UF HC R6): 0 0 0 1 0 0 0
size (decimal): 42
0AED 20 C5 0A A1 00 95 00 74 01 A9 10 CA CA 95 00 74 ......t .......t
0AFD 01 20 78 B3 E8 E8 B5 FE 15 FF D0 03 4C ED 0A 20 . x..... ....L..
0B0D DB 0A B5 02 81 00 E8 E8 E8 E8 ........ ..
begin
AED AC5 ( ' ACIA:STAT ) jsr ACIA:STAT
AF0 0 lda.zxi c@
AF2 0 sta.zx
AF4 1 stz.zx
AF6 10 lda.# $10
AF8 dex
AF9 dex
AFA 0 sta.zx
AFC 1 stz.zx
AFE B378 ( ' and 3 + ) jsr and
B01 inx
B02 inx
B03 FE lda.zx until
B05 FF ora.zx
B07 3 bne
B09 AED ( ' emit1 ) jmp
B0C ADB ( ' ACIA:DATA ) jsr ACIA:DATA
B0F 2 lda.zx c!
B11 0 sta.zxi
B13 inx
B14 inx
B15 inx
B16 inx ok
ok
For comparison, here is the same sample in 65816F running on a 65816 CPU:
Code: Select all
65816F 2019Oct06
: DSum >r 0. r> 1+ 0 do i m+ loop ; ok
: Timer cc@ 2>r 1000 DSum cc@ 2r> d- d. d. ; ok
timer 56195 500500 ok Runs in 56195 machine cycles
see dsum
04F3 B500 LDA 00,x >r
04F5 E8 INX
04F6 E8 INX
04F7 48 PHA
04F8 A00000 LDY #0000 {' SInIndx0} 0.
04FB A90000 LDA #0000 {' SInIndx0}
04FE 20D685 JSR 85D6 {PsuYA}
0501 68 PLA r>
0502 1A INA 1+
0503 A8 TAY
0504 A90000 LDA #0000 {' SInIndx0} 0
0507 5A PHY do
0508 48 PHA
0509 A301 LDA 01,s i
050B 20D688 JSR 88D6 {M++0004} m+
050E 68 PLA loop
050F 1A INA
0510 C301 CMP 01,s
0512 D0F4 BNE 0508 {DSum+0015}
0514 7A PLY
0515 60 RTS ;
ok
see timer
051E 02F5 COP #F5 cc@
0520 48 PHA 2>r
0521 5A PHY
0522 A9E803 LDA #03E8 1000
0525 20F704 JSR 04F7 {DSum+0004} DSum
0528 02F5 COP #F5 cc@
052A 20D685 JSR 85D6 {PsuYA}
052D 7A PLY 2r>
052E 68 PLA
052F 203C89 JSR 893C {D-+0003} d-
0532 2003A5 JSR A503 {D.} d.
0535 2003A5 JSR A503 {D.} d.
0538 60 RTS ;
ok
-------------------------------------------------------
65816F 2019Oct06
ok
$ff00 constant ACIA:STAT ok
$ff01 constant ACIA:DATA ok
: emit1 compiled
begin ACIA:STAT c@ $10 and until compiled
ACIA:DATA c! ; ok
ok
see acia:Stat
04F8 A900FF LDA #FF00 {' ACIA:STAT}
04FB 4C9A85 JMP 859A {PsuA}
ok
see acia:data
050A A901FF LDA #FF01 {' ACIA:DATA}
050D 4C9A85 JMP 859A {PsuA}
ok
see emit1
begin
0518 A90000 LDA #0000 {' SInIndx0} ACIA:STAT c@
051B E220 SEP #20 {Loc+0004}
051D AD00FF LDA FF00 {' ACIA:STAT}
0520 C220 REP #20 {Loc+0004}
0522 291000 AND #0010 {' SInIndx2} $10 and
0525 A8 TAY until
0526 F0F0 BEQ 0518 {emit1}
0528 B500 LDA 00,x ACIA:DATA c!
052A E8 INX
052B E8 INX
052C E220 SEP #20 {Loc+0004}
052E 8D01FF STA FF01 {' ACIA:DATA}
0531 C220 REP #20 {Loc+0004}
0533 60 RTS ;
ok