Here are some small samples of inline assembly using a STC 65816 Forth.
Inline assembly is really easy since all the code (both Forth & assembler) compiles to 65816 machine code.
Code:
D:\65816>\65816s\release\65816s h\0265sxb.lst
65816S Oct 16 2020 14:29:43
65c265 mode on
A=0000 X=0000 Y=0000 S=0180 EnvMXdIzc D=0000 B=00 7818FBC2 008f3e sei
.g
>
65816F 2020Dec06
A=00FE X=00A6 Y=F6D0 S=0458 envMxdIZC D=0000 B=00 2BB00728 00f6dd pld
.@z2.txt
.g
\ Run on 65816F (An STC Forth for the 65816 in native mode) ok
ok
\ Show inline assembly mixed with Forth ok
$f0 constant ScreenBank SeeLatest \ starts at $f00000
04BB A9F000 LDA #00F0 {' ScreenBank}
04BE 4CBE96 JMP 96BE {PsuA}
ok
$100 constant ScreenWidth SeeLatest
04CF A90001 LDA #0100 {' ScreenWidth}
04D2 4CBE96 JMP 96BE {PsuA}
ok
: ScreenWidth* ( n -- n ) \ fast version, hard-coded Screen_Width compiled
[ ScreenWidth $100 <> [if] abort [then] ] compiled
PluA [ xba, $ff00 ## and, ] PsuA ; inline SeeLatest
04E4 B500 LDA 00,x PluA
04E6 E8 INX
04E7 E8 INX
04E8 EB XBA [ xba,
04E9 2900FF AND #FF00 {LED+0048} $ff00 ## and,
04EC CA DEX ] PsuA
04ED CA DEX
04EE 9500 STA 00,x
04F0 60 RTS ;
04F1 60 RTS inline
ok
: ScreenGetPixel ( y x -- color ) compiled
PluA PsuA swap ScreenWidth* + compiled
PluA [ txy, tax, 0 ScreenBank al,x lda, tyx, $ff ## and, ] PsuA ; SeeLatest
0503 B500 LDA 00,x PluA
0505 E8 INX
0506 E8 INX
0507 B400 LDY 00,x PsuA swap
0509 9500 STA 00,x
050B 98 TYA
050C EB XBA ScreenWidth*
050D 2900FF AND #FF00 {LED+0048}
0510 18 CLC +
0511 7500 ADC 00,x
0513 E8 INX PluA
0514 E8 INX
0515 9B TXY [ txy,
0516 AA TAX tax,
0517 BF0000F0 LDA F00000,x 0 ScreenBank al,x lda,
051B BB TYX tyx,
051C 29FF00 AND #00FF {' ScreenBank+000F} $ff ## and,
051F CA DEX ] PsuA
0520 CA DEX
0521 9500 STA 00,x
0523 60 RTS ;
ok
ok
: Test1 \ Show a compile-time expression compiled
[ ScreenBank u>d 16 DLShift 2Literal ] D. ; SeeLatest
052C A0F000 LDY #00F0 {' ScreenBank} [ ScreenBank u>d 16 DLShift 2Literal ]
052F A90000 LDA #0000 {' SInIndx0}
0532 20C5B9 JSR B9C5 {D.+0003} D.
0535 60 RTS ;
ok
ok
: Digit1 ( u -- char ) \ Show the Forth compiler doing some 65816 optimizations compiled
dup 9 u> if 7 + then [char] 0 + ; SeeLatest
053F B500 LDA 00,x dup
0541 C90900 CMP #0009 {' SInIndx1+0001} 9 u>
0544 F002 BEQ 0548 {Digit1+0009} if
0546 B003 BCS 054B {Digit1+000C}
0548 4C5305 JMP 0553 {Digit1+0014}
054B A90700 LDA #0007 {' SInCnt0+0001} 7
054E 18 CLC +
054F 7500 ADC 00,x
0551 9500 STA 00,x
then
0553 A93000 LDA #0030 {' SOutCnt0+000A} [char] 0
0556 18 CLC +
0557 7500 ADC 00,x
0559 9500 STA 00,x
055B 60 RTS ;
ok
: Digit2 ( u -- char ) \ But hand-coded 65816 assembly is still smaller & faster compiled
[ 0 d,x lda, 10 ## cmp, IfCs, 6 ## adc, Then, char 0 ## adc, 0 d,x sta, ] ; SeeLatest
0565 B500 LDA 00,x [ 0 d,x lda,
0567 C90A00 CMP #000A {' SInEnd1} 10 ## cmp,
056A 9003 BCC 056F {Digit2+000A} IfCs,
056C 690600 ADC #0006 {' SInCnt0} 6 ## adc,
Then,
056F 693000 ADC #0030 {' SOutCnt0+000A} char 0 ## adc,
0572 9500 STA 00,x 0 d,x sta,
0574 60 RTS ] ;
ok
: Digit3 ( u -- char ) \ assembler, using local labels compiled
[ 0 d,x lda, 10 ## cmp, LRefR @b bcc, ok
6 ## adc, ok
LDef @b char 0 ## adc, 0 d,x sta, ] ; SeeLatest
057E B500 LDA 00,x [ 0 d,x lda,
0580 C90A00 CMP #000A {' SInEnd1} 10 ## cmp,
0583 9003 BCC 0588 {Digit3+000A} LRefR @b bcc,
0585 690600 ADC #0006 {' SInCnt0} 6 ## adc,
LDef @b
0588 693000 ADC #0030 {' SOutCnt0+000A} char 0 ## adc,
058B 9500 STA 00,x 0 d,x sta,
058D 60 RTS ] ;
ok
ok
$ff01 constant VIA2DDRA SeeLatest
0599 A901FF LDA #FF01 {' VIA2DDRA}
059C 4CBE96 JMP 96BE {PsuA}
ok
$ff02 constant VIA2PA SeeLatest
05A8 A902FF LDA #FF02 {' VIA2PA}
05AB 4CBE96 JMP 96BE {PsuA}
ok
: INIT_D/A $FF VIA2DDRA C! ; ( -- ) SeeLatest
05B9 A9FF00 LDA #00FF {' ScreenBank+000F}
05BC E220 SEP #20 {Loc+0004}
05BE 8D01FF STA FF01 {' VIA2DDRA}
05C1 C220 REP #20 {Loc+0004}
05C3 60 RTS
ok
: WR_D/A VIA2PA C! ; ( c -- ) SeeLatest eof
05CD B500 LDA 00,x
05CF E8 INX
05D0 E8 INX
05D1 E220 SEP #20 {Loc+0004}
05D3 8D02FF STA FF02 {' VIA2PA}
05D6 C220 REP #20 {Loc+0004}
05D8 60 RTS
ok