In STC, sometimes inlining a little code can simplify & speed up some runtime routines.
Return stack pushs & pops are easier without the rts address in the way.
A jump pointer can be turned into a JMP abs instruction.
This eliminates the code to do a jmp indexed indirect indexed.
Adds 1 byte at each call site, removes many bytes & cycles in runtime words.
Code:
\ For LOOP compile this:
' S.(LOOP) JSR
addr JMP
\ For +LOOP compile this:
' S.(+LOOP) JSR
addr JMP
\ LOOP runtime
SUBR S.(LOOP)
XSAVE STX TSX
$103 ,X INC 0= IF \ inc index.lo
$104 ,X LDA CLC 1 # ADC, $104 ,X STA VS IF\ inc index.hi
LABEL (+LOOP).BRANCH
INX INX \ rdrop rts addr
INX INX INX INX \ rdrop index & limit
TXS
PHP RTI \ jmp (end addr)
THEN
THEN
XSAVE LDX
RTS
END-CODE
\ +LOOP runtime
SUBR S.(+LOOP)
0 ,X LDA 1 ,X LDY INX INX \ pop increment
XSAVE STX TSX
CLC $103 ,X ADC $103 ,X STA
TYA $104 ,X ADC $104 ,X STA
(+LOOP).BRANCH BVS
XSAVE LDX
RTS
END-CODE
Moving the UNLOOP code inline at the LOOP & +LOOP call sites makes the runtime routines faster & smaller.
This adds 4 bytes at each call site, but removes some return stack manipulation in the runtime words.
The loop_addr JMP can be changed to a BVS to handle the conditional branch back.
This sample also resolves the end addr references at compile time.
Code:
\ For DO compile this:
' S.(DO) JSR
\ here is the loop address
\ for LEAVE compile this:
leave_addr JMP
\ For LOOP compile this:
' S.(LOOP) JSR
loop_addr BVC \ or *+5 BVS; loop_addr JMP if needed
\ here is the LEAVE address
PLA PLA PLA PLA \ rdrop index & limit
\ For +LOOP compile this:
' S.(+LOOP) JSR
loop_addr BVC \ or *+5 BVS; loop_addr JMP if needed
\ here is the LEAVE address
PLA PLA PLA PLA \ rdrop index & limit
\ DO runtime
SUBR S.(DO)
PLA N STA PLA N 1+ STA \ pop rts addr
3 ,X LDA $80 # EOR 3 ,X STA PHA \ fix limit.hi & push
2 ,X LDA PHA \ push limit.lo
SEC 0 ,X LDA 2 ,X SBC TAY \ fix index.lo
1 ,X LDA 3 ,X SBC PHA \ fix index.hi & push
TYA PHA \ push index.lo
INX INX INX INX \ 2drop
N 1+ LDA PHA N LDA PHA RTS \ return
END-CODE
\ LOOP runtime
SUBR S.(LOOP)
XSAVE STX TSX
CLV $103 ,X INC 0= IF \ inc index.lo
$104 ,X LDA CLC 1 # ADC, $104 ,X STA \ inc index.hi
THEN
XSAVE LDX
RTS
END-CODE
\ +LOOP runtime
SUBR S.(+LOOP)
0 ,X LDA 1 ,X LDY INX INX \ pop increment
XSAVE STX TSX
CLC $103 ,X ADC $103 ,X STA
TYA $104 ,X ADC $104 ,X STA
XSAVE LDX
RTS
END-CODE
Here is a 65816 native example. It's almost all inlined, so there aren't rts addresses to work around.
6502 NMOS probably won't optimize this much, but there are some ideas to use.
Code:
: xx compiled
12 3 do compiled
i 4 and if leave then compiled
i . compiled
2 +loop compiled
; ok
seelatest
04C3 A00C00 LDY #$000C {' SIn_Buf1} 12
04C6 A90300 LDA #$0003 {' SInEnd0+0001} 3
04C9 5A PHY Do
04CA 48 PHA
04CB A301 LDA $01,s i
04CD 290400 AND #$0004 {' SIn_Buf0} 4 and
04D0 A8 TAY if
04D1 D003 BNE $04D6 {xx+0013}
04D3 4CDB04 JMP $04DB {xx+0018}
04D6 68 PLA leave
04D7 7A PLY
04D8 4CEA04 JMP $04EA {xx+0027}
then
04DB A301 LDA $01,s i
04DD 2009B5 JSR $B509 {.+0004} .
04E0 A90200 LDA #$0002 {' SInEnd0} 2
04E3 2066BE JSR $BE66 {+Loop+001D} +loop
04E6 30E3 BMI $04CB {xx+0008}
04E8 68 PLA
04E9 7A PLY
04EA 60 RTS ;
ok
xx 3 ok