6502.org

Posted: **Mon Jan 26, 2026 1:51 pm**

I wanted to streamline some code like u A ! and A @ for heavily used variables, so I started hand-coding assembly for some inline "register" words that interacted directly with fixed 16-bit zero page cells.

In Taliforth (and I guess many other 6502 forths) we use X as the data stack pointer so have lots of simple assembly words implemented with zp,X indexing to manipulate the stack. Obviously the new code I was writing looked very similar but with direct zp addressing (no index).

However I was surprised to see that I could mechanically transform assembly for existing stack words to specialize them into words that targeted a fixed zp "register". Each opcode with an indexed-zero page mode like ADC zp, x (opcode $75) has a matching direct zero-page version with the opcode reduced by $10, e.g. ADC zp (opcode $65). This makes it easy to write a MAKE-REGISTER-WORD ( register-zpadr stack-offset xt ) that converts simple stack word object code into a new word targeting a fixed zp register in place of a given stack offset. This lets me dynamically create efficient register words on the fly.

See below for a few examples. Taliforth's existing stack word implementation is on the left, and the transformed code is on the right. Each transformed word is illustrated with a register at zp $42/43 replacing a targeted stack offset. Sometimes the results are obvious: a word like INVERT ( x - x ) produces a word like INVERT-REGISTER42 which does in inline invert of zp $42/43. From "+" ( x y -- x+y ) targeting NOS we get a word that does an inline addition of TOS into the register. But sometimes the results are surprising/interesting: DUP targeting TOS becomes REGISTER42@ and NIP targeting NOS becomes REGISTER42!

Code: Select all

        w_invert:                       zp_invert:
        ; ( x -- ~x )                   ; r42: x -> ~x
        ; change TOS refs (mode zp,x) to register refs (mode: zp)
a9 ff   lda #$FF                a9 ff   lda #$FF
55 00   eor 0,x                 45 42   eor $42
95 00   sta 0,x                 85 42   sta $42
a9 ff   lda #$FF                a9 ff   lda #$FF
55 01   eor 1,x                 45 43   eor $43
95 01   sta 1,x                 85 43   sta $43
60      rts                     60      rts

        w_plus:                         zp_plus:
        ; ( x y -- x+y )                ; ( y -- ) r42: x -> x+y
        ; change NOS refs (mode: zp,x) to register refs (mode: zp)
18      clc                     18      clc
b5 00   lda 0,x                 b5 00   lda 0,x
75 02   adc 2,x                 65 42   adc $42
95 02   sta 2,x                 85 42   sta $42
b5 01   lda 1,x                 b5 01   lda 1,x
75 03   adc 3,x                 65 43   adc $43
95 03   sta 3,x                 85 43   sta $43
e8      inx                     e8      inx
e8      inx                     e8      inx
60      rts                     60      rts

        w_nip:                          zp_store:
        ; ( x y -- y )                  ; ( y -- )  r42: ? -> y
        ; change NOS refs (mode: zp,x) to register refs (mode: zp)
b5 00   lda 0,x                 b5 00   lda 0,x
95 02   sta 2,x                 85 42   sta $42
b5 01   lda 1,x                 b5 01   lda 1,x
95 03   sta 3,x                 85 43   sta $43
e8      inx                     e8      inx
e8      inx                     e8      inx
60      rts                     60      rts

        w_dup:                          zp_fetch:
        ; ( x -- x x )                  ; r42: r  ( -- r )
        ; change NOS refs (mode: zp,x) to register refs (mode: zp)
ca      dex                     ca      dex
ca      dex                     ca      dex
b5 02   lda 2,x                 a5 42   lda $42
95 00   sta 0,x                 95 00   sta 0,x
b5 03   lda 3,x                 a5 43   lda $43
95 01   sta 1,x                 95 01   sta 1,x
60      rts                     60      rts

Posted: **Wed Jan 28, 2026 9:16 pm**

Have you done any timings to see if it helped Forth performance? I ask because I'm fairly sure a major portion of Forth overhead is the threading model and not the words themselves. For example, while the plus or store word are both pretty short, they're not in-lined into the words they're embedded within. This results in a bunch of function call overhead which slows the Forth down.

This was an interest of mine three years ago in my Turing Tarpit programming challenge thread viewtopic.php?f=2&t=7262&hilit=brainfast. I did performance tests of simple tokenized interpreter vs byte code vs subroutine threaded code vs compiled code. There were gradual improvements with each design until the compiler's massive leap forward. The conclusion I reached was that threading overhead for any threading model is the major cost.

Posted: **Fri Jan 30, 2026 8:46 pm**

Martin_H wrote:

For example, while the plus or store word are both pretty short, they're not in-lined into the words they're embedded within. This results in a bunch of function call overhead which slows the Forth down.

Fortunately, Tali Forth 2 supports inlining words (it's called "native compiling" in the manual). By default, words less than 20 bytes will be compiled by copying the opcodes directly into the target word. Tali is an STC Forth so there is nothing special to do to switch between Forth words and code. You can also mark words as ALWAYS-NATIVE or NEVER-NATIVE (these words can be invoked right after the word definition is complete, similar to IMMEDIATE) to force the compiler to always or never do native compiling when that word is compiled.

The 20-byte limit is adjustable by setting a special NC-LIMIT variable; you can even set it to zero to get straight STC compiling where every word is compiled as a JSR.

Here's an example using invert and + (the JSRs to the stack-depth check can be turned off with a separate special variable):

Code: Select all

: myword invert + ;  ok
see myword 
nt: 800  xt: 80B  header: 01 06 0E BF 21 
flags: HC 0 NN 0 AN 0 IM 0 CO 0 DC 0 LC 0 FP 1 | UF 1 ST 0 
size (decimal): 33 

080B  20 B9 80 A9 FF 55 00 95  00 A9 FF 55 01 95 01 20   ....U.. ...U... 
081B  BE 80 18 B5 00 75 02 95  02 B5 01 75 03 95 03 E8  .....u.. ...u....
082B  E8                                                .

80B   80B9 jsr     1 STACK DEPTH CHECK
80E     FF lda.#
810      0 eor.zx
812      0 sta.zx
814     FF lda.#
816      1 eor.zx
818      1 sta.zx
81A   80BE jsr     2 STACK DEPTH CHECK
81D        clc
81E      0 lda.zx
820      2 adc.zx
822      2 sta.zx
824      1 lda.zx
826      3 adc.zx
828      3 sta.zx
82A        inx
82B        inx
 ok
0 nc-limit !  ok
: myword2 invert + ;  ok
see myword2 
nt: 82D  xt: 838  header: 00 07 00 06 
flags: HC 0 NN 0 AN 0 IM 0 CO 0 DC 0 LC 0 FP 0 | UF 0 ST 0 
size (decimal): 6 

0838  20 8D 88 20 35 8D                                  .. 5.

838   888D jsr     invert
83B   8D35 jsr     +
 ok

Both MYWORD and MYWORD2 have neither the Always-Native (AN) flag nor the Never-Native (NN) flag set in their header flags, so they, in turn, will be compiled based on the setting in NC-LIMIT.

Posted: **Sun Feb 01, 2026 11:08 pm**

yes, exactly - it can make a big performance difference in Taliforth. Have some words that 'inline' access to a few key zeropage registers would avoid a bunch of (even inline) stack manipulation in tight loops.

Conversely, a bunch of non-performance-critical Tali words are effectively inlining access to "registers" like tmp1, tmp2 all over the place, i wonder if would make sense to expose some of that as reusable words.

6502.org

Transmogrify stack words into zp register words

Transmogrify stack words into zp register words

Re: Transmogrify stack words into zp register words

Re: Transmogrify stack words into zp register words

Re: Transmogrify stack words into zp register words