6502.org

Posted: **Sat Mar 28, 2026 6:40 pm**

I asked Claude to rewrite a routine using stack locals with a coding sample in context and the following prompt.

I have not unit tested the ACCEPT_CODE routine, so could you please rewrite it using the following best coding practices:

Use PHD/TCD to point the direct page register at the stack frame so locals can use direct page addressing modes including INC, DEC, and indirect (loc).

Do not use SCRATCH0, SCRATCH1, TMPA, TMPB, or W — all temporaries must live in the stack frame.

Since DP is relocated, all parameter stack accesses via X must use absolute forced addressing, e.g. LDA a:0,X.

Reserve stack space with TSC / SEC / SBC #SIZE / TCS before TCD, and save/restore PHD/PLD around the frame.

Use a single @return label with BRA from success and fall-through from failure, with stack teardown, PLY, and PLD done once.

Claude responded with the following code which looks pretty good. It doesn't use the INC BDD suggested to allow zero based offsets, but one step at a time.

Code: Select all

;------------------------------------------------------------------------------
; ACCEPT ( addr len -- actual ) read a line from UART into buffer
;------------------------------------------------------------------------------
        HEADER  "ACCEPT", ACCEPT_CFA, 0, TIB_CFA
        CODEPTR ACCEPT_CODE
        PUBLIC  ACCEPT_CODE
        .a16
        .i16
                LDA     0,X             ; max len
                STA     TMPA
                INX
                INX
                LDA     0,X             ; addr
                INX
                INX
                STA     SCRATCH0        ; Buffer pointer
                STZ     SCRATCH1        ; Char count = 0

@getchar:
                JSR     hal_getch       ; Blocking receive, char in A
                STA     TMPB            ; Save char for later use

                ; Handle CR → end of line
                CMP     #$0D
                BEQ     @done

                ; Handle backspace (BS or DEL)
                CMP     #$08
                BEQ     @backspace
                CMP     #$7F
                BEQ     @backspace

                ; Check buffer full - ignore char if so
                LDA     SCRATCH1
                CMP     TMPA
                BCS     @getchar

                ; Store char in buffer
                PHY                     ; Save IP
                LDY     SCRATCH1        ; Index = current count
                SEP     #MEM16
                .A8
                LDA     TMPB            ; Restore char
                STA     (SCRATCH0),Y    ; Store in buffer
                REP     #MEM16
                .A16
                PLY                     ; Restore IP

                ; Echo char back
                LDA     TMPB
                JSR     hal_putch

                ; Increment count
                INC     SCRATCH1
                BRA     @getchar

@backspace:
                LDA     SCRATCH1
                BEQ     @getchar        ; Nothing to delete
                DEC     SCRATCH1
                ; Echo backspace-space-backspace to erase character on terminal
                LDA     #$08
                JSR     hal_putch
                LDA     #$20
                JSR     hal_putch
                LDA     #$08
                JSR     hal_putch
                BRA     @getchar

@done:
                ; Echo CR+LF
                LDA     #$0D
                JSR     hal_putch
                LDA     #$0A
                JSR     hal_putch
                ; Push actual char count
                LDA     SCRATCH1
                DEX
                DEX
                STA     0,X
                NEXT
        ENDPUBLIC

Posted: **Sat Mar 28, 2026 7:31 pm**

BigDumbDinosaur wrote:

Don’t forget you can set/clear any of the flags with SEP/REP, which is non-destructive and completes in three Ø2 cycles. DEC on a direct-page location requires between five and seven cycles, depending on accumulator width and DP alignment.

In this case the negative sign Boolean is stored during the initial parse and then used at the end of the routine to apply the two's complement. So, setting a processor flag would be undone before it could be used. The DEC is only used once during the routine, so the penalty is small.

I've reworked ACCEPT_CODE to use INC and zero-based offsets. When I finish testing it, I will work with Claude to amend my prompt to include that in the future.

After that I will circle back to NUMBER_CODE and use zero-based offsets there. With my unit tests I can avoid regression.

Update: I've concluded I'm confused about the INC before TCD. Since a subtraction pulls the stack pointer down, doesn't the INC push it up and increase the chance of overwriting the return address?

Posted: **Sun Mar 29, 2026 1:20 am**

I hate to say this, but I think I am getting a handle on how to prompt Claude into writing useful code. Here's our most recent dialog after completing the unit testing of ACCEPT_CODE:

Me: I completed unit testing of the ACCEPT_CODE routine, and it required minor bugfixes. I also improved it by removing unneeded code. I will paste it after this prompt. Please review it, tell me why I made these changes, and suggest any prompt improvements.

Claude: Please go ahead and paste it.

Me: Pastes code.

Claude:

Code: Select all

Good set of changes. Here's what you fixed and why:
* PHY/PLY removed around the byte store. My version saved and restored Y around the (LOC_BUF),Y store because Y holds the IP. But since TCD has pointed DP at the stack frame, LOC_COUNT is a direct page access — so LDY LOC_COUNT loads the count directly without needing Y to be the IP at that moment. The IP is already saved in the frame, so borrowing Y here is safe.
* LOC_CHAR not reloaded before echo. My version did LDA LOC_CHAR before hal_putch. Since hal_putch takes its argument in A, and A still holds the character after the SEP/REP block (the 8-bit store doesn't clobber A on return to 16-bit mode), the reload was redundant. You just call hal_putch directly.
* Backspace uses TAY / BEQ instead of LDA LOC_COUNT / BEQ. On entry to @backspace, A still holds the received character ($08 or $7F) — not the count. My version loaded LOC_COUNT into A to test for zero. Your version correctly uses TAY to transfer LOC_COUNT into Y (which also sets the zero flag), then BEQ @getchar. This is both correct and saves a load.
* Named constants replace magic numbers. $0D, $08, $7F, $20, $0A are replaced with C_RETURN, BKSP, DEL, SPACE, L_FEED. This improves readability and maintainability.
@return label moved.
* The result load and stack push are now inside @return rather than in @done, which correctly implements the single-exit pattern. In my version I had LDA LOC_COUNT before @return which meant the label wasn't truly a single return point.

Prompt improvements:
* Specify that A is not clobbered by SEP/REP transitions, so the model doesn't insert redundant reloads after mode switches.
* Specify that Y is available as a scratch register within a routine where IP is already saved in the stack frame, so the model doesn't add unnecessary PHY/PLY pairs around short Y usage.
* Specify that flags set by the last operation should be exploited where possible — e.g. "use TAY / BEQ rather than a separate load to test a value that is already available or can be transferred cheaply."
* Require named constants for all control characters and special values — "do not use numeric literals for character codes; use named constants such as C_RETURN, BKSP, DEL, SPACE, L_FEED."
* Clarify the single @return pattern more precisely — "the @return label must encompass the result push and all stack/register teardown; success and failure paths must BRA @return or fall through into it, not perform any work after the label."

Me: Regarding point 5. I also moved the "STA a:0,X" above the stack frame teardown because the TSC clobbers the value in A. So, it had to be stored first.
Claude:

Code: Select all

Good catch — that's an important ordering constraint. TSC transfers the stack pointer into A, clobbering whatever result was there. So the result must be written to the parameter stack with STA a:0,X before the teardown sequence (TSC / CLC / ADC #LOC_SIZE / TCS), not after.
That refines prompt point 5 to: the result must be pushed to the parameter stack before stack frame teardown, because TSC clobbers A. The order must always be:

1. DEX / DEX — make room on parameter stack
2. STA a:0,X — store result
3. TSC / CLC / ADC / TCS — tear down frame
4. PLY — restore IP
5. PLD — restore DP
6. NEXT

What have I learned?
First, describe the problem to Claude and build a framework. Don't expect it to assemble, link, or work. It's just an outline.
Second, start unit testing the foundational pieces. Ask Claude to review your changes and suggest prompt improvements.
Third, use the updated prompt and ask Claude to write something small. Test, and Claude to review and suggest prompt improvements.
Fourth, at some point the session length will get too long and Claude starts getting confused. Update your prompt summary and ask Claude to review it. Incorporate anything you missed.
Fifth, close that session and start a new one.
Sixth, don't ask Claude to do too much at one time. It tends to lose focus and edit unrelated code. Specific is terrific. But always keep iterating on prompt improvements.

Posted: **Sun Mar 29, 2026 3:51 am**

Martin_H wrote:

Update: I've concluded I'm confused about the INC before TCD. Since a subtraction pulls the stack pointer down, doesn't the INC push it up and increase the chance of overwriting the return address?

Not if you do things in the correct order. If, for example, you are allocating S_SF bytes for local direct page use, you would be writing...

Code: Select all

s_sf     =8                    ;temp workspace size
;
         rep #%00100000        ;16-bit accumulator
         sec
         tsc                   ;SP —> .C
         sbc !#s_sf            ;reserve workspace
         tcs                   ;move SP below workspace
         inc                   ;point to start of workspace
         tcd                   ;DP starts at SP + 1

Assuming the above code is within a subroutine, SP would point to the highest unused location on the stack, SP+1 would point to the start of workspace, SP+8 would point to the end of workspace, SP+9 would point to the LSB of the return address and SP+10 would point to the MSB of the return address. As long as you confine your direct page accesses to the range $00 - $07, the return address will be safe. STA 8 would step on the return address LSB.

As I earlier noted, the object of the above sequence is to make LDA 0 actually load from address $00 of the relocated direct page, as one would expect. Without the INC before the TCD, LDA 0 will access the highest unused location on the stack, not the lowest location in your ephemeral direct page. I can guarantee you that situation will eventually lead to an obdurate bug...been there, done that!

When the function has finished, you would relinquish the workspace with...

Code: Select all

         rep #%00100000        ;16-bit accumulator
         clc
         tsc                   ;SP —> .C
         adc !#s_sf            ;workspace size
         tcs                   ;workspace is gone

That’s all there is to it!

I typically make quantities such as S_SF local symbols, since they usually should be known only to the function in which they are used. Also, if all I need to reserve on the stack is two bytes, I do...

Code: Select all

         rep #%00100000        ;16-bit accumulator
         pea #0                ;reserved 2 bytes on stack
         tsc                   ;SP —> .C
         inc
         tcd

A key consideration with this sort of stack rigamarole is SP always points to the highest unused stack location. That is, the active part of the stack is from SP+1 upwards. Hence you should always be use SP+1 as your base when borrowing stack space.

Posted: **Sun Mar 29, 2026 12:14 pm**

@BDD, I think I get it now. It's about aligning the expectation that the zeroth address of the local DP is within the local DP. However, if you only use named offsets you wouldn't notice this outside of the stack frame setup block. To change it I would need to adjust the offsets and introduce the INC instruction.

@All, I used my improved prompt and asked Claude to write FIND_CODE. I haven't unit tested it yet, but here are the first observations.

Good news.
* Claude generated a FIND routine that assembled on the first try.
* It looks logically and structurally correct.

Medium News
* If you put Claude generated routines side by side, the coding style looks slightly different.
* The forth-standard.org website has a no robots policy so I can't ask Claude to read links to it.
* Claude loves messy pointer arithmetic to access header fields.

Bad News
* Claude tried to load the page zero User Pointer after setting up the local direct page. I don't think this will work.

Code: Select all

               ;--------------------------------------------------------------
                ; Load LATEST to start dictionary walk
                ;--------------------------------------------------------------
                LDY     #U_LATEST
                LDA     (UP),Y          ; LATEST -> first entry to check
                STA     LOC_ENTRY

This raises a deeper question. Stack locals can eliminate page zero scratch addresses. But the UP is page zero pointer to the interpreter's configuration data block. If I want to eliminate it I would need to use absolute addressing to the page 4 data block. The UP was clearly borrowed from FIG-Forth and I'm wondering if there's something I'm missing about it.

Posted: **Sun Mar 29, 2026 5:16 pm**

Martin_H wrote:

@BDD, I think I get it now. It's about aligning the expectation that the zeroth address of the local DP is within the local DP. However, if you only use named offsets you wouldn't notice this outside of the stack frame setup block. To change it I would need to adjust the offsets and introduce the INC instruction.

The 65C816, of course, doesn’t know about any of these stack shenanigans and never gets confused. If any confusion is going to occur, it will be on the part of the programmer (you

). That’s why I recommend you use the sequence I illustrated. If you define your local direct page so its lowest address is always $00, and symbolically define everything instead of using “magic numbers,” you won’t accidentally address space outside of the local direct-page range and thus won’t be trying to debug errors caused by improper stack accesses. You can take my word for it that such debugging can become very challenging.

Incidentally, while it may not be immediately obvious, setting DP as illustrated gives you an absolute reference address in bank $00 at which your active stack begins. Knowing this, your function could temporarily set DP to somewhere else, yet you could still access your local stack workspace by copying DP to .Y (with the index registers set to 16 bits), and using $00,Y as the operand to any instruction that allows it. You could also use $0000,X if your assembler allows you to force absolute-indexed addressing with .X. These methods would be alternatives to <offset>,S addressing.

In any of my functions that use a local direct page, I have a sort of boilerplate series of equates that define the three possible stack frames that may be used within a function, which are parameter, register and work. The parameter frame’s structure is dictated by what the function requires as input. The register frame reflects how much machine state is being preserved by the function. The work frame contains whatever local variables, pointers, etc., that will be needed.

Below, in Kowalski assembler syntax, is an example in which three 32-bit pointers are passed as parameters into the function and machine state is fully preserved...

Code: Select all

;—————————————————————————————————————————————————————————
;
;LOCAL DEFINTIONS
;
.s_byte  =1                    ;size of byte
.s_word  =2                    ;size of word
.s_dword =4                    ;size of long word
;
;
;	65C816 register sizes...
;
.s_mpudb =.s_byte              ;data bank
.s_mpudp =.s_word              ;direct page
.s_mpupb =.s_byte              ;program bank
.s_mpupc =.s_word              ;program counter
.s_mpusp =.s_word              ;stack pointer
.s_mpusr =.s_byte              ;status
;
;
;	65C816 register width masks...
;
.seta    =%00100000            ;accumulator
.setx    =%00010000            ;index
.setr    =.seta | .setx        ;all
;
;
;	status register bits...
;
.sr_car  =%00000001            ;C — carry
.sr_bdm  =%00001000            ;D — decimal
.sr_irq  =%00000100            ;I — IRQ
.sr_neg  =%10000000            ;N — result negative
.sr_ovl  =%01000000            ;V — sign overflow
.sr_zer  =%00000010            ;Z — result zero
;
;
;	stack definitions...
;
.sfbase  .set 0                ;base stack index
.sfidx   .set .sfbase          ;workspace index
;
;—————————> workspace stack frame start <—————————
;
.dhptr   =.sfidx               ;shift destination pointer
.sfidx   .= .sfidx+.s_dword
.srptr   =.sfidx               ;shift source pointer
.sfidx   .= .sfidx+.s_dword
.ssiz    =.sfidx               ;S's old size (SS)
.sfidx   .= .sfidx+.s_word
.wsiz    =.sfidx               ;S's new size (WS)
.sfidx   .= .sfidx+.s_word
;
;—————————> workspace stack frame end <—————————
;
.s_wsf   =.sfidx-.sfbase       ;workspace size
.sfbase  .set .sfidx
;
;—————————> register stack frame start <—————————
;
.reg_dp  =.sfidx               ;DP
.sfidx   .= .sfidx+.s_mpudp
.reg_db  =.sfidx               ;DB
.sfidx   .= .sfidx+.s_mpudb
.reg_c   =.sfidx               ;.C
.sfidx   .= .sfidx+.s_word
.reg_x   =.sfidx               ;.X
.sfidx   .= .sfidx+.s_word
.reg_y   =.sfidx               ;.Y
.sfidx   .= .sfidx+.s_word
.reg_sr  =.sfidx               ;SR
.sfidx   .= .sfidx+.s_mpusr
.reg_pc  =.sfidx               ;PC
.sfidx   .= .sfidx+.s_mpupc
;
;—————————> register stack frame end <—————————
;
.s_rsf   =.sfidx-.sfbase       ;register frame size
.sfbase  .set .sfidx
;
;—————————> parameter stack frame start <—————————
;
.sptr    =.sfidx               ;*S
.sfidx   .= .sfidx+.s_dword
.iptr    =.sfidx               ;*I
.sfidx   .= .sfidx+.s_dword
.nptr    =.sfidx               ;*N
.sfidx   .= .sfidx+.s_dword
;
;—————————> parameter stack frame end <—————————
;
.s_psf   =.sfidx-.sfbase       ;parameter frame size
;
;—————————————————————————————————————————————————————————

In the above, the .SET and .= pseudo-ops define local assembly-time variables—the two pseudo-ops are interchangeable. Use of assembly-time variables makes much of the frame structure computation automatic. The .S_WSF, .S_RSF and .S_PSF symbols define the sizes of the three frames; these symbols are used as operands in code that reserves and later releases stack space.

The stack frame setup in the function is as follows...

Code: Select all

         php                   ;preserve state
         rep #.setr|.sr_bdm    ;16-bit registers, binary arithmetic
         phy
         phx
         pha
         phb
         phd
         sec
         tsc
         sbc !#.s_wsf          ;allocate workspace
         tcs
         inc                   ;point DP to...
         tcd                   ;reserved stack space

The above reflects the layout previously defined for the register frame. Notice how I push .Y before .X. Doing so makes it possible to use the symbol .REG_X as the operand for indirect-long addressing, e.g., LDA [.REG_X], should the need arise; .REG_X and .REG_Y become a direct-page pointer once DP has been pointed to the stack. This particular function doesn’t take advantage of that, but for consistency’s sake, I religiously follow this pattern.

At the function’s exit, a sequence of operations gets rid of the work and parameter frames, and realigns the stack so when the registers are pulled (thus getting rid of the register frame), all that will be left on the stack will be the return address. The caller will not have to worry about cleaning up after itself...

Code: Select all

.done    rep #.setr|.sr_car    ;16-bit registers & clear carry
         tsc                   ;here we clean up the stack...
         adc !#.s_wsf          ;get rid of work frame
         tcs
         adc !#.s_rsf          ;following code...
         tax                   ;gets rid of...
         adc !#.s_psf          ;the parameter frame &...
         tay                   ;realigns...
         lda !#.s_rsf-1        ;the stack
         mvp #0,#0
         tyx                   ;point to...
         txs                   ;register frame -1
         pld                   ;restore entry state & return
         plb
         pla
         plx
         ply
         plp
         rts

Again, this is boilerplate code; every function that sets up the three stack frames uses the same code.

Although it may not have been obvious why I had preserved DB earlier in the function, it becomes so when you consider that the MVN instruction finishes with DB pointing, in this case, to bank $00. Since this function is callable from any bank, and the caller may have set DB to the caller’s execution bank, no assumption can be made about the state of the register.

Within the function, any register can be rewritten, simply by using its register frame symbol as a direct-page operand. For example, if I want to set carry on exit to indicate a processing exception, I could do the following...

Code: Select all

         sep #.seta            ;8-bit accumulator & memory
         lda #sr_car           ;carry mask
         tsb .reg_sr           ;set carry in SR return

With everything being addressable as direct page in the function, register manipulation is painless.

The function is called as follows...

Code: Select all

;	Invocation example: PEA #N_PTR >> 16    ;*N MSW
;	                    PEA #N_PTR & $FFFF  ;*N LSW
;	                    PEA #I_PTR >> 16    ;*I MSW
;	                    PEA #I_PTR & $FFFF  ;*I LSW
;	                    PEA #S_PTR >> 16    ;*S MSW
;	                    PEA #S_PTR & $FFFF  ;*S LSW
;	                    JSR STRDEL          ;delete substring
;	                    BCS ERROR           ;error
;
;	Exit registers: .A: entry value
;	                .B: entry value
;	                .X: entry value
;	                .Y: entry value
;	                DB: entry value
;	                DP: entry value
;	                PB: entry value
;	                SR: NVmxDIZC
;	                    ||||||||
;	                    |||||||+———> 0: okay
;	                    |||||||      1: error
;	                    +++++++————> entry value

References such as *S and *I are like K&R C notation: *S is a pointer to an object S. Incidentally, pointers are 32 bits, even though the 816 is limited to 24-bit addressing. Doing 24-bit pointer arithmetic is awkward, as it is necessary to constantly fiddle with the accumulator’s width. A byte is wasted per pointer, but not having to constantly use REP and SEP while manipulating pointers more than makes up for the waste by using fewer clock cycles and offering fewer opportunities for bugs to erupt.

Posted: **Sun Mar 29, 2026 5:24 pm**

I have to say, I don't quite see why it's so terrible not to have offset 0 available. Almost all 65xx programs work in a world where zero page (direct page) resources are limited, and most routines will have a limited range of addresses they can use. That 0 is such an unavailable address seems to me not so unusual.

Posted: **Sun Mar 29, 2026 5:48 pm**

BigEd wrote:

I have to say, I don't quite see why it's so terrible not to have offset 0 available. Almost all 65xx programs work in a world where zero page (direct page) resources are limited, and most routines will have a limited range of addresses they can use. That 0 is such an unavailable address seems to me not so unusual.

My recommendation for zero-aligning a relocated direct page comes from the fact that the physical zero page is, of course, zero-aligned, and several generations of 6502 assembly language programmers are accustomed to thinking in those terms. For me, it’s mostly a matter of maintaining consistency; regardless of where the 816 thinks direct page is located, zero is zero. Also, there is the desire to avoid a potential off-by-one error, especially if <dp>,X addressing is involved.

Functionally, there’s no difference whether the local direct page starts at $00 or $01—as I noted, the 816 won’t get confused.

Achieving zero alignment only requires one extra instruction, which executes in two cycles.

Posted: **Sun Mar 29, 2026 5:51 pm**

@BDD, I think you gave me the idea on how to access Forth's user pointer (UP), even after the direct page is relocated. Basically, use an index register with absolute addressing to load the UP and the place it in the stack frame. Something like:

Code: Select all

     LDY    #UP
     LDA    a:0,Y
     STA    LOC_UP
     LDY    #U_LATEST
     LDA    (LOC_UP),Y

It's only done once per routine, so it's not a major cost. However, abandoning the UP entirely in favor of absolute addressing also works:

Code: Select all

     LDY    #U_LATEST
     LDA    UP_BASE,Y

Basically, the UP is a pointer to the interpreter's global variables. If I were coding in C I would just put "HEADER * U_LATEST" in the global scope and let the compiler and linker deal with the details. Since I've never written a Forth before, I may be conservative and preserve FIG-Forth's UP approach under the assumption the authors knew something I do not.

BigEd wrote:

I have to say, I don't quite see why it's so terrible not to have offset 0 available. Almost all 65xx programs work in a world where zero page (direct page) resources are limited, and most routines will have a limited range of addresses they can use. That 0 is such an unavailable address seems to me not so unusual.

I understand BDD's concern about different addressing techniques to the same chunk of RAM yielding the same results. But for now, I plan to use a single addressing technique and never use magic numbers. That means setting the DP upon entry and only use it to access stack locals. Since Forth uses a separate parameter stack, I will never have to retrieve inputs from a previous stack frame.

Posted: **Sun Mar 29, 2026 6:18 pm**

Martin_H wrote:

@BDD, I think you gave me the idea on how to access Forth's user pointer (UP), even after the direct bank is relocated. Basically, use an index register with absolute addressing to load the UP and the place it in the stack frame. Something like:

Code: Select all

     LDY    #UP
     LDA    a:0,Y             <——
     STA    LOC_UP
     LDY    #U_LATEST
     LDA    (LOC_UP),Y

You should be able to write the highlighted instruction as LDA 0,Y, since there is no <dp>,Y addressing mode.

Quote:

It's only done once per routine, so it's not a major cost. However, abandoning the UP entirely in favor of absolute addressing also works...

Something to watch out for with load/store instructions using absolute or indirect addressing is DB. The LDA 0,Y and LDA (LOC_UP),Y instructions will load from whatever bank is in DB. Unfortunately, monkeying with DB as a program is running is a nuisance, as the register is only accessible via the stack. I set DB to the execution bank during program initialization (PHK - PLB) so I can use “near” addressing for data that is part of the program, e.g., lookup tables, text strings, etc.. Indirect-long (“far”) addressing is used as needed to reach outside of the execution bank.

Posted: **Sun Mar 29, 2026 6:35 pm**

Claude and I had a chat about UP.

Me: I haven't finished unit testing FIND yet, but I believe I found a bug here:
;--------------------------------------------------------------
; Load LATEST to start dictionary walk
;--------------------------------------------------------------
LDY #U_LATEST
LDA (UP),Y ; LATEST -> first entry to check
STA LOC_ENTRY
This is executed after the DP has been set which I believe makes the UP in page zero unavailable with that addresing mode. I removed it and replaced it with the following code just after the TCD instruction:
;--------------------------------------------------------------
; Load LATEST to start dictionary walk
;--------------------------------------------------------------
LDY #UP
LDA a:0,Y
STA LOC_ADDR ; Use LOC_ADDR before it is initialized
LDY #U_LATEST
LDA (LOC_ADDR),Y ; LATEST -> first entry to check
STA LOC_ENTRY

Claude:

Code: Select all

Good catch — that's exactly the bug I flagged as a risk. Once TCD moves DP to the stack frame, any instruction that relies on zero page addressing to reach UP will instead be interpreted relative to the new DP, which points into the stack frame rather than zero page.
Your fix is correct. By loading UP via LDA a:0,Y with absolute forced addressing (bypassing DP entirely), you get the true zero page value of UP into LOC_ADDR, then use it as an indirect pointer with LDA (LOC_ADDR),Y to fetch LATEST. Using LOC_ADDR as a scratch pointer before it is initialized with the search string address is a neat way to avoid needing an extra local.
This suggests an additional coding practice rule for the prompt:
All zero page variables (UP, W, etc.) must be accessed before TCD, or via absolute forced addressing after TCD. Any indirect access through a zero page pointer after TCD must first load the pointer value into a stack frame local using absolute forced addressing, then use that local as the indirect base.
A corollary is that frame setup should fetch any zero page values needed throughout the routine — like UP for user area access — into locals immediately after reserving the frame but before TCD, following the same pattern NUMBER uses for BASE.

I like that Claude suggested prompt improvements unprompted. It also understood why I moved the code before LOC_ADDR was initialized. It also displays a session memory for the NUMBER routine where I sidestepped this problem previously.

Posted: **Sun Mar 29, 2026 7:24 pm**

Martin_H wrote:

Claude and I had a chat about UP...I like that Claude suggested prompt improvements unprompted. It also understood why I moved the code before LOC_ADDR was initialized. It also displays a session memory for the NUMBER routine where I sidestepped this problem previously.

Does Claude know about DB and its role in forming the effective address?

Posted: **Sun Mar 29, 2026 8:08 pm**

BigDumbDinosaur wrote:

Does Claude know about DB and its role in forming the effective address?

I haven't asks Claude about banks yet. Right now, this is targeted at a board with 64KB of memory. In future projects I may see how it handles the bank register.

Posted: **Sun Mar 29, 2026 9:26 pm**

I think FIG-FORTHs UP is the beginning of supporting multiple users or tasks.
The User Area stores everything unique to a task.
The system could have multiple user areas. UP points to the current one.
The FORTH word PAUSE switches to the next task in a ring (cooperative multitasking).
The user area probably includes a parameter stack & return stack so each task has its own.

If you have no intention of doing multitasking, it seems simpler to just have 1 static global user area & skip the UP dereferencing.

Code: Select all

LDA a:U_LATEST ; LATEST -> first entry to check
STA z:LOC_ENTRY

A different option is to use the CPUs DP as UP.
All of the user area variables are in the current direct page - no additional UP dereference needed.
All of the direct page based addressing modes are available.
Useful to include some SCRATCH* variables for work space. Handle them like additional registers.
They are defined as not preserved across subroutine calls so save/restore is not needed.
W is just another SCRATCH* register, handle the same.
We could have some more (LOCAL*) defined as preserved across subroutine calls.
DP only changes during a task switch. Each word doesn't need to modify it.
Some stack frame addressing is still available: ph* pl* pe* d,s (d,s),Y jsr rts .
Routines using the SCRATCH* variables are reusable but not reentrant. This is fine unless
an interrupt occurs in the middle of a word routine & executes other words on the same task.

Code: Select all

@ENTRY = SCRATCH0  ; assign our word-local variable to SCRATCH0 "register" in direct page
LDA z:U_LATEST ; LATEST -> first entry to check
STA z:@ENTRY

The '265 Mensch monitor uses memory at $00000 thru $0000c0 .
Moving our direct page with DP allows us to have one of our own, no sharing needed.

Posted: **Sun Mar 29, 2026 9:50 pm**

@leepivonka, your hypothesis about UP sounds plausible. I will keep it in case I add multitasking.

I'm not planning on eliminating all page zero variables. But they're buggy when one word uses them and calls another word (e.g. math routines). Conversely Claude referred to NUMBER_CODE routine as "register starved" with existing page zero allocations, but using stack locals made it trivial. Also, Claude's first pass as SPACES_CODE used the TMPA scratch variable as a loop counter while using the Y register is superior:

Code: Select all

        PUBLIC  SPACES_CODE
        .a16
        .i16
                PHY
                LDA     0,X             ; n
                INX
                INX
                TAY
                BEQ     @done           ; Zero = no-op
@loop:
                LDA     #SPACE
                JSR     hal_putch
                DEY
                BNE     @loop
@done:          PLY
                NEXT
        ENDPUBLIC

6502.org

Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter