Zero page as registers

jds · Post by **jds** » Fri Apr 11, 2025 2:43 am

There has been quite a bit of discussion of the lack of registers on the 6502, and the idea of thinking of zero page as a very large register file. To add weight to this Bill Mench has called this 'Addressable Register Architecture'. And it is a relatively common technique to define some zero page locations as registers, as for example in the Geos source code. The general lack of registers is often cited as one of the main reasons that it is difficult to write a compiler for a high level language targeting the 6502. The flaw in this idea of zero page being a big register file is that the programmer cannot load and store directly to these locations, data must always go through one of the 3 real registers.

So that got me thinking of instruction set extensions, as many of have on this forum. Wouldn't it be useful to have a MOV instruction that could move a byte to or from zero page? At first I did think that this might be a very complex instruction to implement, but there are plenty of examples of similar instructions in the read-modify-write instructions. The complexity would be around all the data that would need to be active inside the CPU at once, I wonder if it could be implemented without the need for any additional (hidden) registers?

It would look like this in it's simplest forms:

Code: Select all

MOV addr, zp
MOV zp, addr

and additional addressing modes could be implemented. Mainly useful for the addr operand, the zp operand is really the register number. Indexing across registers is possible and could be useful, but that would potentially add a lot of instructions.

It would be a 4 byte instruction and I think could be implemented without changing the internal architecture of the 6502 by using the ALU's B input latch to temporarily store the data byte while reading the destination address.

It also appears that arithmetic operations with the A register could be included in the instruction, this could be really useful for masking operations where the mask value is stored in A.

A counter argument could be that we don't save a lot over the two instruction LDA/STA sequence. we save 1 byte and 1 cycle, and get to preserve the A register for other uses.

barrym95838 · Post by **barrym95838** » Fri Apr 11, 2025 3:03 am

The Mitsubishi 740 had the LDM instruction, "load memory with immediate". It also had a T flag, which allowed the X register to point to a ZP "accumulator" for some instructions. Its assembly language is otherwise very similar to the 65xx family. MOV doesn't fit well into the 65xx vernacular, IMHO.

drogon · Post by **drogon** » Fri Apr 11, 2025 10:04 am

jds wrote:

There has been quite a bit of discussion of the lack of registers on the 6502, and the idea of thinking of zero page as a very large register file. To add weight to this Bill Mench has called this 'Addressable Register Architecture'. And it is a relatively common technique to define some zero page locations as registers, as for example in the Geos source code. The general lack of registers is often cited as one of the main reasons that it is difficult to write a compiler for a high level language targeting the 6502. The flaw in this idea of zero page being a big register file is that the programmer cannot load and store directly to these locations, data must always go through one of the 3 real registers.

So that got me thinking of instruction set extensions, as many of have on this forum. Wouldn't it be useful to have a MOV instruction that could move a byte to or from zero page? At first I did think that this might be a very complex instruction to implement, but there are plenty of examples of similar instructions in the read-modify-write instructions. The complexity would be around all the data that would need to be active inside the CPU at once, I wonder if it could be implemented without the need for any additional (hidden) registers?

It would look like this in it's simplest forms:

Code: Select all

MOV addr, zp
MOV zp, addr

and additional addressing modes could be implemented. Mainly useful for the addr operand, the zp operand is really the register number. Indexing across registers is possible and could be useful, but that would potentially add a lot of instructions.

It would be a 4 byte instruction and I think could be implemented without changing the internal architecture of the 6502 by using the ALU's B input latch to temporarily store the data byte while reading the destination address.

It also appears that arithmetic operations with the A register could be included in the instruction, this could be really useful for masking operations where the mask value is stored in A.

A counter argument could be that we don't save a lot over the two instruction LDA/STA sequence. we save 1 byte and 1 cycle, and get to preserve the A register for other uses.

Thinking about a lot of 6502 projects - e.g. higher level languages - they all seem to treat zero page as virtual registers, pointers, 16/24/32 bit 'accumulators' and so on, so it's almost the done thing anyway...

And you could implement this without too much issues inside a good macro assembler... Maybe a few cycles more but it would prove the concept...

Also, Sweet16 springs to mind.

Additionally... what I've done on a few occasions is to implement what's more or less a virtual machine... Using 6502 opcodes as the "microcode" (or "millicode" - which is used in the RISC-V community) My BCPL system does this for the bytecode that the compiler outputs and m TinyBasic has 16-bit pointers and registers in zero page for the "Intermediate Language" virtual machine.

Your MOV instructions above are memory to memory operations - to implement as macros the data has to pass through one of the A,X or Y registers but maybe something could be hacked up with an FPGA co-processor in an '816 system.

Maybe also have a look at the Acheron system? https://github.com/AcheronVM/acheronvm

-Gordon

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Apr 11, 2025 2:40 pm

jds wrote:

Wouldn't it be useful to have a MOV instruction that could move a byte to or from zero page?

The 65C816 sort of does this in one direction with the PEI instruction, although the only destination is the stack, specifically to wherever SP is pointing when PEI is executed. Alas, there is no opposite to PEI.

The MVN and MVP instructions, while not in themselves capable of loading an arbitrary byte or word value to an arbitrary location, can be used to fill an area with a byte—MVN works well for that purpose. However, the fill value itself has to have been written somewhere in memory, which means use of a register.

Speaking of PEI, I wish Bill Mensch had done more with exploiting the 816’s improved stack addressing. For example, while several instructions exist to load the stack without using a register, e.g., to generate a stack frame, there are no instructions to do stack housekeeping, as would be required upon exiting a function that was called with a stack frame. Unavoidably, at least one register has to get involved to rebalance the stack and if the stack frame is large and the registers have been pushed as part of the call, use of MVP almost becomes a necessity.

The 816 should have had an instruction such as PLW to remove a word from the stack without clobbering a register. Closest that one can come to that sort of operation is:

Code: Select all

         rep #%00100000        ;16-bit accumulator
         pla                   ;remove word from stack, clobbers .A & SR

For a function that is called with a large stack frame, I use the following code to clean up before the return:

Code: Select all

         rep #%00110000        ;16-bit registers
         clc
         tsc                   ;SP —> .A
         adc !#.s_rsf          ;adjust for register stack frame size
         tax                   ;SP + .S_RSF (copy-from pointer)
         adc !#.s_psf          ;adjust for call stack frame size
         tay                   ;SP + .S_RSF + .S_PSF (copy-to pointer)
         lda !#.s_rsf-1        ;bytes to copy, register frame size
         mvp #0,#0             ;shuffle stack
         tya                   ;adjust stack pointer to...
         tcs                   ;bottom of register frame -1
         pld                   ;restore...
         plb                   ;MPU...
         pla                   ;state
         plx
         ply
         plp
         rts

Somewhat cumbersome, but it does offer a measure of transparency and relieves the caller of having to fix up the stack whilst attempting to preserve the exit register values for later use.

BruceRMcF · Post by **BruceRMcF** » Fri Apr 11, 2025 10:41 pm

As far as the 65816, the ability to allocate any page of the first 64KB of RAM as the "direct" page is a pretty effective way of taking the zero page mode to where the psuedo-registers you want to use are located.

It has exactly exactly one instruction unassigned, and for me, freeing MVP and MNV from being trapped within a single 64KB bank might be where I would focus that instruction. For example, a new register, the YBR (Y Bank Register), where when the YBR is not zero and in Native mode, any 16bit Y-indexed access takes its BANK address from the YBR. If it's wired as an alternate Y register, then the operation would be EYR (Exchange Y and YBR Register). Then the Y-indexed part of MVP and MVN can access outside of the current DBR.

Mind, it means that Native mode interrupt routines would have to begin with "LDY #0 : EYR : PHY" or refrain from Y-indexed addressing.

jds · Post by **jds** » Sat Apr 12, 2025 12:18 am

barrym95838 wrote:

MOV doesn't fit well into the 65xx vernacular, IMHO.

I do like that comment. We're all basically here because we like the 6502, so why try and change it to look like something else?

jds · Post by **jds** » Sat Apr 12, 2025 12:29 am

BigDumbDinosaur wrote:

The 816 should have had an instruction such as PLW to remove a word from the stack without clobbering a register.

It does appear to be an omission to not have push and pull stack pointer instructions, this would be a simple way of clearing a stack frame. As you show in your comment, you'd still need to pull any registers used in the instruction. It can be done with a PHA, TSA, PHA sequence, but if you wanted to also preserve A you'd then need a stack relative LDA instruction (LDA #02,s ?), so it would be a bit shorter just having a PHS push stack pointer instruction.

GARTHWILSON · Post by **GARTHWILSON** » Sat Apr 12, 2025 3:43 am

Right away this topic brings to mind several valid things that are only partially related to each other:

It sound like a new processor design to implement in an FPGA (programmable logic);
but it's in the programming section of the forum, so drogon's comments about macros are appropriate. Macros can make it look, to the programmer, like the desired instructions are native to the processor (although it would be nice to have an INS (increment stack pointer) instruction to drop things from the stack without clobbering a register). It's just that the macros, which lay down multiple native instructions, will take a little longer to execute than the same instructions would take if they were native to the processor.
Without going to the extent of designing and implementing a new processor to run in an FPGA, Dr Jefyll is our resident genius for adding external logic to fool the processor to implement new registers and instructions.

Quote:

MOV doesn't fit well into the 65xx vernacular, IMHO.

I do like that comment. We're all basically here because we like the 6502, so why try and change it to look like something else?

Yes; I never did like the "move" term for this, because if for example I move a wire cutter on from the left side of the work bench to the right side, it is no longer on the left side. The MOV instruction is better named COPY, and that's what I've done in macros. For example,

Code: Select all

        COPY  FOOBAR1, to, FOOBAR2

or for two bytes,

Code: Select all

        COPY2  FOOBAR1, to, FOOBAR2

For putting a literal in a memory location, I do

Code: Select all

        PUT2  <this_constant>, in, <that_variable>

and the macro has conditional assembly to pick the most efficient way to do it, for example using STZ instead of LDA #, STA if a byte is 0, or if the two bytes are identical (like $4141), to only do the LDA once, before the two STA's. Optionally you could add a macro parameter for example "using_Y" if you didn't want to clobber A but Y was available. My article on macros is at http://wilsonminesco.com/StructureMacros/

barnacle · Post by **barnacle** » Sat Apr 12, 2025 7:30 am

MOV doesn't fit the 6502 zeitgeist, but perhaps LDM?

Though I've used LDA16, STA16, and COPY16 macro names.

Neil

Dr Jefyll · Post by **Dr Jefyll** » Sat Apr 12, 2025 2:27 pm

barrym95838 wrote:

The Mitsubishi 740 had the LDM instruction, "load memory with immediate". It also had a T flag, which allowed the X register to point to a ZP "accumulator" for some instructions.

Thanks, Mike. This latter feature -- the T flag -- creates a significant boost in functionality. Even more remarkable, IMO, is that it manages to do so without requiring dozens of new opcodes or otherwise turning the existing ISA on its head!

ZP memory at X "becomes" the accumulator. One good example to illustrate would be multi-precision addition/subtraction:

Code: Select all

CLC                ; T flag not in use

LDA ZPOperand, X
ADC OtherOperand,X ; byte0 of the multiprecision operation
STA ZPOperand, X
INX
LDA ZPOperand, X
ADC OtherOperand,X ; byte1 of the multiprecision operation
STA ZPOperand, X
INX

[...]              ; and so on for byte 2, byte 3 etc

Code: Select all

CLC                ; with the T flag set

ADC OtherOperand,X ; byte0 of the multiprecision operation (2 cycle penalty)
INX
ADC OtherOperand,X ; byte1 of the multiprecision operation (2 cycle penalty)
INX

[...]              ; and so on for byte 2, byte 3 etc

The Hudson Soft HuC6280 also features the T flag, but IIRC Hudson Soft's arrangement differs slightly from Mitsubishi's wrt how T gets set & reset. In one case T gets set by a "Set-T" instruction then automatically turns itself off after the following instruction completes. In the other case, T is sticky, and remains set until a "Clear T" instruction explicitly turns it off. (Corrections welcome if I'm misremembering any of this.)

GARTHWILSON wrote:

adding external logic to fool the processor to implement new registers and instructions.

Here again the challenge is to come up with something that's reasonably straight-forward to code for -- something that fits in with the existing ISA.

My KK Computer is a rather extreme example. Unable to resist the golden opportunities presented by the 'C02, I gave KK a shed-load of extra capabilities!

Despite all this indulgence, the warts are pretty minor (IMO). One example pertains to the point mentioned about being able to pull from stack. As with the macro Garth proposed, KK's register Y gets fruitlessly written as a side effect when you pull to register K1, K2, K3, IPL or IPH. (With an FPGA implementation this could have been avoided.)

The '816 also offers opportunities for functional enhancements, but with the 'C02 there's just so much low hanging fruit! It's ridiculously easy to add new functionality to the 'C02, (even without resorting to KK's microcoded complexity).

-- Jeff

enso1 · Post by **enso1** » Sat Apr 12, 2025 4:12 pm

As FPGA cores go, a 6502 with pages 0 and 1 in BRAMS -- as separate, parallel memories, may be an interesting way to accelerate the core. Stack operations may be done in parallel with other memory accesses, and zero-page reads may be performed speculatively, especially if the core reads a byte ahead of the opcode...

Yuri · Post by **Yuri** » Sun Apr 13, 2025 11:32 pm

jds wrote:

... The general lack of registers is often cited as one of the main reasons that it is difficult to write a compiler for a high level language targeting the 6502. ...

IDK who would be saying this or why. Most compilers I know of (granted that's not a huge number), internally target a stack only machine that effectively have NO accumulator or other registers. The implied accumulator is whatever is the top of the stack.

The small stack space and the difficulty of addressing arbitrary values on the stack in the 6502 is the one that I know as being a larger issue. Something that the 65816 addresses with its stack addressing mode additions.

barrym95838 · Post by **barrym95838** » Mon Apr 14, 2025 12:52 am

Dr Jefyll wrote:

In one case T gets set by a "Set-T" instruction then automatically turns itself off after the following instruction completes. In the other case, T is sticky, and remains set until a "Clear T" instruction explicitly turns it off. (Corrections welcome if I'm misremembering any of this.)

Ugh, that "sticky" T flag seems like the potential source of more "sneaky bastard" bugs than the infamous D flag. Done carefully, it seems like a win though.

BruceRMcF · Post by **BruceRMcF** » Thu Apr 17, 2025 3:51 am

barrym95838 wrote:

Dr Jefyll wrote:

In one case T gets set by a "Set-T" instruction then automatically turns itself off after the following instruction completes. In the other case, T is sticky, and remains set until a "Clear T" instruction explicitly turns it off. (Corrections welcome if I'm misremembering any of this.)

Ugh, that "sticky" T flag seems like the potential source of more "sneaky bastard" bugs than the infamous D flag. Done carefully, it seems like a win though.

IIUC, the 65816 could use the transient T flag, without getting in trouble with the T-state not be retained during interrupt by having the T-flag being an internal bit which doesn't allow interrupt until the following instruction completes.

White Flame · Post by **White Flame** » Sat Apr 19, 2025 8:26 am

jds wrote:

So that got me thinking of instruction set extensions, as many of have on this forum. Wouldn't it be useful to have a MOV instruction that could move a byte to or from zero page?

When the register allocation of the zeropage is generally static, as global variables & pointers, data doesn't really move in and out of those registers as temporary values, but the values exist as a global state of what the machine is doing, and those values live as long as that state is useful, which is often permanently for OS/ROM/BIOS/Kernel/BASIC register usage. The primary register-ness quality of zeropage is pointer usage, as there's no indirection through A/X/Y as addresses, compared to other 8-bits that did have 16-bit general purpose registers that could be used for addressing. Plus, they're generally going to retain their state across calls.

Generally much higher level systems would use some set of general purpose, usually >8 bit temporary general purpose pseudoregisters, which are constantly overwritten for different purposes often by values stored elsewhere (eg GEOS, VMs, etc), and I'd think those systems are going to be slower to a scale that MOV wouldn't reclaim.

Zero page as registers

Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers

Re: Zero page as registers