a 6404, 6303 and 16x504 -- the minimalist 6502 models
Posted: Sun May 11, 2025 4:09 pm
so it is often discussed how to extend the 6502 and add things to it, threads abound on such.
however, I am interested in the alternative to this, a reduced instruction set, register count and the inherent addressing modes lost from doing so.
as such there would be no Y register, and possibly no X register. however, I think some of the addressing modes and operations could be accomplished using only the accumulator, stack and RAM. largely, the X and Y registers give access to alternative addressing, and this simplifies operations by accessing addresses directly instead of copying them to the accumulator or stack.
http://www.6502.org/users/obelisk/6502/reference.html
and especially here:
https://www.pagetable.com/c64ref/6502/?tab=2
we can see arrangements of opcodes and how things grew over time. much of the opcodes in one nibble, are almost all dealing solely with the accumulator or memory itself. This allows for the potential of a 12-bit instruction word, and alternative use of the remaining 4 bits for certain operations, such as replacing addressing modes lost from register reduction.
I would maintain the X register at least in part, or mark certain space in zero page as 'general purpose registers' of some kind.
one type of replacement registers would be to use the 4 bits from the reduced word size. 0000 is the accumulator, 0001-1111 might be 'scratch registers' with operations done thereon. I would allow the Stack pointer, to be used as a register if no stack is connected. Like a buffer maybe? this would not solve all problems, it would extend the chip by having a 'one byte stack' on the chip. its a readable and writeable 8b register, that auto increments or decrements on stack operations, it should be useable as a register on its own, imo. that opinion may change as I study the circuitry more.
Program Counter, 16b
Accumulator, 8b
Stack pointer, 8b
Status Flags, 8b
RAM based registers
---removed-----
Stack, <256x8b
zero page ~256x8b
it is probably possible to simplify the lowest level RAM usage here.
of note, if 2 bit addressing' is used here, that is 3 extra 'registers'
Stack use would increase from the loss of X and Y registers, so reducing it's size is possibly not ideal;
however the Xtra registers, may accomplish a lot, if they are housed on-die.
This is not a chip made in 1970, its a chip Im thinking of making in 2025 on an fpga, or asic. So the idea is, instead of operating on ram/cpu/fetch-decode-operate-put , its more 'send opcode to the data and have it done and check it later'. So to create a very simple, very small processor, that can operate on or near values in ram.
If I were to add any kind of register, or special operation, it would be a 'fuzzy comparator' of some kind. To speed up sorting operations. Pick largest/smallest/nearest from group type of thing, something in hardware made out of op-amps. A register than can perform fast analog comparisons. This is a type of 'carry look ahead' style of logic Im still researching techniques on. magnitude comparator maybe. not sure if an operation, or a register. it has to be fast.
So what a lot of this does, is reduce and normalize opcode execution cycle time. simplify circuitry.
Thoughts and resources, technical data is welcome. How would you build an even cheaper, smaller and more streamlined 65xx?
Ill start looking at some VHDL scaffolding and declarations to get this started.
thinking about this a little more, id almost say make it 6 bit opcode, with 2 bit Xtra register select
so that would be:
765432 10
opcode rs
00 Accumulator
01 RAM register/inner register
10 RAM register/inner register
11 RAM register/inner register
if so, this is where I would use the 'select best' comparator, which is now better limited on fan-in/fan-out.
this gets us down to 32 instructions:
LDA AND ASL BCC JSR CLC
STA EOR LSR BCS RTI SEI
trans ORA BMI RTS CLI
TXS BIT BPL CLV
TSX BVC SEC
NOP ADC BNE PLP
CMP BEQ PHP
SBC
JMP can be done with stack trickery i think
PLA/PHA can be done with a TSX/TXS, which are actually just 'transfer now', and you just TSX the address to the stack, if anything, control of stack pointer value is fine, and the address of the top of the stack, you just STA there to push onto the stack.
BRK is possibly a STA of value to the status register and setting it there, similarly, the BCD module is gone and its bit is the Rotate bit, and it controls how the Shifts work. other opcodes might use this too, so its maybe 'Reserved' bit, Im not sure BRK cant be replaced with some stack manipulation and the RTI/RSR operations. SED,CLD are gone, the TXY/TYX/Etc are gone. INC/DEC used a lot of cycles, and are a fetch/modify/repeat operation, so this is not difficult to recreate, LDA, ADC+1, STA, is 2 extra cycles for immediate mode. they are not useless operations, just a nicety we can look at setting aside for this Risc-502.
CPX/Y is now just compare, and it may be done on the stack, or memory addresses etc.
so with a 32 character instruction set, so far, we are now at 5b opcode, 3b suffix, and in style of 6502, 8bit operand.
H 76543 210 L
opcode suffix operand
this allows 16b addressing (Accumulator + operand), with 3b "paging" for 19b, using a bit from the flags to track, we have 1MiB or more address space.
or 8x addressing modes, etc.
or accumulator (000)
RAM-registers (001-110)
Stack register (111)
there are options here for different 3b 'suffixes' to control the 32 instruction count Risc502 architecture.
however, I am interested in the alternative to this, a reduced instruction set, register count and the inherent addressing modes lost from doing so.
as such there would be no Y register, and possibly no X register. however, I think some of the addressing modes and operations could be accomplished using only the accumulator, stack and RAM. largely, the X and Y registers give access to alternative addressing, and this simplifies operations by accessing addresses directly instead of copying them to the accumulator or stack.
http://www.6502.org/users/obelisk/6502/reference.html
and especially here:
https://www.pagetable.com/c64ref/6502/?tab=2
we can see arrangements of opcodes and how things grew over time. much of the opcodes in one nibble, are almost all dealing solely with the accumulator or memory itself. This allows for the potential of a 12-bit instruction word, and alternative use of the remaining 4 bits for certain operations, such as replacing addressing modes lost from register reduction.
I would maintain the X register at least in part, or mark certain space in zero page as 'general purpose registers' of some kind.
one type of replacement registers would be to use the 4 bits from the reduced word size. 0000 is the accumulator, 0001-1111 might be 'scratch registers' with operations done thereon. I would allow the Stack pointer, to be used as a register if no stack is connected. Like a buffer maybe? this would not solve all problems, it would extend the chip by having a 'one byte stack' on the chip. its a readable and writeable 8b register, that auto increments or decrements on stack operations, it should be useable as a register on its own, imo. that opinion may change as I study the circuitry more.
Program Counter, 16b
Accumulator, 8b
Stack pointer, 8b
Status Flags, 8b
RAM based registers
---removed-----
Stack, <256x8b
zero page ~256x8b
it is probably possible to simplify the lowest level RAM usage here.
of note, if 2 bit addressing' is used here, that is 3 extra 'registers'
Stack use would increase from the loss of X and Y registers, so reducing it's size is possibly not ideal;
however the Xtra registers, may accomplish a lot, if they are housed on-die.
This is not a chip made in 1970, its a chip Im thinking of making in 2025 on an fpga, or asic. So the idea is, instead of operating on ram/cpu/fetch-decode-operate-put , its more 'send opcode to the data and have it done and check it later'. So to create a very simple, very small processor, that can operate on or near values in ram.
If I were to add any kind of register, or special operation, it would be a 'fuzzy comparator' of some kind. To speed up sorting operations. Pick largest/smallest/nearest from group type of thing, something in hardware made out of op-amps. A register than can perform fast analog comparisons. This is a type of 'carry look ahead' style of logic Im still researching techniques on. magnitude comparator maybe. not sure if an operation, or a register. it has to be fast.
So what a lot of this does, is reduce and normalize opcode execution cycle time. simplify circuitry.
Thoughts and resources, technical data is welcome. How would you build an even cheaper, smaller and more streamlined 65xx?
Ill start looking at some VHDL scaffolding and declarations to get this started.
thinking about this a little more, id almost say make it 6 bit opcode, with 2 bit Xtra register select
so that would be:
765432 10
opcode rs
00 Accumulator
01 RAM register/inner register
10 RAM register/inner register
11 RAM register/inner register
if so, this is where I would use the 'select best' comparator, which is now better limited on fan-in/fan-out.
this gets us down to 32 instructions:
LDA AND ASL BCC JSR CLC
STA EOR LSR BCS RTI SEI
trans ORA BMI RTS CLI
TXS BIT BPL CLV
TSX BVC SEC
NOP ADC BNE PLP
CMP BEQ PHP
SBC
JMP can be done with stack trickery i think
PLA/PHA can be done with a TSX/TXS, which are actually just 'transfer now', and you just TSX the address to the stack, if anything, control of stack pointer value is fine, and the address of the top of the stack, you just STA there to push onto the stack.
BRK is possibly a STA of value to the status register and setting it there, similarly, the BCD module is gone and its bit is the Rotate bit, and it controls how the Shifts work. other opcodes might use this too, so its maybe 'Reserved' bit, Im not sure BRK cant be replaced with some stack manipulation and the RTI/RSR operations. SED,CLD are gone, the TXY/TYX/Etc are gone. INC/DEC used a lot of cycles, and are a fetch/modify/repeat operation, so this is not difficult to recreate, LDA, ADC+1, STA, is 2 extra cycles for immediate mode. they are not useless operations, just a nicety we can look at setting aside for this Risc-502.
CPX/Y is now just compare, and it may be done on the stack, or memory addresses etc.
so with a 32 character instruction set, so far, we are now at 5b opcode, 3b suffix, and in style of 6502, 8bit operand.
H 76543 210 L
opcode suffix operand
this allows 16b addressing (Accumulator + operand), with 3b "paging" for 19b, using a bit from the flags to track, we have 1MiB or more address space.
or 8x addressing modes, etc.
or accumulator (000)
RAM-registers (001-110)
Stack register (111)
there are options here for different 3b 'suffixes' to control the 32 instruction count Risc502 architecture.