a 6404, 6303 and 16x504 -- the minimalist 6502 models
a 6404, 6303 and 16x504 -- the minimalist 6502 models
so it is often discussed how to extend the 6502 and add things to it, threads abound on such.
however, I am interested in the alternative to this, a reduced instruction set, register count and the inherent addressing modes lost from doing so.
as such there would be no Y register, and possibly no X register. however, I think some of the addressing modes and operations could be accomplished using only the accumulator, stack and RAM. largely, the X and Y registers give access to alternative addressing, and this simplifies operations by accessing addresses directly instead of copying them to the accumulator or stack.
http://www.6502.org/users/obelisk/6502/reference.html
and especially here:
https://www.pagetable.com/c64ref/6502/?tab=2
we can see arrangements of opcodes and how things grew over time. much of the opcodes in one nibble, are almost all dealing solely with the accumulator or memory itself. This allows for the potential of a 12-bit instruction word, and alternative use of the remaining 4 bits for certain operations, such as replacing addressing modes lost from register reduction.
I would maintain the X register at least in part, or mark certain space in zero page as 'general purpose registers' of some kind.
one type of replacement registers would be to use the 4 bits from the reduced word size. 0000 is the accumulator, 0001-1111 might be 'scratch registers' with operations done thereon. I would allow the Stack pointer, to be used as a register if no stack is connected. Like a buffer maybe? this would not solve all problems, it would extend the chip by having a 'one byte stack' on the chip. its a readable and writeable 8b register, that auto increments or decrements on stack operations, it should be useable as a register on its own, imo. that opinion may change as I study the circuitry more.
Program Counter, 16b
Accumulator, 8b
Stack pointer, 8b
Status Flags, 8b
RAM based registers
---removed-----
Stack, <256x8b
zero page ~256x8b
it is probably possible to simplify the lowest level RAM usage here.
of note, if 2 bit addressing' is used here, that is 3 extra 'registers'
Stack use would increase from the loss of X and Y registers, so reducing it's size is possibly not ideal;
however the Xtra registers, may accomplish a lot, if they are housed on-die.
This is not a chip made in 1970, its a chip Im thinking of making in 2025 on an fpga, or asic. So the idea is, instead of operating on ram/cpu/fetch-decode-operate-put , its more 'send opcode to the data and have it done and check it later'. So to create a very simple, very small processor, that can operate on or near values in ram.
If I were to add any kind of register, or special operation, it would be a 'fuzzy comparator' of some kind. To speed up sorting operations. Pick largest/smallest/nearest from group type of thing, something in hardware made out of op-amps. A register than can perform fast analog comparisons. This is a type of 'carry look ahead' style of logic Im still researching techniques on. magnitude comparator maybe. not sure if an operation, or a register. it has to be fast.
So what a lot of this does, is reduce and normalize opcode execution cycle time. simplify circuitry.
Thoughts and resources, technical data is welcome. How would you build an even cheaper, smaller and more streamlined 65xx?
Ill start looking at some VHDL scaffolding and declarations to get this started.
thinking about this a little more, id almost say make it 6 bit opcode, with 2 bit Xtra register select
so that would be:
765432 10
opcode rs
00 Accumulator
01 RAM register/inner register
10 RAM register/inner register
11 RAM register/inner register
if so, this is where I would use the 'select best' comparator, which is now better limited on fan-in/fan-out.
this gets us down to 32 instructions:
LDA AND ASL BCC JSR CLC
STA EOR LSR BCS RTI SEI
trans ORA BMI RTS CLI
TXS BIT BPL CLV
TSX BVC SEC
NOP ADC BNE PLP
CMP BEQ PHP
SBC
JMP can be done with stack trickery i think
PLA/PHA can be done with a TSX/TXS, which are actually just 'transfer now', and you just TSX the address to the stack, if anything, control of stack pointer value is fine, and the address of the top of the stack, you just STA there to push onto the stack.
BRK is possibly a STA of value to the status register and setting it there, similarly, the BCD module is gone and its bit is the Rotate bit, and it controls how the Shifts work. other opcodes might use this too, so its maybe 'Reserved' bit, Im not sure BRK cant be replaced with some stack manipulation and the RTI/RSR operations. SED,CLD are gone, the TXY/TYX/Etc are gone. INC/DEC used a lot of cycles, and are a fetch/modify/repeat operation, so this is not difficult to recreate, LDA, ADC+1, STA, is 2 extra cycles for immediate mode. they are not useless operations, just a nicety we can look at setting aside for this Risc-502.
CPX/Y is now just compare, and it may be done on the stack, or memory addresses etc.
so with a 32 character instruction set, so far, we are now at 5b opcode, 3b suffix, and in style of 6502, 8bit operand.
H 76543 210 L
opcode suffix operand
this allows 16b addressing (Accumulator + operand), with 3b "paging" for 19b, using a bit from the flags to track, we have 1MiB or more address space.
or 8x addressing modes, etc.
or accumulator (000)
RAM-registers (001-110)
Stack register (111)
there are options here for different 3b 'suffixes' to control the 32 instruction count Risc502 architecture.
however, I am interested in the alternative to this, a reduced instruction set, register count and the inherent addressing modes lost from doing so.
as such there would be no Y register, and possibly no X register. however, I think some of the addressing modes and operations could be accomplished using only the accumulator, stack and RAM. largely, the X and Y registers give access to alternative addressing, and this simplifies operations by accessing addresses directly instead of copying them to the accumulator or stack.
http://www.6502.org/users/obelisk/6502/reference.html
and especially here:
https://www.pagetable.com/c64ref/6502/?tab=2
we can see arrangements of opcodes and how things grew over time. much of the opcodes in one nibble, are almost all dealing solely with the accumulator or memory itself. This allows for the potential of a 12-bit instruction word, and alternative use of the remaining 4 bits for certain operations, such as replacing addressing modes lost from register reduction.
I would maintain the X register at least in part, or mark certain space in zero page as 'general purpose registers' of some kind.
one type of replacement registers would be to use the 4 bits from the reduced word size. 0000 is the accumulator, 0001-1111 might be 'scratch registers' with operations done thereon. I would allow the Stack pointer, to be used as a register if no stack is connected. Like a buffer maybe? this would not solve all problems, it would extend the chip by having a 'one byte stack' on the chip. its a readable and writeable 8b register, that auto increments or decrements on stack operations, it should be useable as a register on its own, imo. that opinion may change as I study the circuitry more.
Program Counter, 16b
Accumulator, 8b
Stack pointer, 8b
Status Flags, 8b
RAM based registers
---removed-----
Stack, <256x8b
zero page ~256x8b
it is probably possible to simplify the lowest level RAM usage here.
of note, if 2 bit addressing' is used here, that is 3 extra 'registers'
Stack use would increase from the loss of X and Y registers, so reducing it's size is possibly not ideal;
however the Xtra registers, may accomplish a lot, if they are housed on-die.
This is not a chip made in 1970, its a chip Im thinking of making in 2025 on an fpga, or asic. So the idea is, instead of operating on ram/cpu/fetch-decode-operate-put , its more 'send opcode to the data and have it done and check it later'. So to create a very simple, very small processor, that can operate on or near values in ram.
If I were to add any kind of register, or special operation, it would be a 'fuzzy comparator' of some kind. To speed up sorting operations. Pick largest/smallest/nearest from group type of thing, something in hardware made out of op-amps. A register than can perform fast analog comparisons. This is a type of 'carry look ahead' style of logic Im still researching techniques on. magnitude comparator maybe. not sure if an operation, or a register. it has to be fast.
So what a lot of this does, is reduce and normalize opcode execution cycle time. simplify circuitry.
Thoughts and resources, technical data is welcome. How would you build an even cheaper, smaller and more streamlined 65xx?
Ill start looking at some VHDL scaffolding and declarations to get this started.
thinking about this a little more, id almost say make it 6 bit opcode, with 2 bit Xtra register select
so that would be:
765432 10
opcode rs
00 Accumulator
01 RAM register/inner register
10 RAM register/inner register
11 RAM register/inner register
if so, this is where I would use the 'select best' comparator, which is now better limited on fan-in/fan-out.
this gets us down to 32 instructions:
LDA AND ASL BCC JSR CLC
STA EOR LSR BCS RTI SEI
trans ORA BMI RTS CLI
TXS BIT BPL CLV
TSX BVC SEC
NOP ADC BNE PLP
CMP BEQ PHP
SBC
JMP can be done with stack trickery i think
PLA/PHA can be done with a TSX/TXS, which are actually just 'transfer now', and you just TSX the address to the stack, if anything, control of stack pointer value is fine, and the address of the top of the stack, you just STA there to push onto the stack.
BRK is possibly a STA of value to the status register and setting it there, similarly, the BCD module is gone and its bit is the Rotate bit, and it controls how the Shifts work. other opcodes might use this too, so its maybe 'Reserved' bit, Im not sure BRK cant be replaced with some stack manipulation and the RTI/RSR operations. SED,CLD are gone, the TXY/TYX/Etc are gone. INC/DEC used a lot of cycles, and are a fetch/modify/repeat operation, so this is not difficult to recreate, LDA, ADC+1, STA, is 2 extra cycles for immediate mode. they are not useless operations, just a nicety we can look at setting aside for this Risc-502.
CPX/Y is now just compare, and it may be done on the stack, or memory addresses etc.
so with a 32 character instruction set, so far, we are now at 5b opcode, 3b suffix, and in style of 6502, 8bit operand.
H 76543 210 L
opcode suffix operand
this allows 16b addressing (Accumulator + operand), with 3b "paging" for 19b, using a bit from the flags to track, we have 1MiB or more address space.
or 8x addressing modes, etc.
or accumulator (000)
RAM-registers (001-110)
Stack register (111)
there are options here for different 3b 'suffixes' to control the 32 instruction count Risc502 architecture.
Last edited by wayfarer on Wed May 14, 2025 12:40 am, edited 4 times in total.
Re: a 6303 -- the minimalist 6502
edited, note for ninjas.
interesting read here: http://retro.hansotten.nl/uploads/mag65 ... pcodes.pdf
he doesnt take into account opcode usage, from loops or subroutines, only the number of times its typed.
Amdahl's law and universal modeling law might be useful here to determine which codes to drop or keep, however;
some maintenance of turing completeness is in mind, and to reduce 'convenience' opcodes a little, to simplify design and make it more 'abstract'.
interesting read here: http://retro.hansotten.nl/uploads/mag65 ... pcodes.pdf
he doesnt take into account opcode usage, from loops or subroutines, only the number of times its typed.
Amdahl's law and universal modeling law might be useful here to determine which codes to drop or keep, however;
some maintenance of turing completeness is in mind, and to reduce 'convenience' opcodes a little, to simplify design and make it more 'abstract'.
Re: a 16x504 -- the 16 instruction 6502
16x504
a 6502 based 14500b ish 6502
So here we have a 16 opcode instruction set
we use the Break Bit for Interrupt and Subroutine control,
the R bit to access a variety of Reversed or Restricted or Reserved commands.
It becomes out 'inveRse' bit. we still havent use the eXpansion bit,
the Bit command is moved to the AND command with the R flag set,
Branches are mostly inverses of themselves and the R bit serves here as well.
Further exploration into what opcodes would make the final cut remains, however:
this puts the "6502 instruction set in 4 bits" this is now a very minimal system, and should still be capable of running "6502 like code". kinda.
it could be possibly made to run on 4 bit. and a 4 bit set of registers or 'cache' that has the stack pointer address, the flags, the accumulator (on 0) etc, there are some ways here to map the other 4 bits here, (which kinda got moved the the status register, and im saying the status flag register, should be an address, not relocatable)
several places indicate a 'last operation' or 'last state' and a buffer register unseen seems somewhat intuitive, though it may be the 'datapath itself' register. its an odd one to think about.
so, 8 bit (or 4? or 1?) processor, with a 4 bit instruction word and an 8 bit operand, leaves a 4 bit suffix, address space here is not yet discussed. on a 8 bit processor, with 8 'registers' you get an Octlet, and LDx?Stx can be done with a octal suffix, so going to a 4 bit suffix, it is possible some of these opcodes may have different address, control line codes, or other suffixes.
some opcodes, like CMP, might be faster if done to the Stack (1111) than a memory address (0000+operand). again, it is wondered it a 'buffer' register is in the machine.
If this follows the 8/16 address system of 6502, then 4 bit opcode, 4 bit suffix and 8 bit operand words, can now easily by 12, 16 or 20.
a 4 bit compare, would be 'accumulator' to 'the value at address of the register', so 0000 is accumulator, and 1111 is stack, so if we CMP1111, it compares the Accumulator to the value on the stack, if we use, 011, it looks at register (of our octlet) and uses the value there. if we want an offset, we use Status flags to control if we get another operand to increase our addressing space.
for many operations, we have 12b addressing, or 4k.
with control registers for say, inner or outer addresses, it is easy to have 16 bytes of 'register space',
where the accumulator (0), indexes(1,2,3,4,5,6,7) , stAtus flags (A), <Buffer/last value>(B) program Counter(CD), <Extra>(E), stack pointer (F),
so operations such as ADC, OR, XOR can be done on the octlet via command with a suffix of 0001-0111,and 0000 being the accumulator, fetches another byte for the address in memory (to put in that buffer), and might use a Flag (X?) to get a 16b address instead. Further, R 'negates' the operation, and causes NOR or XNOR, or Subtraction is done instead. So, there is more Flag and Control work to be done, however using them allows for a very minimal instruction set.
H7654 3210 (&NVXBRICV), L76543210 (+ X76543210)
Opcode Suffix status modifiers page address outer address
a 6502 based 14500b ish 6502
So here we have a 16 opcode instruction set
we use the Break Bit for Interrupt and Subroutine control,
the R bit to access a variety of Reversed or Restricted or Reserved commands.
It becomes out 'inveRse' bit. we still havent use the eXpansion bit,
the Bit command is moved to the AND command with the R flag set,
Branches are mostly inverses of themselves and the R bit serves here as well.
Further exploration into what opcodes would make the final cut remains, however:
this puts the "6502 instruction set in 4 bits" this is now a very minimal system, and should still be capable of running "6502 like code". kinda.
it could be possibly made to run on 4 bit. and a 4 bit set of registers or 'cache' that has the stack pointer address, the flags, the accumulator (on 0) etc, there are some ways here to map the other 4 bits here, (which kinda got moved the the status register, and im saying the status flag register, should be an address, not relocatable)
several places indicate a 'last operation' or 'last state' and a buffer register unseen seems somewhat intuitive, though it may be the 'datapath itself' register. its an odd one to think about.
so, 8 bit (or 4? or 1?) processor, with a 4 bit instruction word and an 8 bit operand, leaves a 4 bit suffix, address space here is not yet discussed. on a 8 bit processor, with 8 'registers' you get an Octlet, and LDx?Stx can be done with a octal suffix, so going to a 4 bit suffix, it is possible some of these opcodes may have different address, control line codes, or other suffixes.
some opcodes, like CMP, might be faster if done to the Stack (1111) than a memory address (0000+operand). again, it is wondered it a 'buffer' register is in the machine.
If this follows the 8/16 address system of 6502, then 4 bit opcode, 4 bit suffix and 8 bit operand words, can now easily by 12, 16 or 20.
a 4 bit compare, would be 'accumulator' to 'the value at address of the register', so 0000 is accumulator, and 1111 is stack, so if we CMP1111, it compares the Accumulator to the value on the stack, if we use, 011, it looks at register (of our octlet) and uses the value there. if we want an offset, we use Status flags to control if we get another operand to increase our addressing space.
for many operations, we have 12b addressing, or 4k.
with control registers for say, inner or outer addresses, it is easy to have 16 bytes of 'register space',
where the accumulator (0), indexes(1,2,3,4,5,6,7) , stAtus flags (A), <Buffer/last value>(B) program Counter(CD), <Extra>(E), stack pointer (F),
so operations such as ADC, OR, XOR can be done on the octlet via command with a suffix of 0001-0111,and 0000 being the accumulator, fetches another byte for the address in memory (to put in that buffer), and might use a Flag (X?) to get a 16b address instead. Further, R 'negates' the operation, and causes NOR or XNOR, or Subtraction is done instead. So, there is more Flag and Control work to be done, however using them allows for a very minimal instruction set.
H7654 3210 (&NVXBRICV), L76543210 (+ X76543210)
Opcode Suffix status modifiers page address outer address
Last edited by wayfarer on Sun May 11, 2025 10:10 pm, edited 1 time in total.
Re: the 16x504, a 16 instruction 6502
https://sleepingelephant.com/ipw-web/bu ... hp?t=11123
on review here, to iterate, instead of 5 bit (3 bit opcode, 2 bit opcode page), and 3 bit addressing (as stated above in the 6303 post)
we are in the 16x504, with a 4 bit opcode, and using the B, R, I and X bits, to give additional control signals or modes.
and reviewing this, using at least 2 bits for addressing, as we no longer have the Y, and X registers, we dont need octal addressing.
00, accumulator, implied, immediate
01, indirect, or direct
10, 8 or 16 bit address
11, uses a direct 16b address, stack otherwise
JSI(JMP) uses relative, absolute, vectors, based on Irq and Brk flags depending on instruction, and may use these op-modes
that leaves 2 bits for 00, accumulator, 01 Status, 10 PC, 11 stack, maybe.
this would allow for several forms of relative addressing, indexing and other operations, while maintaining the 6502 instruction set overall.
ideally, this would allow 'transpiling' or compilation to this set, though as is, a 16 instruction count 650-mpu, with registers, Acc, PC, Stack, Flags, is vaguely, a 16x504 mpu. a Risc502.
pretty bare minimum, and it shows how the X,Y registers add convenience, simplicity and a reduced instruction cycle count for a very small amount of circuitry, mostly registers and buffers.
on review here, to iterate, instead of 5 bit (3 bit opcode, 2 bit opcode page), and 3 bit addressing (as stated above in the 6303 post)
we are in the 16x504, with a 4 bit opcode, and using the B, R, I and X bits, to give additional control signals or modes.
and reviewing this, using at least 2 bits for addressing, as we no longer have the Y, and X registers, we dont need octal addressing.
00, accumulator, implied, immediate
01, indirect, or direct
10, 8 or 16 bit address
11, uses a direct 16b address, stack otherwise
JSI(JMP) uses relative, absolute, vectors, based on Irq and Brk flags depending on instruction, and may use these op-modes
that leaves 2 bits for 00, accumulator, 01 Status, 10 PC, 11 stack, maybe.
this would allow for several forms of relative addressing, indexing and other operations, while maintaining the 6502 instruction set overall.
ideally, this would allow 'transpiling' or compilation to this set, though as is, a 16 instruction count 650-mpu, with registers, Acc, PC, Stack, Flags, is vaguely, a 16x504 mpu. a Risc502.
pretty bare minimum, and it shows how the X,Y registers add convenience, simplicity and a reduced instruction cycle count for a very small amount of circuitry, mostly registers and buffers.
Re: a 6303 and 16x504 -- the minimalist 6502 models
Load, Store, "Move/Assign"
AND, XOR, OR,
Add/Sub-w-carry, Shift/Rotate, Compare
Branch/Jump, NOP
variations thereon accounts for 16 instructions, give or take,
branch jump, is a flag based system, Add/sub, Shift/Rotate, Compare are
all flag controlled, or set flags, reading or writing the status register directly
is how to perform those operations, logic ops might be invertible with a flag
so it is possible to return to a 3b opcode, possibly
Load, Store, AddwCarry, Branch,
AND, OR, NOP, Shift
we move XOR to OR with R flag bit set, and allowing for NOP, this 3 bit opcode, 2 bit address mode or 5 bit addressing, possibly other modes.
judicious use of the stack, or some additional RAM-register is required here most likely. PC gets put on Address to fetch instructions. so that gets us 64k address space already, storing to PC high or low. allows full address jumps, combined with various flags and accumulator values.
really, we are moving opcode bits to the status register, in order to expose further address modes, and opcode pages.
AND, XOR, OR,
Add/Sub-w-carry, Shift/Rotate, Compare
Branch/Jump, NOP
variations thereon accounts for 16 instructions, give or take,
branch jump, is a flag based system, Add/sub, Shift/Rotate, Compare are
all flag controlled, or set flags, reading or writing the status register directly
is how to perform those operations, logic ops might be invertible with a flag
so it is possible to return to a 3b opcode, possibly
Load, Store, AddwCarry, Branch,
AND, OR, NOP, Shift
we move XOR to OR with R flag bit set, and allowing for NOP, this 3 bit opcode, 2 bit address mode or 5 bit addressing, possibly other modes.
judicious use of the stack, or some additional RAM-register is required here most likely. PC gets put on Address to fetch instructions. so that gets us 64k address space already, storing to PC high or low. allows full address jumps, combined with various flags and accumulator values.
really, we are moving opcode bits to the status register, in order to expose further address modes, and opcode pages.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: a 6303 and 16x504 -- the minimalist 6502 models
You might get some good ideas from BMOW Steve Chamberlin's tiny 6502-like CPU in a CPLD
That's important. The linked .pdf says in the conclusion, "My original suspicions of the worthlessness of the 6502's pre-indexed instructions seems confirmed." He apparently has no exposure to Forth; because Forth uses the (zp,X) addressing mode constantly. If you just count the instructions' occurrences, you'll only get about a half-dozen in a kernel, but they get run constantly. I think most people's wrong idea about this addressing mode is that it's for tables or arrays in ZP. They're not thinking of a data stack which often has addresses on it too, and is separate from the page-1 return stack. I introduce the idea of a separate data stack in chapter 4 of my 6502 stacks treatise.
Quote:
He doesn't take into account opcode usage, from loops or subroutines, only the number of times it's typed.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: a 6303 and 16x504 -- the minimalist 6502 models
GARTHWILSON wrote:
Quote:
He doesn't take into account opcode usage, from loops or subroutines, only the number of times it's typed.
In the serial I/O driver of my POC units’ firmware, I access UART registers and circular queues according to the channel number being processed by using statically-defined pointers that are referenced with (<dp>,X) addressing. As with the use of (<dp>,X) in Forth, there aren’t a lot of instructions in the firmware that do use that addressing mode, but they are constantly being executed as serial data comes and goes.
The resulting code is succinct, and easily scaled to more channels, simply by adding more pointers. (<dp>,X) addressing greatly simplifies things and is far from worthless.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: a 6303 and 16x504 -- the minimalist 6502 models
GARTHWILSON wrote:
You might get some good ideas from BMOW Steve Chamberlin's tiny 6502-like CPU in a CPLDThat's important. The linked .pdf says in the conclusion, "My original suspicions of ..." He apparently has no exposure to Forth; because Forth uses the (zp,X) addressing mode constantly. If you just count the instructions' occurrences, you'll only get about a half-dozen in a kernel, but they get run constantly. I think most people's wrong idea about this addressing mode is that it's for tables or arrays in ZP. They're not thinking of a data stack which often has addresses on it too, and is separate from the page-1 return stack. I introduce the idea of a separate data stack in chapter 4 of my 6502 stacks treatise.
Quote:
He doesn't take into account opcode usage, ...
if a 4b mpu, had a 16 layer stack, would that be sufficient?
BigDumbDinosaur wrote:
GARTHWILSON wrote:
Quote:
He doesn't take into account opcode usage, from loops or subroutines, only the number of times it's typed.
In the serial I/O driver of my POC units’ firmware, I access UART registers and circular queues according to the channel number being processed by using statically-defined pointers that are referenced with (<dp>,X) addressing. As with the use of (<dp>,X) in Forth, there aren’t a lot of instructions in the firmware that do use that addressing mode, but they are constantly being executed as serial data comes and goes.
The resulting code is succinct, and easily scaled to more channels, simply by adding more pointers. (<dp>,X) addressing greatly simplifies things and is far from worthless.
Last edited by wayfarer on Wed May 14, 2025 6:16 pm, edited 1 time in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: a 6303 and 16x504 -- the minimalist 6502 models
wayfarer wrote:
agreed, I'm interested in doing further research myself. I'm curious on if you think a stack should be wide or deep.
if a 4b mpu, had a 16-layer stack, would that be sufficient?
if a 4b mpu, had a 16-layer stack, would that be sufficient?
The other processor I'm super familiar with, having brought a dozen products to market with it, is the one in the PIC16F microcontrollers. (The last one was a few years ago, and I have not used the PIC16F1's which are apparently slightly better.) These don't have index registers and the luxury of the 6502's addressing modes. The return stack is only eight-level, and in one project, I was really overflowing it and I had to figure out what I could straightline (to get rid of subroutine calls where the stack was full), which was difficult because I was also using up all the program memory, and of course straightlining requires more program memory. I did complete it, with extra effort.
The PIC16 also does not allow you to put data on the hardware stack, let alone address it out of order like you can do on the '02 using X (and better on the '816). (In fact, the hardware stack is not accessible to the programmer at all.) A separate data stack solves various problems; but implementing a data stack on the PIC16, although possible, is extremely inefficient, again due to lacking the luxury of the 6502's addressing modes. A couple of weeks ago I started doing some routines and macros for it. I thought that if you don't need maximum performance, it might be worth it sometimes, just to keep better control of a project and make it more maintainable. Now I'm not so sure.
We tend to think and plan in terms of things we've already done and are familiar with; and I suspect you're still not thinking in terms of a virtual stack for data (which can contain any data, including cells that may be calculated addresses) which is separate from the return stack that's used for subroutine return addresses.
You'll have to consider what you want to use your new trimmed-down 6502 for, and what you're willing to give up to gain simplicity. If the instruction set is too simple, a program will require a lot more instructions to get the job done. Will a program no longer fit in the memory space that the number of address bits allows? What do you plan to do for an assembler? Do you plan to run any languages that are higher level than assembly language? I imagine you could probably come up with a pretty well defined model and then experiment with writing programs for it, and maybe have others experiment with it too, before hammering out the details of how to implement your envisioned registers and instructions in the programmable logic.
I'm not too happy with my post, but maybe it'll stir some ideas.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: a 6303 and 16x504 -- the minimalist 6502 models
Of all the things engineers do, being simple is the hardest; it takes the most efforts, times and iterations. It is the hallmark of true engineering to make it look simple.
Bill
Bill
Re: a 6303 and 16x504 -- the minimalist 6502 models
GARTHWILSON wrote:
wayfarer wrote:
agreed, I'm interested in doing further research myself. I'm curious on if you think a stack should be wide or deep.
if a 4b mpu, had a 16-layer stack, would that be sufficient?
if a 4b mpu, had a 16-layer stack, would that be sufficient?
Quote:
The other processor I'm super familiar with, having brought a dozen products to market with it, is the one in the PIC16F microcontrollers. (The last one was a few years ago, and I have not used the PIC16F1's which are apparently slightly better.) These don't have index registers and the luxury of the 6502's addressing modes. SNIP ... /SNIP We tend to think and plan in terms of things we've already done and are familiar with; and I suspect you're still not thinking in terms of a virtual stack for data (which can contain any data, including cells that may be calculated addresses) which is separate from the return stack that's used for subroutine return addresses.
Im aiming for Orthogonal Commands and Registers though, and that might help. 4b mpu, 16 registers, the "6404", see below.
Quote:
You'll have to consider what you want to use your new trimmed-down 6502 for, and what you're willing to give up to gain simplicity. If the instruction set is too simple, a program will require a lot more instructions to get the job done. Will a program no longer fit in the memory space that the number of address bits allows? What do you plan to do for an assembler?
at the moment, my "6404" 4b 6502 hybrid, is going on a 6522 chassis to run it as a DMA controller, ISAC. similar 6502 Assembly is good, eventually Id like it to use the same assembler and compiler as a 6502, c65, or common 6502 assembler I can modify. for these othe chips. the same, similar ASM and features. though not 'bit compatible' with 6502/65816, it should be 'familiar'. my tiny 4b can: run a clock, microwave, dma controller maybe. iits somewhere between a: 4004, am290x, mc14500b, mc10800, and a 6502. The program size/instruction length balance is a concern. One thing we are doing is keeping 'address modes'. so 4b instruction, gets a 2b length, to determine final address or operand sizes. we ditched BCD, so we can use that bit for R, and control shifts to rotates, invert logic operations etc. I want to be turing complete., and maintain a lot of usefulness. Im adding a better comparator. Whats it for... to program 4b devices, or 8b, or 16b. or whatever, using a 'familiar assembly and style' to 6502 development, Im trying to broaden the use of the architecture overall, and explore different directions... Ill tell you about my matrix math 4x4 swap-shift register module later
Quote:
Do you plan to run any languages that are higher level than assembly language? I imagine you could probably come up with a pretty well defined model and then experiment with writing programs for it, and maybe have others experiment with it too, before hammering out the details of how to implement your envisioned registers and instructions in the programmable logic.
BDD and other have mentioned using ATF150x cplds, so I think this is certainly feasible, I may need 256 instead of 128 macrocells in the end, however, dropping to 4b here is usefull, if it can take 'similar or a subset of 6502 and 6522 opcodes/function'
Quote:
I'm not too happy with my post, but maybe it'll stir some ideas.
plasmo wrote:
Of all the things engineers do, being simple is the hardest; it takes the most efforts, times and iterations. It is the hallmark of true engineering to make it look simple.
Bill
Bill
Language Possibilities
m4, datalog, SQL, Gremlin, mostly ASM, or extended ASM, I checked the specs and it looks like this device is more than capable of running "C", though conforming to standard C type definitions (short, long, double, float) would be very slow. Unless I put a long stack, Forth and several other languages are kinda 'nixed'.
However, register F (1111) is a stack-offset, and is vector assignable to *any* legal address, so you can use 16 however many levels in RAM, until you run out. you can Shift or Store to this Register, and this relocates the stack pointer, it can be manipulated like ay other register. this means its either moving the the register offset, or shifting the position of the stack pointer relative to the Register Vector, so it might be 'any number of legal stacks', that are '16 levels deep',
or possibly, its a 4b stack, and its 64, or 256 levels deep.
(Im not a fan of RPN, which explains why im not drawn to Forth,
however I am interested in a largish Heap or other dictionary structure. Im already planning on vectored registers)
so here is the Doc for the 6404. its not close to being a 'draft' yet. still, i see where im going some.
https://docs.google.com/document/d/1cMI ... oCf9Tpj00/
6404 :: 4b mpu-like 6502, ill probably use this.
6303 :: 8b stripped 6502, similar to the cpld linked here i probably wont make this, no need now
16x504 :: 16x 6404s in a 4x matrix that can be linked using bitslice carry and look ahead circuits for variable use processing. runs a 6502/65816 similar ASM. will run 6404 code. this has the 4x4 swap-shift matrix module. fast, hardware, 4x4 matrix operations on a 4b value, so integers, or offsets to a ram look-up table of the contents.
Re: a 6303 and 16x504 -- the minimalist 6502 models
Here, I think I have all commands from MOS6502 represented here, through the use of control and ALU flags (which gets set from the new comparator) too
you can perform each 6502 command, give or take, 4bs at a time (possibly 8/b with register chaining/echoing, if i get that to work),
its going to use the B bit more, and Im leaning towards using the X bit, to address 'inner registers and vector tables'
yellow shows how orthogonal register addressing is used, TAB/TBA/TBS etc, is performed using OPCODE+MOD+SRC+DES and is simply a LOAD/STORE sequence.
SWAP is done in software using XOR fast Swap, or a vectored register maybe. thats more likely to cause a collision though.
this means that MOS6502 ASM should 'tran-ssemble' or cross compile down to, 6404 asm thats more expanded. it would not be optimized, it does mean that 6404 ASM is more and more a subset of 6502 ASM
oh, one "language" I would want a Compiler for, is VHDL, either to print a object file, a netlist intermediary file, a vhdl interpreter, to parse vhdl files (another language here is g-code), and for the vhdl compiler, to use the general purpose language features, to compile a executable for the 65xx family computers. vhdl is a general purpose language.
I checked the C documents, this computer chip can support a very slow, minimal C environment.
you can perform each 6502 command, give or take, 4bs at a time (possibly 8/b with register chaining/echoing, if i get that to work),
its going to use the B bit more, and Im leaning towards using the X bit, to address 'inner registers and vector tables'
yellow shows how orthogonal register addressing is used, TAB/TBA/TBS etc, is performed using OPCODE+MOD+SRC+DES and is simply a LOAD/STORE sequence.
SWAP is done in software using XOR fast Swap, or a vectored register maybe. thats more likely to cause a collision though.
this means that MOS6502 ASM should 'tran-ssemble' or cross compile down to, 6404 asm thats more expanded. it would not be optimized, it does mean that 6404 ASM is more and more a subset of 6502 ASM
oh, one "language" I would want a Compiler for, is VHDL, either to print a object file, a netlist intermediary file, a vhdl interpreter, to parse vhdl files (another language here is g-code), and for the vhdl compiler, to use the general purpose language features, to compile a executable for the 65xx family computers. vhdl is a general purpose language.
I checked the C documents, this computer chip can support a very slow, minimal C environment.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: a 6303 and 16x504 -- the minimalist 6502 models
wayfarer wrote:
GARTHWILSON wrote:
wayfarer wrote:
agreed, I'm interested in doing further research myself. I'm curious on if you think a stack should be wide or deep.
if a 4b mpu, had a 16-layer stack, would that be sufficient?
if a 4b mpu, had a 16-layer stack, would that be sufficient?
I was slightly saddened when NSC dropped their line of 4-bit microcontrollers. Now many years later, there probably wouldn't be any cost benefit to 4-bit over 8-bit; but I was imagining applications that could be satisfied just fine with 4-bit.
My HP-71 hand-held computer, whose BASIC is far better than any other BASIC I've ever seen anywhere, especially with the LEX (Language-EXtension) files contributed by the users' groups, and which I described a little bit of here, has a 4-bit data bus, 20-bit address bus, and 64-bit registers. I have about 350KB of memory in mine, half RAM and half ROM.
Quote:
Quote:
The other processor I'm super familiar with, having brought a dozen products to market with it, is the one in the PIC16F microcontrollers. (The last one was a few years ago, and I have not used the PIC16F1's which are apparently slightly better.) These don't have index registers and the luxury of the 6502's addressing modes. SNIP ... /SNIP We tend to think and plan in terms of things we've already done and are familiar with; and I suspect you're still not thinking in terms of a virtual stack for data (which can contain any data, including cells that may be calculated addresses) which is separate from the return stack that's used for subroutine return addresses.
Quote:
(I'm not a fan of RPN, which explains why I'm not drawn to Forth
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: a 6303 and 16x504 -- the minimalist 6502 models
GARTHWILSON wrote:
wayfarer wrote:
I'm not a fan of RPN, which explains why I'm not drawn to Forth
- Copy augend to accumulator #1.
- Copy addend to accumulator #2.
- Call function that adds the accumulators.
- Replace augend with sum.
RPN is “natural” in computers due to the MPU having no notion of algrebra or much else of anything to do with math, other than the basic functions wired into the MPU’s instruction set.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: a 6404, 6303 and 16x504 -- the minimalist 6502 models
I started early with both Sinclair and HP RPN calculators, and FORTH, so RPN often makes more sense to me than braces and arithmetic precedence rules... which are a pain to implement in logic.
The arithmetic precedence rules enforced in Neolithic Tiny Basic are implemented by recursive descent into the expression; there is never a need to hold a pile of intermediate results other than on the stack. RPN is an obvious fit for a stack based language.
Neil
The arithmetic precedence rules enforced in Neolithic Tiny Basic are implemented by recursive descent into the expression; there is never a need to hold a pile of intermediate results other than on the stack. RPN is an obvious fit for a stack based language.
Neil