The 65M202 road map that could have been
Posted: Wed Jun 29, 2022 10:26 pm
The multitude of alternative 65X02s here are looking at what might be now, if only we add this feature or that. My own rabbit hole dive into the 6502 comes from the other direction. What could have been if the computer companies had pushed MOS/WDC/Rockwell/etc. to incrementally improve the 6502 year by year.
To ground this hypothetical in a believable history, lets assume Motorola had said yes to Chuck Peddle instead of no, and that the Motorola 6502 launched in 1975, at $25. Let's assume it was an even bigger hit than the MOS 6502 given Motorola's bigger brand. Let's assume that because of that success, the 68000 project was pushed back a few years, leaving Apple, Commodore, and others with no shiny 32-bit processor to jump to, and instead pushing Motorola to make incremental updates to its 65k line of chips.
The last assumption to this what if is cultural. Back in the 1970s and 1980s the tech industry deeply embraced leapfrog step changes and waterfall design, whereas after the dot-com bubble of the 1990s the tech industry embraced incremental and frequent changes. In the actual Apple timeline, this is evident from the ][ to the /// to the Lisa/Mac. So what if instead the /// was aimed to be a better-but-backward-compatible ][, and the Apple /V or V or V/ still based on the 6502 architecture was the first with windows and a mouse? What would that 65XXX02 look like and what would be the incremental 65X02, 65XX02, etc. chips leading there?
In thinking through this road map, I'm trying to keep to the (mostly) unwritten philosophy of Peddle and Mensch, "just enough, and simple enough."
24 bit addresses
First and foremost, what led me to email Bill Mensch was the question of why it took so long to get more than 16-bit addresses. By 1980 the Apple ][ was shipping with 48k and 64k was not uncommon. Anyone at Apple or Commodore or Intel could have plotted out standard memory sizes in 1977, 1978, and 1979 along with average memory prices and predicted that computers would have more than 64k in the 1980s.
So why wasn't there a 652402 by 1978 with a 24 bit address bus?
The "just enough" first step is to keep the 6502 as 8-bit, with the only change being a 24-bit instruction register. There is space in the opcodes for adding LDA24, ADC24, SBC24, AND24, OR24, EOR24, CMP24, BIT24, INC24, DEC24, ROL24, JMP24, JSR24, and RTS24 all with 24-bit variations. Best of all, it's 100% compatible with the 6502, as all the original opcodes can be unchanged, the lower 64k is unchanged, and the only edge case is ensuring new codes doesn't JSR to old code that will use the old RTS, popping a two byte return address instead of three.
Yes, this change requires a DIP-48, and thus breaks the then-industry desire to stop at 40 pins, but eight more pins in exchange for 16MB of flat address space is a good trade off. 16MB was enough to last into the early 1990s, and millions of coding hours would have been saved by the flat address space instead of all the segmentation nonsense.
Multiple of 8
Once we had 24 bit addresses, perhaps that would have woken up the possibility of using multiples of 8 rather than powers of 2 as we grew the rest of the CPU. Step two in my alternative road map would be a 6524M202 followed by the 6524M302 and 6524M402. These are 6502s with M2=16 bit, M3 = 24 bit, and M4 = 32 bit registers.
Again following "just enough and simple enough", these variations have no new registers, just A, X, and Y, and no new flags. Legacy 6502 and 652402 code would be unchanged as would the behavior of the old opcodes.
To implement the wider registers, I'd take a page from the Z80 and use prefix codes. No prefix and the opcodes are original. Prefix $FF and the immediates, offsets, etc. are 16 bit. Prefix $FE and they are 24 bit. $FD for 32 bit. Save the whole $Fx set of opcodes so this could grow to at least 64 bits and so other novel features can also use a one-byte prefix.
Using prefix codes eliminates any mode switching. Code that needs to process 8-bits at a time can still do that. The data bus could stay at 8-bits until the chip manufacturers moved on en masse from DIPs. External caches could deal with wider computer buses when they were necessary.
Faster interrupts
While quite a few suggested improvements include more registers, my suggestion is to replicate copies of A/X/Y/S/SP/PC. The first just/simple implementation of that is to have a separate set of registers for interrupts. When the 6502 jumps to an IRQ or NMI vector, the second set of registers are active. The RTI switches back.
This change alone seems too small to merit a chip redesign, but it's goes along with the next step in the roadmap, multi-threading.
Multi-threading
If the 6502 were to grow up incrementally as the main CPU family for the Mac, it would inevitably be looked at by the Unix workstation companies too, just as Sun followed the crowd to the 68000 before jumping on the RISC bandwagon with SPARC. Context switching with 32 registers is a very expensive operation. The one advantage of just A/X/Y/S/SP/PC is less context and that advantage only goes up when that context is all on-chip instead of having to be copied to/from the stack.
The 65Tn02 adds multi-threading with just a handful of new instructions. I can post to another thread (sic) if anyone is interested, but in short, a CPU has n copies of the registers. Thread #0 is the historic behavior, thus again not requiring any new modes or flags. Thread #n has its own stack at $n00. Thread n-1 is thread used for interrupts. Thread #1 is the scheduler. A new THR n opcode switches between threads. RFI switches back to the previous thread. WAI jumps back to thread 1 instead of hanging the CPU.
A few other opcodes are needed to setup the threads before they are started.
Through the 1990s
That seems sufficient for everything that happened in personal computers in the 1980s and 1990s, at least for opcodes. Given the 16MB flat memory space, there would have still been the need for MMUs and if the 8-bit bus survived that long, no doubt we would have seen caches too.
On this alternative timeline, I'm curious how hard Motorola would have pushed its 65Mn02s on clock rate. Given the simplify of the core, and the simplicity of these proposed augmentations, I'm suspecting this 65k line of chips could have led not just on overall efficiency but also could have been optimized for clock speed, leading the race for MIPS too.
TL;DR: if only.
To ground this hypothetical in a believable history, lets assume Motorola had said yes to Chuck Peddle instead of no, and that the Motorola 6502 launched in 1975, at $25. Let's assume it was an even bigger hit than the MOS 6502 given Motorola's bigger brand. Let's assume that because of that success, the 68000 project was pushed back a few years, leaving Apple, Commodore, and others with no shiny 32-bit processor to jump to, and instead pushing Motorola to make incremental updates to its 65k line of chips.
The last assumption to this what if is cultural. Back in the 1970s and 1980s the tech industry deeply embraced leapfrog step changes and waterfall design, whereas after the dot-com bubble of the 1990s the tech industry embraced incremental and frequent changes. In the actual Apple timeline, this is evident from the ][ to the /// to the Lisa/Mac. So what if instead the /// was aimed to be a better-but-backward-compatible ][, and the Apple /V or V or V/ still based on the 6502 architecture was the first with windows and a mouse? What would that 65XXX02 look like and what would be the incremental 65X02, 65XX02, etc. chips leading there?
In thinking through this road map, I'm trying to keep to the (mostly) unwritten philosophy of Peddle and Mensch, "just enough, and simple enough."
24 bit addresses
First and foremost, what led me to email Bill Mensch was the question of why it took so long to get more than 16-bit addresses. By 1980 the Apple ][ was shipping with 48k and 64k was not uncommon. Anyone at Apple or Commodore or Intel could have plotted out standard memory sizes in 1977, 1978, and 1979 along with average memory prices and predicted that computers would have more than 64k in the 1980s.
So why wasn't there a 652402 by 1978 with a 24 bit address bus?
The "just enough" first step is to keep the 6502 as 8-bit, with the only change being a 24-bit instruction register. There is space in the opcodes for adding LDA24, ADC24, SBC24, AND24, OR24, EOR24, CMP24, BIT24, INC24, DEC24, ROL24, JMP24, JSR24, and RTS24 all with 24-bit variations. Best of all, it's 100% compatible with the 6502, as all the original opcodes can be unchanged, the lower 64k is unchanged, and the only edge case is ensuring new codes doesn't JSR to old code that will use the old RTS, popping a two byte return address instead of three.
Yes, this change requires a DIP-48, and thus breaks the then-industry desire to stop at 40 pins, but eight more pins in exchange for 16MB of flat address space is a good trade off. 16MB was enough to last into the early 1990s, and millions of coding hours would have been saved by the flat address space instead of all the segmentation nonsense.
Multiple of 8
Once we had 24 bit addresses, perhaps that would have woken up the possibility of using multiples of 8 rather than powers of 2 as we grew the rest of the CPU. Step two in my alternative road map would be a 6524M202 followed by the 6524M302 and 6524M402. These are 6502s with M2=16 bit, M3 = 24 bit, and M4 = 32 bit registers.
Again following "just enough and simple enough", these variations have no new registers, just A, X, and Y, and no new flags. Legacy 6502 and 652402 code would be unchanged as would the behavior of the old opcodes.
To implement the wider registers, I'd take a page from the Z80 and use prefix codes. No prefix and the opcodes are original. Prefix $FF and the immediates, offsets, etc. are 16 bit. Prefix $FE and they are 24 bit. $FD for 32 bit. Save the whole $Fx set of opcodes so this could grow to at least 64 bits and so other novel features can also use a one-byte prefix.
Using prefix codes eliminates any mode switching. Code that needs to process 8-bits at a time can still do that. The data bus could stay at 8-bits until the chip manufacturers moved on en masse from DIPs. External caches could deal with wider computer buses when they were necessary.
Faster interrupts
While quite a few suggested improvements include more registers, my suggestion is to replicate copies of A/X/Y/S/SP/PC. The first just/simple implementation of that is to have a separate set of registers for interrupts. When the 6502 jumps to an IRQ or NMI vector, the second set of registers are active. The RTI switches back.
This change alone seems too small to merit a chip redesign, but it's goes along with the next step in the roadmap, multi-threading.
Multi-threading
If the 6502 were to grow up incrementally as the main CPU family for the Mac, it would inevitably be looked at by the Unix workstation companies too, just as Sun followed the crowd to the 68000 before jumping on the RISC bandwagon with SPARC. Context switching with 32 registers is a very expensive operation. The one advantage of just A/X/Y/S/SP/PC is less context and that advantage only goes up when that context is all on-chip instead of having to be copied to/from the stack.
The 65Tn02 adds multi-threading with just a handful of new instructions. I can post to another thread (sic) if anyone is interested, but in short, a CPU has n copies of the registers. Thread #0 is the historic behavior, thus again not requiring any new modes or flags. Thread #n has its own stack at $n00. Thread n-1 is thread used for interrupts. Thread #1 is the scheduler. A new THR n opcode switches between threads. RFI switches back to the previous thread. WAI jumps back to thread 1 instead of hanging the CPU.
A few other opcodes are needed to setup the threads before they are started.
Through the 1990s
That seems sufficient for everything that happened in personal computers in the 1980s and 1990s, at least for opcodes. Given the 16MB flat memory space, there would have still been the need for MMUs and if the 8-bit bus survived that long, no doubt we would have seen caches too.
On this alternative timeline, I'm curious how hard Motorola would have pushed its 65Mn02s on clock rate. Given the simplify of the core, and the simplicity of these proposed augmentations, I'm suspecting this 65k line of chips could have led not just on overall efficiency but also could have been optimized for clock speed, leading the race for MIPS too.
TL;DR: if only.