6502.org

Posted: **Wed Jun 29, 2022 10:26 pm**

The multitude of alternative 65X02s here are looking at what might be now, if only we add this feature or that. My own rabbit hole dive into the 6502 comes from the other direction. What could have been if the computer companies had pushed MOS/WDC/Rockwell/etc. to incrementally improve the 6502 year by year.

To ground this hypothetical in a believable history, lets assume Motorola had said yes to Chuck Peddle instead of no, and that the Motorola 6502 launched in 1975, at $25. Let's assume it was an even bigger hit than the MOS 6502 given Motorola's bigger brand. Let's assume that because of that success, the 68000 project was pushed back a few years, leaving Apple, Commodore, and others with no shiny 32-bit processor to jump to, and instead pushing Motorola to make incremental updates to its 65k line of chips.

The last assumption to this what if is cultural. Back in the 1970s and 1980s the tech industry deeply embraced leapfrog step changes and waterfall design, whereas after the dot-com bubble of the 1990s the tech industry embraced incremental and frequent changes. In the actual Apple timeline, this is evident from the ][ to the /// to the Lisa/Mac. So what if instead the /// was aimed to be a better-but-backward-compatible ][, and the Apple /V or V or V/ still based on the 6502 architecture was the first with windows and a mouse? What would that 65XXX02 look like and what would be the incremental 65X02, 65XX02, etc. chips leading there?

In thinking through this road map, I'm trying to keep to the (mostly) unwritten philosophy of Peddle and Mensch, "just enough, and simple enough."

24 bit addresses

First and foremost, what led me to email Bill Mensch was the question of why it took so long to get more than 16-bit addresses. By 1980 the Apple ][ was shipping with 48k and 64k was not uncommon. Anyone at Apple or Commodore or Intel could have plotted out standard memory sizes in 1977, 1978, and 1979 along with average memory prices and predicted that computers would have more than 64k in the 1980s.

So why wasn't there a 652402 by 1978 with a 24 bit address bus?

The "just enough" first step is to keep the 6502 as 8-bit, with the only change being a 24-bit instruction register. There is space in the opcodes for adding LDA24, ADC24, SBC24, AND24, OR24, EOR24, CMP24, BIT24, INC24, DEC24, ROL24, JMP24, JSR24, and RTS24 all with 24-bit variations. Best of all, it's 100% compatible with the 6502, as all the original opcodes can be unchanged, the lower 64k is unchanged, and the only edge case is ensuring new codes doesn't JSR to old code that will use the old RTS, popping a two byte return address instead of three.

Yes, this change requires a DIP-48, and thus breaks the then-industry desire to stop at 40 pins, but eight more pins in exchange for 16MB of flat address space is a good trade off. 16MB was enough to last into the early 1990s, and millions of coding hours would have been saved by the flat address space instead of all the segmentation nonsense.

Multiple of 8

Once we had 24 bit addresses, perhaps that would have woken up the possibility of using multiples of 8 rather than powers of 2 as we grew the rest of the CPU. Step two in my alternative road map would be a 6524M202 followed by the 6524M302 and 6524M402. These are 6502s with M2=16 bit, M3 = 24 bit, and M4 = 32 bit registers.

Again following "just enough and simple enough", these variations have no new registers, just A, X, and Y, and no new flags. Legacy 6502 and 652402 code would be unchanged as would the behavior of the old opcodes.

To implement the wider registers, I'd take a page from the Z80 and use prefix codes. No prefix and the opcodes are original. Prefix $FF and the immediates, offsets, etc. are 16 bit. Prefix $FE and they are 24 bit. $FD for 32 bit. Save the whole $Fx set of opcodes so this could grow to at least 64 bits and so other novel features can also use a one-byte prefix.

Using prefix codes eliminates any mode switching. Code that needs to process 8-bits at a time can still do that. The data bus could stay at 8-bits until the chip manufacturers moved on en masse from DIPs. External caches could deal with wider computer buses when they were necessary.

Faster interrupts

While quite a few suggested improvements include more registers, my suggestion is to replicate copies of A/X/Y/S/SP/PC. The first just/simple implementation of that is to have a separate set of registers for interrupts. When the 6502 jumps to an IRQ or NMI vector, the second set of registers are active. The RTI switches back.

This change alone seems too small to merit a chip redesign, but it's goes along with the next step in the roadmap, multi-threading.

Multi-threading

If the 6502 were to grow up incrementally as the main CPU family for the Mac, it would inevitably be looked at by the Unix workstation companies too, just as Sun followed the crowd to the 68000 before jumping on the RISC bandwagon with SPARC. Context switching with 32 registers is a very expensive operation. The one advantage of just A/X/Y/S/SP/PC is less context and that advantage only goes up when that context is all on-chip instead of having to be copied to/from the stack.

The 65Tn02 adds multi-threading with just a handful of new instructions. I can post to another thread (sic) if anyone is interested, but in short, a CPU has n copies of the registers. Thread #0 is the historic behavior, thus again not requiring any new modes or flags. Thread #n has its own stack at $n00. Thread n-1 is thread used for interrupts. Thread #1 is the scheduler. A new THR n opcode switches between threads. RFI switches back to the previous thread. WAI jumps back to thread 1 instead of hanging the CPU.

A few other opcodes are needed to setup the threads before they are started.

Through the 1990s

That seems sufficient for everything that happened in personal computers in the 1980s and 1990s, at least for opcodes. Given the 16MB flat memory space, there would have still been the need for MMUs and if the 8-bit bus survived that long, no doubt we would have seen caches too.

On this alternative timeline, I'm curious how hard Motorola would have pushed its 65Mn02s on clock rate. Given the simplify of the core, and the simplicity of these proposed augmentations, I'm suspecting this 65k line of chips could have led not just on overall efficiency but also could have been optimized for clock speed, leading the race for MIPS too.

TL;DR: if only.

Posted: **Thu Jun 30, 2022 12:21 am**

Welcome.

These discussions are always fascinating; and as programmable logic has advanced, some have indeed made their own processors, and I suppose the design process will continue to get easier, so more and more people will be carrying out their ideas. And just as custom PCBs' prices have fallen precipitously in the last decade or two, I hope too that custom ICs will someday come within financial reach of hobbyists, where you could email your design to a foundry, and get a package delivered to your door a week or two later with your new ICs. It would probably mean that expensive masks are no longer needed, just as many SMT assembly houses today no longer make a silkscreen for the solderpaste, but instead apply it with a process resembling ink jet, completely controlled by software.

After having worked with the 65816 a fair amount, I do like the way it does things with the 8- and 16-bit-wide register-size settings. It's not the burden that many anticipate it will be; in fact, I find I seldom need to switch. I address this and lots more, here. The 816's op-code table is full.

I believe the Z80 has a separate set of registers for interrupts; but the first time you have nested interrupts (which I do all the time, with NMIs interrupting the servicing of IRQs, and I have things in the works to go further), the benefit is gone. Perhaps you could do a stack of register sets within the processor.

The 68000 used a 64-pin DIP, and there was also a 48-pin DIP; so I'm not sure why they felt like they had to stay down to 40. The testing cost and failure rate increase with the higher pin counts, but they went to higher pin counts for upgraded processors anyway, and now they're out to somewhere around a thousand balls in a BGA.

One 32-bit quasi-65xx architecture from one of our forum members, Michael Barry, that looks good is his 65m32.

See Ed's "Index of threads for improved 6502 and derived architectures" topic too.

Posted: **Thu Jun 30, 2022 7:15 am**

There are always pros and cons, and compromises, and ultimately a new kind of product or project comes out when someone has the courage of their convictions, and makes the thing, and it's either found to have some useful tradeoffs, or it isn't...

And I am keen on exploring the envelope of possibilities surrounding the 6502 as-was, and I do find it interesting to think about alternative histories...

That said, a few thoughts...

The thing about supporting 24 bit addresses is that it's not just the PC which needs extending: in all probability you'd also want to extend absolute addresses and indirect addresses. If you do that in the simplest way, the usual story of using pairs of zero-page locations turns into a story of using sets of three. And your code gets bigger, and a bit slower, as some three byte instructions become four byte instructions.

One reaction to that is to support both 16 and 24 bit addresses... which uses more opcode space and takes more documentation, learning, and debugging. (Maybe not all instructions have to support both sizes?)

A different reaction is to extend page zero in some way to take the pressure off: Acorn used page 3 for the high bytes of their 24 bit addresses.

And another way is to put the high byte somewhere else, which is what the '816 does with its bank registers.

And indeed, one can get some way with prefix bytes, as noted.

But anyhow, it is interesting, and as Garth notes, it's not too difficult to build any extended 6502 that one can dream up, and see how it is to use. (Before building in hardware, it's even easier to rustle up a software model and the run the idea in emulation.)

Posted: **Thu Jun 30, 2022 7:57 am**

The moment we discard the idea that "bytes" must be eight bits (I blame IBM for this unfortunate convention), a new and beautiful universe of possibilities suddenly appears.

Posted: **Thu Jun 30, 2022 4:48 pm**

GARTHWILSON wrote:

Welcome.
I believe the Z80 has a separate set of registers for interrupts; but the first time you have nested interrupts (which I do all the time, with NMIs interrupting the servicing of IRQs, and I have things in the works to go further), the benefit is gone. Perhaps you could do a stack of register sets within the processor.

Nested interrupts?! I'll admit I didn't think of that. Mostly because, who on earth thinks nested interrupts is a good idea? If a system uses NMIs then this multi-threading idea could use thread n-1 for IRQ and n-2 for NMI. Then only the NMI thread could push the registers it needs onto its stack.

For the IRQ vector, it my understanding that the vector won't get called again until after the RTI finishes, no? Typically the interrupt handler loops through all possible interrupt sources, so if a system expects overlapping interrupts that loop doesn't RTI after any one interrupt is handled, but instead does the RTI after checking them all.

Posted: **Thu Jun 30, 2022 5:08 pm**

BigEd wrote:

The thing about supporting 24 bit addresses is that it's not just the PC which needs extending: in all probability you'd also want to extend absolute addresses and indirect addresses. If you do that in the simplest way, the usual story of using pairs of zero-page locations turns into a story of using sets of three. And your code gets bigger, and a bit slower, as some three byte instructions become four byte instructions.

There is room in the 6502 opcode space for one big feature. There is room for a second set of opcodes that are one byte longer, which means all the 16-bit addressing modes would still exist, thus making code running solely in the lowest 64K a little faster than code running in the rest of the 16M address space. What's nice about that is that it mirrors the one-byte-fewer of zero page addressing.

BigEd wrote:

One reaction to that is to support both 16 and 24 bit addresses... which uses more opcode space and takes more documentation, learning, and debugging. (Maybe not all instructions have to support both sizes?)

Back to the idea of foresight vs. hindsight and incremental changes possible in the 70s and 80s, I'm presuming backward compatibility would be a must. Thus any extension of features would fill up the "missing" opcodes and double the thickness of the manuals.

What I'm hoping to have accomplished with the 652402 design is to simply add a new, orthogonal set of addressing modes that look and feel like the originals, but with one extra byte of address length. E.g. LDA #$nn, LDA #$nnnn, and the new LDA #$nnnnnn. E.g. ADC ($nn),X, ADC ($nnnn),X, and a new ADC ($nnnnnn),X.

So yes, more opcodes and more complexity, but the coding patterns would be unchanged as there are no new registers, A/X/Y are no wider, etc.

BigEd wrote:

A different reaction is to extend page zero in some way to take the pressure off: Acorn used page 3 for the high bytes of their 24 bit addresses.

As long as the original opcodes go unchanged, then we not only get zero page with two bytes shorter addressing but also 16-bit addresses with one byte shorter than "normal".

What I didn't mention was the expectation that keeping the design simple would aid in speeding up the clock rate. A 6524C02 should be able to run at the same clock rate as a 65C02. Yes, it'll often take an extra byte of opcode fetch, adding 25% or 33% more reads, but that difference isn't seen if the clock rate goes from 14MHz to 18MHz.

What we know from history is that scaling up clock speeds was a big challenge in the 80s for the CPUs with hundreds of thousands of gates. A 6524C02 would have fewer than 20,000 gates. It thus, if it existed, should have been the fastest clocked chip on the market, if pressed by Apple, Commodore, etc. for speed.

Posted: **Thu Jun 30, 2022 11:17 pm**

65LUN02 wrote:

GARTHWILSON wrote:

I believe the Z80 has a separate set of registers for interrupts; but the first time you have nested interrupts (which I do all the time, with NMIs interrupting the servicing of IRQs, and I have things in the works to go further), the benefit is gone. Perhaps you could do a stack of register sets within the processor.

Nested interrupts?! I'll admit I didn't think of that. Mostly because, who on earth thinks nested interrupts is a good idea? If a system uses NMIs then this multi-threading idea could use thread n-1 for IRQ and n-2 for NMI. Then only the NMI thread could push the registers it needs onto its stack.

I put my 10ms interrupt for the realtime clock on NMI since I have it running by default, and since it's the only thing on NMI, there's no polling necessary, and in the admittedly rare scenario where an IRQ interrupt would take longer than 10ms to service, the clock won't miss any ticks. (It does time-of-day and calendar and alarm functions; but the more common use is for things like key debouncing, delay-before-repeat, key-repeat rate, and a display cursor blink rate which are kept constant regardless of other things the computer is doing at the same time, and for tasks in cooperative multitasking to see when it's time to do some operation again, plus other things.)

We're often told to reserve the NMI for something drastic like power going down; but in most systems the people on 6502.org are making, what happens in the last milliseconds before power is gone is of no concern. If you have a system that remembers things when it's off, it probably has batteries and can turn itself off in an orderly fashion. Otherwise, if you accidentally pull the power cord, there's no time to store anything useful on mass storage anyway.

Quote:

For the IRQ vector, it my understanding that the vector won't get called again until after the RTI finishes, no? Typically the interrupt handler loops through all possible interrupt sources, so if a system expects overlapping interrupts that loop doesn't RTI after any one interrupt is handled, but instead does the RTI after checking them all.

My workbench computer has more than two dozen interrupt sources, but I don't think I've ever had more than about three enabled at once. It would be very wasteful of computing time to check them all every time there's an interrupt like the Apple IIGS apparently did. It's appropriate to poll only the ones that are enabled, and poll them in order of urgency, or of frequency (ie, that if you know one interrupts far more often than the others, you poll that one first since it's most likely to be the one). I have it set up so the ISR installer arranges the polling in order of priority, and if you de-activate an interrupt, its polling is pulled from the list and the gap is closed up.

There may be a situation where an interrupt takes a long time (by ISR standards) to service, and you want to allow a higher-priority but quick-to-service thing to be able to cut in on the servicing of a longer, lower-priority one. In that case, the ISR turns off the first interrupt early and then clears the interrupt-disable flag long before it's done servicing the interrupt. One may argue that such long service should be left for the background program and that the ISR should only leave a message to tell the background program that it has to do something; but if the background program has to keep polling for the message, it might as well poll the hardware originator and not bother with the interrupt. That defeats the purpose of an interrupt though.

ISRs almost always have to save and restore the accumulator, and we could wish the '02 did that automatically in the interrupt sequence and the RTI. I suppose the reason it wasn't done was that it would increase the cycle count past 7 so the whole instruction-decode thing would have had to be increased substantially in size, making it harder to hit their price target. Status is of course automatically saved and restored. X and Y often don't need to be saved and restored.

BigEd wrote:

And another way is to put the high byte somewhere else, which is what the '816 does with its bank registers.

65LUN02, it's not clear whether you have delved into the 65816. It would undoubtedly have been designed different if Apple hadn't required that it be able to run legacy '02 software; but there's always the tradeoff between efficiency and having a larger continuous address space, and I think the '816 has a pretty good compromise. You can do long addressing when you need to, but most addressing will be in the current banks which you specify with the bank registers, making the instructions quicker and more compact. Yes, the '02 left a lot of openings available in the op-code table; but the '816 filled them all in with not just long addressing but also a lot of new instructions and addressing modes, making it able to efficiently do things the '02 couldn't do gracefully, or at all. I'm not saying everyone should necessarily flock to the '816, but rather that we can learn from it before designing upscale 65-family processors. The extra saving and restoring the '816 has to do for interrupts definitely increases the overhead beyond what the '02 needs. It still dramatically outperforms something like the 68000 in interrupt performance though.

Posted: **Fri Jul 01, 2022 11:41 pm**

GARTHWILSON wrote:

65LUN02, it's not clear whether you have delved into the 65816. It would undoubtedly have been designed different if Apple hadn't required that it be able to run legacy '02 software;

I hadn't seen teh 65816 until a few months ago. No offense to you, WDC, or Apple, but from reading through the datasheet and reading about how it's used, it looks like the offspring of a shotgun marriage between the 65C02 and 80286.

GARTHWILSON wrote:

there's always the tradeoff between efficiency and having a larger continuous address space, and I think the '816 has a pretty good compromise.

This is where we'll disagree. I suspect why you and WDC and others are fine with segments is that (after reading your website), your work is mostly with embedded systems. Bill Mensch's response to me is that the microcontroller market was (and still is) bigger than the microcomputer market.

My decades in tech were all spent with microcomputers, from Apple ][ to Mac then a side trip to PenPoint, General Magic, Palm, and Windows CE PDAs, then the first set of phones that ran apps, all with server-side software running in rack-mounted servers until the cloud hid all that complexity away.

The commonality in my work was graphical UIs. VGA had a 640x480, 16 color spec in 1987. The EO 440 tablet computer of 1993 had a 480x640 screen with 4 level of gray. Both of those required 76800 bytes of memory per screen buffer. That alone is more than 64K. Whatever Mac II I had in the early 90s had 32-bit color by then, and thus more than 2 banks of 64K memory just for the screen.

Sure, you can deal with that in banks, but if that was the better choice the 80386 wouldn't have moved to a flat memory model.

GARTHWILSON wrote:

Yes, the '02 left a lot of openings available in the op-code table; but the '816 filled them all in with not just long addressing but also a lot of new instructions and addressing modes, making it able to efficiently do things the '02 couldn't do gracefully, or at all.

Yes, I see that. My point in my post is a what-if question of how those opcodes would have evolved if, for example the Apple /// had been such a bit hit that the Lisa and Macintosh were built upon the success of the ][ and ///, with annual incremental iterations along the way. And in that hypothetical what if, what if Apple had purchased MOS instead of Commodore and thus what if Apple was driving the chip design, optimizing it for microcomputers instead of microcontrollers?

We know from the 2010s what that looks like for ARM-core SoCs. We know from the IIc and IIgs how Apple eventually shrunk the IIe down to a handful of custom chips. Imagine if Apple did that back in 1979.

Mike Markkula could have taken an hour in 1977 to predict the future size, cost, and speed of RAM into the 1990s, just as Gordon Moore took an hour in 1965 to predict the future density and cost of integrated circuits. I suspect someone at Motorola did that. I suspect that is how they managed to convince the decision makers to jump to 32-bit registers and 24-bit addresses in the 68000. To get that chip out in 1979 means that proposal got a green light in 1976 or 1977. To see how big a leap the 68000 was in the industry, https://en.wikipedia.org/wiki/Transistor_count shows the CPUs, the number of transistors, and the year of release, sorted by year.

As I said in my first post in this thread, I think in foresight Apple and others saw the 68000 as the leapfrog solution away from 8-bits and 64K. In hindsight I think they would have been better off with an incremental 65..02 path, but incrementalism wasn't the culture of the 70s and 80s and neither MOS nor WDC seemed to be competing using that strategy.

GARTHWILSON wrote:

I'm not saying everyone should necessarily flock to the '816, but rather that we can learn from it before designing upscale 65-family processors. The extra saving and restoring the '816 has to do for interrupts definitely increases the overhead beyond what the '02 needs. It still dramatically outperforms something like the 68000 in interrupt performance though.

I read most of your website in the last few days. You are clearly an '816 fan. That's fine. It does what you need in your work, which is why it exists and why it is still being purchased.

Interrupt performance is not a spec I've ever seen touted for a microcomputer, PDA, or smartphone. Just as cache size and virtual memory performance are not common specs for microcontrollers. There are two markets for CPUs and their optimization needs are different.

In all the code I or my team wrote over multiple decades, the total amount of assembly code in released products was measured in under 10 pages. Ease of writing and ease of maintenance was valued above speed for all but the critical loops, and even then, the critical loops were only optimized if the were noticeably slow. Speed to market of the products was much more important than the speed of the products.

I said I worked in the era of "every byte counts" and it did, but typically that was every byte sent on the wire (or wireless) and every byte of the data set. That every byte didn't include every byte of the code itself. In the early 90s, if it fit on a 800k floppy, that was sufficient. By the 00s it just had to fit on a CD ROM. If the code fit on one of those, then it fit into RAM on the computer it ran on.

This is why given a time machine back to 1977, I would visit Cupertino and convince Steve and Steve to ask for a 652402 with the only change being a flat, 24-bit address space. Put that in an Apple /// and Woz would have pushed for 560x192 with at least 16 colors, which requires 26K per screen buffer. The IIgs maxed out its graphics at 640×200, no doubt because its next to impossible to have a screen buffer that spans more than 64K on a CPU with a 16-bit address bus.

Posted: **Sat Jul 02, 2022 5:17 am**

65LUN02 wrote:

I hadn't seen teh 65816 until a few months ago. No offense to you, WDC, or Apple, but from reading through the datasheet and reading about how it's used, it looks like the offspring of a shotgun marriage between the 65C02 and 80286.

Unfortunately, WDC data sheets leave something to be desired in clarity, so you may be getting a distorted perception about 65C816 and how it is used. I'm not sure I agree with your shotgun wedding analogy, but then, I've been writing 65C816 assembly language code for a long time and know the MPU pretty well.

Quote:

GARTHWILSON wrote:

there's always the tradeoff between efficiency and having a larger continuous address space, and I think the '816 has a pretty good compromise.

This is where we'll disagree. I suspect why you and WDC and others are fine with segments is that (after reading your website), your work is mostly with embedded systems.

Except, the 816's architecture is not segmented in the same fashion as the 80286. Memory is flat when it comes to data fetches and stores. There is no Intel segment/offset malarkey—standard 6502 indirection idioms are used for data access over the 16 MB address space.

Quote:

Bill Mensch's response to me is that the microcontroller market was (and still is) bigger than the microcomputer market.

Which is the case once you get outside of the general-purpose computing market. The original market for the 6502 was supposed to be in areas such as process control and interfacing. It suddenly ended up in the general-purpose computing universe due to Wozniak's decision to use it in his new computer design.

Quote:

Sure, you can deal with that in banks, but if that was the better choice the 80386 wouldn't have moved to a flat memory model.

The 80x86 has more than segmented memory baggage to worry about. Its interrupt performance is abysmal and is not obvious only because of the gigahertz+ clock speeds now in use. On the PC hardware of the 1980s, it was quite obvious, especially with TIA-232 communications.

Posted: **Sat Jul 02, 2022 6:46 am**

It's always good to have some clarity about what we're talking about in a given thread.

I think there are two ideas here
- the minimal enhancement to 6502, followed perhaps by further increments
- the alternative history in which MOS belongs to a company which aims higher, in the computer market

I think I could warm to the idea of the 24 bit PC with 24 bit addressing modes in previously-unused opcodes. Backward compatibility is always a difficult one - it's not cost-free, it's another set of compromises.

It might be worth noting that RAM costs (and ROM costs) can become quite significant in larger machines - as such, the cost of the CPU isn't so dominant, and therefore the downward pressure to make a minimal CPU isn't quite so strong as it might have been.

But a smaller CPU is not just cheaper, it's also available at greater volume, so that's a win.

As for the 48 pin package, that's an option, but not the only option. The '816 relatively neatly multiplexes data and address to stay within 40 pins. And as a thought experiment, I do wonder about multiplexing high and low parts of the address, because DRAM is going to need to do that anyway. My feeling is that the necessary external logic isn't such a big problem as it trades off against a larger and more expensive package.

We believe the critical path - the speed-limiting path - of the 6502 was in the PC increment. Some care would be needed to make a 24 bit PC which can increment in a single cycle and still allow for ever-faster clocks. A path from 1MHz to 2, 3, and 4MHz would be needed, in the alternate history. Today, on an FPGA and possibly even in a multi-CPLD implementation, incrementing the PC would probably not be an important aspect.

Posted: **Sat Jul 02, 2022 5:58 pm**

Quote:

The moment we discard the idea that "bytes" must be eight bits (I blame IBM for this unfortunate convention), a new and beautiful universe of possibilities suddenly appears.

A 6502 extended to 12-bits would address memory issues.

The 6502 was designed like a scaled-down 6809. Scaling it up for micro-computer applications might have made it look much like a 6809.

Concerned about memory addressing, my 6809 core supports a 12-bit version allowing a 24-bit address bus.

I think modifying the 6502 for micro-computer applications is not as simple as it sounds and it would be a very high-risk project for a smaller company.
Using mainframe processors as a guide for a micro-computer type processor, may have bypassed the need for incremental improvements. It was already known roughly what a micro-computer processor would look like. I think incremental improvements are more necessary when one does not know where one is going. It may be that incremental improvements are not worth the cost when leaping improvements can be made. If a big leap was done then would it really be a 6502?

The '816 is a very good update and it is quite a bit larger than the '02. I think it addresses the need for more memory in a micro-controller environment.

The '816 has a 24-bit PC, the high order bits do not need to increment.

I like the idea of muxing the address high/low. I think up to a 30-bit address could be fit in a 40-pin dip, by muxing a0 to 14 with a15 to a29, then providing an high address enable strobe. Note the high address bits only need to be latched when they change. If code and data would fit into 32kB then no additional clocks would be required.

Posted: **Sun Jul 03, 2022 9:48 pm**

Rob Finch wrote:

Quote:

I think modifying the 6502 for micro-computer applications is not as simple as it sounds and it would be a very high-risk project for a smaller company.
Using mainframe processors as a guide for a micro-computer type processor, may have bypassed the need for incremental improvements. It was already known roughly what a micro-computer processor would look like. I think incremental improvements are more necessary when one does not know where one is going. It may be that incremental improvements are not worth the cost when leaping improvements can be made. If a big leap was done then would it really be a 6502?

I took a class in CPU design back in 1999 or 2000 as part of my Masters in Computer Science/Engineering. We covered pipelining, caches, virtual memory, multi-threading, multi-ALUs, out-of-order execution, and more. All of these were fairly new ideas in the CPUs of that era. What was eye opening is that all of these ideas had been implemented before, in the mainframe era or mini-computer era, just not integrated into one chip and at kilohertz or at most a few megahertz, not at 1 GHz or more, which was the speed we were looking at by 2000.

More importantly, you have to remember that the IBM 360 was based on core memory, and the PDP 8 had just 4K (words) of memory. The tradeoffs are CPU design are vastly different when your choices include pre-loading multiple K of opcodes or megabytes of caches or so many registers you start thinking about register windows. And that is before you start considering multiple whole cores, which wasn't yet considered reasonable twenty years ago.

My only experience designing CPUs was in that class, and a similar class in my undergraduate years a decade earlier. In both classes our homework was in Verilog, not wiring diagrams. I thus don't personally know the tradeoffs involved in gate or transistor layouts. By 2000 (and here in 2022) you can change a line in Verilog and a few minutes later have that change running on an FPGA for testing. Nothing like that was possible in even the 1980s for CPU designers.

This is why my alternative history is iterative. I'm imaging what Woz would do if Apple had purchased a 6502 design, license, and foundry. I can imagine him one weekend saying "how do I add a few more address lines?" Maybe that day it would be 18 bit or 20 bit instead of 24, but the process would the same. Start by adding more bits to the program counter (PC). That alone makes the die larger, which is fine, as step two is making the die larger to handle the extra output pins. (No, don't try and double up the pins, that takes a lot more transistors and effort and only makes designing the rest of the computer a lot more difficult.) The PC has an incrementor. That needs to grow in width. Then comes the hard part, adding JMP(L) and JSR(L), each with an extra byte of address, and RTS(L) that pops an extra byte. The logic for that get squeezed into the added width created by the extra pins, probably all on the right of the existing gates and wiring so as to minimize changes to existing opcodes.

If that could be taped out and tested (which was too expensive in the 70s and 80s, but trivial here in the 21st Century), I expect my mythical Woz to have all that completed in a long weekend. Once that was working, the second and final step would be to add the other opcodes for LDA, STA, ADC, SBC, etc. with longer addressing modes. Given all this work would have to be repeated if more addresses bits were added later, and given 8 bits at a time are read from opcodes, that is how I ended up with 24 bits as the next logical step up from the existing 16. The nice part about CPU design is that the first new bit is a lot of work but the next seven bits are trivial in comparison.

Going back to the list of known CPU changes, nothing on that list is as small a change as a wider address. A zero page cache would require more transistors than the whole 6502. A more pipelined processor would require re-doing the top two thirds of the CPU design. Nothing else on the list looks possible to do without re-doing the whole chip design. My proposed opcode-based multi-threading the one exception, as I can imagine a fourth block of logic under the registers and ALU with all the new register storage, with the old registers replaced by muxes drive by a new register holding the current thread number.

Ultimately, the path taken by history to iterate microcomputer CPU design was to get those CPUs up to a scale where the designs were in Verilog/VHDL, with automated layout and transistor-level simulation. The tradeoff is then what we see in software (often discussed on these boards) with waste from the automated tools that wouldn't be there if every transistor were placed by hand. Mensch talks about how he calculated the size of every transistor on the 6502. No one did that on the Apple M2. No doubt the M2 wastes thousands of 6502 equivalent transistors from its toolchain.

Anyhow, I'm off on a tangent. Yes, even in 1976 Woz knew what was needed for a microcomputer CPU. The 6800, 6502, 8080, and Z80 all met the needs of those times. None met the market needs by 1980. None of the chip suppliers chose the simplest path forward. Maybe the flaw was that none of the microcomputer companies knew the fancy techniques of the mainframe CPUs and thus none asked the CPU providers for those features? Or more likely, the 68000 looked like it was sufficient for the microcomputers of the 1980s and the 8-bit CPU providers were content with the microcontroller market.

Posted: **Mon Jul 04, 2022 12:19 pm**

barrym95838 wrote:

The moment we discard the idea that "bytes" must be eight bits (I blame IBM for this unfortunate convention), a new and beautiful universe of possibilities suddenly appears.

True enough that IBM suggested using 8-bits to a byte as covered in this video:

https://www.youtube.com/watch?v=ixJCo0cyAuA

Assigning blame solely on IBM does seem a bit harsh however. Granted, that was done before my time there, but I never saw any evidence or documentation where IBM forced the rest of the industry to adopt said convention. I would say the industry in general are to blame... "lemmings" come to mind

Posted: **Tue Jul 05, 2022 1:10 am**

It could be argued that the 65816 does have a flat memory model. If the programmer uses the long addressing modes exclusively then the address space is flat. The two byte addressing modes can be considered as optimisations of that flat address space, just like the single byte branches and direct pages are also optimisations rather than limitations.

There is a small wrinkle with this way of thinking regarding the program counter not crossing bank boundaries, but this is not a major issue to deal with.

The biggest change with the 68000 was the massive jump in registers, both in count and in size. A general purpose successor to the 65816 would probably need to go down this path, as did the x86 line of processors, adding registers and increasing their size with each generation.

Other performance improvements could be made having little impact on the instruction set, such as instruction pipelining, caching, virtual memory and increases in the size of the data bus.

Personally I would like to use a design that used a prefix byte to specify additional registers and widths rather than the processor flags we currently have.

Posted: **Tue Jul 05, 2022 2:56 am**

I found the answer to my original question... why didn't Apple push for an improved 6502.

Turns out Motorola wasn't happy about losing all those sockets to the 6502, and thus begat the 68000 as not just a competitor to the 8086, Z80, etc., but purposefully as a leap to 32-bit to be better than everything else. 32-bit opcodes and registers, so that the architecture would last. 16-bit ALU so that the chip would be a reasonable size. 24 address pins to keep the pin count down.

That design then won every CPU competition in the next decade. Apple, Commodore, Atari, Sun, HP, Apollo, Silicon Graphics. Except for the IBM PC (which would have gone to the 68000 too if only enough test chips were ready in 1979), the 68000 was the go-to chip of the 1990s, only in the 1990s to be replaced by the wave of RISC chips, as the 68000 turned out to be too difficult for Motorola to upgrade.

The other big reveal in the interview is how much Apple paid for the 68000 chips in the Macintosh. Turns out Steve Jobs promised an order of 1 million chips, at a price of $15. The 6502's list price was $25. No doubt Apple eventually paid less for that chip too, but the list price of the 68000 was around $175, so $15 was an incredibly bargain. A price explains the tight partnership between Apple and Motorola.

The transcript of the interview I found is at https://www.computerhistory.org/collect ... /102658164 and the video of that interview is on my blog post, https://www.lunarmobiscuit.com/the-need ... -haystack/.

6502.org

The 65M202 road map that could have been

The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been

Re: The 65M202 road map that could have been