Mike Naberezny wrote:
Related to Motorola, I've thought several times that it would be handy if the 6502 had a 16-bit index register like on the 6800. Since the original 6502 team came from that project, they may have considered it.
There is no doubt in my mind that they deliberately decided to do that differently from the 6800. There are plenty of things they took wholesale from the 6800 (the bus protocol, for example), and pretty much everything they did differently seems to have a pretty clear advantage, if you look for it.
The index register design in the 6800 and the 6502 is an interesting topic, with many subtleties. A sixteen-bit index register is handy, but start doing any significant amount of programming on the 6800 and the problems, more to do with their implementation than their design, become quite apparent, at least when you're programming a general purpose computer rather than a microcontroller for an embedded system.
In particular, with just one index register, in general purpose programming you're doing a lot of loads and stores of that register. It's made worse by not having PSHX or PULX on the 6800/6802/6805 (it was added to the 6801/6803, 6809 and 68HC11), so you always have to have a location available for this, which can get tricky with re-entrant code.
Further there's a performance issue that's not a problem with the architecture but the implementation: when indexing through that register you must always supply a constant 8-bit offset.
LDAA (X) can in theory be a very fast two-cycle operation: instruction read followed by read at X. But between having to read another byte for the offset, add it to X, and not short-cutting the add when there's no carry, it ends up always being 5 cycles instead. (No-offset addressing was added to the 6805, 6809 and 68HC11.) INX also always requires 4 cycles, though in theory a increment when there's no carry could be done in two (or perhaps even one) cycles. This entirely unnecessary inefficiency in the implementation will have implications for the 6502's competitiveness, as we will see below.
The 6800 designers, with their focus on constant-offset indexed accesses, clearly had "fixed structure access" on the brain. This is convenient at times: point X at a structure and you can easily grab whichever bytes from it you need without any math. I've not found this to be hugely significant in the programs I've written, though. Perhaps it's used a lot more in microcontroller applications, since they didn't even chose to add a no-offset index mode when making additions to the instruction set for the 6801. (And they made some pretty expensive ones, such as adding a multiply instruction, which apparently GM really, really wanted for some reason.)
The obvious fix for many of the 6800 load/store issues is to add a second index register, which is exactly what happened with the 68HC11 and 6809. (The 6809 actually has a third index register, the "user stack," as well, and indexing from the stack, and addressing modes out the wazoo such that there are basically no issues at all with addressing anything however you like without going to any extra effort.) That they didn't add this to the 6801, despite other expensive changes (see "multiply," above) also might point to the single-index-register problem not being so much of a problem in MCU code.
Now the 6502 has
two index registers, which you might think of as the solution to the 6800's "one index register" problem, but that's actually not it at all. The 6502's solution works just as well with a single "index" register (which, since it adds an offset, might more properly be termed an "offset" register when comparing it to the 6800's X or 8080's HL registers) because the actual index that would be stored in a register in the 6800 or 8080 is stored in memory instead. Consider that in your programming it's almost certain that you do a lot of
LDA ($02),Y / STA ($04),Y but very little similar stuff where one operation involves the Y register and the other involves the X register.
Effectively what the 6502 does here is substitute a zero-page memory location for the 6800's X register, and remove the constant offset, adding instead a variable offset from the Y register. This is rather inefficient, taking 5 or 6 cycles to load through the index, but if you remember what I said above about the 6800's inefficient implementation, you'll see that this is perfectly competitive with the 6800. It's interesting to wonder if MOS would still have taken this approach if the 6800 had had a two-cycle load through the index register, as I reckon it easily could have.
The 6502 does lose when you need to increment the 16-bit index, taking at least seven cycles for
INC $02 / BCC when there's no carry (and much more when there is), but this is also mitigated with the offset in Y, which can be incremented in two cycles, saving you from incrementing not just your source pointer but also your destination pointer when doing a copy. This does make for extra work when you need to be able to copy more than 256 bytes, but there are plenty of situations where you don't, and even when you do it still works out faster (though using more code space) overall, even when compared to a theoretical two-index-register copy on a 6800. Even in practice, on the 68HC11 which as two index registers and less cycle inefficiency, it's still 14 cycles for
LDAA 0,X (4) / STAA 0,Y (4) / INX (3) / INY (3), as opposed to 12-14 on the 6502.
But you'll note that there's something missing above, which is the loop's branch instruction. That brings us to yet another bit of 6502 cleverness for short (≤256 byte) copies: you can work
towards zero rather than away from it (copying backwards from the top or copying forwards by setting your index to 256 bytes before the end of your copy range) and that means you don't need a compare instruction before the branch to detect the end: you simply DEC or INC and BNE to stop when it hits zero. With the 6800 family you need either to do a 16-bit compare of an index register or, more often, keep a separate count (usually in accumulator B) that you decrement (which also avoids the compare, but that's now a third thing to decrement, except on the 6809 where the index registers can auto-increment or -decrement).
In all of this we've been using only one index register on the 6502. So why did Peddle & Co., who were clearly
very concerned about saving transistors and die space wherever possible, have two?
One argument might be that they have different addressing modes: indexed indirect (
LDA ($02,X)) versus indirect indexed (
LDA ($02),Y). But on second glance that looks as easily implemented with one register instead of two: the instruction bit you decode for the addressing mode need not also specify one of two registers, rather than specifying one of two modes for a single register. (And this, by the way, is probably why the modes are fixed for each register: being able to specify the addressing mode separately from the register to use would require another bit in the instruction and more decoding. We've already seen in previous messages that they were willing to give up BRA and BSR to avoid this.)
But this brings us to another feature of the 6800 that the 6502 decided to drop: having two accumulators, A and B. Or did they? As it turns out, on the 6800 the B accumulator is mostly (at least in my experience) used for counts, where you load it with your count, decrement it as you go through your loop, and exit the loop when it hits zero, for quick temporary storage of values in A, and for passing parameters to subroutines. As it turns out, the X and Y registers in the 6502 also work great for this, and when you're doing indexing you're usually using only one, leaving the other free for these purposes. And I'd be willing to bet that, as well as being more efficient for instruction decoding, re-using existing chip structures for Y for X as well is more efficient than adding another somewhat different B register.
Note that the speed and instruction decoding issues come up again when doing the obvious "add a second index register" modification, too, which may be why Motorola delayed so long in doing that. The 68HC11 IX register works just as it does in the 6800 (but faster); using the IY register requires a prefix byte due to lack of space in the opcode set.
There's surely more along these lines, but this should give you an idea of how intertwined all these considerations are, and how a good deal of cleverness can give you most of the same thing at considerably less cost, especially when your competitor has made a few key mistakes that make their design needlessly less efficient.
And these are the sorts of things I keep in mind when I ask myself, "What would I change about the original 6502 design?" It turns out that, when you really dig into it, there's often a lot less than you'd think.
[quote=BigDumbDinosaur]And your point is...what? My example was meant to highlight a feature if the 65C816. How did the 6809 get into the discussion?[/quote]
Looking at later improvements to CPU family architectures, such as the 6800 family (particularly the 6801, 68HC11 and, because it's a related architecture even if not binary compatible, 6809) can often give insight into the long-term potential of particular architectural designs. So, for example:
Quote:
I think you missed something from my example. Once I’ve pointed direct page to the stack, I can use stack RAM to set up pointers and then do something like LDA [$00],Y, which conceptually is LDA [(SP+1)],Y.
No, I did not miss that. My point is that while you get cute little things like this with the 6502 architecture, you're getting it not because it's particularly useful but because it just sort of falls out of the core design where pointers are stored in RAM, rather than index registers, and the index registers are used for offsets. I'd argue that while this was
great in 1975 and for up to a decade thereafter, it soon showed its age and infelicity as CPUs grew to have more registers and more addressing modes at low cost. I'd argue that this is one of the reasons that the 6502 architecture, at least in PCs, just sort of faded away through the '80s, rather than continuing to compete with, e.g., the 8086 design from 1978 or the 68000 design from 1979. Adding 24-bit addressing alone was not enough: the core architecture just doesn't work so well when it becomes cheaper to make considerably larger CPUs.