dilettante wrote:
Ohhh!
Well in that case...
Because another arrangement would be goofy, like driving on the left side of the road?
In other words, it was an arbitrary decision.
I do not think it was arbitrary. I've been in the industry long enough to know that few things in the electronics industry are arbitrary, especially when low gate count is critical (remember the 6502 was introduced at a *very* low price-point back when it was first released: $25 per. It's nearest competitors at the time were the Z-80 and 6800, which cost in excess of $200 per for each).
The art of optimization is just as valuable from a hardware perspective as it is from a software perspective. Minimization and re-use are patterns which quite often *seems* arbitrary, but in reality, isn't (e.g., the advantages of using little-endian numbers over big-endian numbers in an 8-bit CPU).
Some examples:
* It is easier to buffer a signal using an inverter (one or two transistors) instead of an active-high buffer (two to four transistors, or basically, two inverters strung together). Besides being faster, the fact that the _IRQ or _NMI signal is asserted ('0') and internally buffered using an inverter (thus making it a '1') means that address bits A15-A3 do not have to have a hard-coded storage when accessing a vector. It doesn't sound like much, but when chip real-estate is at issue, it can make or break meeting cost requirements. This can explain why ROM was to be placed in high memory instead of low memory, as with the Z-80.
* Placing direct page at $0000-$00FF in the 6502 enables the use of the already existing ALU to perform effective address calculations *as the opcode's operands are being fetched from the program source*, including carry generation and propegation.
* The 6502 pushes PC+1 onto the stack in response to a JSR, instead of PC+3 like most other CPUs do. Why? Because at the time the JSR microcode is fetching its operand, PC+1
is already in the PC register; hence, it can be stacked as is with no wasted cycles to calculate PC+3. A RTS instruction works by first popping the low-byte, and adding 2 via the ALU. As you might expect, little-endian number format enables the carry-over into the next byte (upper 8-bits of PC). Thus, absolutely zero "wait-states" exist when making a subroutine call.
* BRK pushes PC+2 onto the stack for two reasons: one is that the numeric constant '2' is already a hard-wired input into the ALU circuitry ('1', at least
originally, was not, as INX/INY/DEX/DEY appear to be implemented directly by the X and Y registers as up/down counters. The 65816, I know, has a memory-direct INC/DEC instruction, but I'm not sure if the
original MOS 6502 had them. I know the 6510A certainly did, though). Another is that the overwhelming majority of 6502 opcodes are two-bytes long, and with BRK representing the "breakpoint" instruction, it was felt that it would be the most convenient method of handling breakpoints. Well, it turns out that it's not that useful in practice. But the 65816 engineers found another really nice application for that "lost byte": using BRK as a common access point for an operating system, that byte can specify one of 256 OS services (e.g., the 6502 and 65816 have 2 hardware interrupts, but 256 "software" interrupts).
* The original 6502 lacked a ROR instruction, as the same effect could be implemented with a suitable number of ROL instructions. Again, saving hardware resources. (This turned out to be such a bad decision in retrospect that other "6502" implementations implemented a ROR instruction; of course, the CPUs from WDC now support that ROR instruction.)
There are other examples of hardware optimizations that are programmer visible, but it's already 3:00AM here, and I'm tired. So off to bed I go. I hope this clears up some of the mysteries of why the 6502 is the way it is.