My virtual QUART driver makes extensive use of (<dp>,X) addressing. In fact, implementing an efficient driver, one that could support 115.2 Kbps on all four channels, coming and going, would have been quite difficult without (<dp>,X).
What would you change in 1974 with Mensch & Peddle
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: What would you change in 1974 with Mensch & Peddle
jgharston wrote:
Yes, I'd have (S,n) instead of (X,n) addressing mode. In 40+ years I think I've only ever used (X,n) once...
My virtual QUART driver makes extensive use of (<dp>,X) addressing. In fact, implementing an efficient driver, one that could support 115.2 Kbps on all four channels, coming and going, would have been quite difficult without (<dp>,X).
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: What would you change in 1974 with Mensch & Peddle
gfoot wrote:
Does anybody know how relatively expensive different changes would have been?
Re: What would you change in 1974 with Mensch & Peddle
added a Z register for X, Y, Z operations (yes it can be serialized, we live in a 3d world and I think it would help)
put the Stack, in the IC if possible as a form of L1 Cache RAM (for speed and simplicity)
...general compatibility interoperability with existing standards
put the Stack, in the IC if possible as a form of L1 Cache RAM (for speed and simplicity)
...general compatibility interoperability with existing standards
Re: What would you change in 1974 with Mensch & Peddle
drogon wrote:
Or that was the thought... Did anyone actually use [decimal mode]?
gfoot wrote:
I wonder about NMI, now that you mention it...
jgharston wrote:
I'd also ensure JSR pushed the proper return address, not address-1.
As for a more general comment, it's always worth thinking about instruction decoding when considering what features you might want to add. I believe that saving on decoding circuitry was the reason that the 6502 didn't have a BRA instruction: they'd already filled up an entire 3-bit "column" with the conditional branch instructions, and decoding one more branch instruction from a different column would have added considerable expense, compared to just using a single column.
Curt J. Sampson - github.com/0cjs
Re: What would you change in 1974 with Mensch & Peddle
cjs wrote:
I've aways suspected the reason for doing that was to save a cycle on JSR, though now that I look more closely at it I can't really see how it would do that. Maybe it avoids having to make an "increment PC or not?" decision when executing RTS? Anybody want to pop it into Visual 6502 and give a cycle by cycle account of JSR and RTS?
Re: What would you change in 1974 with Mensch & Peddle
It's very much a teaching moment: JSR as it is saves needing an extra byte of storage. That was the level of price pressure on the design!
Re: What would you change in 1974 with Mensch & Peddle
And the LSB of the destination address is temporarily stored in the S register to save another byte.
Re: What would you change in 1974 with Mensch & Peddle
Arlet wrote:
The CPU first reads the LSB of the destination address, then pushes the PC, then reads the MSB, and then jumps to new location. If you wanted to read LSB, read MSB, push PC, then you'd need an extra 8 bits to store the MSB somewhere.
I love learning this kind of stuff about the 6502 (and about old microcomputers in general). You really come out with a completely different viewpoint about what's sensible and what isn't when you learn what's going on at those lower levels.
Curt J. Sampson - github.com/0cjs
Re: What would you change in 1974 with Mensch & Peddle
Even though the 6502 was supposed to be the original RISC CPU, later RISC CPUs like ARM went in a completely different direction, with less addressing modes but more registers. I always wondered what would've happened if the makers of the 6502 took the approach of more registers but less addressing modes but still managed to get it within the same price and transistor count.
For example, getting rid of indirect addressing modes, but using 16-bit address registers instead. I do how much that would've change the overall transistor count.
For example, getting rid of indirect addressing modes, but using 16-bit address registers instead. I do how much that would've change the overall transistor count.
Re: What would you change in 1974 with Mensch & Peddle
Aaendi wrote:
Even though the 6502 was supposed to be the original RISC CPU...
1. "RISC" wasn't even a concept until 1980, or perhaps 1979 or late 1978 in unpublished work. That's years after the 6502.
2. RISC is distinguished from CISC, and arguably it was the VAX-11 (announced in '77, two years after the 6502 introduction) that provided the CISC standard of comparison, with all the extra, specialised instructions it added over the PDP-11.
3. RISC focuses on register-to-register operations, with registers being generic. The 6502 is very much a memory-oriented processor, even more than than 8080. For example, the 8080, while operations tend to focus on the accumulator, is generic in the counterparty register: you can do things like `XRA A` (`XOR A,A` in Z80 mnemonics) which is simply something you can't do in the 6502. This was also quite efficient; `XRA A`/`XOR A,A` is the standard idiom for loading 0 into the accumulator, taking only one one byte and M-cycle (four T-cycles, IIRC, the minimum), whereas the 6502 must do `LDA #0`, taking two bytes and two cycles.
4. It's clear from much of the 6502 instruction coding that reducing the number of instructions was not about being RISC, but simply saving transistors. For example, it would be great to have a `BRA` instruction as the 6800 did (so great that they added it back in the 65C02!), but with a hard limit of 8 relative branch instructions to keep the coding of the particular branch instruction to three bits, there simply wasn't room for it. (Consider which branch instruction you would remove to make room for `BRA`, and why.)
Quote:
For example, getting rid of indirect addressing modes, but using 16-bit address registers instead. I do how much that would've change the overall transistor count.
And you can't even just change the X register to be 16 bits instead of having indirect ZP indexed addressing; one of the improvements over the 6800 (which had a 16-bit X register) made by the 6502 design team was to essentially have multiple (inefficient, because zero-page) index registers. If you've ever programmed a 6800, you'll well remember how inefficient it was to do things (particularly copies) with only one index register, and the additional pain caused to program design from not being able to push it on to the stack.
The 6502 has a marvelously efficient design. It's not easily appreciated until you first have programmed both it and the 6800, and second have gained some understanding of how the 6502 works internally. It took me many, many years to start to properly appreciate it.
Curt J. Sampson - github.com/0cjs
Re: What would you change in 1974 with Mensch & Peddle
cjs wrote:
3. RISC focuses on register-to-register operations, with registers being generic. The 6502 is very much a memory-oriented processor, even more than than 8080. For example, the 8080, while operations tend to focus on the accumulator, is generic in the counterparty register: you can do things like `XRA A` (`XOR A,A` in Z80 mnemonics) which is simply something you can't do in the 6502. This was also quite efficient; `XRA A`/`XOR A,A` is the standard idiom for loading 0 into the accumulator, taking only one one byte and M-cycle (four T-cycles, IIRC, the minimum), whereas the 6502 must do `LDA #0`, taking two bytes and two cycles.
So keeping your working data in registers and doing your ALU and branch decisions based on them keeps the pipelines moving at full speed.
And when you do have to go to memory, burst-reading cache line sized units, and then keeping a cache hierarchy is important, as are techniques such as speculation to keep the memory interface busy and data moving even if the program is getting its current needs met from register-to-register or cache hits. All of this acts to decouple the instruction retirement rate from the memory latency.
The 6502 does its best work when the memory system can keep up, and service a transaction on every cycle, and even then it's only on the bus for half a cycle, so with a 70ns SRAM I think that means the 6502 maxes out at around 7MHz. Let me know if I'm mistaken about that.
Re: What would you change in 1974 with Mensch & Peddle
I am interested in the hub-bub about using zero page as 'registers'.
I again advocate for a 3rd "Z" register that can be even further reduced from Y, as Y is to X, as X is to A.
This would not be a full Z register, just something to help the robotic automation of Detroit auto factories.
Id also try like crazy to get some of the base, low level memory (zero page, stack) onto the die wherever possible.
Id also try to get more I/O 'Chip Select' pins for more advanced Chip to Chip communication.
More Vectors in the upper band maybe.
I again advocate for a 3rd "Z" register that can be even further reduced from Y, as Y is to X, as X is to A.
This would not be a full Z register, just something to help the robotic automation of Detroit auto factories.
Id also try like crazy to get some of the base, low level memory (zero page, stack) onto the die wherever possible.
Id also try to get more I/O 'Chip Select' pins for more advanced Chip to Chip communication.
More Vectors in the upper band maybe.
Re: What would you change in 1974 with Mensch & Peddle
wayfarer wrote:
I am interested in the hub-bub about using zero page as 'registers'.
I say, "architecturally", as it's intrinsic to the design. There are no general purpose address registers in the 6502 (i.e. registers that hold and address for the purpose of performing indirect accesses), like for example the HL, BC, DE, IX, IY regs in Z80, the X, Y regs in 6809,the SI, DI, BP regs in 8086, IX reg in 6800. In the 6502 such indirect accesses are accomplished by pairs of bytes in page zero, primarily via the (ZP),Y address mode, with the Y register providing an offset and an efficient way to advance the pointer using INY vs. having to increment the 16-bit value in memory with an INC/BNE/INC sequence.
It's somewhat cumbersome, but remember at the time (1975) the 6502's minimal design was dramatically cheaper than the other available CPUs, and its aggressive price point arguably unleashed the entire home computer industry. Worth it.
Re: What would you change in 1974 with Mensch & Peddle
My prioritized wish list:
I've never used (ZP,X), so I'd rather have two independent indirect offsets. The counter to this is that certain types of application may suffer (e.g. Forth).
There's not much to say about BRA (2 bytes) vs. JMP (3 bytes)... one less byte. Unless position independence is important... in which case perhaps a BSR would be more valuable.
Overall none of these are game-changers... which most likely means I haven't written enough 6502 to have better ideas.
- 1. ADD/SUB without carry.
2. (ZP),X instead of (ZP,X).
3. Unconditional BRA.
I've never used (ZP,X), so I'd rather have two independent indirect offsets. The counter to this is that certain types of application may suffer (e.g. Forth).
There's not much to say about BRA (2 bytes) vs. JMP (3 bytes)... one less byte. Unless position independence is important... in which case perhaps a BSR would be more valuable.
Overall none of these are game-changers... which most likely means I haven't written enough 6502 to have better ideas.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: What would you change in 1974 with Mensch & Peddle
sark02 wrote:
My prioritized wish list...(ZP),X instead of (ZP,X)...I've never used (ZP,X)...
Guess you haven’t done much device driver programming.
Code: Select all
03594 ; —-—-—-—-—-—-—-—-—-—-—-—
03595 ; SIO Receiver Processing
03596 ; —-—-—-—-—-—-—-—-—-—-—-—
03597 ;
03598 00D2CF 85 4A sta sioirqst ;save channel IRQ status word
03599 00D2D1 A0 00 ldy #0 ;1st channel to process
03600 ;
03601 00D2D3 46 4A .0000010 lsr sioirqst ;this channel interrupting?
03602 00D2D5 90 32 bcc .0000040 ;no, skip it
03603 ;
03604 00D2D7 98 tya ;copy channel index &...
03605 00D2D8 0A asl ;make channel pointer...
03606 00D2D9 AA tax ;offset
03607 00D2DA A9 40 lda #nxpcresr ;clear any RxD...
03608 00D2DC 81 18 sta (siocr,x) <——— ;overrun (ugly hack)
03609 ;
03610 00D2DE A1 10 .0000020 lda (siosr,x) <——— ;get receiver status
03611 00D2E0 89 01 bit #nxprxdr ;any data?
03612 00D2E2 F0 11 beq .0000030 ;no, done with channel
03613 ;
03614 00D2E4 A1 20 lda (siofif,x) <——— ;get datum from channel &...
03615 00D2E6 EB xba ;protect it
03616 00D2E7 B5 30 lda sioputrx,x ;get RX queue ‘put’ pointer
03617 00D2E9 1A inc ;bump it
03618 00D2EA D5 28 cmp siogetrx,x ;any room in queue?
03619 00D2EC F0 F0 beq .0000020 ;no, discard datum
03620 ;
03621 00D2EE EB xba ;recover datum &...
03622 00D2EF 81 30 sta (sioputrx,x) <——— ;store in queue
03623 00D2F1 F6 30 inc sioputrx,x ;adjust queue ‘put’ pointer &...
03624 00D2F3 80 E9 bra .0000020 ;go back for more
03625 ;
03626 00D2F5 38 .0000030 sec
03627 00D2F6 B5 30 lda sioputrx,x ;compute datums...
03628 00D2F8 F5 28 sbc siogetrx,x ;in queue
03629 00D2FA C9 E7 cmp #s_siohwm ;reach queue fill level?
03630 00D2FC 90 0B bcc .0000040 ;no, move on to next channel
03631 ;
03632 00D2FE BD 0B EF lda siotstab,x ;yes, stop...
03633 00D301 04 48 tsb siorxst ;incoming data flow
03634 00D303 D0 04 bne .0000040 ;data flow previously stopped
03635 ;
03636 00D305 A9 90 lda #nxpcrrsd ;tell channel to...
03637 00D307 81 18 sta (siocr,x) <——— ;deassert RTS
03638 ;
03639 00D309 C8 .0000040 iny ;next channel
03640 00D30A CC 0C C0 cpy io_quart+nx_urega ;all channels processed?
03641 00D30D D0 C4 bne .0000010 ;noNote all the highlighted instructions, in which specific registers in the four channels of the two DUARTs in POC V1.3’s hardware are easily selected with a single index value. Writing the above code without benefit of (<dp>,X) addressing would have been possible, of course. However, lacking (<dp>,X) to quickly select each channel’s register set, it would have been necessary to set up the pointers each time the ISR ran in response to a virtual QUART interrupt, adding to both code size and total execution time.
x86? We ain't got no x86. We don't NEED no stinking x86!