Hi!
GARTHWILSON wrote:
Yes; but I would also argue that these don't necessarily give as much benefit as it would initially appear, as I showed above. SRAM is available in much faster speeds today than anything else you'll put on the bus. 10ns is run-of-the-mill today, and I've seen down to 6ns in outboard SRAM. Actually, that has been the case for many years. If it were on the same die with the processor, it would be even much faster; and in a modern deep-submicron silicon process, even faster. Maybe 1ns? Maybe measured in hundreds of picoseconds?
Problem is, you are still limited to one access per clock cycle. In contrast, register files are multi-ported, topically allowing two reads and one write per cycle.
Quote:
I think taking advantage of what you're saying requires much more complex instruction decoding, reducing the processor's maximum clock speed, unless you also get into the complexities of pipelines and having several instructions simultaneously in the process of getting decoded. The next step in peeling this onion is trying to do branch prediction to try to minimize the penalties of having to flush the pipelines, which becomes another reason why the complex processors are so difficult to write assembly language for, and compilers kind of created their own need. Obviously we do have very powerful processors today, so it is possible; but at what cost? 10,000-20,000-transistor processors have given way to processors with not just millions of transistors, but billions.
No, actually the 6502 is not simpler than newer minimal RISC CPUs.
You can compare a FPGA implementation - I have a cheap "upduino" board, and build my own 6502 based computer. Using Arlet verilog 6502, I can get the 6502 core up to 16MHz, using abut 900 LUT for the CPU. On the other hand, a RISC-V 32bit core uses about 2000 LUT and clocks at about 20MHz (see
https://github.com/grahamedgecombe/icicle )
And the RISC-V core executes up to one instruction per cycle, instead of the multiple cycles per instruction in the 6502, so it is *much* faster, and 3 times the number of LUT by going from 8 bit to 32 bit and from 5 to 16 registers is not that much.
The reason the 6502 core can't run faster is that the results from the ALU are directly feed to the memory address bus, so the critical path is pretty long.
Quote:
Also, I'm constantly doing nested interrupt service routines (particularly NMI interrupting the service an IRQ), so having two register sets is not enough to avoid saving registers for interrupt service.
The ARM cores also support two types of interrupts, FIQ is equivalent to the NMI and uses the alternate registers, and IRQ acts like the 6502 interrupts, you have to push the registers on the stack.
IMHO, I find ARM assembly simpler than the 6502 assembly, you don't have the "indirect" addressing modes, all registers are similar, and you can use condition codes to avoid jumps.
I really think that the ARM cores (in particular the old integer only ones) are the true evolution of the 6502 simplicity
![Smile :)](./images/smilies/icon_smile.gif)
Have Fun!