As you may know, the 6502's zero page is basically 256 bytes of processor registers. On the '816, you can have lots of sets of "zero pages," the 256-byte range being set by the 16-bit direct-page register. The 6502's hardware stack is in page 1, ie, address $100-$1FF, but on the '816, it can be tens of thousands of bytes—although I've never needed anywhere near even one whole page.
When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure." I did this for years myself, and finally decided to do some tests to find out. I filled the 6502 stack area with a constant value (maybe it was 00—I don't remember), ran a heavy-ish application with all the interrupts going too, did compiling, assembling, and interpreting while running other things in the background on interrupts, and after awhile looked to see how much of the stack area had been written on. It wasn't really much—less than 20% of each of page 1 (return stack) and page 0 (data stack). This was in Forth, which makes heavy use of the stacks. The IRQ interrupt handlers were in Forth too, although the software RTC (run off a timer on NMI) was not. If you use an '816 and dedicate 64 bytes of stack space and 64 bytes of DP space to each program you have running concurrently, you could have
hundreds of such programs and still have plenty of room in bank 0 for ISRs, the reset routine, etc.. The individual programs themselves would go in other banks.
Even 30 years ago though, people naturally thought the Z80 for example, which has more registers, wider registers, and a higher clock speed, should vastly outperform the 6502; yet the 6502 (and 6800) routinely did better in benchmark tests. The 6502 runs Forth about 25% faster than a 6800 at a given clock speed.
You'll get a small step up from the 6502 to the 65c02, and a much bigger step up from there to the 65816. Even a 6502 outperforms an 8086 though in the Sieve of Eratosthenes benchmark in cycles required to finish the job though, in spite of number and size of registers, and all the more an 8088. For completing ten iterations of the Sieve:
Code: Select all
5MHz 8088 4.0 seconds
4MHz 6502 3.1 seconds
8MHz 8086 1.9 seconds
4MHz 65816 1.56 seconds
8MHz 65816 .78 seconds
8MHz 68000 .49 seconds
16MHz 65816 .39 seconds
IOW, a
4MHz 65816 did it faster than an
8MHz 8086 which has more and wider registers, yet none of the production 65816's available off the shelf today are rated for any less than
14MHz. The fastest '816 did it faster than the fastest 68000 which had 32-bit registers and a lot more of them. I also find the '816 much easier to program than the 6502. (See an example
here.) It has more instructions and addressing modes, and features that make it far better suited for code relocation, multitasking, and a lot of other things where the 6502 is either clumsy or totally inept.
Do I sound like a 65816 salesman?
