65816 system engineering pros and cons

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Dec 29, 2010 1:08 am

So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Dec 29, 2010 3:11 am

ElEctric_EyE wrote:

So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

In terms of raw processing power, the 65C816 can readily outperform a 65C02 running at the same clock speed when one starts to consider the use of 16 bit register and memory operations. Example:

Code: Select all

          inc counter
          bne next
          inc counter+1
next      ...program continues...

If the above were coded for the '816 running in native mode, and assuming accumulator/memory operations are set for 16 bits, you'd have the following:

Code: Select all

          inc counter
          ...program continues...

Even better would be decrementing the counter. With the 'C02, you'd have to code something like the following:

Code: Select all

          ldx counter
          bne next
          dec counter+1
next      dec counter

Again, if the above were coded for the '816 running in native mode, and assuming accumulator/memory operations are set for 16 bits, you'd have the following:

Code: Select all

          dec counter
          ...program continues...

In itself, this feature would substantially improve program performance as compared to coding it with only eight bit register and memory operations.

However, there are more subtle features that give the '816 an advantage over the 65C02. Perhaps the most important of them are the new stack-based instructions (as well as BRL) that facilitate the development of relocatable code, as well as the ability to efficiently use the stack for passing parameters into subroutines. Also, there are TXY and TYX, which eliminate the need to use some intermediate code to copy X to Y or vice versa. Another one that I've found useful for swapping byte order is XBA. Consider the following:

Code: Select all

;reverse byte order in RAM
;
         rep #%00100000        ;select 16 bit accumulator/memory
         lda addr              ;read 2 bytes from addr & addr+1
         xba                   ;swap them
         sta addr              ;write 2 bytes to addr & addr+1

That's definitely better than how you'd do it with a 'C02:

Code: Select all

;reverse byte order in RAM
;
         lda addr              ;read 1st byte &...
         pha                   ;protect it
         lda addr+1            ;read 2nd byte
         sta addr              ;now the 1st byte
         pla                   ;get former 1st byte
         sta addr+1            ;now 2nd byte

On the hardware side of things, the '816 offers more signals to better manage memory, as well as properly arbitrate bus accesses by external hardware (e.g., some sort of DMA controller). There is also the very useful feature of separate vectors being provided for IRQ and BRK, eliminating the code that is required to distinguish one from the other on a 'C02.

So, to answer your question, yes the '816 can outpace the 65C02 in all respects.

GARTHWILSON · Post by **GARTHWILSON** » Wed Dec 29, 2010 7:29 am

Quote:

So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

See Ed's post at viewtopic.php?p=9704#p9704 and my reply immediately after it.

fachat · Post by **fachat** » Wed Dec 29, 2010 9:39 am

ElEctric_EyE wrote:

So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

Yes, I think so. I don't see anything compared to the 6502 that counters that.

André

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Dec 29, 2010 12:36 pm

BDD, I think you have converted me with your statements above, INC & DEC in 16-bits internally.. Also with the TXY, TYX.

These very issues have very recently come up with me, as my software graphic routines for a 640x480 TFT display have become cumbersome for the above reasons alone, using the W65C02.

I have some a few '816's in 40PDIP, but does anyone know if WDC has plans to release the '816 in QFP form soon? any rumors at least, heh? The new data sheets now include that style of packaging, but are not as yet for sale on their site...

BigEd · Post by **BigEd** » Wed Dec 29, 2010 5:44 pm

I've added a note to the opening post on this topic, to the effect that simple systems remain simple with the 65816 - people only start hitting 'interesting' situations when they venture into more complex systems.

I would go further, and suggest that every one of the 'extra' features of the 65816 was carefully designed with some use case in mind - if a feature seems useless or impossible to use, that might be a problem with understanding, rather than a problem with the chip.

I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes. It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too. A few cases spring to mind:
- ignore the failing pushes, because the task will be killed
- capture the pushes in a 4-byte buffer in the glue
- map in some dedicated area of memory specifically to take the pushes
- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.

Note also that ABORT isn't like the other 'interrupts' - it does not finish the instruction. An ABORT handler must kill the task or put things in place for a restart of the instruction.

Cheers
Ed

kc5tja · Post by **kc5tja** » Wed Dec 29, 2010 5:55 pm

BigEd wrote:

I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes.

The problem is, it has no context for doing so. All the glue logic knows is that there's a page fault when hitting a stack page that doesn't exist. Is the write a push? Is it a STA? Without tracking opcodes, which is adding yet more complexity to the system, there's just no way to know.

Quote:

It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too.

Of course, but . . .

Quote:

A few cases spring to mind:
- ignore the failing pushes, because the task will be killed

Actually, it's a machine-check condition, which means the entire system needs to be rebooted. Note that all CPUs with built-in MMUs have no more knowledge of machine state in a double-fault condition than any system with an external MMU. This is why the x86, MIPS, and PowerPC architectures have non-recoverable double-fault exceptions (and, in the specific case of x86, will take it upon itself to hard-reboot if it detects a third fault in succession).

Quote:

- capture the pushes in a 4-byte buffer in the glue

Assuming we could do this reliably, what information would this provide us? As I understand how things would work, the 2nd ABORT corrupts the state of the 1st.

Quote:

- map in some dedicated area of memory specifically to take the pushes

Automatic stack space expansion is a function of the OS, and not of the hardware. The hardware doesn't know where free pages exist to draw from.

Quote:

- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.

I claim this, too, is an OS-level concern. The MMU is just doing its job -- if the stack ends up writing into protected memory (e.g., I/O space in bank 0), then it should rightfully issue an -ABORT signal.

BTW, I'm not suggesting that ABORT cannot be used to make a reliable virtual memory environment for the CPU. Quite the contrary -- I was one of the original folks here who saw its potential, and once I get an FPGA system, would like to explore this in greater detail. I would be fun and liberating to have a multitasking Forth system operating under memory protection, for then things like 0 0 ! would not be such inconveniences.

All I'm saying is that there are some things about its use which are inconvenient or particularly troublesome if you're not careful. The 65816 instruction set was intended for peripheral control (per the original 6502 instruction set intentions), and not intended for running Unix or Windows (despite being a more capable machine than the original PDP-8!).

But let's be realistic too -- I'd MUCH rather deal with the quirks of ABORT than having to embed a completely separate CPU to handle page faults, like the original SunOS boxes did.

fachat · Post by **fachat** » Wed Dec 29, 2010 6:16 pm

kc5tja wrote:

But let's be realistic too -- I'd MUCH rather deal with the quirks of ABORT than having to embed a completely separate CPU to handle page faults, like the original SunOS boxes did.

Absolutely. I also used a second CPU in my CS/A system to handle the ABORTs because the 6502 cannot do it. Sun was lucky Motorola fixed the 68000 (becoming the 68010 IIRC because of that), and we are lucky WDC fixed the 6502 :-)

André

BigEd · Post by **BigEd** » Wed Dec 29, 2010 6:19 pm

I don't think there is a second abort - there's no need for one and it would be no use. The previous postings seem to assume that a second abort is inevitable, and then to puzzle over how difficult it is: my point is that this a signal under control of the system designer, and it would be a broken design to produce this signal in a way which prevents recovery.

The abort arises when the user task makes an access which is somehow invalid. The next few cycles are certain to contain four writes, intended to be pushes onto the stack. Deal with them as you see fit, but don't raise another abort, because it won't do you any good. You're already in the process of handling it, in hardware - you haven't even reached the vector pull yet. It may turn out, when the software handler gets to pick over the pieces(*), that there was a failed PHA, or a failed STA. As I said, you may have a system which can never fail to PHA, or you might not: when designing your ABORT logic and handler, you need to consider which case you are designing for.

(I don't think it's too helpful to think of abort as being a page fault - it bears a little similarity, but the 816 doesn't have the internal support. It's a bad access. Maybe recoverable, maybe not. It would be a simpler design if ABORT is never recoverable: the kernel just needs to dismantle the dead task. If ABORT is a RESET then you don't even need ABORT. I don't think it's useful to consider all the ways in which ABORT isn't like a fault on later VM-capable CPUs - it's more useful to consider what it is and how to use it.)

Cheers
Ed

(*) If indeed the pieces exist to pick over: if the four writes went somewhere. I see no problem in dedicating the first four bytes of the OS bank to be the ABORT buffer - this isn't automatic stack expansion, this is just saving some state in a particular situation. And there is no nesting.

kc5tja · Post by **kc5tja** » Wed Dec 29, 2010 7:14 pm

WDC would never have made ABORT restartable if it weren't intended to facilitate virtual memory, so ABORT's recoverability depends entirely on how you use it.

I see what you're saying about the ABORT logic now -- since you knew you threw the ABORT, you know the next batch of writes up to the VPB signal must contain application state (PC, optionally PBR, and P), and thus can be intercepted under the assumption that RAM accesses are invalid. Change to supervisor state, and let the OS fix up its broken S register (since the CPU thinks it wrote to RAM but actually didn't), or change to a supervisor stack manually. Read the MMU registers for the captured user-mode state. When ready to return, transfer (or synthesize, if the task you're switching to is about to execute a signal handler) user state onto the stack, and execute an RTI.

BigEd · Post by **BigEd** » Wed Dec 29, 2010 7:53 pm

Exactly! Thanks for spelling it out (it was only in my head, which isn't a good place even for four bytes)

On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls. It might be that the glue needs to hold a small stack of supervisor mode bits. When the NMI returns it lands in the IRQ handler and when that returns it lands in the BRK handler, which can finally return the OS call results to the user process which used BRK.

I prefer delayed-action switch triggered by write into I/O space because
- user mode can execute RTI but can't write into I/O
- a write into I/O doesn't disturb the machine state, so it can be the last instruction before the RTI

I think the delayed-action only has to account for the fetch cycle (of the RTI) before switching modes - I don't think it's a general-purpose timer.

But then, we could be interrupted in between the write and the RTI. Aargh!

Cheers
Ed

kc5tja · Post by **kc5tja** » Wed Dec 29, 2010 8:00 pm

If you write to the delayed demotion register, the glue can temporarily hold off NMI and IRQ signals, so that you can rely on an interrupt never occurring between the strobe and the RTI fetch.

BigEd · Post by **BigEd** » Wed Dec 29, 2010 8:06 pm

Nice!

fachat · Post by **fachat** » Wed Dec 29, 2010 8:27 pm

BigEd wrote:

On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls.

You could force the always-1 bit in the SR to zero when the supervisor mode is interrupted and when pulling that from the stack decide whether to leave supervisor mode or not. Just an idea I had for my 65k project. But might be too complicated because you have to monitor (and even change) reads and writes of the status register to/from the stack (that's easier if it's already built into the processor)

André

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Dec 30, 2010 2:59 am

fachat wrote:

BigEd wrote:

You could force the always-1 bit in the SR to zero when the supervisor mode is interrupted and when pulling that from the stack decide whether to leave supervisor mode or not.

There is no "always-1" bit in the 65C816's SR. The bit to which you are referring is the accumulator/memory size bit when the '816 is running in native mode.

65816 system engineering pros and cons

CONVERTED!

more on ABORT handling