65816 system engineering pros and cons
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
ElEctric_EyE wrote:
So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?
Code: Select all
inc counter
bne next
inc counter+1
next ...program continues...Code: Select all
inc counter
...program continues...Code: Select all
ldx counter
bne next
dec counter+1
next dec counterCode: Select all
dec counter
...program continues...However, there are more subtle features that give the '816 an advantage over the 65C02. Perhaps the most important of them are the new stack-based instructions (as well as BRL) that facilitate the development of relocatable code, as well as the ability to efficiently use the stack for passing parameters into subroutines. Also, there are TXY and TYX, which eliminate the need to use some intermediate code to copy X to Y or vice versa. Another one that I've found useful for swapping byte order is XBA. Consider the following:
Code: Select all
;reverse byte order in RAM
;
rep #%00100000 ;select 16 bit accumulator/memory
lda addr ;read 2 bytes from addr & addr+1
xba ;swap them
sta addr ;write 2 bytes to addr & addr+1Code: Select all
;reverse byte order in RAM
;
lda addr ;read 1st byte &...
pha ;protect it
lda addr+1 ;read 2nd byte
sta addr ;now the 1st byte
pla ;get former 1st byte
sta addr+1 ;now 2nd byteSo, to answer your question, yes the '816 can outpace the 65C02 in all respects.
x86? We ain't got no x86. We don't NEED no stinking x86!
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
CONVERTED!
BDD, I think you have converted me with your statements above, INC & DEC in 16-bits internally.. Also with the TXY, TYX.
These very issues have very recently come up with me, as my software graphic routines for a 640x480 TFT display have become cumbersome for the above reasons alone, using the W65C02.
I have some a few '816's in 40PDIP, but does anyone know if WDC has plans to release the '816 in QFP form soon? any rumors at least, heh? The new data sheets now include that style of packaging, but are not as yet for sale on their site...
These very issues have very recently come up with me, as my software graphic routines for a 640x480 TFT display have become cumbersome for the above reasons alone, using the W65C02.
I have some a few '816's in 40PDIP, but does anyone know if WDC has plans to release the '816 in QFP form soon? any rumors at least, heh? The new data sheets now include that style of packaging, but are not as yet for sale on their site...
I've added a note to the opening post on this topic, to the effect that simple systems remain simple with the 65816 - people only start hitting 'interesting' situations when they venture into more complex systems.
I would go further, and suggest that every one of the 'extra' features of the 65816 was carefully designed with some use case in mind - if a feature seems useless or impossible to use, that might be a problem with understanding, rather than a problem with the chip.
I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes. It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too. A few cases spring to mind:
- ignore the failing pushes, because the task will be killed
- capture the pushes in a 4-byte buffer in the glue
- map in some dedicated area of memory specifically to take the pushes
- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.
Note also that ABORT isn't like the other 'interrupts' - it does not finish the instruction. An ABORT handler must kill the task or put things in place for a restart of the instruction.
Cheers
Ed
I would go further, and suggest that every one of the 'extra' features of the 65816 was carefully designed with some use case in mind - if a feature seems useless or impossible to use, that might be a problem with understanding, rather than a problem with the chip.
I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes. It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too. A few cases spring to mind:
- ignore the failing pushes, because the task will be killed
- capture the pushes in a 4-byte buffer in the glue
- map in some dedicated area of memory specifically to take the pushes
- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.
Note also that ABORT isn't like the other 'interrupts' - it does not finish the instruction. An ABORT handler must kill the task or put things in place for a restart of the instruction.
Cheers
Ed
BigEd wrote:
I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes.
Quote:
It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too.
Quote:
A few cases spring to mind:
- ignore the failing pushes, because the task will be killed
- ignore the failing pushes, because the task will be killed
Quote:
- capture the pushes in a 4-byte buffer in the glue
Quote:
- map in some dedicated area of memory specifically to take the pushes
Quote:
- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.
BTW, I'm not suggesting that ABORT cannot be used to make a reliable virtual memory environment for the CPU. Quite the contrary -- I was one of the original folks here who saw its potential, and once I get an FPGA system, would like to explore this in greater detail. I would be fun and liberating to have a multitasking Forth system operating under memory protection, for then things like 0 0 ! would not be such inconveniences.
All I'm saying is that there are some things about its use which are inconvenient or particularly troublesome if you're not careful. The 65816 instruction set was intended for peripheral control (per the original 6502 instruction set intentions), and not intended for running Unix or Windows (despite being a more capable machine than the original PDP-8!).
But let's be realistic too -- I'd MUCH rather deal with the quirks of ABORT than having to embed a completely separate CPU to handle page faults, like the original SunOS boxes did.
kc5tja wrote:
But let's be realistic too -- I'd MUCH rather deal with the quirks of ABORT than having to embed a completely separate CPU to handle page faults, like the original SunOS boxes did.
André
more on ABORT handling
I don't think there is a second abort - there's no need for one and it would be no use. The previous postings seem to assume that a second abort is inevitable, and then to puzzle over how difficult it is: my point is that this a signal under control of the system designer, and it would be a broken design to produce this signal in a way which prevents recovery.
The abort arises when the user task makes an access which is somehow invalid. The next few cycles are certain to contain four writes, intended to be pushes onto the stack. Deal with them as you see fit, but don't raise another abort, because it won't do you any good. You're already in the process of handling it, in hardware - you haven't even reached the vector pull yet. It may turn out, when the software handler gets to pick over the pieces(*), that there was a failed PHA, or a failed STA. As I said, you may have a system which can never fail to PHA, or you might not: when designing your ABORT logic and handler, you need to consider which case you are designing for.
(I don't think it's too helpful to think of abort as being a page fault - it bears a little similarity, but the 816 doesn't have the internal support. It's a bad access. Maybe recoverable, maybe not. It would be a simpler design if ABORT is never recoverable: the kernel just needs to dismantle the dead task. If ABORT is a RESET then you don't even need ABORT. I don't think it's useful to consider all the ways in which ABORT isn't like a fault on later VM-capable CPUs - it's more useful to consider what it is and how to use it.)
Cheers
Ed
(*) If indeed the pieces exist to pick over: if the four writes went somewhere. I see no problem in dedicating the first four bytes of the OS bank to be the ABORT buffer - this isn't automatic stack expansion, this is just saving some state in a particular situation. And there is no nesting.
The abort arises when the user task makes an access which is somehow invalid. The next few cycles are certain to contain four writes, intended to be pushes onto the stack. Deal with them as you see fit, but don't raise another abort, because it won't do you any good. You're already in the process of handling it, in hardware - you haven't even reached the vector pull yet. It may turn out, when the software handler gets to pick over the pieces(*), that there was a failed PHA, or a failed STA. As I said, you may have a system which can never fail to PHA, or you might not: when designing your ABORT logic and handler, you need to consider which case you are designing for.
(I don't think it's too helpful to think of abort as being a page fault - it bears a little similarity, but the 816 doesn't have the internal support. It's a bad access. Maybe recoverable, maybe not. It would be a simpler design if ABORT is never recoverable: the kernel just needs to dismantle the dead task. If ABORT is a RESET then you don't even need ABORT. I don't think it's useful to consider all the ways in which ABORT isn't like a fault on later VM-capable CPUs - it's more useful to consider what it is and how to use it.)
Cheers
Ed
(*) If indeed the pieces exist to pick over: if the four writes went somewhere. I see no problem in dedicating the first four bytes of the OS bank to be the ABORT buffer - this isn't automatic stack expansion, this is just saving some state in a particular situation. And there is no nesting.
WDC would never have made ABORT restartable if it weren't intended to facilitate virtual memory, so ABORT's recoverability depends entirely on how you use it.
I see what you're saying about the ABORT logic now -- since you knew you threw the ABORT, you know the next batch of writes up to the VPB signal must contain application state (PC, optionally PBR, and P), and thus can be intercepted under the assumption that RAM accesses are invalid. Change to supervisor state, and let the OS fix up its broken S register (since the CPU thinks it wrote to RAM but actually didn't), or change to a supervisor stack manually. Read the MMU registers for the captured user-mode state. When ready to return, transfer (or synthesize, if the task you're switching to is about to execute a signal handler) user state onto the stack, and execute an RTI.
I see what you're saying about the ABORT logic now -- since you knew you threw the ABORT, you know the next batch of writes up to the VPB signal must contain application state (PC, optionally PBR, and P), and thus can be intercepted under the assumption that RAM accesses are invalid. Change to supervisor state, and let the OS fix up its broken S register (since the CPU thinks it wrote to RAM but actually didn't), or change to a supervisor stack manually. Read the MMU registers for the captured user-mode state. When ready to return, transfer (or synthesize, if the task you're switching to is about to execute a signal handler) user state onto the stack, and execute an RTI.
Exactly! Thanks for spelling it out (it was only in my head, which isn't a good place even for four bytes)
On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls. It might be that the glue needs to hold a small stack of supervisor mode bits. When the NMI returns it lands in the IRQ handler and when that returns it lands in the BRK handler, which can finally return the OS call results to the user process which used BRK.
I prefer delayed-action switch triggered by write into I/O space because
- user mode can execute RTI but can't write into I/O
- a write into I/O doesn't disturb the machine state, so it can be the last instruction before the RTI
I think the delayed-action only has to account for the fetch cycle (of the RTI) before switching modes - I don't think it's a general-purpose timer.
But then, we could be interrupted in between the write and the RTI. Aargh!
Cheers
Ed
On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls. It might be that the glue needs to hold a small stack of supervisor mode bits. When the NMI returns it lands in the IRQ handler and when that returns it lands in the BRK handler, which can finally return the OS call results to the user process which used BRK.
I prefer delayed-action switch triggered by write into I/O space because
- user mode can execute RTI but can't write into I/O
- a write into I/O doesn't disturb the machine state, so it can be the last instruction before the RTI
I think the delayed-action only has to account for the fetch cycle (of the RTI) before switching modes - I don't think it's a general-purpose timer.
But then, we could be interrupted in between the write and the RTI. Aargh!
Cheers
Ed
BigEd wrote:
On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls.
André
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
fachat wrote:
BigEd wrote:
You could force the always-1 bit in the SR to zero when the supervisor mode is interrupted and when pulling that from the stack decide whether to leave supervisor mode or not.
x86? We ain't got no x86. We don't NEED no stinking x86!