6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 9:47 am

All times are UTC




Post new topic Reply to topic  [ 31 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject:
PostPosted: Wed Dec 29, 2010 1:08 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 3:11 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
ElEctric_EyE wrote:
So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

In terms of raw processing power, the 65C816 can readily outperform a 65C02 running at the same clock speed when one starts to consider the use of 16 bit register and memory operations. Example:
Code:
          inc counter
          bne next
          inc counter+1
next      ...program continues...

If the above were coded for the '816 running in native mode, and assuming accumulator/memory operations are set for 16 bits, you'd have the following:
Code:
          inc counter
          ...program continues...

Even better would be decrementing the counter. With the 'C02, you'd have to code something like the following:
Code:
          ldx counter
          bne next
          dec counter+1
next      dec counter

Again, if the above were coded for the '816 running in native mode, and assuming accumulator/memory operations are set for 16 bits, you'd have the following:
Code:
          dec counter
          ...program continues...

In itself, this feature would substantially improve program performance as compared to coding it with only eight bit register and memory operations.

However, there are more subtle features that give the '816 an advantage over the 65C02. Perhaps the most important of them are the new stack-based instructions (as well as BRL) that facilitate the development of relocatable code, as well as the ability to efficiently use the stack for passing parameters into subroutines. Also, there are TXY and TYX, which eliminate the need to use some intermediate code to copy X to Y or vice versa. Another one that I've found useful for swapping byte order is XBA. Consider the following:
Code:
;reverse byte order in RAM
;
         rep #%00100000        ;select 16 bit accumulator/memory
         lda addr              ;read 2 bytes from addr & addr+1
         xba                   ;swap them
         sta addr              ;write 2 bytes to addr & addr+1

That's definitely better than how you'd do it with a 'C02:
Code:
;reverse byte order in RAM
;
         lda addr              ;read 1st byte &...
         pha                   ;protect it
         lda addr+1            ;read 2nd byte
         sta addr              ;now the 1st byte
         pla                   ;get former 1st byte
         sta addr+1            ;now 2nd byte


On the hardware side of things, the '816 offers more signals to better manage memory, as well as properly arbitrate bus accesses by external hardware (e.g., some sort of DMA controller). There is also the very useful feature of separate vectors being provided for IRQ and BRK, eliminating the code that is required to distinguish one from the other on a 'C02.

So, to answer your question, yes the '816 can outpace the 65C02 in all respects.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 7:29 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Quote:
So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?

See Ed's post at viewtopic.php?p=9704#p9704 and my reply immediately after it.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 9:39 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
ElEctric_EyE wrote:
So in all seriousness, as a retrospective question from my limited POV, the 65816 is FULLY qualified to outpace the 6502 in all manner of operation?


Yes, I think so. I don't see anything compared to the 6502 that counters that.

André


Top
 Profile  
Reply with quote  
 Post subject: CONVERTED!
PostPosted: Wed Dec 29, 2010 12:36 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BDD, I think you have converted me with your statements above, INC & DEC in 16-bits internally.. Also with the TXY, TYX.

These very issues have very recently come up with me, as my software graphic routines for a 640x480 TFT display have become cumbersome for the above reasons alone, using the W65C02.

I have some a few '816's in 40PDIP, but does anyone know if WDC has plans to release the '816 in QFP form soon? any rumors at least, heh? The new data sheets now include that style of packaging, but are not as yet for sale on their site...

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 5:44 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
I've added a note to the opening post on this topic, to the effect that simple systems remain simple with the 65816 - people only start hitting 'interesting' situations when they venture into more complex systems.

I would go further, and suggest that every one of the 'extra' features of the 65816 was carefully designed with some use case in mind - if a feature seems useless or impossible to use, that might be a problem with understanding, rather than a problem with the chip.

I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes. It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too. A few cases spring to mind:
- ignore the failing pushes, because the task will be killed
- capture the pushes in a 4-byte buffer in the glue
- map in some dedicated area of memory specifically to take the pushes
- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.

Note also that ABORT isn't like the other 'interrupts' - it does not finish the instruction. An ABORT handler must kill the task or put things in place for a restart of the instruction.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 5:55 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
BigEd wrote:
I'm not convinced by the concerns about using ABORT and having a double fault during the pushing of machine state. It's an input, produced by the glue logic, and therefore that logic can and should handle the 4 pushes.


The problem is, it has no context for doing so. All the glue logic knows is that there's a page fault when hitting a stack page that doesn't exist. Is the write a push? Is it a STA? Without tracking opcodes, which is adding yet more complexity to the system, there's just no way to know.

Quote:
It would be a bug in that logic to signal a second ABORT unless the system has an approach to handling that case too.


Of course, but . . .

Quote:
A few cases spring to mind:
- ignore the failing pushes, because the task will be killed


Actually, it's a machine-check condition, which means the entire system needs to be rebooted. Note that all CPUs with built-in MMUs have no more knowledge of machine state in a double-fault condition than any system with an external MMU. This is why the x86, MIPS, and PowerPC architectures have non-recoverable double-fault exceptions (and, in the specific case of x86, will take it upon itself to hard-reboot if it detects a third fault in succession).

Quote:
- capture the pushes in a 4-byte buffer in the glue


Assuming we could do this reliably, what information would this provide us? As I understand how things would work, the 2nd ABORT corrupts the state of the 1st.

Quote:
- map in some dedicated area of memory specifically to take the pushes


Automatic stack space expansion is a function of the OS, and not of the hardware. The hardware doesn't know where free pages exist to draw from.

Quote:
- design the system such that ABORT only handles other cases of illegal accesses, or accesses needing attention, and such that writes to the stack can never cause ABORT. For example, each user task has a private full-sized bank 0.


I claim this, too, is an OS-level concern. The MMU is just doing its job -- if the stack ends up writing into protected memory (e.g., I/O space in bank 0), then it should rightfully issue an -ABORT signal.

BTW, I'm not suggesting that ABORT cannot be used to make a reliable virtual memory environment for the CPU. Quite the contrary -- I was one of the original folks here who saw its potential, and once I get an FPGA system, would like to explore this in greater detail. I would be fun and liberating to have a multitasking Forth system operating under memory protection, for then things like 0 0 ! would not be such inconveniences. ;)

All I'm saying is that there are some things about its use which are inconvenient or particularly troublesome if you're not careful. The 65816 instruction set was intended for peripheral control (per the original 6502 instruction set intentions), and not intended for running Unix or Windows (despite being a more capable machine than the original PDP-8!).

But let's be realistic too -- I'd MUCH rather deal with the quirks of ABORT than having to embed a completely separate CPU to handle page faults, like the original SunOS boxes did.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 6:16 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
kc5tja wrote:
But let's be realistic too -- I'd MUCH rather deal with the quirks of ABORT than having to embed a completely separate CPU to handle page faults, like the original SunOS boxes did.


Absolutely. I also used a second CPU in my CS/A system to handle the ABORTs because the 6502 cannot do it. Sun was lucky Motorola fixed the 68000 (becoming the 68010 IIRC because of that), and we are lucky WDC fixed the 6502 :-)

André


Top
 Profile  
Reply with quote  
 Post subject: more on ABORT handling
PostPosted: Wed Dec 29, 2010 6:19 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
I don't think there is a second abort - there's no need for one and it would be no use. The previous postings seem to assume that a second abort is inevitable, and then to puzzle over how difficult it is: my point is that this a signal under control of the system designer, and it would be a broken design to produce this signal in a way which prevents recovery.

The abort arises when the user task makes an access which is somehow invalid. The next few cycles are certain to contain four writes, intended to be pushes onto the stack. Deal with them as you see fit, but don't raise another abort, because it won't do you any good. You're already in the process of handling it, in hardware - you haven't even reached the vector pull yet. It may turn out, when the software handler gets to pick over the pieces(*), that there was a failed PHA, or a failed STA. As I said, you may have a system which can never fail to PHA, or you might not: when designing your ABORT logic and handler, you need to consider which case you are designing for.

(I don't think it's too helpful to think of abort as being a page fault - it bears a little similarity, but the 816 doesn't have the internal support. It's a bad access. Maybe recoverable, maybe not. It would be a simpler design if ABORT is never recoverable: the kernel just needs to dismantle the dead task. If ABORT is a RESET then you don't even need ABORT. I don't think it's useful to consider all the ways in which ABORT isn't like a fault on later VM-capable CPUs - it's more useful to consider what it is and how to use it.)

Cheers
Ed

(*) If indeed the pieces exist to pick over: if the four writes went somewhere. I see no problem in dedicating the first four bytes of the OS bank to be the ABORT buffer - this isn't automatic stack expansion, this is just saving some state in a particular situation. And there is no nesting.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 7:14 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
WDC would never have made ABORT restartable if it weren't intended to facilitate virtual memory, so ABORT's recoverability depends entirely on how you use it.

I see what you're saying about the ABORT logic now -- since you knew you threw the ABORT, you know the next batch of writes up to the VPB signal must contain application state (PC, optionally PBR, and P), and thus can be intercepted under the assumption that RAM accesses are invalid. Change to supervisor state, and let the OS fix up its broken S register (since the CPU thinks it wrote to RAM but actually didn't), or change to a supervisor stack manually. Read the MMU registers for the captured user-mode state. When ready to return, transfer (or synthesize, if the task you're switching to is about to execute a signal handler) user state onto the stack, and execute an RTI.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 7:53 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Exactly! Thanks for spelling it out (it was only in my head, which isn't a good place even for four bytes)

On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls. It might be that the glue needs to hold a small stack of supervisor mode bits. When the NMI returns it lands in the IRQ handler and when that returns it lands in the BRK handler, which can finally return the OS call results to the user process which used BRK.

I prefer delayed-action switch triggered by write into I/O space because
- user mode can execute RTI but can't write into I/O
- a write into I/O doesn't disturb the machine state, so it can be the last instruction before the RTI

I think the delayed-action only has to account for the fetch cycle (of the RTI) before switching modes - I don't think it's a general-purpose timer.

But then, we could be interrupted in between the write and the RTI. Aargh!

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 8:00 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
If you write to the delayed demotion register, the glue can temporarily hold off NMI and IRQ signals, so that you can rely on an interrupt never occurring between the strobe and the RTI fetch.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 8:06 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Nice!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 29, 2010 8:27 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
BigEd wrote:
On another subject, that of leaving supervisor state (by writing to a delayed-action switch in the glue, or by detecting RTI) I'm still not sure how to cope with (returning from) nested supervisory calls.


You could force the always-1 bit in the SR to zero when the supervisor mode is interrupted and when pulling that from the stack decide whether to leave supervisor mode or not. Just an idea I had for my 65k project. But might be too complicated because you have to monitor (and even change) reads and writes of the status register to/from the stack (that's easier if it's already built into the processor)

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 30, 2010 2:59 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
fachat wrote:
BigEd wrote:
You could force the always-1 bit in the SR to zero when the supervisor mode is interrupted and when pulling that from the stack decide whether to leave supervisor mode or not.

There is no "always-1" bit in the 65C816's SR. The bit to which you are referring is the accumulator/memory size bit when the '816 is running in native mode.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron