65816 system engineering pros and cons

BigEd · Post by **BigEd** » Tue Dec 28, 2010 6:06 pm

fachat wrote:

... the 65816 does not make it easy to build a simple system.... That's why I started to design my own 6502 extension...

and I thought it might be good to have a thread specifically about the features and difficulties of building with the 65816.

To recap, the 65816 offers
- a 6502 compatible mode, which it uses at reset time and optionally thereafter
- optional modes for 16-bit registers, arithmetic and/or and memory accesses
- 16-bit stack pointer
- base address register for zero-page accesses (now called direct page)
- a 24-bit address space, with
- an 8-bit wide databus, used part-time for the high byte of the address bus
- bank registers, to allow 16-bit pointers to access programs or data in the 24-bit address space
- some addressing modes which allow for 24 bit addresses
- some stack-relative instructions
- long branches with 16-bit displacement
- ABORT input
- outputs to indicate the mode, vector accesses, valid addresses, valid fetches, rmw accesses

but it has some drawbacks, such as
- no supervisor mode
- 16-bit bank boundaries are visible
- address bus is not 24 dedicated pins
- a need for conventions and discipline as to which modes are used, and when.
- some structures are forced to be in bank 0 (stacks, direct page, vectors, interrupt handlers)
- timing of bank address (on data bus) and ABORT input are relative to rising edge of phi2 (which is rather early at high clock speeds)

Any other observations about the usefulness or difficulty of building systems with this processor?

Edit: Just to note that simple systems remain simple with the 65816 - it's a relatively easy design job to replace an existing 6502, or to design a system around it with say 512Mbyte of SRAM. Things start getting complicated when the system is to support slow peripherals, have memory protection or multiple bus masters. A simple system doesn't need to use the extra pins such as RDY, ABORT, MPB, VPB, VDA, VPA, M/X or E - conversely, if the system seems to need those pins, it's complex and the designer will need a good understanding of what they are for and how they are to be used.

kc5tja · Post by **kc5tja** » Tue Dec 28, 2010 6:20 pm

The inability to add an (preferably signed) immediate value to the X or Y registers directly makes traversing structures a bit more difficult than they otherwise would need to be. If the 6502/65816 had such an ability, it'd make implementing Forth, and I suspect other languages, much easier.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Dec 28, 2010 7:31 pm

kc5tja wrote:

The inability to add an (preferably signed) immediate value to the X or Y registers directly makes traversing structures a bit more difficult than they otherwise would need to be. If the 6502/65816 had such an ability, it'd make implementing Forth, and I suspect other languages, much easier.

By "add," do you mean arithmetic addition?

kc5tja · Post by **kc5tja** » Tue Dec 28, 2010 7:47 pm

Yes. If I want to add 16 to the X register, I should not have to resort to sixteen INX instructions. That's 32 cycles when, say, 7 to 9 might do. Otherwise, I have to trash the A register:

Code: Select all

; assuming we don't care about the value in the accumulator . . .
txa                 ; 2 cycles
clc                 ; 2 cycles
adc #16             ; 2(3) cycles
tax                 ; 2 cycles

BigEd · Post by **BigEd** » Tue Dec 28, 2010 8:25 pm

To be honest, I'm alarmed by the possibility of this becoming another thread about improved processors - we have lots of those already. Any comments on system engineering with the 65816?

kc5tja · Post by **kc5tja** » Tue Dec 28, 2010 8:45 pm

I'm sorry -- I thought the discussion was open to system-wide engineering concerns, and not restricted to hardware-only issues. Since I practice hardware/software co-engineering in my projects, I tend to blur the lines between hardware and software frequently.

BigEd · Post by **BigEd** » Tue Dec 28, 2010 9:02 pm

No harm done, I hope! In my mind, systems engineering would cross into software territory like OS calls, memory protection and supervisory mode. But, rather than suggesting ways to change the chip, I'd hope to see workarounds or inventions:
Q - I can't (do this thing) with the 816
A1 - Have you considered (this approach)
A2 - Note that (another system) managed without (that facility)
A3 - Here's a completely different way to get the same system feature...

Cheers
Ed

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Dec 28, 2010 9:02 pm

kc5tja wrote:

I'm sorry -- I thought the discussion was open to system-wide engineering concerns, and not restricted to hardware-only issues. Since I practice hardware/software co-engineering in my projects, I tend to blur the lines between hardware and software frequently.

What you are referring to is really a software engineering matter. All processors have limitations in their instruction sets. As the 65C816's accumulator is the only general purpose register, the inability to increment the X-register 16 times in one instruction is not really a hardware (read: system) issue (yes, something like INX 16 would be useful). It's a design constraint that is part and parcel of designing a "low power, cost-sensitive 8/16-bit microprocessor." Think of not being able to increment the index registers an arbitrary number of times in a single operation as a "feature" that separates the men from the boys in assembly language programming.

kc5tja · Post by **kc5tja** » Tue Dec 28, 2010 9:08 pm

Or, it separates the men from the boys by programming a 256 byte chunk of read-only I/O space in CPLD or FPGA, where the value read is the sum of A7-A0 on the bus and some value stored in another holding register. So, you can do:

Code: Select all

STX xHold
LDX $AB0010  ; add 16 to X -- bwaahahahah!!

Since this now crossed into hardware, does this now become a systems issue? ;D

kc5tja · Post by **kc5tja** » Tue Dec 28, 2010 9:17 pm

bigEd -- as discussed elsewhere, it looks like the assertion of the VPB signal could have been timed more conveniently. What I don't know, though, is if it's a con or a pro, but more like "none of the above". It's a pro from the point of view of banking in vectors on the bus where you'd expect to find uncommitted RAM, but it's a con from the point of view of recovering from a doubly-nested ABORT.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Dec 28, 2010 10:20 pm

One aspect of the '816 I don't like is the MUX'd upper address lines, I know I'm in the majority there!

We can all come up with our own banking schemes for more memory, but the blur here for me is which commands will I be sacrificing if I decide to use the '816 for some additional opcodes, (especially 16-bit!), but don't use the MUX'd upper address lines.

I'm sure I could answer my own question if I did some serious research... I hope I've not posted in total ignorance and laziness!

fachat · Post by **fachat** » Tue Dec 28, 2010 10:21 pm

So I guess I DO have to comment on that ;-)

BigEd wrote:

Elsewhere, André said:

fachat wrote:

... the 65816 does not make it easy to build a simple system.... That's why I started to design my own 6502 extension...

and I thought it might be good to have a thread specifically about the features and difficulties of building with the 65816.

First of all, I think the 65816 is a nice chip, with lots of abilities that make it much greater than the 6502. It is probably the best you could get for in these days in terms of functionality per silicon/chip area.
It is only in retrospective now, with changing requirements (thinking of program sizes of more than 64k, supervisor mode) that I find the 65816 shows its age.

Quote:

To recap, the 65816 offers
..

Quite some great offerings. In terms of system engineering I do like the ABORT input. It is also probably one of the larger chip area drivers, as you'd have to provide copies for all the changed registers that are only updated when the opcode is truly finished.

The time-shared data bus is an annoyance, but can easily be worked around. WDC could have thought of another hardware option with full external address bus though.

VDA/VPA give much more information than the 6502's SYNC output. Together with VP, E, M/X it allows to use the 65816 from simple systems to very complex systems with the 65816 as a core only and glue logic providing higher level functionality - speaking of supervisor mode for example etc.

Quote:

but it has some drawbacks, such as
- no supervisor mode
- 16-bit bank boundaries are visible
- address bus is not 24 dedicated pins
- a need for conventions and discipline as to which modes are used, and when.
- some structures are forced to be in bank 0 (stacks, direct page, vectors, interrupt handlers)
- timing of bank address (on data bus) and ABORT input are relative to rising edge of phi2 (which is rather early at high clock speeds)

Concerning system engineering:
- supervisor mode could be implemented with external logic [1]
- 16-bit boundaries are visible to programs (only). In today's expectations to be able to use large memory areas, a major annoyance. But for multiple processes with each less than 64k memory requirements, I don't see problems.
- no full external address bus - an annoyance, but easily worked around.
- modes and conventions - again a problem more for the programming side. Today you try to avoid places where things may go wrong - using mode bits invites such possibilities (in my opinion!)
- Structures in bank 0 - a major annoyance! On one side more a software problem. But when you start thinking of protecting processes from each other, it becomes a hardware problem as well. You could build some hardware that maps "user space" stack and direct page to another bank though [2]
- ABORT is required to be asserted on rising phi2? Outch. That's a tough timing requirement.

Quote:

Any other observations about the usefulness or difficulty of building systems with this processor?

If you separate processes out into different banks, they still need some way to call supervisor/kernel mode functions. If you use BRK or COP opcodes, you'd not need any shared memory or code. But in general you'd need some libraries (similar to libc - which runs in user space, but is read-only shared between processes) that requires address space. You can't share memory without external mapping hardware, and mapping some shared memory e.g. in the uppermost 8k of each bank leaves holes of unused RAM (if you don't make up some more complex re-mapping scheme).

Because of this I've come to love my simple MMU-based scheme - where you can easily map virtual (processor) addresses to physical (memory) addresses as you like (in my case in blocks of 4k). But which has other drawbacks, e.g. re-loading the mapping on each context switch.

André

[1] e.g. using VP, or detection of TXS opcodes to switch to supervisor mode, and extended BRK or COP opcodes to switch back
[2] A "user bank" register could map bank zero to this bank when in user mode. No need to detect TXS opcodes in this case actually to protect other processes' stacks. On the other hand such a hardware increases address decoding delay, reducing max speed

fachat · Post by **fachat** » Tue Dec 28, 2010 10:38 pm

So how could you build an advanced system using the 65816... Let's talk about the requirements:

- multitasking, multiprocessing
- memory protection between processes
- 64k max memory per process
- user mode vs. supervisor mode
- hardware registers only available to supervisor mode

You'd have to decide what mapping scheme to use. The most easy "natural" one for the 65816 is one process per bank. Or at least one memory environment per bank (which may contain one or even more processes). You'd sacrifice the ability to run programs with more than 64k memory requirements. But on the other hand, the banking comes natural to the 65816's opcodes. Except for zero/direct page and stack, a bank looks like a normal 6502.

But with this decision you could do use a CLPD with a "user bank" register that monitors the 65816 and internally switches between user and supervisor mode.

In supervisor mode ("bank 0") the 65816 works as it is. In user mode the processor's bank address byte is ignored and instead replaced with the content of the "user bank" register. Thus the user mode cannot "escape" the user space memory. Even processor state on interrupt would be stored on the user space (bank) stack, not in supervisor memory.

How would such a system switch between user and supervisor mode?
- supervisor mode is entered on vector pull (VP), and possibly other signals
- opcodes like BRK or COP could be used to call supervisor mode functions from user mode. Supervisor mode code can access user mode to transfer data.
- opcodes like TXS need not be detected, as stack is mapped into the user bank (away from bank 0)
- user mode could be entered by monitoring RTI opcode. The interrupt routine could increase a counter in the CPLD that is decreased on RTI, so that when the counter reaches zero/underflows user mode is entered (to accommodate stacked interrupts for example).

Could be quite an interesting system thinking about it now... :-)

André

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Dec 29, 2010 12:17 am

fachat wrote:

So I guess I DO have to comment on that

First of all, I think the 65816 is a nice chip, with lots of abilities that make it much greater than the 6502...it is only in retrospective now, with changing requirements (thinking of program sizes of more than 64k, supervisor mode) that I find the 65816 shows its age.

To some extent, that could be said about other contemporary MPUs. Consider at all the baggage dragged around by the x86 architecture, which is almost as old as the 65xx family.

Quote:

The time-shared data bus is an annoyance, but can easily be worked around. WDC could have thought of another hardware option with full external address bus though.

That feature (?) stems from the requirement by Apple that 65C02 compatibility be retained. There weren't enough pins available on the DIP40 package in which the first generation 65C816s were made. Now, if WDC had used a larger package, e.g., PLCC68, A16-A23 could have been separately brought out (as well as the output side of RDY). That, and if money grew on trees we'd all be wealthy.

Quote:

VDA/VPA give much more information than the 6502's SYNC output. Together with VP, E, M/X it allows to use the 65816 from simple systems to very complex systems with the 65816 as a core only and glue logic providing higher level functionality - speaking of supervisor mode for example etc.

Conversely, accounting for those signals means the use of basic glue logic become problematic. You could conceivably implement fine-grained functionality with a bunch of discrete gates—assuming performance isn't going to be important to you. Realistically, a high performance '816 system is going to required use of a CPLD to avoid cumulative propagation delays.

Quote:

- Structures in bank 0 - a major annoyance!

My principle beef with the '816 design. Unfortunately, a necessity to maintain 65C02 compatibility.

Quote:

- ABORT is required to be asserted on rising phi2? Outch. That's a tough timing requirement.

It's tight, I agree, but not impossible. Again, a GAL or CPLD would be necessary to achieve a quick response to an illegal memory condition. If you peruse the timing diagram, it is indicating that ABORT should be asserted shortly before the rise of Ø2, defined as tPCS. By then, the bank and address will be valid and aborting will prevent an register or memory changes. ABORT should be de-asserted immediately after the fall of Ø2.

Quote:

- multitasking, multiprocessing
- memory protection between processes
- 64k max memory per process
- user mode vs. supervisor mode
- hardware registers only available to supervisor mode

You'd have to decide what mapping scheme to use. The most easy "natural" one for the 65816 is one process per bank. Or at least one memory environment per bank (which may contain one or even more processes). You'd sacrifice the ability to run programs with more than 64k memory requirements.

You can't run a larger than 64K program without special gymnastics, because PBR doesn't increment when PC wraps. You're stuck in the same bank unless you execute something like JMP <NEWPBR><$0000> to get to the start of the next bank. So in that regard, you really haven't sacrificed anything.

Quote:

But with this decision you could do use a CLPD with a "user bank" register that monitors the 65816 and internally switches between user and supervisor mode.

In supervisor mode ("bank 0") the 65816 works as it is. In user mode the processor's bank address byte is ignored and instead replaced with the content of the "user bank" register. Thus the user mode cannot "escape" the user space memory. Even processor state on interrupt would be stored on the user space (bank) stack, not in supervisor memory.

This is one of the memory management schemes I've contemplated. Your next question highlights the fundamental problem with ignoring the bank when not in supervisor mode.

Quote:

How would such a system switch between user and supervisor mode?
- supervisor mode is entered on vector pull (VP), and possibly other signals

Risky. If ABORT is asserted during any stack operation it is likely the MPU will get put into a death spiral, since ABORT causes VP to be asserted as well. The rub is that ABORT, like all other interrupts, pushes the MPU state to the stack. If the illegal memory access involved a stack operation the MPU will choke on an endless cycle of ABORTs.

Quote:

- opcodes like BRK or COP could be used to call supervisor mode functions from user mode. Supervisor mode code can access user mode to transfer data.

BRK seems to be the better choice here, as the signature byte can be anything. Hence BRK can invoked up to 255 separate kernel functions, assuming signature byte $00 is used to set program breakpoints for debugging. Only 128 COP signatures are designated for user use.

As the MPU executes BRK (or other interrupts) it pushes its state onto the current stack, which would be in user space. If entering supervisor mode on a BRK instruction, how do you get the exit status of the called function back to the user process? After all, when the RTI is executed at the end, the user space has to be mapped back in so the MPU can pick up SR, PC and PBR. The problem is the value pulled back into the status register (SR) is that which was pushed when the BRK was executed. How would one fix that little contretemps?

Quote:

- opcodes like TXS need not be detected, as stack is mapped into the user bank (away from bank 0)

Also true of other stack operations, such as PHD, PHK, etc.

Quote:

- user mode could be entered by monitoring RTI opcode. The interrupt routine could increase a counter in the CPLD that is decreased on RTI, so that when the counter reaches zero/underflows user mode is entered (to accommodate stacked interrupts for example).

The VP line could be used to increment the counter, as VP is involved on any interrupt. However, the counter will become unbalanced if user-mode code "misuses" RTI for vectoring the MPU.

Quote:

Could be quite an interesting system thinking about it now...

Isn't that what this hobby is all about?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Dec 29, 2010 12:27 am

ElEctric_EyE wrote:

One aspect of the '816 I don't like is the MUX'd upper address lines, I know I'm in the majority there!

We can all come up with our own banking schemes for more memory, but the blur here for me is which commands will I be sacrificing if I decide to use the '816 for some additional opcodes, (especially 16-bit!), but don't use the MUX'd upper address lines.

I'm sure I could answer my own question if I did some serious research... I hope I've not posted in total ignorance and laziness!

The long addressing modes wouldn't have any effect if the bank address is not picked up. The effect would be the same as if the only address space was bounded by 16 bits.