Review of 65C816

bvold · Post by **bvold** » Wed Sep 21, 2005 12:15 am

I am finishing up a 6502 based computer and am interested in maybe using the 65C816 for my next computer project. Anyone who has used it can you tell me what you think of it. Is it a good processor?

GARTHWILSON · Post by **GARTHWILSON** » Wed Sep 21, 2005 5:11 am

There's quite a bit about the '816 here on the forum, but even finding one of my own posts I was looking for has turned out to be a challenge. (I never did find it.) Anyway, here are a few of the 65816's attractions. (This is definitely not a complete list.)

A, X, and Y can be switched in and out of 16-bit mode at any time. Certain things are a whole lot easier when you can handle 16 bits at a time. My '816 Forth runs 2-3 times as fast as my '02 Forth at the same clock speed, just because each primitive requires so few instructions to get the same job done.

16-bit stack pointer. The stack can occupy any part (or nearly all of) the first 64K of the memory map.

"Zero page" is now "direct page" (DP) because it can be put anywhere in the first 64K of the memory map, instead of being confined to 0000-00FF. It can even straddle page boundaries. In the case of multitasking, each task can have its own zero-page-like access that won't interfere with other tasks'.

It has quite a few more instructions and addressing modes. Block moves can be accomplished without a loop if you set up A, X, and Y and then use MVN or MVP which can move up to 64K at a time with the one instruction, taking 7 clocks per byte. (This can be used to fill blocks of memory too, and interrupts are not forced to wait until the completion of the move or fill.)

For more, especially the things that make the '816 better for multitasking, see my post that starts about 2/3 of the way down the page at http://www.6502.org/forum/viewtopic.php?t=50 .

dclxvi · Post by **dclxvi** » Wed Sep 21, 2005 7:40 am

The 65816 is okay. The (NMOS) 6502 and 65C02 are better, cleaner designs, though. Among the positives are the additional features, like the ability to work with 16 bits at a time, new addressing modes (like stack relative), new instructions (like BRL and PER), and the ability to locate the direct page anywhere in bank zero.

Among the negatives are the caveats (as the 65816 datasheet calls them) of which there are several. Most are the result of using new instructions and addressing modes in emulation mode (note that it doesn't emulate either the 6502 or the 65C02 exactly, even if when confined only to instructions and addressing modes of the 6502 and 65C02), but even in native mode, there are things like the program counter wrapping on a bank boundary. So, for some things it acts like there's a 24-bit address space, and for others it acts like a there's a banked 16-bit address space. From a hardware perspective, multiplexing the bank address on the data bus has had its disadvantages.

kc5tja · Post by **kc5tja** » Thu Sep 22, 2005 2:02 am

dclxvi wrote:

The 65816 is okay.

Okay? Is that it? I feel that it is so overwhelmingly compellingly better than the 65C02 that I simply refuse to consider using a 6502 at all in any new design, especially when (at the time I purchased mine) there is only a $1 difference between the two.

Quote:

Among the negatives are the caveats (as the 65816 datasheet calls them) of which there are several.

I haven't seen any so-called caveats in any of my datasheets.

Quote:

Most are the result of using new instructions and addressing modes in emulation mode (note that it doesn't emulate either the 6502 or the 65C02 exactly, even if when confined only to instructions and addressing modes of the 6502 and 65C02)

As far as I can recall, the 65816 emulates a 65C02 exactly when in emulation mode, judging by what is in my 65816 book here. The 65C02 fixes a number of well-known bugs in the NMOS 6502, so I hate to say that it is indeed the original NMOS 6502 that is the odd one out. In fact, the modern 65C02, 65C802, and 65C816 cores are identical, so far as I've been told, with the only real difference being options set at the time of chip bonding.

Quote:

there are things like the program counter wrapping on a bank boundary. So, for some things it acts like there's a 24-bit address space, and for others it acts like a there's a banked 16-bit address space.

Actually, it always acts like there is a banked 16-bit address space. Always. The 24-bit long-address instructions are included as a programmer convenience, and really are the only instructions that can linearly address all 16MB. Fortunately, this really isn't a problem.

If you need more than 64K of code in a single program image, you're almost certainly doing something wrong. However, it would not be hard for a compiler to conveniently introduce long jumps to the next bank whenever it approached the end of a code bank. Assuming the best-case scenario, 32764 2-byte opcodes taking 65524 clock cycles, the 6-cycle long jump is not going to introduce any noticable delay. Also, compilers should be smart enough not to span loops across bank boundaries.

The reason these design decisions were made is simply this: it required the absolute minimal changes to the 6502 core. To achieve the absolute maximum return on investment, you need to minimize your spending, and the best way to do that is to reuse your existing infrastructure as much as possible.

Be prepared for more "wonkiness" with the Terbium. The more I think about how to achieve Terbium's stated goals, it's looking more and more like my predictions are accurate, such as the use of a 16-bit byte. This is another approach towards expanding the 6502 capabilities with absolute minimum changes to the core. Whenever I try to explain the concept of widening the byte from 8-bits to 16-bits, people's eyes glass over. This is a pity, because a byte is formally defined simply as the smallest *addressable* unit of memory, and has nothing to do intrinsically with its inherent width.

Quote:

From a hardware perspective, multiplexing the bank address on the data bus has had its disadvantages.

Not disadvantages, but challenges. I agree that it requires more parts if you want to exploit this feature. If you meet the challenges, there is no real difference in operation between a 6502 and a 65816. The bus tenures are exactly the same.

However, nothing says you HAVE to use these features at all. My current Kestrel design, Kestrel 1, currently ignores the bank address byte outright, effectively using the 65816 processor as a glorified 65802. Though, I personally look forward to exploiting the presence of the bank address byte on the data bus in my next Kestrel design.

Memblers · Post by **Memblers** » Thu Sep 22, 2005 3:53 am

I think it's great, even if you treat it like a normal 6502 with 16-bit index regs. The only tricky part for me was switching often between 8 and 16-bit accumulator. With more flexibility comes more ways to make mistakes.

kc5tja · Post by **kc5tja** » Thu Sep 22, 2005 5:32 pm

Memblers wrote:

I think it's great, even if you treat it like a normal 6502 with 16-bit index regs. The only tricky part for me was switching often between 8 and 16-bit accumulator. With more flexibility comes more ways to make mistakes.

Oh, yes, that is so very true. I remember the first time trying to get Kestrel to boot some code that I've written, and wondering why it wasn't working. It sure would have helped had I put the CPU into native mode first, eh? ;D

(And then, after that, there was the minor issue of the assembler not reporting an error for an unsupported opcode form which, though documented as a valid alias, the assembler still didn't recognize. In particular, I'm talking about TAS versus TCS. TAS did nothing, and even failed to emit an opcode byte. TCS worked. Go figure.)

GARTHWILSON · Post by **GARTHWILSON** » Thu Sep 22, 2005 8:40 pm

Since the REP and SEP instructions are so cryptic, just define macros ACCUM_16 and ACCUM_8, INDEX_16 and INDEX_8. They've helped me avoid all such trouble.

kc5tja · Post by **kc5tja** » Fri Sep 23, 2005 4:13 am

GARTHWILSON wrote:

Since the REP and SEP instructions are so cryptic, just define macros ACCUM_16 and ACCUM_8, INDEX_16 and INDEX_8. They've helped me avoid all such trouble.

They don't do a whole hell of a lot when you're still in emulation mode.

dclxvi · Post by **dclxvi** » Fri Sep 23, 2005 6:19 am

kc5tja wrote:

Okay? Is that it?

Yes. Thanks for asking.

kc5tja wrote:

I haven't seen any so-called caveats in any of my datasheets.

See section 8 (pp. 56-60) of:

http://www.6502.org/documents/datasheet ... b_2004.pdf

and:

http://www.westerndesigncenter.com/wdc/ ... 5c816s.pdf

(Actually, those are the 65802 caveats, but they also pertain to the 65816.) This leads to today's 65816 trivia question:

If e=1 and S=$00 (i.e. $0100) when (i.e. before) the sequence PHB PLB is executed, what is the (24-bit) address that PHB writes to and what is the address that PLB reads from? Bonus points if you can find where WDC's documentation tips you off about this.

kc5tja wrote:

As far as I can recall, the 65816 emulates a 65C02 exactly when in emulation mode, judging by what is in my 65816 book here.

It's very close to, but not exactly like, a 65C02. Here are a few examples. A JMP ($12FF) reads the high byte from $1300, like a 65C02, but takes 5 cycles like a 6502. This is an important difference when you need exact timing. Likewise an ASL absolute,X takes 7 cycles regardless of whether a page boundary is crossed, like a 6502. A LDA $FFFF,X when X is non-zero reads from bank 1 (assuming B, the DBR, is zero) on a 65816, though it would almost never be found in a 65C02 program. In the realm of undocmented behavior there's:

Code: Select all

SED
SEC
LDA #$20
SBC #$0F

The result in the accumulator is $1B on a 65816, same as a 6502 (the 65C02 accumulator result is $0B). Also, only the 65C02, not the 6502 or 65816, takes an extra cycle in decimal mode. Yet on the 65816, the N and Z flags are based on the decimal result, like the 65C02, not the 6502. Of course, I can't fault any differences in undocumented behavior -- making changes there is fair game. It's merely an example of something that functions partly like the 6502, and partly like the 65C02.

kc5tja wrote:

Actually, it always acts like there is a banked 16-bit address space. Always. The 24-bit long-address instructions are included as a programmer convenience, and really are the only instructions that can linearly address all 16MB.

No, not always. Several other addressing modes can cross page boundaries. For example, when e=0, m=0, and B=$12 (DBR), an ASL $FFFF (i.e. absolute addressing) performs a 16-bit shift whose low byte is at $12FFFF and whose high byte is at $130000.

kc5tja wrote:

If you need more than 64K of code in a single program image, you're almost certainly doing something wrong.

I agree that in almost all cases program code should and will fit in 64k. Using, a large unrolled loop (for, say, fast video or something) does not seem unreasonable, yet it's something that could make the code larger than 64k.

kc5tja wrote:

However, it would not be hard for a compiler to conveniently introduce long jumps to the next bank whenever it approached the end of a code bank.

This is true if the code is intended for a fixed location (or at least when the lower 16 bits of its location are fixed). On a 65C02, if the final location isn't known at compile time, relocating a program at load time is fairly straightforward. The program can be located anywhere with enough available memory, unless, for example, exact timing is needed and page boundary crossings must be avoided. However, such programs are the exception, not the common case. On a 65816, bank boundary crossings have to be considered for every program at some point, at compile time, at load time, or what have you.

Is there really a benefit to having the program counter wrap on a bank boundary? Sure, it saves a little silicon because the 65816 does not have to increment (or, when branching backward, decrement) K (the PBR), but to me this is just creating an idiosyncrasy that you have to program around.

kc5tja wrote:

Not disadvantages, but challenges. I agree that it requires more parts if you want to exploit this feature. If you meet the challenges, there is no real difference in operation between a 6502 and a 65816. The bus tenures are exactly the same.

One of the early advantages of the 6502 and 65C02 was they didn't use the data bus for half of the clock cycle. You lose this with a 65816 due to the multiplexed data bus. The only thing that multiplexing the data bus really gives you is it allows the 65816 to have a standard (40-pin) DIP package. Imagine if the PLCC version was the standard 52-pin package and used the extra pins for the bank address rather than multiplexing it on the data bus. That would making it an extremely attractive choice even in hobbyist environments where DIPs are usually far more convenient than PLCCs. "Challenges" isn't quite a strong enough term, IMO.

All in all, I think the 65816's advantages outweigh the disadvantages (yes, I actually like the thing), but I was anticipating gushing responses to the original post, and I thought I'd play devil's advocate a bit in the interest of balance.