We've had various topics on the '816, but here's something I don't remember any of them having addressed. Someone said they wished that instead of the M and X flags, it had separate op codes for 8- versus 16-bit operations, but IIRC, only mentioned loads and stores. That by itself would already require two-byte op codes since the table is full as it is; but the fact is that you'd have to have separate op codes also for any register or memory operation that affects the C, N, or V flags. ROR A for example, even though it does not involve memory, still needs to know whether to rotate the C flag into bit 7 or bit 15, also which bit to reflect in the N flag. INA and DEA, even though they do not involve memory, still need to know whether to operate on 8 or 16 bits, because if the accumulator is in 8-bit mode, it's actually two 8-bit registers, A and B, and you might be expecting data in B to be left undisturbed. PHA, PLA, PHX, PLX, PHY, and PLY (a different kind of load and store) could be made to always be 16 bits, but that would be more wasteful of memory and execution time.
There's more, but these should make the point. I have wished sometimes that Bill Mensch hadn't been required to make it backward compatible with the '02; but I think he did a great job given Apple's requirements.
Regarding the not-quite-linear address space, Samuel Falvo (forum name kc5tja), professional programmer, said the key is to make the banking work
for you, not
against you. Now that memory (even SRAM) is super cheap compared to what it cost back when the Apple II
GS was introduced, one could extend the 65816 design with 32-bit buses and all 32-bit registers (except maybe the status register which probably doesn't have any use for 32 bits); and then the program and data bank registers and direct-page register simply become offsets into the entire 4-gigaword address space, à la
65org32.
If any of the 32-bit 65-family processor designs ever become reality, I expect Michael Barry's (forum name barrym95838)
65m32 might be the closest to it. His scheme is more efficient than the 65org32's but has less of the familiar 65 flavor.