So the thread about how "yucky" the 65816 (and broccoli) is got me thinking: If the problem with this chip is its complexity, maybe we could use an assembler/compiler to create a simpler, cleaner virtual 16 bit CPU on top of it -- a "65V16". Completely hide emulation mode, get rid of decimal mode, make X and Y full-time 16 bit registers, make 16 bit C the default, but expose A and B as ("slow") 8 bit registers. Make the memory seem like it is linear 16 Mb of RAM, with the exception of the reserved "Zero Bank" (so program memory starts at $01:0000). No more PBR or DBR! We would end up with something like this:

We give the C register all the modes we have for A (LDC, PHC, STC, TCX, etc), and reserve a few for A as 8 bit access for I/O chips (LDA, STA, PHA; even LBA, STB, PHB). We can keep an "expert mode" for direct entry of pure 65816 code, for instance for interrupt handling (more on that later). Other than that, those "yucky" details of the chip are hidden. No SEP, REP or XCE, for instance - mode switches are handled unter the hood. The Status Register as exposed to the user is shrunk down to N, V, I, Z, and C.
Obviously the resulting code would be slower, larger, and have less flexibility than native 65816 assembler. We lose the 1:1 of instruction and machine code. However, it would be a
lot easier to work with and expecially to code, which means the programmer saves time unterstanding, writing, and debugging the code. I'm guessing that for a lot of us hobbyist types, less complexity might be worth a bit of bloat and slower execution: The limiting factor is the time we can spend at the keyboard.
There are some obvious problems with this scheme, starting with the fact that RESET dumps us in emulation mode again. An assembler/compiler would have to add some boilerplate code to make sure we get back to 16 bit land ASAP. Also, I'm not sure how you would go about converting the virtual linear memory addresses to the hardware segment model, but this can't be a new problem. And I'm sure there are other problems I haven't thought of. I don't think this could be handled with a simple single-pass assembler

.