@randyhyde, I hadn't seen this post when a few months ago I posted my own thoughts on an improved 65xx on
viewtopic.php?f=1&t=7223. I read through your 65000 design and found quite a lot of great ideas that I've never seen in all my years coding obscure CPUs. It succeeds at being very 6502-esque.
My thoughts were more along the lines of "What might Apple have done if they had control over the 65xx chip design in 1978?" with the constraints that changes had to be incremental, possible with the technology of the day, and as much as possible, backward compatible.
My 652402
https://github.com/lunarmobiscuit/verilog-65C2402-fsm demonstrates one possible simple step-up in capability, changing only the address bus to 24-bits without touching A,X,Y, or S. Imagine how much simpler the ][e or C128 would have been to code with a flat, 128K of memory space?
My 652424
https://github.com/lunarmobiscuit/verilog-65C2424-fsm then demonstrates another way to grow the register widths, without throwing out the existing opcodes, without any new modes, without even using that unused bit in the status register.
That same design could then grow to be the 653232 or even 654848, although at some point it gets silly to keep an 8-bit bus, and once you grow that, you might as well switch to 16/24/or 32-bit opcodes and transcode for backward compatibility.
Finally, on your website your design doesn't optimize for threads, but I looked at that too. The 6524T8
https://github.com/lunarmobiscuit/verilog-65C24T8-fsm demonstrates a 65C02 with 8 sets of registers, including 8 PCs, with just a few new opcodes to make multi-threading instant and easy. This, of course, gets less and less practical as the registers get larger. However, combined with your zero-page-is-special concept, if the thread id is exported on pins, then each thread could have its own unique zero page and thus its own set of 256 registers.
Anyhow... I thought you might find some of my ideas useful in your work.