There are at least 10 groups who have proposed a 32 bit extension or larger. This would include 65k, 65000, 65020,
65GZ032,
65Org32,
65832,
65E4,
65m02 and tentative steps separately from
Proxy and I to extend 65CE02. It is apparent that many of the proposals have common themes and they typically lean towards TMS9900 by instruction encoding, registers or both. Indeed, we should establish a taxonomy of extension proposals. Ignoring that, your proposal is possibly the most similar to mine. Your proposal differs from mine by emphasising access to zero page. Your proposal also has hints of simplified 65E4. I believe that your proposal might benefit from the
Hudson T flag, although, by graph coloring, there is diminishing return when a 3-address machine has more than six symmetric registers.
About six months ago, I proposed atomic instructions without much success. After
reading the forum archive in full and creating a shockingly similar duplicate (compare
grid processing diagrams from BigEd and I), it is apparent to me that there is a general preference for
CSP rather than
SMP. This is especially true for the Acorn enthusiasts who have made some exotic heterogeneous systems. Ignoring that, forum experts are disproportionately hardware hackers and are strongly empirical. A sticking point for my own development is the hard deadline of 500ns (seven clock cycles at 14MHz) to push state and begin interrupt execution. You can add as many modes and registers as you like. You can include atomic instructions, integer division or FPU. You can implement with any arbitrary technology. However, if your design incurs more latency or jitter than it is inferior to a shipping product and unsuitable for known use cases. It would also be extremely beneficial if any implementation used less than 100,000 gates (30 times more than the original) and can be packaged for rugged deployment.
I strongly agree that an Arduino class device is desirable and especially so after recently reading of success with
MCL65+. We are both less keen about FPGA. However, if you pursue this path, your design must exceed 50MHz otherwise it will be inferior to existing choices. Indeed, it is quite easy to make an 8 bit design which exceeds 50MHz. However, the frequency predictably drops as the design grows. Most predictably, distances double or worse when gates quadruple. There is also the under-appreciated problem that binary addition is, at best, O(log2 n) for n bit registers. Adding more than three numbers may also be a problem. It is for this reason that I recommend avoiding arbitrary segment offsets.
I cannot find any significant problems with the technical details of your proposal.
Some people strongly dislike mnemonics of differing lengths but this is easily rectified.
You might want to provide operand sizes which are binary and/or Fibonacci length. Specifically, 0, 1, 2, 3, 4, 5 and 8 byte.
You might want to scale zero page locations to register width. This would make your design very competitive for running Forth, running Java or rendering PDF. It would also utilize more bits in the instruction stream if zero page locations do not overlap - or at least do not overlap more than historical levels.
I am impressed that you do not add user registers but instead improve connectivity of existing resources.
I am particularly impressed by the originality of applying UTF-8 style encoding to branch offsets. I considered and rejected BER encoding which is much the same thing but with the extension flags spread over each byte. Each encoding has advantages and disadvantages. Bunched flags may be preferable for software implementation. Spread flags may be preferable for micro-coded implementation. However, bunched flags may also be preferable for more advanced hardware implementations.
White Flame is the expert for seven bit encodings and may have considered this in much greater detail.
The real power of 6502 is the no prefix, one byte operand, impure load/store architecture where one operand may be sourced from eight or more places, where a load may be overloaded with six ALU functions and all of this may be intermixed with a smattering of irregular instructions. This would appear to make read-modify-write instructions mostly redundant. However, definitely don't remove read-modify-write instructions. Yes, they go against good design. Yes, they're lousy for IPC. However, they are very good for arbitrary precision counters, arbitrary precision bit rotation and bit mixing.
Some people with similar designs suggest removing zp,X. Definitely don't suggest this. If device driver authors don't
stab you over the Internet then Forth enthusiasts will. Substitutions are acceptable if they range strictly from neutral to positive. For example, I suggested replacing the absolute addressing mode with abs,Z. However, multiple guarantees to clear Z make it 100% downward compatible.
I strongly recommend Arduino for implementation. It is suitable for testing ideas in a concise and accessible manner which may be easily replicated. Efficient representation of 8/16/24/32 bit values will be challenging on all targets. Thankfully, this would be a minor consideration if you start with 600MHz
MCL65+ as the basis for your work.