I just registered, but I have some ideas as well, many of them from some RISC processors. I am not proposing a RISC processor however because that's not what a 65xxx is anyway. It is possible that some think my ideas don't retain the spirit of 65xxx processors but I think they definitely do, while improving it at the same time. This is not a finished plan and probably contains some errors or stupidities. I don't propose anything like deep pipelining, out-of-order execution or large caches that make desktop processors so complicated nowadays, as this processor should fit in a FPGA. Constructive criticism is excepted and welcome
Here comes:
- First of all, processor would be 32-bit internally as far as registers are concerned, but data bus would be 16 bits for ease of implementation (and fewer pins would be required as well). I will explain "ease of implementation" shortly.
- There would be two accumulators, A and B, like in 6809. There would be four index registers, X, Y, Z and SP. Everything 32-bit, of course. I believe this would improve support for high-level languages and also make machine language programming easier. Address space should be flat and any banking is to be avoided.
- All instructions would be either 16-bit or 32-bit (maybe 48 bits in some cases) in length, with a 16-bit opcode and possibly a 16-bit (or 32-bit) data word. This combined with a 16-bit data bus would ensure there wouldn't be unaligned instructions, ever. Large constants could be put in a table or loaded with several instructions (LDA.W #HIWORD; ASLA #16; ORA.W #LOWORD) and index registers could be used for address calculations, but to ease machine language programming some 32-bit absolute/immediate instructions could be provided. None of those instructions should be needed in principle though.
- There should be 8-bit, 16-bit and 32-bit load/store instructions with separate opcodes. This also applies to instructions between an accumulator and a memory location.
- There would be ADQ (ADd Quick) that would replace INC/DEC and fit in 16 bits, like in 680x0. The range could be +/-8, covering all common indexing cases and being more powerful than double INC/DEC as proposed earlier.
- As there are 65536 opcodes available and maybe about 256 needed, there could be an optional conditional execution for every instruction, like in ARM. This could be used to eliminate branches over few instructions and would be zero-cost. There should still be enough space to encode shift counts etc. to an 16-bit opcode.
- Fast divide/multiply instructions would be provided if feasible. Floating point wouldn't be supported, except maybe by a separate co-processor.
- There could be a few IRQ vectors/lines as well, to make interrupts faster.
Putting on the asbestos suit..