Posted: Mon Jun 22, 2009 4:01 pm
For VBR's original question - what small delta would you make to 65c02 - I'd start with the '816's B accumulator, and then I'd take some other goodies from the '816. I'd prefer separate opcodes for 16-bit operations, if there were any. I don't much like the modes.
Relocatable code would be good, so I'd throw in BSR for a relative subroutine call, and I'd allow 16-bit offsets for that and for branches. Stack relative sounds good too. Where this fits in the opcode map I don't know: perhaps I'd allow myself prefix bytes.
I'd probably want to sneak in an 8x8 multiply, if I could afford it.
Considering Garth's larger scale suggestions, I'd want to focus on what the implementation might be. If it's to fit a processor into a 40-pin or 44-pin programmable part, then memory will be external and there's not much room for wider address or data. If it's to embed inside a huge FPGA, memory might be internal and efficient use of it would be important. But pincount wouldn't matter.
If you're not thinking of some particular implementation, a paper sketch or an emulation has no limits to what you can throw in. Arguably one of the ARM's original weak spots for implementation was such a rich instruction set that the register file needs an immoderate number of ports. (In fact thinking of the ARM as a reinvented 32-bit 6502 isn't a bad model, if you look at the history.)
The 6502 should be just about implementable in a cheap part which can be put on a breadboard, so I'd go for that first and then add some minor tweaks.
I'd probably throw out some compatibility and try for fewer unused bus cycles, on the guess that memory access will limit performance. A wider data bus could improve performance but you'd need a careful approach for I/O devices. I'd like to consider a 16-bit data bus with byte enables, faithful to byte-addressable memory, but able to do twice as much per access in the best case. (But have I got the extra pins?) A single RAM chip is attractive, otherwise why not a 32-bit data bus, and go for an 84-pin part.
Overall, as a matter of personal preference, I'm thinking small scale single-CPLD incremental improvements, rather than a 32-bit re-architecting.
Relocatable code would be good, so I'd throw in BSR for a relative subroutine call, and I'd allow 16-bit offsets for that and for branches. Stack relative sounds good too. Where this fits in the opcode map I don't know: perhaps I'd allow myself prefix bytes.
I'd probably want to sneak in an 8x8 multiply, if I could afford it.
Considering Garth's larger scale suggestions, I'd want to focus on what the implementation might be. If it's to fit a processor into a 40-pin or 44-pin programmable part, then memory will be external and there's not much room for wider address or data. If it's to embed inside a huge FPGA, memory might be internal and efficient use of it would be important. But pincount wouldn't matter.
If you're not thinking of some particular implementation, a paper sketch or an emulation has no limits to what you can throw in. Arguably one of the ARM's original weak spots for implementation was such a rich instruction set that the register file needs an immoderate number of ports. (In fact thinking of the ARM as a reinvented 32-bit 6502 isn't a bad model, if you look at the history.)
The 6502 should be just about implementable in a cheap part which can be put on a breadboard, so I'd go for that first and then add some minor tweaks.
I'd probably throw out some compatibility and try for fewer unused bus cycles, on the guess that memory access will limit performance. A wider data bus could improve performance but you'd need a careful approach for I/O devices. I'd like to consider a 16-bit data bus with byte enables, faithful to byte-addressable memory, but able to do twice as much per access in the best case. (But have I got the extra pins?) A single RAM chip is attractive, otherwise why not a 32-bit data bus, and go for an 84-pin part.
Overall, as a matter of personal preference, I'm thinking small scale single-CPLD incremental improvements, rather than a 32-bit re-architecting.