Proxy wrote:
Not sure if it's entirely relavent to the thread, but i've been toying with the idea of having a 65C02 core where the ZP and Stack are inside the CPU as 2 seperate 256 Byte Register files with 2 read ports and 1 write port.
See here (
http://forum.6502.org/viewtopic.php?t=6284) for a description of a core that takes that to the extreme. It uses block RAM for everything, duplicates RAM as needed (all copies are written to at the same time), and reads whole instructions rather than a byte at a time.
Proxy wrote:
Though any instruction that writes a 16 bit word (JSR) would still need 2 cycles, as building a Register file with 2 write ports is a massive pain and not worth it.
Also the internal ZP and Stack would only be accesable through their respective instructions, so Instruction fetching and absolute addressing modes would only access external memory (though i guess it could be redirected to the internal memory by just checking the high byte of the address, but it wouldn't give you any speed benefit)
If it's acceptable for locations 0x00aa (absolute) and 0xaa (zero page) to become two different locations. Same for stack locations. In general cases, it's not acceptable. Especially stack locations will often be manipulated with absolute addressing.
Proxy wrote:
Another idea, speed is one thing, but what about width? like why not make the data bus wider ala 8086 sytle? i don't think i've really seen any 65C02/816 cores that have a full 16 bit data bus without massively changing the Instruction set. having a 16 bit wide fully compatible 65C02 would on average cut the amount of cycles required to fetch instructions in half, and any aligned 16 bit accesses would also be twice as fast.
But if driving a full 16 bit data bus is too much (maybe it drags down Fmax too far or it just takes too many IO Pins or traces on a PCB), then just keep it internal.
Have the BlockRAM of the FPGA be 16, 32, or even 64 bits wide with the CPU built to take full advantage of that width, and when accessing external memory cut it into 8 bit pieces (at the cost of speed) so that externally you can just plug it into an existing 65C02 System.
See the core mentioned above. It does most of that.