Say, you can do 32-bit wide memory accesses. Why not exploit this in the core ? Gets rid of all those pesky byte-wide instruction fetches. Instant speedup. E.g. simply replace every opcode fetch with a 32-bit memory read. Combine result with leftover bytes from any previous read (shift register ? swap between two 32-bit registers and multiplex ?). Voila, opcode fetches have become instruction fetches. Argument bytes instantly available. E.g. LDA abs reduces from 'opcode fetch, low byte fetch, high byte fetch, access' to 'instruction fetch, access'.
Say, you can do 32-bit wide memory accesses. Why not exploit this in the core ? Gets rid of all those pesky byte-wide instruction fetches. Instant speedup. E.g. simply replace every opcode fetch with a 32-bit memory read. Combine result with leftover bytes from any previous read (shift register ? swap between two 32-bit registers and multiplex ?). Voila, opcode fetches have become instruction fetches. Argument bytes instantly available. E.g. LDA abs reduces from 'opcode fetch, low byte fetch, high byte fetch, access' to 'instruction fetch, access'.
Who's first ?
Sounds like the description of an 80386 on steroids.
As ElEctric_EyE responded, "Do it! And then prove it in a real world system. Make your own board,etc. We all welcome this. See if it is so easy as you think..."
Oh, almost forgot. We want 65C816 native mode compatibility, plus the capability to run NMOS 6502 software, other than undocumented opcodes. How soon can you have it ready for beta?
Do it! And then prove it in a real world system. Make your own board,etc. We all welcome this. See if it is so easy as you think...
Well, one of my own designs uses MichaelM's 65C02 core. I could probably adapt it. But I'd rather the author did it. It's only common sense.
The implementation of the idea is actually quite 'orthogonal' to most core designs I've seen, and relatively simple, compared to what else goes on in there. Maybe if you'd think about it a little first, you'd realize that.
Sounds like the description of an 80386 on steroids.
As ElEctric_EyE responded, "Do it! And then prove it in a real world system. Make your own board,etc. We all welcome this. See if it is so easy as you think..."
Oh, almost forgot. We want 65C816 native mode compatibility, plus the capability to run NMOS 6502 software, other than undocumented opcodes. How soon can you have it ready for beta?
Grumpy Gus? Hardly. What we are trying to say is if it were as easy as you make it to be, it would have already been done. Enthusiasm is always beneficial, but isn't a substitute for plain old hard work.
Over the years, there have been various true-32-bit 6502 concepts discussed, and designs that were never brought to market, but I think the truest 6502-like 32-bitter is the 65Org32 discussed at viewtopic.php?f=1&t=1419 , and which we would like to work toward. It's basically like an 65816 with 32-bit non-multiplexed address and data buses, and all of its registers being 32-bit (except maybe the status register). Since even the S and DP and bank registers are 32-bit, there are no page or bank boundaries, so these registers just become offsets that different tasks can use, and the entire 4gigaword address space is available to everything. I'm not sure what that does for those who are interested in things like memory protection, but it's still a true 65-family processor, unlike some other designs I've seen that might power up in a 6502-emulation mode of some kind and then turn into something entirely different. There are plenty of different processors already available to choose from, and I doubt we really need another one; but it would be cool to have a 6502-like one that could handle 32-bit quantities in a single gulp and not have the page and bank boundaries.
ElEctric_Eye is working first on what I understand to be a double-wide NMOS 6502 (although he seems to be on a productive rabbit trail with video at the moment). I've thought about preliminarily emulating the 65Org32 above with a microcontroller (which would require nearly 70 I/O pins), not expecting any performance that's worth a hill of beans from this approach, but to experiment with the instruction set and an assembler and a Forth kernel, and there would probably be others who would like to experiment with multitasking OSs and compilers for other languages.
OK, windfall's idea is to use 32-bit memory. Fetch 32 bits into a 4-byte buffer, then use it as a cache. However, the problems I see are
- Why are you using 32-bit memory with an 8 or 16-bit processor anyway?
- First read will cost a cycle (with SRAM), unless a bypass of some kind is implemented.
- Subsequent reads will not be aligned, requiring a delay to read more if instruction does not fit.
- The complexity of this prefetch circuit greatly overshadows any benefit. It is likely to slow down the fetching, not accelerate it.
What exactly are you trying to gain here? For classic 6502 emulation, cycle accuracy is needed, obviating any need for this kind of buffering. For fast 6502 cores - well, are there any cores that run faster than byte SRAM? 100MHz is pretty much the limit with homemade FPGA cores, 10ns SRAM is plentiful.
Edit: if you are unlucky enough to use a devboard with 32-bit drams, throw it away and get one with SRAM. A DRAM controller is a horrible thing, often as big as a 6502 core itself. DRAM refresh will often delay reads by tens of cycles. Fetching 32 bits is the least of your problems.
So without being grumpy - bad idea (for 6502-related cores anyway) as it fails to solve a problem that does not exist in the first place.
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut
It would be very challenging to rewrite the classic 6502 core so it would use less cycles for the same instructions. That much effort would be better spent on a 16 or 32 bit processor to start with. Take the (ind), Y modes for instance. After fetching the opcode+operand you still need to fetch 2 zero page bytes, and the actual data.
Another option, which would be easier to implement, is to use a cache inside the FPGA. This cache could have an 8 bit interface to the 6502 core, but a 16 or 32 bit interface to external memory, reducing the amount of memory cycles it would take to read/write a cache line. This would work nicely in combination with SDRAM, and video access. It would increase memory bandwidth, but without complicating the 6502 core itself.
Thanks to Windfall anyway, I for one like the speculation threads that pop up now and then, even when an idea may be impractical - there's always some interesting discussions around as to why, and other ideas are spawned. And as I was reading this thread I got some ideas for my own emulators, not directly related, but when something wakes the mind up it starts churning. Ideas generate other ideas, even distant ones.
Thanks to Windfall anyway, I for one like the speculation threads that pop up now and then, even when an idea may be impractical - there's always some interesting discussions around as to why, and other ideas are spawned.
+1. Incidentally, I was reading a bunch of posts from 10+ years ago, and was impressed with how much this forum has grown in expertise since then.