Re: Socgen project
Posted: Sun Feb 17, 2013 5:09 pm
If so, if the 65Org16.b core runs on a Xilinx Spartan 6 FPGA at 100MHz, how fast could it run on an ASIC?
Todays processes can boast clk speeds into the multi gigihertz but that requires custom designs and significant pipeling. If you can run your clock @ 2 Ghz but if it needs a 20 deep pipeline to meet timing then each instruction is still running @ 100 Mhz. The speed comes from as long as you do not branch and don't have to wait for one instruction to finish before starting the next one then the following operation will finish 500 picosecs after the first.
Optimizing a FPGA is different than optimizing an ASIC. With an ASIC you can put a small amount of logic between flipflops with no problem. A FPGA has a fixed amount of logic in each LUT. You can use it as a single inverter if you like but you will get your best usage results if you can use as much of the LUT as possible.
The key to optimizing the 6502 is dealing with the bus bottleneck. You do get a lot of flexibility by routing all memory accesses to one bus but that means that everyone has to wait their turn. If you split it up into I and D busses as well as a separate stack and page_zero memory then it could run a lot faster. But then a lot of legacy code would break.
It would make sense to port out the ProgramCounter to it's own romspace so that it can fetch the next instructions without having to wait for data accesses to complete. But then legacy code that mixed data segments and tables in with code space would break. Systems that copied code into ram for execution would not run unless you provided a backdoor into the code romspace and then you have to use arbiters and things become complex.
On the other hand if your application is rom based and you can split your I and D spaces then it would make a nice option for a quick and easy speed boost. The only reason we have the one bus was package limitations back in the 70's.
Page_Zero ram would be an ideal candidate to remove from the bus and embed inside the cpu. First of all you could double its width so that any pointer accesses are done in one operation. You would have to write your code so that all 16 bit pointers were aligned to a word boundry but for any new applications this would be possible and would run code a lot faster.
Todays processes can boast clk speeds into the multi gigihertz but that requires custom designs and significant pipeling. If you can run your clock @ 2 Ghz but if it needs a 20 deep pipeline to meet timing then each instruction is still running @ 100 Mhz. The speed comes from as long as you do not branch and don't have to wait for one instruction to finish before starting the next one then the following operation will finish 500 picosecs after the first.
Optimizing a FPGA is different than optimizing an ASIC. With an ASIC you can put a small amount of logic between flipflops with no problem. A FPGA has a fixed amount of logic in each LUT. You can use it as a single inverter if you like but you will get your best usage results if you can use as much of the LUT as possible.
The key to optimizing the 6502 is dealing with the bus bottleneck. You do get a lot of flexibility by routing all memory accesses to one bus but that means that everyone has to wait their turn. If you split it up into I and D busses as well as a separate stack and page_zero memory then it could run a lot faster. But then a lot of legacy code would break.
It would make sense to port out the ProgramCounter to it's own romspace so that it can fetch the next instructions without having to wait for data accesses to complete. But then legacy code that mixed data segments and tables in with code space would break. Systems that copied code into ram for execution would not run unless you provided a backdoor into the code romspace and then you have to use arbiters and things become complex.
On the other hand if your application is rom based and you can split your I and D spaces then it would make a nice option for a quick and easy speed boost. The only reason we have the one bus was package limitations back in the 70's.
Page_Zero ram would be an ideal candidate to remove from the bus and embed inside the cpu. First of all you could double its width so that any pointer accesses are done in one operation. You would have to write your code so that all 16 bit pointers were aligned to a word boundry but for any new applications this would be possible and would run code a lot faster.