Socgen project

jt_eaton · Post by **jt_eaton** » Sun Feb 17, 2013 5:09 pm

If so, if the 65Org16.b core runs on a Xilinx Spartan 6 FPGA at 100MHz, how fast could it run on an ASIC?

Todays processes can boast clk speeds into the multi gigihertz but that requires custom designs and significant pipeling. If you can run your clock @ 2 Ghz but if it needs a 20 deep pipeline to meet timing then each instruction is still running @ 100 Mhz. The speed comes from as long as you do not branch and don't have to wait for one instruction to finish before starting the next one then the following operation will finish 500 picosecs after the first.

Optimizing a FPGA is different than optimizing an ASIC. With an ASIC you can put a small amount of logic between flipflops with no problem. A FPGA has a fixed amount of logic in each LUT. You can use it as a single inverter if you like but you will get your best usage results if you can use as much of the LUT as possible.

The key to optimizing the 6502 is dealing with the bus bottleneck. You do get a lot of flexibility by routing all memory accesses to one bus but that means that everyone has to wait their turn. If you split it up into I and D busses as well as a separate stack and page_zero memory then it could run a lot faster. But then a lot of legacy code would break.

It would make sense to port out the ProgramCounter to it's own romspace so that it can fetch the next instructions without having to wait for data accesses to complete. But then legacy code that mixed data segments and tables in with code space would break. Systems that copied code into ram for execution would not run unless you provided a backdoor into the code romspace and then you have to use arbiters and things become complex.
On the other hand if your application is rom based and you can split your I and D spaces then it would make a nice option for a quick and easy speed boost. The only reason we have the one bus was package limitations back in the 70's.

Page_Zero ram would be an ideal candidate to remove from the bus and embed inside the cpu. First of all you could double its width so that any pointer accesses are done in one operation. You would have to write your code so that all 16 bit pointers were aligned to a word boundry but for any new applications this would be possible and would run code a lot faster.

Arlet · Post by **Arlet** » Sun Feb 17, 2013 8:24 pm

I'm guessing that a naive 6502 implementation on an ASIC would spend most of the time picking its nose, until the data got back from the memory. The rest of the design has very little logic between registers for ASIC standards.

Dr Jefyll · Post by **Dr Jefyll** » Sun Feb 17, 2013 8:45 pm

Quote:

Page_Zero ram would be an ideal candidate to remove from the bus and embed inside the cpu. First of all you could double its width so that any pointer accesses are done in one operation. You would have to write your code so that all 16 bit pointers were aligned to a word boundry but for any new applications this would be possible and would run code a lot faster.

Zero-page on-chip is a remedy (albeit only a partial remedy) for the point Arlet raises. As for allowing only word-aligned access when z-pg pairs are used as pointers, that implies the least-significant bit of the z-pg address in the instruction would always be taken to be zero. Ie, the bit is unused, and could be exploited to implement some additional feature new to the 65xx. Heck, the requirement to word-align has already broken 100% compatibility, so why not!

cheers,
Jeff

BigEd · Post by **BigEd** » Sun Feb 17, 2013 9:05 pm

You could make a page-zero cache which presents 16 bits to the core: with a line length of 8 or 16 bytes you'd usually get both halves of a pointer even if it isn't aligned. Of course, with page zero so small you might as well implement the whole thing, but as a wide memory, with the same result. Same holds for the stack. And if the icache presents pairs of bytes, you can get single-byte operands in the same cycle as the instruction, usually.

(On the 65Org16 similar arguments apply, except now the direct page and the stack are each 64k words in size: not large for an asic to carry on chip but rather large for fpga.)

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Sun Feb 17, 2013 9:07 pm

Dr Jefyll wrote:

Zero-page on-chip...

We already have zero page and stack on-chip with the blockRAMs. I think John was saying to embed them inside the CPU itself on a different data bus?

jt_eaton · Post by **jt_eaton** » Sun Feb 17, 2013 10:53 pm

Zero-page on-chip is a remedy (albeit only a partial remedy) for the point Arlet raises. As for allowing only word-aligned access when z-pg pairs are used as pointers, that implies the least-significant bit of the z-pg address in the instruction would always be taken to be zero. Ie, the bit is unused, and could be exploited to implement some additional feature new to the 65xx.

I was thinking that zero_page doubles from 256 bytes to 256 words(512 bytes) of storage. You can fetch a pointer address in one cycle and you would have 256 total pointers.

If you only need to use the stack as a stack then pull it out of page_1 and make it a 16 bit wide lifo. Expand the psw from 8 to 16 bits so that everything aligns and you can size it with a parameter based on how much stacking your App will need.

Socgen project

Re: Socgen project

Re: Socgen project

Re: Socgen project

Re: Socgen project

Re: Socgen project

Re: Socgen project