Pipelined 6502

For discussing the 65xx hardware itself or electronics projects.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: Pipelined 6502

Post by BigEd »

Is it fast enough for what?! Typically you'd have an application in mind, and you'd iterate on your HDL and your synthesis tactics (maybe your placement tactics) until you get there.

123MHz isn't too bad, in some general sense - it's a useful speed. To know whether it's impressive, I'd need to know what a PicoBlaze or other CPU would synth to - or indeed, Arlet's core - on this choice of chip with this speed grade.

One worrying thing to note: you have a clock period just over 8ns but you need some signal to arrive 9ns before the clock - so in reality your speed is lower, as you'd assume that signal is ultimately also controlled by the clock. In the best case, it becomes the speed limit. (In the worst case, it comes from something else with a substantial delay. You can see, if two people make designs and one produces a signal with 8ns delay and another uses that signal with an 8ns setup time, the actual period achievable will be something like 16ns. And they both thought they'd done a good job!)

It's a worthwhile skill to dig into the timing reports and be able to make sense of them.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Pipelined 6502

Post by Arlet »

I'm not sure what stage in the synthesis this is, but for the most accurate result you should set a timing constraint by specifying a clock frequency yourself, and then see if the tools can meet that. If so, you can increase the number until you run into problems.

Without a constraint, you get an estimate, but this isn't always realistic, since the tools don't know what you want, so they can't perform a good area/speed optimization for instance.

It's also a good idea to check the timing analyzer output to see if there are easy improvements. In some designs there may be just a couple of long paths blocking higher speeds, and some local optimizations may yield a good improvement. It's however not always easy to understand the timing analyzer output, and map it back to the source code, so you may want to skip this part for now.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

BigEd wrote:
Is it fast enough for what?! Typically you'd have an application in mind, and you'd iterate on your HDL and your synthesis tactics (maybe your placement tactics) until you get there.
Is there any article/book to read about these things and learn them by doing ? Because I learned to code Verilog, but not how physical things really works. I know theories ( for example I've passed a course about VLSI design ), but things become completely different when you are talking about a real world.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

Arlet wrote:
I'm not sure what stage in the synthesis this is, but for the most accurate result you should set a timing constraint by specifying a clock frequency yourself, and then see if the tools can meet that. If so, you can increase the number until you run into problems.

Without a constraint, you get an estimate, but this isn't always realistic, since the tools don't know what you want, so they can't perform a good area/speed optimization for instance.

It's also a good idea to check the timing analyzer output to see if there are easy improvements. In some designs there may be just a couple of long paths blocking higher speeds, and some local optimizations may yield a good improvement. It's however not always easy to understand the timing analyzer output, and map it back to the source code, so you may want to skip this part for now.
You mean post-implementation simulation ?
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: Pipelined 6502

Post by BigEd »

I just did a quick search for "xilinx how to read timing report" and it looks like it would lead to some good reading.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

BigEd wrote:
I just did a quick search for "xilinx how to read timing report" and it looks like it would lead to some good reading.
Thanks a lot Ed. I'll do it ASAP.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Pipelined 6502

Post by Arlet »

In my version of ISE it's called "post place & route static timing". But if you enter a timing constraint (basically just the frequency of your input clock), and then generate the design, it should give you an error if it can't meet the timing.

In the menu, there's a Timing Analyzer tool for details.

Note that the timing also depends on the pin constraints and other resources used in the FPGA. A small design in an otherwise empty FPGA can be optimized a lot more than a large design that uses a lot of routing resources, or forces long signals across the chip.

Of course, if you just want to make a ballpark comparison, you don't need to worry about all these details, but you should be aware that the results may be imprecise.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

Arlet wrote:
In my version of ISE it's called "post place & route static timing". But if you enter a timing constraint (basically just the frequency of your input clock), and then generate the design, it should give you an error if it can't meet the timing.

In the menu, there's a Timing Analyzer tool for details.

Note that the timing also depends on the pin constraints and other resources used in the FPGA. A small design in an otherwise empty FPGA can be optimized a lot more than a large design that uses a lot of routing resources, or forces long signals across the chip.

Of course, if you just want to make a ballpark comparison, you don't need to worry about all these details, but you should be aware that the results may be imprecise.
Good points, I'll keep them in mind. Thanks a lot.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

Hello guys, it's me again :mrgreen:
This is a big day for me, because I'm releasing a source code of mine in public place. I hope you guys forgive me for the delay between the opening time of the topic and the release time of the code. This is pipelined 6502 source code.
Well ... good news first:

1. The code is completely synthesizable (I did post-synthesis simulation with Vivado toolsets successfully. But I did not have any FPGAs to test the code.)
2. I did some comparison with the native 6502 model (I took Arlet's model as a reference), it seems that the new processor can outperform up to 40% sometimes. (But still I need to study much much much more about this claim)

Now bad news:

1. I had to change the architecture so I just got rid of caches.
2. I did not pass any testing suits yet, so please help me to do so. :oops: :(
3. The memory which is working with this CPU is very special. It has 4 Async read ports and 1 sync write port. I think this will make the processor very slow (what do you think guys?)

BTW you can think of this code as a working MODEL not as a REAL 6502 (not yet!!!).
Looking forward to read your comments.
thank you all for encouraging me to finish this hard project.

M. A. Nili

P.S. I have written the Verilog HDL code in TextWrangler editor on macOS. So I don't know what would happen if you open the source code in other editors!
P.S. Again sorry for my bad english! :mrgreen:
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: Pipelined 6502

Post by Dr Jefyll »

hi, manili. Nice to see you're persevering with this very challenging project.
Schematic.png
I admit I don't understand your diagram (above). Can you choose a typical instruction (such as LDA abs, perhaps) and describe how it would execute? (Your English is just fine, BTW! :) )

I know that there are several pipeline stages, of course. I guess what I'm missing is exactly how the instructions overlap.

I took the diagram below from another thread, so it's not 100% appropriate. But it does show four pipeline stages, with multiple instructions making their way through. Do your plans specify this kind of detail -- how instructions will overlap -- or is that part yet to be determined?

cheers,
Jeff
pipeline_illustration.gif
pipeline_illustration.gif (6.28 KiB) Viewed 1790 times
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
Rob Finch
Posts: 465
Joined: 29 Dec 2002
Location: Canada
Contact:

Re: Pipelined 6502

Post by Rob Finch »

Hi Manili,

The 6502 address space is small enough that it doesn’t really need caches, the ram in a PLD can be used directly. But if using external ram resources retaining the caches would be a good idea. The long term goals for the project might dictate inclusion / exclusion of caches.

On a clock by clock comparison the core may be faster but there is also likely a lower fmax for the more complex core. Does the 40% faster take into consideration fmax comparisons ?

I think you are correct with bad news #3 point. Having five port ram to interface to along with bypassing logic is bound to make the design larger and slower. Even if it were possible to use asynch ram directly some loss in the fmax is to be expected compared to a simpler design.

Real memory resource in many PLD’s (FPGA) is synchronous hence more pipelining complexity must be added to make use of it in a single cycle. Using two clock cycles to access the RAM would defeat the purpose of an overlapped-pipelined design.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

@Dr Jefyll
Thanks for your reply. I didn't understand your problem. Do the following example help you?

One of my tests during this project was Fibonacci series generation test. So here is the Verilog code:

Code: Select all

		MEM[32814] = `LDY_IME;
		MEM[32815] = 8'h07;
		MEM[32816] = `LDA_IME;
		MEM[32817] = 8'h00;
		MEM[32818] = `STA_ABS;
		MEM[32819] = 8'h03;
		MEM[32820] = 8'h00;
		MEM[32821] = `LDA_IME;
		MEM[32822] = 8'h01;
		MEM[32823] = `TAX;
		MEM[32824] = `ADC_ABS;
		MEM[32825] = 8'h03;
		MEM[32826] = 8'h00;
		MEM[32827] = `STX_ABS;
		MEM[32828] = 8'h03;
		MEM[32829] = 8'h00;
		MEM[32830] = `DEY;
		MEM[32831] = `BNE;
		MEM[32832] = 8'hF6;
Now this is how the processor behave during the execution of the program.
Fibonacci_Stages.png
Remember I used the assembly symbol inside the picture (not the Verilog macros).
Last edited by manili on Wed Mar 29, 2017 9:30 pm, edited 1 time in total.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

Rob Finch wrote:
Hi Manili,

The 6502 address space is small enough that it doesn’t really need caches, the ram in a PLD can be used directly. But if using external ram resources retaining the caches would be a good idea. The long term goals for the project might dictate inclusion / exclusion of caches.

On a clock by clock comparison the core may be faster but there is also likely a lower fmax for the more complex core. Does the 40% faster take into consideration fmax comparisons ?

I think you are correct with bad news #3 point. Having five port ram to interface to along with bypassing logic is bound to make the design larger and slower. Even if it were possible to use asynch ram directly some loss in the fmax is to be expected compared to a simpler design.

Real memory resource in many PLD’s (FPGA) is synchronous hence more pipelining complexity must be added to make use of it in a single cycle. Using two clock cycles to access the RAM would defeat the purpose of an overlapped-pipelined design.
Thanks a lot Rob.
Your points are really important. One of my biggest problem is that I don't know how to find the Fmax from Vivado toolsets. The problem is that I set the timing constraints but synthesizing the 'Core' module cost no timing constraints and it passes every timing constraints which is very odd to me. But when I choose the 'System' module as the top module to synthesize it could reach about 70MHz.

I was thinking to make this unusual memory like a cache with very little size and use an external memory as the main memory. So it could make the processor much more faster.

You pointed out that "using 2 clock cycle to access the RAM would defeat the purpose of an overlapped-pipeline", so you think the whole project is kinda wasting time?

Thanks a lot.

manili
Last edited by manili on Wed Mar 29, 2017 9:33 pm, edited 1 time in total.
manili
Posts: 31
Joined: 07 Oct 2016

Re: Pipelined 6502

Post by manili »

@Arlet
Would you like tell us about Fmax of your processor, please?
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Re: Pipelined 6502

Post by Arlet »

My core synthesized for 100 MHz without effort, including internal memory and peripherals, on a Spartan 6. I didn't attempt to push it harder at the time.

But it depends a lot on the device and synthesis tools, and things attached to the core. So for a fair comparison it's best if you synthesize it yourself, using the same settings as for your own core.
Post Reply