6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 9:17 pm

All times are UTC




Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7  Next
Author Message
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 5:47 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Is it fast enough for what?! Typically you'd have an application in mind, and you'd iterate on your HDL and your synthesis tactics (maybe your placement tactics) until you get there.

123MHz isn't too bad, in some general sense - it's a useful speed. To know whether it's impressive, I'd need to know what a PicoBlaze or other CPU would synth to - or indeed, Arlet's core - on this choice of chip with this speed grade.

One worrying thing to note: you have a clock period just over 8ns but you need some signal to arrive 9ns before the clock - so in reality your speed is lower, as you'd assume that signal is ultimately also controlled by the clock. In the best case, it becomes the speed limit. (In the worst case, it comes from something else with a substantial delay. You can see, if two people make designs and one produces a signal with 8ns delay and another uses that signal with an 8ns setup time, the actual period achievable will be something like 16ns. And they both thought they'd done a good job!)

It's a worthwhile skill to dig into the timing reports and be able to make sense of them.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 5:58 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I'm not sure what stage in the synthesis this is, but for the most accurate result you should set a timing constraint by specifying a clock frequency yourself, and then see if the tools can meet that. If so, you can increase the number until you run into problems.

Without a constraint, you get an estimate, but this isn't always realistic, since the tools don't know what you want, so they can't perform a good area/speed optimization for instance.

It's also a good idea to check the timing analyzer output to see if there are easy improvements. In some designs there may be just a couple of long paths blocking higher speeds, and some local optimizations may yield a good improvement. It's however not always easy to understand the timing analyzer output, and map it back to the source code, so you may want to skip this part for now.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 6:03 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
BigEd wrote:
Is it fast enough for what?! Typically you'd have an application in mind, and you'd iterate on your HDL and your synthesis tactics (maybe your placement tactics) until you get there.

Is there any article/book to read about these things and learn them by doing ? Because I learned to code Verilog, but not how physical things really works. I know theories ( for example I've passed a course about VLSI design ), but things become completely different when you are talking about a real world.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 6:04 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Arlet wrote:
I'm not sure what stage in the synthesis this is, but for the most accurate result you should set a timing constraint by specifying a clock frequency yourself, and then see if the tools can meet that. If so, you can increase the number until you run into problems.

Without a constraint, you get an estimate, but this isn't always realistic, since the tools don't know what you want, so they can't perform a good area/speed optimization for instance.

It's also a good idea to check the timing analyzer output to see if there are easy improvements. In some designs there may be just a couple of long paths blocking higher speeds, and some local optimizations may yield a good improvement. It's however not always easy to understand the timing analyzer output, and map it back to the source code, so you may want to skip this part for now.

You mean post-implementation simulation ?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 6:09 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I just did a quick search for "xilinx how to read timing report" and it looks like it would lead to some good reading.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 6:14 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
BigEd wrote:
I just did a quick search for "xilinx how to read timing report" and it looks like it would lead to some good reading.

Thanks a lot Ed. I'll do it ASAP.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 6:14 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
In my version of ISE it's called "post place & route static timing". But if you enter a timing constraint (basically just the frequency of your input clock), and then generate the design, it should give you an error if it can't meet the timing.

In the menu, there's a Timing Analyzer tool for details.

Note that the timing also depends on the pin constraints and other resources used in the FPGA. A small design in an otherwise empty FPGA can be optimized a lot more than a large design that uses a lot of routing resources, or forces long signals across the chip.

Of course, if you just want to make a ballpark comparison, you don't need to worry about all these details, but you should be aware that the results may be imprecise.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Oct 25, 2016 6:18 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Arlet wrote:
In my version of ISE it's called "post place & route static timing". But if you enter a timing constraint (basically just the frequency of your input clock), and then generate the design, it should give you an error if it can't meet the timing.

In the menu, there's a Timing Analyzer tool for details.

Note that the timing also depends on the pin constraints and other resources used in the FPGA. A small design in an otherwise empty FPGA can be optimized a lot more than a large design that uses a lot of routing resources, or forces long signals across the chip.

Of course, if you just want to make a ballpark comparison, you don't need to worry about all these details, but you should be aware that the results may be imprecise.

Good points, I'll keep them in mind. Thanks a lot.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Tue Mar 28, 2017 5:22 am 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Hello guys, it's me again :mrgreen:
This is a big day for me, because I'm releasing a source code of mine in public place. I hope you guys forgive me for the delay between the opening time of the topic and the release time of the code. This is pipelined 6502 source code.
Well ... good news first:

1. The code is completely synthesizable (I did post-synthesis simulation with Vivado toolsets successfully. But I did not have any FPGAs to test the code.)
2. I did some comparison with the native 6502 model (I took Arlet's model as a reference), it seems that the new processor can outperform up to 40% sometimes. (But still I need to study much much much more about this claim)

Now bad news:

1. I had to change the architecture so I just got rid of caches.
2. I did not pass any testing suits yet, so please help me to do so. :oops: :(
3. The memory which is working with this CPU is very special. It has 4 Async read ports and 1 sync write port. I think this will make the processor very slow (what do you think guys?)

BTW you can think of this code as a working MODEL not as a REAL 6502 (not yet!!!).
Looking forward to read your comments.
thank you all for encouraging me to finish this hard project.

M. A. Nili

P.S. I have written the Verilog HDL code in TextWrangler editor on macOS. So I don't know what would happen if you open the source code in other editors!
P.S. Again sorry for my bad english! :mrgreen:


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Wed Mar 29, 2017 3:43 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
hi, manili. Nice to see you're persevering with this very challenging project.

Attachment:
Schematic.png
Schematic.png [ 14.26 KiB | Viewed 1153 times ]

I admit I don't understand your diagram (above). Can you choose a typical instruction (such as LDA abs, perhaps) and describe how it would execute? (Your English is just fine, BTW! :) )

I know that there are several pipeline stages, of course. I guess what I'm missing is exactly how the instructions overlap.

I took the diagram below from another thread, so it's not 100% appropriate. But it does show four pipeline stages, with multiple instructions making their way through. Do your plans specify this kind of detail -- how instructions will overlap -- or is that part yet to be determined?

cheers,
Jeff
Attachment:
pipeline_illustration.gif
pipeline_illustration.gif [ 6.28 KiB | Viewed 1153 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Wed Mar 29, 2017 8:16 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Hi Manili,

The 6502 address space is small enough that it doesn’t really need caches, the ram in a PLD can be used directly. But if using external ram resources retaining the caches would be a good idea. The long term goals for the project might dictate inclusion / exclusion of caches.

On a clock by clock comparison the core may be faster but there is also likely a lower fmax for the more complex core. Does the 40% faster take into consideration fmax comparisons ?

I think you are correct with bad news #3 point. Having five port ram to interface to along with bypassing logic is bound to make the design larger and slower. Even if it were possible to use asynch ram directly some loss in the fmax is to be expected compared to a simpler design.

Real memory resource in many PLD’s (FPGA) is synchronous hence more pipelining complexity must be added to make use of it in a single cycle. Using two clock cycles to access the RAM would defeat the purpose of an overlapped-pipelined design.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Wed Mar 29, 2017 2:57 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
@Dr Jefyll
Thanks for your reply. I didn't understand your problem. Do the following example help you?

One of my tests during this project was Fibonacci series generation test. So here is the Verilog code:
Code:
      MEM[32814] = `LDY_IME;
      MEM[32815] = 8'h07;
      MEM[32816] = `LDA_IME;
      MEM[32817] = 8'h00;
      MEM[32818] = `STA_ABS;
      MEM[32819] = 8'h03;
      MEM[32820] = 8'h00;
      MEM[32821] = `LDA_IME;
      MEM[32822] = 8'h01;
      MEM[32823] = `TAX;
      MEM[32824] = `ADC_ABS;
      MEM[32825] = 8'h03;
      MEM[32826] = 8'h00;
      MEM[32827] = `STX_ABS;
      MEM[32828] = 8'h03;
      MEM[32829] = 8'h00;
      MEM[32830] = `DEY;
      MEM[32831] = `BNE;
      MEM[32832] = 8'hF6;


Now this is how the processor behave during the execution of the program.
Attachment:
Fibonacci_Stages.png
Fibonacci_Stages.png [ 194.67 KiB | Viewed 1112 times ]

Remember I used the assembly symbol inside the picture (not the Verilog macros).


Last edited by manili on Wed Mar 29, 2017 9:30 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Wed Mar 29, 2017 3:26 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Rob Finch wrote:
Hi Manili,

The 6502 address space is small enough that it doesn’t really need caches, the ram in a PLD can be used directly. But if using external ram resources retaining the caches would be a good idea. The long term goals for the project might dictate inclusion / exclusion of caches.

On a clock by clock comparison the core may be faster but there is also likely a lower fmax for the more complex core. Does the 40% faster take into consideration fmax comparisons ?

I think you are correct with bad news #3 point. Having five port ram to interface to along with bypassing logic is bound to make the design larger and slower. Even if it were possible to use asynch ram directly some loss in the fmax is to be expected compared to a simpler design.

Real memory resource in many PLD’s (FPGA) is synchronous hence more pipelining complexity must be added to make use of it in a single cycle. Using two clock cycles to access the RAM would defeat the purpose of an overlapped-pipelined design.


Thanks a lot Rob.
Your points are really important. One of my biggest problem is that I don't know how to find the Fmax from Vivado toolsets. The problem is that I set the timing constraints but synthesizing the 'Core' module cost no timing constraints and it passes every timing constraints which is very odd to me. But when I choose the 'System' module as the top module to synthesize it could reach about 70MHz.

I was thinking to make this unusual memory like a cache with very little size and use an external memory as the main memory. So it could make the processor much more faster.

You pointed out that "using 2 clock cycle to access the RAM would defeat the purpose of an overlapped-pipeline", so you think the whole project is kinda wasting time?

Thanks a lot.

manili


Last edited by manili on Wed Mar 29, 2017 9:33 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Wed Mar 29, 2017 9:31 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
@Arlet
Would you like tell us about Fmax of your processor, please?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Thu Mar 30, 2017 6:10 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
My core synthesized for 100 MHz without effort, including internal memory and peripherals, on a Spartan 6. I didn't attempt to push it harder at the time.

But it depends a lot on the device and synthesis tools, and things attached to the core. So for a fair comparison it's best if you synthesize it yourself, using the same settings as for your own core.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: