6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 11:04 pm

All times are UTC




Post new topic Reply to topic  [ 353 posts ]  Go to page Previous  1 ... 16, 17, 18, 19, 20, 21, 22 ... 24  Next
Author Message
 Post subject: Re: 65ORG16.b Core
PostPosted: Tue Apr 24, 2012 10:37 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I'm ears Ed, do you (or anyone) have any ideas on how to go actually go about this?

BTW, I've finished adding TWX, TWY, TYW, TXW, rearranged 1 or 2 opcodes and finished an update to the .b spec.


Attachments:
File comment: 65016.b spec v2, with updated opcode matrix.
65016bv2.zip [318.13 KiB]
Downloaded 110 times

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502
Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 9:36 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I'm wondering, as I constantly do speed tests on this core as I've added features on the .b core and tested them on 2 different x86 machines, if the opcode decoding in the state machine would result in a speed increase if I were to replace a 'x' with a known state, especially a '0'.

I ask only because it seems I am getting 2 different results depending on which machine I run ISE13.4 from in order to optimize this core with SmartXPlorer. My mobile laptop has a Pentium 4. My main desktop has a 4GHz I7 dual core 875K (my main optimizer).

2 different results, one that takes about 1 hour. 1 that takes about 5 min's. But 2 different results.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 9:42 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
ElEctric_EyE wrote:
I'm ears Ed, do you (or anyone) have any ideas on how to go actually go about this?
It's fair to challenge me! In truth I'm not sure of the details, only a vague idea. It's always the details! I may have a look at the weekend.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 10:02 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
EEye:

If I understand your question correctly, the performance difference that you note between your Pentium 4 and the i7 machines is mostly due to the differences in their microarchitectures: number of instruction decoders and dispatchers, execution pipeline depth, cache architecture, and memory interface bus speed and efficiency. In a detailed tutorial/analysis of the Intel/AMD microarchitectures that I found on the web, the performance differences are essentially due to Intel's decision to focus its resources on the Pentium II microarchitecture on which the i7 processors are based instead of the Pentium 4 microarchitecture.

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 10:05 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Thanks MM for responding, but why should any processor being used have a different outcome in ISE? any version of ISE? on any x86 processor? when using SmartXPlorer.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 10:09 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
EEyE:

I guess that I misunderstood your original question. From your reply, I gather that you are noting performance differences for the synthesized HDL on two different computers??

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 10:17 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
MichaelM wrote:
EEyE:

I guess that I misunderstood your original question. From your reply, I gather that you are noting performance differences for the synthesized HDL on two different computers??

Yes, identical project files. ISE13.4 on both machines...

I would usually post a queston like this on the Xilinx forums, as it is a nagging curiousity. I was just wondering if anyone else has come across a similar experience.

BTW, this forum is more active than Xilinx forums. :wink: Thanks for sharing your expertise!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 10:44 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
EEyE:

That is somewhat curious behaviour. At the synthesis level, I can't think of any reason why there would be any differences. The Xilinx MAP and PAR tools, on the other hand, generally operate with algorithms that are stochastic, i.e. driven by random seeds. Thus, each mapping and place and route operation can yield different results. A simple PERIOD constraint on the clock signal is generally enough to get each "random" output to converge to a consistent result.

In the past, I've had trouble with the Xilinx tools taking shortcuts and using results from previous cycles as guides for a new place and route. Therefore, I periodically purge the project files to force a clean build.

One final thought. If you've not compared the TCL files of both projects, I would run the "Generate TCL Script" under the "Project" menu, and compare them to ensure that all of the tool options are set to the same values. (That file is something that I've recently begun using after someone showed me how easy it was to save and restore a project settings using it. Previously, I simply tried to remember and reset the project settings from their GUIs.)

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Wed Apr 25, 2012 11:21 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
MichaelM wrote:
...One final thought. If you've not compared the TCL files of both projects, I would run the "Generate TCL Script" under the "Project" menu, and compare them to ensure that all of the tool options are set to the same values. (That file is something that I've recently begun using after someone showed me how easy it was to save and restore a project settings using it. Previously, I simply tried to remember and reset the project settings from their GUIs.)

Thank You! I'll try this out...
The only thing I've tried is to 'Cleanup Project Files'. This used to work on earlier versions of ISE.
ISE13.4 seems to more 'tight' as far as lack of errors and this is a good thing!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: SmartXplorer
PostPosted: Thu Apr 26, 2012 5:03 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Just to note that when I looked into SmartXplorer 12.4 some time ago, there was more gain to be had with further manual exploration beyond the 7 default tactics.

To answer an earlier query, I wouldn't expect changing an 'x' into a value to help with the result, because you're decreasing the freedom of the tool. It might do, if the tool is failing to find a good solution which you know about. I don't advise changing the HDL source in search of higher clock speeds unless you're also looking at the synthesis reports and understanding the nature of the critical path. If the critical path is in the ALU, then changing the instruction decoding is unlikely to help.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject: Re: SmartXplorer
PostPosted: Thu Apr 26, 2012 9:33 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
... I don't advise changing the HDL source in search of higher clock speeds unless you're also looking at the synthesis reports and understanding the nature of the critical path. If the critical path is in the ALU, then changing the instruction decoding is unlikely to help.

Cheers
Ed

There's a nice tool in planahead/analyze timing that lets you see the actual path , and the paths that fail are highlighted in red. Is this what you were referring to?
In my case src_reg_3 is one that is slowest. Now how to figure out where src_reg_3 is in the code...

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Thu Apr 26, 2012 9:51 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Interesting - I've only seen textual reports, not looked in the GUI.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Fri Apr 27, 2012 12:01 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ah, I found out src_reg_3 is actually bit 3 of the src_reg. Now this would lead me to believe my previous argument about being carerul with the bits in regards to opcode decoding can actually lead to higher speeds, especially if you can negate them with a '0'. That way, I'm surmising here, the tools will naturally cut the path as close to the 'source' as possible instead of routing 'x's through many levels of MUX's. Am I on the right track?

BTW, that PlanAhead tool is worth some investigating and experimentation.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Fri Apr 27, 2012 12:15 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hi EEye
The 'x' in a case statement should always be working in your favour, because they give the logic synthesis the maximum degree of freedom to implement whatever is smallest or fastest.

In a clean (RISC-like) instruction encoding, the destination register would just be some selection of bits from the IR, and wouldn't take any decoding. But we have this legacy instruction set: and bit 3 is right down there in that legacy set. In the 6502, that was quite efficient, but of course in this branch you've added more registers and operations and filled out the bottom 8 bits. That's evidently slowed down the machine - it's bound to slow down the decode, in retrospect, because unless you're extremely careful, you're perturbing something which was put together with great care for efficient decoding. Up to a point you get away with that because decode hasn't been on the critical path, but if you add extra opcodes without the same care for placement as the original 6502 designers then the decode will get more complex.

A radical suggestion - which would need even more cooperation from the assembler than you already need - would be to modify the encodings of existing opcodes, so that for example LDX would signal in the upper bits that X is the destination. Instead of $00A6, $00B6 and so on, you'd use $20A6, $20B6 or whatever would be appropriate to address X in the register file. However, I see from your code that your register file now has 5-bit addresses... so I'll steer clear of trying to think too hard about this particular implementation. I think I'm revisiting ideas I put forward in the 65Org16.c thread.

I see in your latest checkin that you have changed some x's to 0's and you have the speed as 91MHz. What was it before?

The problem with lots of complex decode lines - even with Arlet's original - is being absolutely sure that every opcode is doing what it should, and nothing more.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject: Re: 65ORG16.b Core
PostPosted: Fri Apr 27, 2012 12:44 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Very astute observations, thanks very much for that! I'll be checking for errors in my decoding once again today.

BigEd wrote:
... I see in your latest checkin that you have changed some x's to 0's and you have the speed as 91MHz. What was it before?...
Cheers
Ed

It was passing when my constraint was at 12ns, I think. The max speed was 89MHz, so I was pretty excited I got it back up to at least 90MHz which is my goal for this core.

One question which has been nagging at me for sometime now: Do you think that depending how long the program is in the blockRAM, when running synthesis, that this will have an affect top speed? I would hope not, but I'm not sure...

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 353 posts ]  Go to page Previous  1 ... 16, 17, 18, 19, 20, 21, 22 ... 24  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 14 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron