Page 21 of 24
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 12:19 am
by ElEctric_EyE
Just to note that when I looked into
SmartXplorer 12.4 some time ago, there was more gain to be had with further manual exploration beyond the 7 default tactics...
Now see, this is a clue that smartXplorer actually knows the kind of CPU it is working with, which explains by differences in results using my P4 3GHz laptop, and I7 875K speed machine. On my speed machine it starts out with 10 tactics. My laptop it starts out with 7.
So it is good, according to MichaelM's advice to create a TCL file and compare. Probably the best advice would be just to run spartXplorer on your fastest machine and copy the results to the slow machine... How to do this though? Do you just copy a .TCL file into the slower computers project folder?
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 12:40 am
by ElEctric_EyE
Well, with all this experimentation on the 4GHz I7 875K, the core presently passes an 11.2ns constraint for 91MHz+, but fails with 11.0ns with smatXplorer. Results are just about the same as with the previous errors in Microcode state machine, meaning I'm done here!
I wouldn't expect any sort of speed increase for the little amount of errors that are still present.
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 2:35 am
by MichaelM
EEyE:
I have copied the TCL project settings file from one project to another. In the TCL window, you load the TCL file and then run it. With it you can add all of your files to the new project, i.e. recreate your project, and also re-set all of your settings for the synthesizer and MAP/PAR. I have enjoyed more success in optimizing performance by setting the settings myself for the synthesizer and MAP/PAR tools manually like BigEd suggested.
Its been my experience that each project's characteristics will determine its optimal settings. For example, the M65C02 core logic achieves better results with an AREA optimization strategy than with either the SPEED or BALANCED strategies that optimize most of my other designs. I've also been able to achieve better results by allowing the synthesizer to remove the hierachy and optimize across all modules. Although that's been a general strategy for most designs, and a default of the tools, best performance on my follow-on efforts on the M65C02 core has only been achieved when the hierachy is retained.
I don't know SmartXplorer at all so I can't comment on how its optimization strategy will adjust the various settings, but most of the optimization efforts are likely applied to the post-synthesis netlist. As I have recently encountered, it may be necessary to adjust your synthesis options to get better performance from MAP/PAR. There aren't many synthesizer options that make sense to adjust, but those options regarding the retention of the hierarchy, resource sharing, and I/O port mapping have been the ones that I've adjusted to improve performance.
It is because I can't keep track of the optimal settings for each project that I've recently started using the TCL file. Long ago I adopted the philosophy to adjust the design concept and implementation strategy instead of forcing my designs into the devices. As a consequence, I never have to hand route or place a design like I had to way back in the days of the XC4000, where I had to hand edit LUTs in the FPGA Editor to correct problems in circuit to avoid 3 hour recompile times.
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 4:40 am
by Arlet
At this point, I wouldn't spend too much time optimizing the timing with SmartXplorer, and manual tuning. If you're still planning to add SDRAM, video or other peripherals, you'd better do that first, as these things will most likely slow down your timing again. Anything you add to the address/data bus will need extra muxes, which will add to the critical paths. Even unrelated logic can interfere with routing.
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 2:59 pm
by ElEctric_EyE
At this point, I wouldn't spend too much time optimizing the timing with SmartXplorer, and manual tuning. If you're still planning to add SDRAM, video or other peripherals, you'd better do that first, as these things will most likely slow down your timing again. Anything you add to the address/data bus will need extra muxes, which will add to the critical paths. Even unrelated logic can interfere with routing.
Good point... I am trying to fit the core into what I have so far on my project using a7" TFT, using some I/O routines for simple graphic routines. Still having problems getting it to work right, without using any of the extra features yet. I can only work on this 2 days a week though...
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 5:07 pm
by Arlet
How many block RAMs are you using ? The code inside the block RAMs doesn't matter, but it does matter how many you're using due to routing paths. If you were to use them all, the tools cannot avoid long signal lines. Now, according to app notes, the FPGAs have dedicated routing channels for block RAMs, so that helps, but there still would be some effect.
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 5:12 pm
by ElEctric_EyE
4096x16 and zeropage/stack, and 16384x16 for program space...
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 5:14 pm
by Arlet
Just as an experiment, it would be interesting to see what the effect would be on the timing if you made the memories larger or smaller.
Re: 65ORG16.b Core
Posted: Sat Apr 28, 2012 5:20 pm
by ElEctric_EyE
I should have specified, that was for the project I am trying to fit the core in.... For the testbench which I run to find errors on the core and tweak constraints, I use 4K zeropage/stack and 4K program. Wouldn't be too difficult to make 16Kx16 for program on the testbench. I can try that tonight and report back.
Re: 65ORG16.b Core
Posted: Tue May 01, 2012 12:54 am
by ElEctric_EyE
Took some time off...
Actually what I will do for my "Testbench" is narrow the 4K zeropage/4K stack and 4K ROM to 1K each. This tighter Testbench I can use in the future and may/may not provide some kind of result in top speed change. It will be mildly interesting... I don't really need that much zeropage/stack/ROM anyway in order to test opcodes.
I would hope that the internal blockram is fast enough, and that there would be no change in top speed. I will find out tomorrow.
Re: 65ORG16.b Core
Posted: Wed May 02, 2012 1:24 pm
by ElEctric_EyE
Despite pouring over my code a great many times, my simulations still crashing ISim and I can't figure out why.
Just wondering, does the Microcode state machine have to be in any sort of order?
Re: 65ORG16.b Core
Posted: Wed May 02, 2012 1:40 pm
by ElEctric_EyE
Ah, I just read in the
Xilinx forums that this is a problem under Windows 7 64bit, which I am running unfortunately. Says it will be fixed in version post13.4.
WHEW!
Re: 65ORG16.b Core
Posted: Wed May 02, 2012 1:59 pm
by Arlet
Just wondering, does the Microcode state machine have to be in any sort of order?
Yes, but a pulse on the reset signal accomplishes that.
Re: 65ORG16.b Core
Posted: Wed May 02, 2012 6:22 pm
by ElEctric_EyE
Ok thanks. I'm still trying to track down some kind of problem as the .b core isn't fully functional on the devboard yet. And since ISim can't really help me, I'm grasping at anything that is peculiar looking. The order of all the encodings in the Microcode state machine was not the same as your original, or the original 65Org16. I just now put it in order as much as I could, since I have additional encodings for the W register, and it made no difference in what I am seeing on the TFT.
If I go back to the .b version that has just the 16 Acc/variable shift opcodes, everything works fine and I see c'mon working, so this is my proof at the moment.
Re: 65ORG16.b Core
Posted: Wed May 02, 2012 9:27 pm
by ElEctric_EyE
For sure, since I don't use the D flag, this
Processor status flag needs to be changed from Arlet's 8-bit original:
Code: Select all
wire [7:0] P = { N, V, 2'b0, D, I, Z, C };
to 65Org16.b
Code: Select all
wire [15:0] P = { 8'b0, N, V, 3'b0, I, Z, C };
I have been using this up until now:
Code: Select all
wire [15:0] P = { N, V, 2'b0, I, Z, C };
Although, in my plotting routines I only use the C, Z flags. It looks like only the N and V flags would've been affected since they're offset by 1-bit, since the 'D' bit is missing. This is one issue I've overlooked since the beginning of the .b core after I removed decimal mode.