6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 11:41 am

All times are UTC




Post new topic Reply to topic  [ 353 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 24  Next
Author Message
 Post subject:
PostPosted: Fri May 27, 2011 12:44 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Can you include 'statename' in these diagrams ? That makes it a lot easier to see what's going on.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 27, 2011 1:01 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
From your address decoding schematics , it looks like you're only decoding the bottom 19 address bits. This means that 0x10080000 is an alias for your zero page.

EDIT: No, that doesn't make sense. I realize you must be decoding the upper bits.

However, watch your RAMDO bus. In the first diagram it returns 0xaa55 from location 0, and 0x1000 from location 1. This is already incorrect.

In the next diagram, you see that it returns 0xaa55 from 0, and also 0xaa55 from 1. Obviously, it's being overwritten. I suggest you trace the signals on the RAM inputs to see what's happening, and work backwards from there.


Last edited by Arlet on Fri May 27, 2011 1:18 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 27, 2011 1:02 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ok. May take a few min's to update the pics... You think it's something to do with the update we did to the Z flag? I'll check the decoding again. I thought I had it right...


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 27, 2011 1:57 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hi EE
ElEctric_EyE wrote:
Code:
FFFFE004   LDA #$1000         ;00A9 0000
FFFFE006   STA $0001          ;0085 0001
That LDA constant doesn't look right: your hand-assembly has resulted in you pointing at 0000_0000 instead of 1000_0000

Not sure if that makes a difference. Does it mean you are about to over write your pointer?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 27, 2011 6:15 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
Hi EE
ElEctric_EyE wrote:
Code:
FFFFE004   LDA #$1000         ;00A9 0000
FFFFE006   STA $0001          ;0085 0001
That LDA constant doesn't look right: your hand-assembly has resulted in you pointing at 0000_0000 instead of 1000_0000

Not sure if that makes a difference. Does it mean you are about to over write your pointer?


You're right! That's a mistake on my part, sorry. Must've copied it over wrong. BUT, the .bin file is correct. It wouldn't have even made it as far as it has... I have corrected those occurrences in this thread.

I am really suspect as to why is goes beyond the BNE every time and tries to decode the $00E6 which is the first part of INC $0001. Wouldn't this result is some undefined behavior?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 27, 2011 6:24 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
The 6502 often reads past the point where it strictly needs to: as Arlet says, you need to trace the state of the machine to see what it has in mind.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri May 27, 2011 6:43 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigEd wrote:
The 6502 often reads past the point where it strictly needs to: as Arlet says, you need to trace the state of the machine to see what it has in mind.


Correct. In this case, it's a result of the pipelining. The 6502 core has already fetched the next instruction, because it needs to decode it in the case the branch won't be taken.If the branch is taken, the fetched instruction is discarded, and the core will fetch another instruction at the branch target. That's why I recommend adding 'statename' to the diagrams, to better understand what's happening. The 'IR' value is only looked at in the DECODE state, and ignored in other states.

For a branch there are three possibilities:

1) BRA0 -> DECODE (branch not taken)
2) BRA0 -> BRA1 -> DECODE (branch to same page)
3) BRA0 -> BRA1 -> BRA2 -> DECODE (branch to diff page)

In the BRA0 state, the core decides whether to take the branch, and at the same time already fetches the next instruction so it gets there just in time for DECODE.

If it takes the branch, it fetches the new instruction in BRA1 state, and starts executing it in the next DECODE cycle. That's why it takes only 2 cycles for a branch not taken, and 3 cycles for a branch taken.

If the new address calculation produces a carry/borrow, the BRA2 state is added, where it fixes the MSB, and fetches the opcode again. In that case, you'll see 3 opcode fetches in a row, 2 times from the wrong address, and the third time from the correct address.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat May 28, 2011 12:15 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ok, thanks for the input guys. I am starting to think it's something peculiar to the Spartan 6 BRAM's. There's a couple more options for reset compared to the Spartan 3. Gonna try this design in a Spartan 3 and make some observations...


What I have observed is I do have some kind of memory address problem.
Also, went to the very beginning of the simulation. It stores at $10000000, then stores at $1000AA56. Getting warmer... Have to go to work soon :x


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 1:07 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I am making no headway with the 65Org16 Indirect Indexed in ISim. So I decided to go back to the 6502SoC and test Indirect Indexed on Arlet's original 8 bit core only using ISim (I don't use indirect indexed since I don't have external memory, just a register based TFT display...

Either I don't know how to read ISim or something is wrong. I just replaced the $E000-$FFFF ROM software on the 6502SoC (in another new temp project file) with the original 8 bit version of the core. I ran ISim and observed strange results...

Something is amiss here, or I am misreading ISim?. Can someone prove to me Indirect Indexed actually works in these cores?

I will humbly admit if I have made a mistake...


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 6:57 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Image

On this trace you can see how the LDA ($DE),Y instruction is executed, with Y=1, and memory at $00DE = $40, and memory at $00DF = $80.

Starting at the first red flag, and going through the cycles:

1) IR = $B1, which is the opcode. Note that the core is in the DECODE state, so the IR is relevant here.

2) state moves to INDY0, because it has decoded the instruction as indexed indirect with Y. In AB you can see the location 00DE of the first ZP byte. At the same time, you can see the ALU going to perform 00+DE+1 calculation (bottom 4 lines). The answer DF appears the next cycle in ADD.

3) state moves to INDY1. The contents of 00DE (40) appear on DI (as well as in IR, but that's not important since the core isn't in DECODE anymore). The results from the ALU are moved to the AB register (00DF) so it can fetch the second zeropage byte. At the same time, the ALU input AI is now set to Y, and the input BI is set to the value just read from zeropage (40), and these two are added up. The result (41) appears in ADD on the next cycle.

4) state moves to INDY2. The contents of 00DF (80) appear on DI, and they are immediately put back on the AB[15:8] bus, the result from the ZP+Y calculation are used as AB[7:0], so the full address $8041. Because there is no carry (not visible in diagram), this address is correct.

5) state moves to FETCH. AB is now back to value of PC ($FD4A) to fetch the next instruction. On DI you can see $20 appear which is the contents of the memory at $8041, the result of the LDA ($DE), Y instruction. This result is then moved to the A register in the next cycle (not visible here).

I suggest you find the same signal names, find a similar instruction, and compare the results.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 7:11 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Image

On this 2nd waveform you can see a STA ($DE), Y with A=$A0, Y=$05, and DE,DF pointing to $8000.

The first couple of states are identical, DECODE, INDY0, INDY1, INDY2. But then the core moves to INDY3. The reason is that this is a write access and not a read, and when doing LSB+Y calculation in the ALU, resulting here in ADD=$05 in the INDY2 cycle, there may be a page overflow, in which case the MSB must be incremented as well. To avoid writing to the wrong page, an extra cycle is taken here where the MSB+00+Carry is fed through the ALU. Because there was no carry, in this case, the result on the AB bus doesn't change between INDY2 and INDY3.

In INDY3, the Accumulator is written to memory. You can see the WE=1, and $A0 appear on the DO bus, while $8005 appears on the AB bus.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 8:11 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
great walk-throughs, thanks!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 12:57 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ditto!
I'm going to give what's left of my grey matter a rest for a couple days...


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 2:14 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I was playing with ISE, trying to optimize the timing for my design, and I discovered a nice tool (ISE 13.1 but may also be present in older versions)

Tools -> SmartXplorer. The first thing to try is 'Use built-in SmartXplorer strategies for Timing Performance', and hit OK.

It will try 7 different synthesis/place/route/map strategies, and select the best one. Often I have that one strategy doesn't meet timing constraints, but some other one will. After a change in the source, if it fails timing again, try re-running this, as the best strategy may have changed.

Of course, it's best to worry about this when the design is (almost) finished, otherwise you'll spend too much time on this. A fast computer is recommended, especially with bigger designs.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun May 29, 2011 9:19 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Interesting - I have it too (on version 12.4) so I had a quick play. I've been synthesising to a 20ns (50MHz) target because that's the crystal I have.

To make it do some work, I set a 15ns clock. With the default 'balanced' tactics I'd been using I now got
Code:
   Minimum period:  17.567ns{1}   (Maximum frequency:  56.925MHz)
(all these timings are post-routing)

With SmartXplorer the best result of the 7 standard tactics was:
Code:
   Minimum period:  15.939ns{1}   (Maximum frequency:  62.739MHz)
I then started exploring some other choices. With 'timing performance' and 'Performance with Physical Synthesis' I got
Code:
   Minimum period:  15.874ns{1}   (Maximum frequency:  62.996MHz)
and a suggestion to 'Increase the PAR Effort Level setting to "high"' (although that already seems to be the setting.)

At the next level of 'timing performance' and 'Performance without IOB packing' I meet the constraints with a result of
Code:
   Minimum period:  14.998ns{1}   (Maximum frequency:  66.676MHz)
which means I need to set a higher target... and now with a 14ns target I get
Code:
   Minimum period:  14.848ns{1}   (Maximum frequency:  67.349MHz)

I note that SmartXplorer does allow a lot more flexibility than just the 7 preset choices - it also allows running on several computers, if you have them. I agree that this is worth revisiting now and again, and only worth visiting when the design is ready. Looking carefully at timing reports and adjusting the design will usually get bigger initial gains than turning up the synthesis, and without taking up so much time.

But the bottom line is that a bit of experimentation gave me quite a healthy increase in clock speed.

Cheers
Ed

(Edit: and spartan6 is looking twice as fast as spartan3, so if in doubt, use a faster FPGA.)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 353 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8 ... 24  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 28 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron