65ORG16.b Core

Arlet · Post by **Arlet** » Fri May 27, 2011 12:44 pm

Can you include 'statename' in these diagrams ? That makes it a lot easier to see what's going on.

Arlet · Post by **Arlet** » Fri May 27, 2011 1:01 pm

From your address decoding schematics , it looks like you're only decoding the bottom 19 address bits. This means that 0x10080000 is an alias for your zero page.

EDIT: No, that doesn't make sense. I realize you must be decoding the upper bits.

However, watch your RAMDO bus. In the first diagram it returns 0xaa55 from location 0, and 0x1000 from location 1. This is already incorrect.

In the next diagram, you see that it returns 0xaa55 from 0, and also 0xaa55 from 1. Obviously, it's being overwritten. I suggest you trace the signals on the RAM inputs to see what's happening, and work backwards from there.

ElEctric_EyE · Post by **ElEctric_EyE** » Fri May 27, 2011 1:02 pm

Ok. May take a few min's to update the pics... You think it's something to do with the update we did to the Z flag? I'll check the decoding again. I thought I had it right...

BigEd · Post by **BigEd** » Fri May 27, 2011 1:57 pm

Hi EE

ElEctric_EyE wrote:

Code: Select all

FFFFE004	LDA #$1000	      ;00A9 0000
FFFFE006	STA $0001          ;0085 0001

That LDA constant doesn't look right: your hand-assembly has resulted in you pointing at 0000_0000 instead of 1000_0000

Not sure if that makes a difference. Does it mean you are about to over write your pointer?

ElEctric_EyE · Post by **ElEctric_EyE** » Fri May 27, 2011 6:15 pm

BigEd wrote:

Hi EE

ElEctric_EyE wrote:

Code: Select all

FFFFE004	LDA #$1000	      ;00A9 0000
FFFFE006	STA $0001          ;0085 0001

That LDA constant doesn't look right: your hand-assembly has resulted in you pointing at 0000_0000 instead of 1000_0000

Not sure if that makes a difference. Does it mean you are about to over write your pointer?

You're right! That's a mistake on my part, sorry. Must've copied it over wrong. BUT, the .bin file is correct. It wouldn't have even made it as far as it has... I have corrected those occurrences in this thread.

I am really suspect as to why is goes beyond the BNE every time and tries to decode the $00E6 which is the first part of INC $0001. Wouldn't this result is some undefined behavior?

BigEd · Post by **BigEd** » Fri May 27, 2011 6:24 pm

The 6502 often reads past the point where it strictly needs to: as Arlet says, you need to trace the state of the machine to see what it has in mind.

Arlet · Post by **Arlet** » Fri May 27, 2011 6:43 pm

BigEd wrote:

The 6502 often reads past the point where it strictly needs to: as Arlet says, you need to trace the state of the machine to see what it has in mind.

Correct. In this case, it's a result of the pipelining. The 6502 core has already fetched the next instruction, because it needs to decode it in the case the branch won't be taken.If the branch is taken, the fetched instruction is discarded, and the core will fetch another instruction at the branch target. That's why I recommend adding 'statename' to the diagrams, to better understand what's happening. The 'IR' value is only looked at in the DECODE state, and ignored in other states.

For a branch there are three possibilities:

1) BRA0 -> DECODE (branch not taken)
2) BRA0 -> BRA1 -> DECODE (branch to same page)
3) BRA0 -> BRA1 -> BRA2 -> DECODE (branch to diff page)

In the BRA0 state, the core decides whether to take the branch, and at the same time already fetches the next instruction so it gets there just in time for DECODE.

If it takes the branch, it fetches the new instruction in BRA1 state, and starts executing it in the next DECODE cycle. That's why it takes only 2 cycles for a branch not taken, and 3 cycles for a branch taken.

If the new address calculation produces a carry/borrow, the BRA2 state is added, where it fixes the MSB, and fetches the opcode again. In that case, you'll see 3 opcode fetches in a row, 2 times from the wrong address, and the third time from the correct address.

ElEctric_EyE · Post by **ElEctric_EyE** » Sat May 28, 2011 12:15 am

Ok, thanks for the input guys. I am starting to think it's something peculiar to the Spartan 6 BRAM's. There's a couple more options for reset compared to the Spartan 3. Gonna try this design in a Spartan 3 and make some observations...

What I have observed is I do have some kind of memory address problem.
Also, went to the very beginning of the simulation. It stores at $10000000, then stores at $1000AA56. Getting warmer... Have to go to work soon

ElEctric_EyE · Post by **ElEctric_EyE** » Sun May 29, 2011 1:07 am

I am making no headway with the 65Org16 Indirect Indexed in ISim. So I decided to go back to the 6502SoC and test Indirect Indexed on Arlet's original 8 bit core only using ISim (I don't use indirect indexed since I don't have external memory, just a register based TFT display...

Either I don't know how to read ISim or something is wrong. I just replaced the $E000-$FFFF ROM software on the 6502SoC (in another new temp project file) with the original 8 bit version of the core. I ran ISim and observed strange results...

Something is amiss here, or I am misreading ISim?. Can someone prove to me Indirect Indexed actually works in these cores?

I will humbly admit if I have made a mistake...

Arlet · Post by **Arlet** » Sun May 29, 2011 6:57 am

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/indy.png

On this trace you can see how the LDA ($DE),Y instruction is executed, with Y=1, and memory at $00DE = $40, and memory at $00DF = $80.

Starting at the first red flag, and going through the cycles:

1) IR = $B1, which is the opcode. Note that the core is in the DECODE state, so the IR is relevant here.

2) state moves to INDY0, because it has decoded the instruction as indexed indirect with Y. In AB you can see the location 00DE of the first ZP byte. At the same time, you can see the ALU going to perform 00+DE+1 calculation (bottom 4 lines). The answer DF appears the next cycle in ADD.

3) state moves to INDY1. The contents of 00DE (40) appear on DI (as well as in IR, but that's not important since the core isn't in DECODE anymore). The results from the ALU are moved to the AB register (00DF) so it can fetch the second zeropage byte. At the same time, the ALU input AI is now set to Y, and the input BI is set to the value just read from zeropage (40), and these two are added up. The result (41) appears in ADD on the next cycle.

4) state moves to INDY2. The contents of 00DF (80) appear on DI, and they are immediately put back on the AB[15:8] bus, the result from the ZP+Y calculation are used as AB[7:0], so the full address $8041. Because there is no carry (not visible in diagram), this address is correct.

5) state moves to FETCH. AB is now back to value of PC ($FD4A) to fetch the next instruction. On DI you can see $20 appear which is the contents of the memory at $8041, the result of the LDA ($DE), Y instruction. This result is then moved to the A register in the next cycle (not visible here).

I suggest you find the same signal names, find a similar instruction, and compare the results.

Arlet · Post by **Arlet** » Sun May 29, 2011 7:11 am

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/staindy.png

On this 2nd waveform you can see a STA ($DE), Y with A=$A0, Y=$05, and DE,DF pointing to $8000.

The first couple of states are identical, DECODE, INDY0, INDY1, INDY2. But then the core moves to INDY3. The reason is that this is a write access and not a read, and when doing LSB+Y calculation in the ALU, resulting here in ADD=$05 in the INDY2 cycle, there may be a page overflow, in which case the MSB must be incremented as well. To avoid writing to the wrong page, an extra cycle is taken here where the MSB+00+Carry is fed through the ALU. Because there was no carry, in this case, the result on the AB bus doesn't change between INDY2 and INDY3.

In INDY3, the Accumulator is written to memory. You can see the WE=1, and $A0 appear on the DO bus, while $8005 appears on the AB bus.

BigEd · Post by **BigEd** » Sun May 29, 2011 8:11 am

great walk-throughs, thanks!

ElEctric_EyE · Post by **ElEctric_EyE** » Sun May 29, 2011 12:57 pm

Ditto!
I'm going to give what's left of my grey matter a rest for a couple days...

Arlet · Post by **Arlet** » Sun May 29, 2011 2:14 pm

I was playing with ISE, trying to optimize the timing for my design, and I discovered a nice tool (ISE 13.1 but may also be present in older versions)

Tools -> SmartXplorer. The first thing to try is 'Use built-in SmartXplorer strategies for Timing Performance', and hit OK.

It will try 7 different synthesis/place/route/map strategies, and select the best one. Often I have that one strategy doesn't meet timing constraints, but some other one will. After a change in the source, if it fails timing again, try re-running this, as the best strategy may have changed.

Of course, it's best to worry about this when the design is (almost) finished, otherwise you'll spend too much time on this. A fast computer is recommended, especially with bigger designs.

BigEd · Post by **BigEd** » Sun May 29, 2011 9:19 pm

Interesting - I have it too (on version 12.4) so I had a quick play. I've been synthesising to a 20ns (50MHz) target because that's the crystal I have.

To make it do some work, I set a 15ns clock. With the default 'balanced' tactics I'd been using I now got

Code: Select all

   Minimum period:  17.567ns{1}   (Maximum frequency:  56.925MHz)

(all these timings are post-routing)

With SmartXplorer the best result of the 7 standard tactics was:

Code: Select all

   Minimum period:  15.939ns{1}   (Maximum frequency:  62.739MHz)

I then started exploring some other choices. With 'timing performance' and 'Performance with Physical Synthesis' I got

Code: Select all

   Minimum period:  15.874ns{1}   (Maximum frequency:  62.996MHz)

and a suggestion to 'Increase the PAR Effort Level setting to "high"' (although that already seems to be the setting.)

At the next level of 'timing performance' and 'Performance without IOB packing' I meet the constraints with a result of

Code: Select all

   Minimum period:  14.998ns{1}   (Maximum frequency:  66.676MHz)

which means I need to set a higher target... and now with a 14ns target I get

Code: Select all

   Minimum period:  14.848ns{1}   (Maximum frequency:  67.349MHz)

I note that SmartXplorer does allow a lot more flexibility than just the 7 preset choices - it also allows running on several computers, if you have them. I agree that this is worth revisiting now and again, and only worth visiting when the design is ready. Looking carefully at timing reports and adjusting the design will usually get bigger initial gains than turning up the synthesis, and without taking up so much time.

But the bottom line is that a bit of experimentation gave me quite a healthy increase in clock speed.

Cheers
Ed

(Edit: and spartan6 is looking twice as fast as spartan3, so if in doubt, use a faster FPGA.)