A 6502 SoC Project using a Spartan 3 FPGA

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Apr 12, 2011 11:59 am

Dr Jefyll wrote:

... It might be good to do some more reading on the subject and hear it explained in different words, or a schematic maybe...

Yes, I should've read up on it some more. I think BDD has a schematic with RDY hardware somewhere. My alter-ego was a little too proud of his creation, heh!

Alas, I do have part of it working. When I internally hardwire $55 to the Flash databus I see a grey screen fill at about 1/4 speed of a previous screen clear.

Arlet wrote:

Disadvantage of the clock switching method is that it needs to be done carefully to avoid glitches. It also creates extra delay in the clock path...

I think I chose a design that avoids glitches and metastability issues, but my problem may be the delay you mention... My weekend is finally here, so I have many hours to experiment.

Arlet · Post by **Arlet** » Tue Apr 12, 2011 12:01 pm

By the way, does anybody know a good reference to the details of the RDY signal on the 6502 ?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Apr 12, 2011 8:35 pm

We had a wait-state topic going over here, complete with a schematic or two and WinCUPL code. The discussion was about implementing wait-states using a GAL, although it could be done solely with discrete logic if performance isn't an issue.

BTW, don't forget that an NMOS 6502's response to RDY is a bit broken, in that the MPU doesn't halt during a write cycle. Hence, using RDY to wait-state writes to slow hardware in an NMOS system isn't going to produce the desired results. This is yet another reason for using CMOS parts in new designs.

BigEd · Post by **BigEd** » Tue Apr 12, 2011 8:50 pm

BigDumbDinosaur wrote:

... using RDY to wait-state writes to slow hardware in an NMOS system isn't going to produce the desired results.

It does make some kind of sense, if you think of memory subsystems. The slow memory subsystem has to catch the data (quickly) and do something with it (slowly.) The worst case is three back to back writes (on an interrupt.) Only on a read is it necessary to stall the CPU, because the CPU can't make progress without the data.

I agree that it's not super-convenient, but it's not broken. What you need is a write buffer between the CPU bus and the memory subsystem. At which point, stretching or suppressing clock pulses starts to look much more attractive.

(Edit to add: as DrJefyll's linked Hardware Ref says, the case they had in mind at the time was slow ROMs, so reads were the practical problem to solve, not writes.)

Dr Jefyll · Post by **Dr Jefyll** » Wed Apr 13, 2011 1:43 am

There's some basic, introductory info in the MCS6500 Family Hardware Manual. In section 1.4.1.2.8 it tells you the RDY pin description (same as any 65xx02 datasheet will list), and section 2.3.4, "Application of RDY to controlling the Memory Interface" has further info, including an example schematic. The Manual is available here on 6502.org and also I found an HTML-ified version here on www.kim-1.com.

The schematic in the manual is very simple, and generates just a single wait state, stretching the access from one cycle to two. (NB: The box "PROM address detection" evidently produces an active-high output.) For a 1 Mhz system, you'd increase your access time by 1 μs, compared to the no-wait-state value.

-- Jeff

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Apr 13, 2011 4:16 am

Dr Jefyll wrote:

There's some basic, introductory info in the MCS6500 Family Hardware Manual. In section 1.4.1.2.8 it tells you the RDY pin description (same as any 65xx02 datasheet will list), and section 2.3.4, "Application of RDY to controlling the Memory Interface" has further info, including an example schematic. The Manual is available here on 6502.org and also I found an HTML-ified version here on www.kim-1.com.

The schematic in the manual is very simple, and generates just a single wait state, stretching the access from one cycle to two. (NB: The box "PROM address detection" evidently produces an active-high output.) For a 1 Mhz system, you'd increase your access time by 1 μs, compared to the no-wait-state value.

-- Jeff

Most of that is of historical interest now, as even slow RAM is very fast. Where the wait-states might matter is with I/O hardware. But even then, most of the I/O hardware we use nowadays is much faster than a 6502 running at 1 or 2 MHz.

Arlet · Post by **Arlet** » Wed Apr 13, 2011 5:19 am

BigDumbDinosaur wrote:

Most of that is of historical interest now, as even slow RAM is very fast. Where the wait-states might matter is with I/O hardware. But even then, most of the I/O hardware we use nowadays is much faster than a 6502 running at 1 or 2 MHz.

It's still relevant if you want to hook up a 38 MHz 6502 to a 70 ns flash chip.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Apr 13, 2011 2:37 pm

Arlet wrote:

BigDumbDinosaur wrote:

Most of that is of historical interest now, as even slow RAM is very fast. Where the wait-states might matter is with I/O hardware. But even then, most of the I/O hardware we use nowadays is much faster than a 6502 running at 1 or 2 MHz.

It's still relevant if you want to hook up a 38 MHz 6502 to a 70 ns flash chip.

I'm waiting for that 38 MHz 6502 to be released by MOS Technology.

Arlet · Post by **Arlet** » Wed Apr 13, 2011 2:47 pm

You can make your own 38MHz 6502 with an FPGA.

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Apr 13, 2011 2:58 pm

I'm sure it can go faster but the display it is interfaced to, limits the speed. According to ISE top limit should be around 53MHz.
Though, if I was not limited pushing top speed of the FPGA, it might even go EVEN faster in a Spartan 6. I got my eye on a XC6SLX9-3TQG144I 144-pin QFP version! Avnet, had them last week. But someone scooped them up already.
----------------------------------------------------------------------------------------------------

Let me stop right there, as I realize I'm currently limiting the CPU because I know it works based on what I can see on the display, but...
What if I had some kind of state machine controlling the MAX O2, for the internal CPU only, and monitor an 8bit port for a "key". This key would be sent by the 6502 at X MHz and would be included in all subroutines. If the state machine didn't see the key it would lower the speed from (X) MHz to (X-1) MHz, and would keep doing so until it saw the key. This would be another reason to use the Spartan 6. Their DCM's can be externally programmed via SPI, unlike the Spartan 3 where it is set once by the FPGA PROM on startup.

Of course I would have to fix the problem I'm having right now...
Running everything at 12MHz (no O2 switching), I powered up the scope and focused on the data lines out of the Flash. I am not seeing the expected $FF. I was using it in a read only fashion. But maybe they come shipped with all '0's and I need to do a chip erase. For that I have to make the dedicated Flash databus 2-way, which I'm working on now. Since it is dedicated there's no need for tristating.

BigEd · Post by **BigEd** » Wed Apr 13, 2011 6:38 pm

Arlet wrote:

When I have some more time, I should take a look at what it takes to implement RDY in the core.

Hi Arlet
I think we need a new topic all about RDY which can reference the many and varied observations that have been made. But in the meantime, my own mental model - which could be wrong, and I think I've had doubts in the past - is that RDY is sampled on the falling edge, and acts to stall the machine. (At least for 65C02)

If that's right, then perhaps your clocked processes transform in a simple way to

Code: Select all

always @(posedge clk)
  if (rdy)  
    begin  
      ABL <= AB[7:0];
      ABH <= AB[15:8];
    end

(perturbs your indentation but otherwise not very intrusive)

If it's more subtle than that, I'm all ears. Also, I'm not sure how that kind of coding affects the implementation - it might not be smart. Or even correct!

Cheers
Ed

Arlet · Post by **Arlet** » Wed Apr 13, 2011 7:05 pm

Hmm... according to the Hardware Manual, section 1.4.1.2.8:

Quote:

The RDY function will not stop the processor in a cycle in which a WRITE operation is being performed. If the RDY line goes from high to low during a WRITE cycle the processor will execute that cycle and will then stop in the next READ cycle (R/W = 1).

I wonder how that's supposed to work with slow devices that are written to.

As far as my core, I think the test for RDY should go in the microcode state machine, in other words the assignment to 'state' should be skipped when RDY=0.
As a result, it looks like most of the other stuff stays the same too (except for DI, which is the one we'd expect to change).

In addition to your change above, the assignment to ADD (ALU output) also needs to be avoided when RDY=0, otherwise, because some states loop the ADD back into the ALU input, which would case a feedback loop.

That's all I see for now... but I may be missing things.

BigEd · Post by **BigEd** » Wed Apr 13, 2011 7:16 pm

Sorry, I didn't just mean that single statement, I meant every 'posedge' always block. I think that covers all the assignments you mention.

As you say, the NMOS hardware ref says that there's no stall for write cycles - my model for RDY isn't like that. (I think that wasn't a killer bug, because writing to a slow peripheral like a teletype just means writing through a latch and having the driver polling some status to avoid overrunning. Probably RAM being slow wasn't even a case under consideration, so back-to-back writes wasn't an issue. In fact, you could have slow high RAM, for banked video or program RAM, so long as you avoid slow stack you'd be [able to get away with a latch to capture writes].)

Cheers
Ed

Arlet · Post by **Arlet** » Wed Apr 13, 2011 7:23 pm

BigEd wrote:

Sorry, I didn't just mean that single statement, I meant every 'posedge' always block. I think that covers all the assignments you mention.

Aha, yes, that would do the trick. I was trying to find the minimal set of always blocks that needed to be modified.

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Apr 13, 2011 11:20 pm

Is it really that easy to make a RDY argument? Minus the 'wire' statements, etc.

I thought Arlet was saying in his previous post that he would have to get into the microcode?

A 6502 SoC Project using a Spartan 3 FPGA

implementing RDY in verilog core