Alas, I do have part of it working. When I internally hardwire $55 to the Flash databus I see a grey screen fill at about 1/4 speed of a previous screen clear.
A 6502 SoC Project using a Spartan 3 FPGA
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Dr Jefyll wrote:
... It might be good to do some more reading on the subject and hear it explained in different words, or a schematic maybe...
Alas, I do have part of it working. When I internally hardwire $55 to the Flash databus I see a grey screen fill at about 1/4 speed of a previous screen clear.
Arlet wrote:
Disadvantage of the clock switching method is that it needs to be done carefully to avoid glitches. It also creates extra delay in the clock path...
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
We had a wait-state topic going over here, complete with a schematic or two and WinCUPL code. The discussion was about implementing wait-states using a GAL, although it could be done solely with discrete logic if performance isn't an issue.
BTW, don't forget that an NMOS 6502's response to RDY is a bit broken, in that the MPU doesn't halt during a write cycle. Hence, using RDY to wait-state writes to slow hardware in an NMOS system isn't going to produce the desired results. This is yet another reason for using CMOS parts in new designs.
BTW, don't forget that an NMOS 6502's response to RDY is a bit broken, in that the MPU doesn't halt during a write cycle. Hence, using RDY to wait-state writes to slow hardware in an NMOS system isn't going to produce the desired results. This is yet another reason for using CMOS parts in new designs.
x86? We ain't got no x86. We don't NEED no stinking x86!
BigDumbDinosaur wrote:
... using RDY to wait-state writes to slow hardware in an NMOS system isn't going to produce the desired results.
I agree that it's not super-convenient, but it's not broken. What you need is a write buffer between the CPU bus and the memory subsystem. At which point, stretching or suppressing clock pulses starts to look much more attractive.
(Edit to add: as DrJefyll's linked Hardware Ref says, the case they had in mind at the time was slow ROMs, so reads were the practical problem to solve, not writes.)
Last edited by BigEd on Wed Apr 13, 2011 8:55 pm, edited 1 time in total.
There's some basic, introductory info in the MCS6500 Family Hardware Manual. In section 1.4.1.2.8 it tells you the RDY pin description (same as any 65xx02 datasheet will list), and section 2.3.4, "Application of RDY to controlling the Memory Interface" has further info, including an example schematic. The Manual is available here on 6502.org and also I found an HTML-ified version here on www.kim-1.com.
The schematic in the manual is very simple, and generates just a single wait state, stretching the access from one cycle to two. (NB: The box "PROM address detection" evidently produces an active-high output.) For a 1 Mhz system, you'd increase your access time by 1 μs, compared to the no-wait-state value.
-- Jeff
The schematic in the manual is very simple, and generates just a single wait state, stretching the access from one cycle to two. (NB: The box "PROM address detection" evidently produces an active-high output.) For a 1 Mhz system, you'd increase your access time by 1 μs, compared to the no-wait-state value.
-- Jeff
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Dr Jefyll wrote:
There's some basic, introductory info in the MCS6500 Family Hardware Manual. In section 1.4.1.2.8 it tells you the RDY pin description (same as any 65xx02 datasheet will list), and section 2.3.4, "Application of RDY to controlling the Memory Interface" has further info, including an example schematic. The Manual is available here on 6502.org and also I found an HTML-ified version here on www.kim-1.com.
The schematic in the manual is very simple, and generates just a single wait state, stretching the access from one cycle to two. (NB: The box "PROM address detection" evidently produces an active-high output.) For a 1 Mhz system, you'd increase your access time by 1 μs, compared to the no-wait-state value.
-- Jeff
The schematic in the manual is very simple, and generates just a single wait state, stretching the access from one cycle to two. (NB: The box "PROM address detection" evidently produces an active-high output.) For a 1 Mhz system, you'd increase your access time by 1 μs, compared to the no-wait-state value.
-- Jeff
x86? We ain't got no x86. We don't NEED no stinking x86!
BigDumbDinosaur wrote:
Most of that is of historical interest now, as even slow RAM is very fast. Where the wait-states might matter is with I/O hardware. But even then, most of the I/O hardware we use nowadays is much faster than a 6502 running at 1 or 2 MHz.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Arlet wrote:
BigDumbDinosaur wrote:
Most of that is of historical interest now, as even slow RAM is very fast. Where the wait-states might matter is with I/O hardware. But even then, most of the I/O hardware we use nowadays is much faster than a 6502 running at 1 or 2 MHz.
x86? We ain't got no x86. We don't NEED no stinking x86!
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
I'm sure it can go faster but the display it is interfaced to, limits the speed. According to ISE top limit should be around 53MHz.
Though, if I was not limited pushing top speed of the FPGA, it might even go EVEN faster in a Spartan 6. I got my eye on a XC6SLX9-3TQG144I 144-pin QFP version! Avnet, had them last week. But someone scooped them up already.
----------------------------------------------------------------------------------------------------
Let me stop right there, as I realize I'm currently limiting the CPU because I know it works based on what I can see on the display, but...
What if I had some kind of state machine controlling the MAX O2, for the internal CPU only, and monitor an 8bit port for a "key". This key would be sent by the 6502 at X MHz and would be included in all subroutines. If the state machine didn't see the key it would lower the speed from (X) MHz to (X-1) MHz, and would keep doing so until it saw the key. This would be another reason to use the Spartan 6. Their DCM's can be externally programmed via SPI, unlike the Spartan 3 where it is set once by the FPGA PROM on startup.
Of course I would have to fix the problem I'm having right now...
Running everything at 12MHz (no O2 switching), I powered up the scope and focused on the data lines out of the Flash. I am not seeing the expected $FF. I was using it in a read only fashion. But maybe they come shipped with all '0's and I need to do a chip erase. For that I have to make the dedicated Flash databus 2-way, which I'm working on now. Since it is dedicated there's no need for tristating.
Though, if I was not limited pushing top speed of the FPGA, it might even go EVEN faster in a Spartan 6. I got my eye on a XC6SLX9-3TQG144I 144-pin QFP version! Avnet, had them last week. But someone scooped them up already.
----------------------------------------------------------------------------------------------------
Let me stop right there, as I realize I'm currently limiting the CPU because I know it works based on what I can see on the display, but...
What if I had some kind of state machine controlling the MAX O2, for the internal CPU only, and monitor an 8bit port for a "key". This key would be sent by the 6502 at X MHz and would be included in all subroutines. If the state machine didn't see the key it would lower the speed from (X) MHz to (X-1) MHz, and would keep doing so until it saw the key. This would be another reason to use the Spartan 6. Their DCM's can be externally programmed via SPI, unlike the Spartan 3 where it is set once by the FPGA PROM on startup.
Of course I would have to fix the problem I'm having right now...
Running everything at 12MHz (no O2 switching), I powered up the scope and focused on the data lines out of the Flash. I am not seeing the expected $FF. I was using it in a read only fashion. But maybe they come shipped with all '0's and I need to do a chip erase. For that I have to make the dedicated Flash databus 2-way, which I'm working on now. Since it is dedicated there's no need for tristating.
implementing RDY in verilog core
Arlet wrote:
When I have some more time, I should take a look at what it takes to implement RDY in the core.
I think we need a new topic all about RDY which can reference the many and varied observations that have been made. But in the meantime, my own mental model - which could be wrong, and I think I've had doubts in the past - is that RDY is sampled on the falling edge, and acts to stall the machine. (At least for 65C02)
If that's right, then perhaps your clocked processes transform in a simple way to
Code: Select all
always @(posedge clk)
if (rdy)
begin
ABL <= AB[7:0];
ABH <= AB[15:8];
end
If it's more subtle than that, I'm all ears. Also, I'm not sure how that kind of coding affects the implementation - it might not be smart. Or even correct!
Cheers
Ed
Hmm... according to the Hardware Manual, section 1.4.1.2.8:
I wonder how that's supposed to work with slow devices that are written to.
As far as my core, I think the test for RDY should go in the microcode state machine, in other words the assignment to 'state' should be skipped when RDY=0.
As a result, it looks like most of the other stuff stays the same too (except for DI, which is the one we'd expect to change).
In addition to your change above, the assignment to ADD (ALU output) also needs to be avoided when RDY=0, otherwise, because some states loop the ADD back into the ALU input, which would case a feedback loop.
That's all I see for now... but I may be missing things.
Quote:
The RDY function will not stop the processor in a cycle in which a WRITE operation is being performed. If the RDY line goes from high to low during a WRITE cycle the processor will execute that cycle and will then stop in the next READ cycle (R/W = 1).
As far as my core, I think the test for RDY should go in the microcode state machine, in other words the assignment to 'state' should be skipped when RDY=0.
As a result, it looks like most of the other stuff stays the same too (except for DI, which is the one we'd expect to change).
In addition to your change above, the assignment to ADD (ALU output) also needs to be avoided when RDY=0, otherwise, because some states loop the ADD back into the ALU input, which would case a feedback loop.
That's all I see for now... but I may be missing things.
Sorry, I didn't just mean that single statement, I meant every 'posedge' always block. I think that covers all the assignments you mention.
As you say, the NMOS hardware ref says that there's no stall for write cycles - my model for RDY isn't like that. (I think that wasn't a killer bug, because writing to a slow peripheral like a teletype just means writing through a latch and having the driver polling some status to avoid overrunning. Probably RAM being slow wasn't even a case under consideration, so back-to-back writes wasn't an issue. In fact, you could have slow high RAM, for banked video or program RAM, so long as you avoid slow stack you'd be [able to get away with a latch to capture writes].)
Cheers
Ed
As you say, the NMOS hardware ref says that there's no stall for write cycles - my model for RDY isn't like that. (I think that wasn't a killer bug, because writing to a slow peripheral like a teletype just means writing through a latch and having the driver polling some status to avoid overrunning. Probably RAM being slow wasn't even a case under consideration, so back-to-back writes wasn't an issue. In fact, you could have slow high RAM, for banked video or program RAM, so long as you avoid slow stack you'd be [able to get away with a latch to capture writes].)
Cheers
Ed
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA