6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 6:27 am

All times are UTC




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Arlet's 6502 Core Timing
PostPosted: Fri Sep 11, 2015 7:14 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
I've been playing with Arlet's cute 6502 Verilog core recently and trying to embed it onto an FPGA for a DAQ application to analyze bus signals (the 6502 tells an FPGA-logic analyzer to start collecting data and sends/receives data from a DUT).

To make my life easier, I've tried connecting the 6502 core to a shared wishbone bus with the following memory map:
0x0000-01FFF Uninitialized RAM
0x2000-0x3DFF Initialized RAM/ROM Program (Loaded by either by IPL or FPGA as part of bitstream init)
0x3E00-0x3EFF Wishbone I/O
0x3FF0-0x3FF9 IPL
0x3FFA-0x3FFF Vectors (taking advantage of incomplete address decoding here)

To connect the core to a Wishbone bus, I've tied 6502 RDY to Wishbone ACK, and Wishone STB and CYC to a signal which simply ensures that the cycle after ack is a wait-state for the 6502. While I got the core uploaded to my FPGA, any attempts to access I/O would crash. Through simulation, I've figured out that this is because while the 6502 attempts to decode the high 8 bits of an absolute addressing mode operand, there is a combinatorial path from Data In to the upper part of the Address Bus out (For Arlet specifically: during state ABS1: RDY1 is high and so DIMUX takes the value of DI, and DI feeds back into AB[15:8]). If e.g. I'm accessing I/O using an absolute address mode, the upper 8-bits of the address bus will immediately take the value of the data-in bus without waiting for posedge.

As an example using my address decoding above; assume there are two separate synchronous RAMs at 0x2000-0x3DFF/0x3F00-0x3FFF and 0x3E00-0x3EFF regions. Combinational address decoding ensures that only only RAM is selected at a time, by checking that the top 8 bits of the address bus equal 0xFE or not. Synchronous RAMs will update the read data bus every cycle, regardless of whether they are selected or not- it's up to the CPU to latch the data or ignore it. All the RAM has to do is ACK when accessed to indicated the data is now valid. See More Technical Information.

Now suppose the conditions are met so the address bus is combinationally linked to the input data bus- an absolute addressing mode (read or write doesn't matter). The CPU reads the top 8 bits of an absolute address from the SRAM at 0x2000-0x3DFF, or 0x3F00-0x3FFF. The input data will be fed to the address bus to prepare to either latch new input data/write new output data from/to the absolute addressing's source/destination during the next posedge.

Suppose the absolute address to be accessed is NOT in the same address decoding region as the address used to get the top 8 bits of the address bus. The minute there is an address-decode match, the muxes will switch from one device's read data bus and control signals to another. This of course causes a different device to be accessed, which in turn sends data to the data-in bus without waiting, and things get ugly from here. Simulation in my case confirms a combinatorial loop, but there's no guarantees that the new data that the CPU "sees" will be valid at all!

I'm not sure how to fix this problem/am looking for advice and/or a timing diagram. I suppose the easiest thing to do would be to add a register to the address line to delay changes by one clock, but I'm not sure this will cause improper operation in other aspects; I assume there is a reason the address takes the value of the data bus immediately instead of waiting until a clock transition. Same with adding a register to data in to ensure a one-cycle delay.

From what I understand, Arlet's core effectively divides the input clock by two to simulate activity on both positive and negative transitions of the clock. Have I possibly discovered a bug, or am I just interfacing to other parts of the FPGA improperly? Is there a proper timing diagram reference for Arlet's core?


More Technical Information:
Unfortunately, in a shared wishbone topology, the only thing that traditionally prevents a piece of hardware being accessed, and a CPU "seeing" the data the device intends to send is a combinational mux. Although the data may be invalid, it's legal for wishbone devices to put data on its read bus when it's not accessed. Only a mux between all devices read data buses prevents the CPU from seeing this data. There are topologies where the read data will in fact consistently reach the CPU each cycle! The CPU is not supposed to react to/latch the data until it gets a signal from the device that the data is valid.

EDIT1: RDY=>RDY1


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 11, 2015 7:53 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Arlet will give the best answer I'm sure, but in the meantime, I think I remember that his core is designed for synchronous RAM - the address is output just prior to the clock edge, so that the RAM can clock it and respond. Most of us are used to the 6502's interface which is made for conventional SRAM: the address is output just after the clock edge and the RAM figures out what to do during the clock cycle.

So it's probably a step forward, if not the whole story, to clock the address pins of the core to get closer to a conventional 6502 setup. Probably RnW has a similar timing aspect.

Having said that, I'm not sure what Wishbone needs to see. Perhaps it needs exactly this kind of interface?

(I don't believe Arlet's core uses both edges of the clock.)

Hope this helps.

Oh, and this is a previous discussion: viewtopic.php?p=14677#p14677


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 11, 2015 8:18 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Okay, so it's not just me... looks like you ran into the same problem I did.

This is a real pity though... the speed of the core will be no more than 2/3's full speed thanks to this; the RAM itself is synchronous. The address decoding of my design is not. So synchronous RAM accesses are delayed by a full clock cycle, even though they are more than up to the task without the delay, because of the interaction of components in my design :/.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 6:25 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I'm a bit confused about the state of the RDY signal during all of this. Is it being deasserted ? And if so, can you show the timing ? (note that I have no experience/knowledge of wishbone)

User sleary78 opened an issue on github that had to do with RDY, and I think the core has a problem in cases where RDY is being deasserted, and the data bus is invalid. In those cases, the bad data is used to drive the address bus. https://github.com/Arlet/verilog-6502/issues/3
He closed the issue because he managed to find a workaround, but I still think there's a genuine bug in there.

The combinatorial path from data to address bus is simply to prevent extra waiting cycles.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 6:29 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigEd wrote:
I don't believe Arlet's core uses both edges of the clock.

That is correct. Only positive clock edge is used, and the core cycles go 1-to-1 with the clock cycles.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 6:32 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Okay, after some timing/simulation analysis of Arlet's core, along with looking on Arlet's website I think I figured out most of the internals:

Data is fetched on cycle X posedge, Data is interpreted on cycle X + 1 posedge, as indicated on the website. This pipeline allows Arlet's 6502 core to run at the full clock speed using only one edge of an FPGA clock. I think this is functionally equivalent to just shifting an ASIC's 6502 timing by one/half of a clock cycle and just having the address bus remain valid for longer than an ASIC core. It would be nice to see how ASIC 6502 cycles translate to Arlet 6502 cycles via a diagram though :P.

To prevent the pipeline from losing a cycle, write data becomes valid at the same time the address bus does, occurring the cycle Y + 1, immediately after the core realizes a write is going to take place, decoded on cycle Y. Contrast to an ASIC 6502, where data written is only valid after posedge (when address became valid on negedge).

To prevent the pipeline from losing a cycle to refill during absolute address writes/reads, the high byte is propogated directly to the address bus.

In any case, latching data in sequentially, tying RDY to wishbone ACK, and tying STB and CYC to the reset signal is sufficient to create a Wishbone interface at half speed (50 MHz to 25 MHz). All other signals are direct connections in Arlet's core.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 6:44 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Arlet wrote:
I'm a bit confused about the state of the RDY signal during all of this. Is it being deasserted ? And if so, can you show the timing ? (note that I have no experience/knowledge of wishbone)
RDY is deasserted every other clock cycle b/c wishbone (without more control signals than I have available) requires two full clock cycles per xfer to work properly. I would need this wait state regardless of the issue I describe in subsequent paragraphs, but not the latched DI. See attachment.


Arlet wrote:
User sleary78 opened an issue on github that had to do with RDY, and I think the core has a problem in cases where RDY is being deasserted, and the data bus is invalid. In those cases, the bad data is used to drive the address bus. https://github.com/Arlet/verilog-6502/issues/3
He closed the issue because he managed to find a workaround, but I still think there's a genuine bug in there.
I'll look into it and see if I can flush that bug out.

EDIT: The user seemed to have a bug similar to mine; he needed a delay element on DI to prevent the combinational connection of DI to AB from accessing a bad address. I needed it to prevent an infinite loop. The chance of needing this delay element depends on how your address decoding is set up. The core works fine AS LONG AS the only thing it's connected to is a synchronous memory, and presumably synchronous I/O. Unfortunately, my I/O is asynchronous by virtue of asynchronous address decoding, which is the root of the problem. Data is placed on the 6502 core's bus immediately, not waiting for the next clock cycle to drive the bus; additionally the synchronous memory's address decoding mux is deactivated, so it stops driving the bus too early.

It's not really a bug... just a design decision that has potentially unexpected consequences. Oh, and if you need this delay element like I did... yes, the clock speed is effectively halved b/c of the added wait state (my complaint about 2/3's was something unrelated that I managed to solve).


Attachments:
6502.PNG
6502.PNG [ 100.39 KiB | Viewed 3064 times ]


Last edited by cr1901 on Sat Sep 12, 2015 7:02 am, edited 3 times in total.
Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 6:54 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I think the problem is that the combinatorial path from data to address bus is maintained even when RDY=0, where instead you would expect the address bus to stay fixed at the same value when RDY=0.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 6:58 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Arlet wrote:
I think the problem is that the combinatorial path from data to address bus is maintained even when RDY=0, where instead you would expect the address bus to stay fixed at the same value when RDY=0.
See my edited post. I think what you just described is a side effect of the real issue.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 7:03 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Quote:
I've tied 6502 RDY to Wishbone ACK

I believe the RDY line needs to be active for the 6502 even when there are invalid bus cycles, or bus cycles that don't access I/O or memory.
You could generate the CYC and STB signals whenever the '02 address changes. And negate CYC and STB when there is an ACK. I think you need to generate CYC and STB signals all the time so that a corresponding ACK/RDY is generated. Interfacing WISHBONE to the 6502 bus could be non-trivial.
Using the WISHBONE bus with Arlet's core will cut it's performance in half, because WISHBONE requires a minimum of two clock cycles per bus access.
Unless you are planning on using WISHBONE bus access with peripherals I'd be tempted to the use the 6502 style bus (for RAM / ROM etc). To interface with synchronous RAM's / ROM's in the FPGA an FF is required to create a single cycle delayed ready signal.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 7:57 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Rob Finch wrote:
I think you need to generate CYC and STB signals all the time so that a corresponding ACK/RDY is generated.
Indeed, this is what I do. The core is doing a perpetual Wishbone block mode transfer.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 9:31 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Could you improve performance by pretending to do bursts at incrementing addresses, and abandon data + start a new transfer when the 6502 doesn't read from next address ?


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 9:43 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
(See Alan Cox's figures on the benefits of a tiny cache and on reading successor bytes:
viewtopic.php?f=1&t=3146)


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 11:13 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Rob Finch wrote:
Unless you are planning on using WISHBONE bus access with peripherals I'd be tempted to the use the 6502 style bus (for RAM / ROM etc).


You could do a combination. Connect one or more block RAMs directly to the 6502 for fast local access, and add a wishbone bridge to attach the wishbone-only stuff.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 12, 2015 11:41 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Arlet wrote:
Could you improve performance by pretending to do bursts at incrementing addresses, and abandon data + start a new transfer when the 6502 doesn't read from next address ?
I don't think your coree provides enough control signals to the outside world for me to do that (I use a set of Python modules that generates Verilog, and register your core with the toolbox). Having a comparator that compares the current address to the previous address sounds like it would miss some edge cases.

Arlet wrote:
You could do a combination. Connect one or more block RAMs directly to the 6502 for fast local access, and add a wishbone bridge to attach the wishbone-only stuff.
This is most likely doable.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: