6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 8:05 pm

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Sun Jan 08, 2017 12:24 am 
Offline

Joined: Sun Jan 08, 2017 12:07 am
Posts: 11
Greetings,

I'm currently working on implementing my own 6502 in a Xilinx Spartan 6 LX9 FPGA. I've been studying the hardware manual and various other resources online, but one thing I'm struggling with is the timing for instructions. For instance, I'm looking at the timing for absolute addressing in the A-3 of the hardware manual. I don't understand how data from a corresponding address is coming back from memory in the same clock cycle that that address is put on the address bus (hope that makes sense). Isn't there a 1-cycle read latency? If so, shouldn't an operation like LDA with absolute addressing take 5 cycles, not 4? I've attached an image of a timing diagram showing how I think it should work, which should hopefully help anyone reading this diagnose my misunderstanding.

On a related note, I'm wondering if there will be any game-breaking discrepancies in my design and the timing diagrams found in the appendix A of the hardware manual due to the fact that I'm using edge-triggered D flip-flops and a single clock, as opposed to latches and a two-phase clock, as is used in the original 6502 design.

Thanks in advance for any help you can provide!


Attachments:
LDA_timing.jpeg
LDA_timing.jpeg [ 1.83 MiB | Viewed 2935 times ]
Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 1:24 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Welcome!

LDA abs takes four cycles. The memory typically used is asynchronous (and also truly random-access). It has no clock input. As long as you meet the timing requirements for address, chip select, and output enable (and write enable, if you're writing), it doesn't care what you want to call a cycle. 10ns memory can deliver the data 10ns after the address and select are valid, regardless of your clock speed. On reads especially, the memory is basically combinatorial logic; so it's just a matter of how fast things can ripple through, with no clock.

In your diagram, also flip the clock shown. Your rising edges should be falling edges, and vice-versa. The first half of the cycle is when Φ2 is low, and the second half is when it's high. Right after it falls, the next address is put on the bus.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 8:47 am 
Offline

Joined: Sun Jan 08, 2017 12:07 am
Posts: 11
Thanks!

In regard to your last point, if CLK were Φ1, it would then be correct to show the address changing on rising edges of CLK, right? Since the two clocks are essentially complements of each other (with some additional wiggle room, I've gathered from reading, to ensure no overlapping). I'm curious about this since I'm modifying the design to use one clock instead of two, and I'm wondering if I can get away with this.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 8:56 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Welcome! You've got a good example there of the 6502's design style - it does indeed get more performance than you'd expect, in that data arriving on the data bus towards the end of one cycle can be output on the address bus early in the next cycle. You can do that on FPGA but you will have to pay attention to where you put your clock boundaries. Arlet's core manages to keep 6502 cycle counts, I think, even using synchronous memory. See
viewtopic.php?f=10&t=3453

You surely can clock a 6502 design on a single clock. By convention, we always use phi2 in 6502 land, and so you would be clocking on the falling edge. (Indeed, it's common to clock on a falling edge anyway, as pulldowns are stronger and falling edges are better defined.) In your diagram, you're using phi1, which is as you say logically equivalent, but is unconventional.


Last edited by BigEd on Sun Jan 08, 2017 10:26 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 9:19 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Hi, welcome.
For the 6502 there isn’t a read latency because as Garth says the memory used is asynchronously accessed. That means data coming in from the ram can be latched as the next address for the next clock cycle because the RAM isn’t waiting for a clock. The 6502 designers choose to use a longer clock cycle time in order to do things like this to reduce the clock cycle count for instructions. The 6502 is a little unusual in that it latches data on the negative clock edge rather than the positive edge. The 6502 has a slower clock than some other eight bit micros, but it does more work per clock cycle.

There are a few different ways to work around the asynchronous vs synchronous ram issue. One would be to use two clock cycles for every equivalent 6502 clock in order to read the ram. There’s one 6502 emulator that uses four clock cycles per 6502 clock for instance. As long as the multiple is fixed, cycle counted games should still work.

If using the block ram in an FPGA asynchronous memory can be simulated by using the opposite (negative edge) of the clock while reading. Dual port rams would allow writing at the positive edge while reading at the negative edge. Another option is to use a board with external asynchronous ram available. Using external asynchronous ram is bound to be slower than using block ram.

I’ve put together a couple of 6502 compatible cores myself, but without paying attention to clock cycle / clock phase accuracy. It’s plain difficult to do. They can still run some software like EhBASIC or Supermon. But probably not adequate for some games due to cycle counts being off. Newer software isn’t as reliant on cycle counting due to the plethora of processors available and the unpredictability of the things like caches.
http://github.com/robfinch/Cores/blob/master/FT816/

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 9:23 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Yes, Φ2 is the reference for the system. 65xx I/O ICs use a Φ2-input pin and a R/W pin (rather than a RD\ pin and a WR\ pin). At least some of them also need the chip selects, register selects, and R/W to be valid some amount of setup time before Φ2 rises.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 08, 2017 9:13 pm 
Offline

Joined: Sun Jan 08, 2017 12:07 am
Posts: 11
Thanks for the replies, everyone.

BigEd wrote:
Welcome! You've got a good example there of the 6502's design style - it does indeed get more performance than you'd expect, in that data arriving on the data bus towards the end of one cycle can be output on the address bus early in the next cycle. You can do that on FPGA but you will have to pay attention to where you put your clock boundaries. Arlet's core manages to keep 6502 cycle counts, I think, even using synchronous memory. See
viewtopic.php?f=10&t=3453


I'll take a look at that post now; thank you!

Rob Finch wrote:
There are a few different ways to work around the asynchronous vs synchronous ram issue. One would be to use two clock cycles for every equivalent 6502 clock in order to read the ram. There’s one 6502 emulator that uses four clock cycles per 6502 clock for instance. As long as the multiple is fixed, cycle counted games should still work.


By 6502 clock do you mean, for instance, if we're trying to emulate a 6502 running at 1 MHz, 1 us? And, if I'm understanding the idea you've proposed correctly, I could run the FPGA block RAM with a clock of half that (500 ns), thereby packing two FPGA cycles into a single 6502 cycle?

Rob Finch wrote:
If using the block ram in an FPGA asynchronous memory can be simulated by using the opposite (negative edge) of the clock while reading.


To make sure I'm understanding this correctly, I could use a synchronous block RAM which clocks out data on positive edges, and capture that data on negative edges, thereby capturing data in the same cycle that the data is read out of the RAM. This way, I'll have the captured the data before the next cycle, so it's almost like an asynchronous RAM. Is that correct?

For a bit of context, I'm interested in emulating an NES on an FPGA. At first, I thought I might just use already available 6502 cores, but then figured it would be a more interesting and educational project if I built the 6502 up from scratch. Based on your post, it seems that I might run into issues with some software, though, if I use one of these synchronous block RAM workarounds.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 09, 2017 1:17 am 
Offline

Joined: Sun Jan 08, 2017 12:07 am
Posts: 11
I've attached a new timing diagram to this post. Now, CLK is labeled Φ2, and I'm changing the address on falling edges. As suggested, I'm showing how I might use a synchronous FPGA block RAM as a pseudo-asynchronous RAM by clocking data out of the synchronous RAM on falling edges and capturing that data in a register on rising edges.

I have to break convention to perform the LDA instruction in 4 cycles because, as shown in the diagram, I change the address during T2 after a rising edge as opposed to a falling edge. This allows me to capture the data from the appropriate memory location in a register during T3.

Does this seem like a reasonable approach to keeping the cycle count the same between my FPGA design and an actual 6502 IC?


Attachments:
lda_neg_timing.jpg
lda_neg_timing.jpg [ 1.92 MiB | Viewed 2860 times ]
Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 09, 2017 4:16 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
If you need it to be compatible with normal 6502 boards and support ICs, note that although some of those parts need the address and control lines to be valid and stable before the rise of Φ2, they will not have dished up the info you want to read until some number of nanoseconds after that edge; and when you're pushing their speed limits, that time will be not long before the fall of Φ2. WDC's 14MHz W65C22 VIA (Versatile Interface Adapter) I/O IC for example may take up to 20ns (according to the data sheet) tCDR after the rise of Φ2 to get valid data on the bus to read. The processor has a setup time tDSR (like 10ns, but it depends on the speed rating), meaning the data have to be there that amount of time before the fall of Φ2 which is the point in time where they get latched. (It can probably get a valid read with less setup time than that—it's just that it's not guaranteed for less.) Then there's a hold time tDHR of usually 10ns where the processor wants the data to remain valid after the fall of Φ2. (Again, it can probably get along ok with somewhat less, but it's not guaranteed to.) (Note that if nothing else goes to driving the bus, bus capacitance will hold the data much longer, so there's usually nothing to worry about as far as hold time goes.)

These things are of course in the data sheets' timing diagrams and charts. Jeff Laughton has some excellent to-scale animated timing diagrams showing the effects of changing the speed, at http://laughtonelectronics.com/Arcana/V ... iming.html .

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 09, 2017 4:34 am 
Offline

Joined: Sun Jan 08, 2017 12:07 am
Posts: 11
I will be using exclusively block RAM for memory, internal to the FPGA, so I think this timing scheme should at least work for that application. I plan to build the whole system in the fabric of the FPGA (i.e. no external ICs). Given that, I think what I've got in the latest diagram should work, but if someone sees any holes in it, that would be great to know.


Last edited by CitizenSnips on Mon Jan 09, 2017 4:57 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 09, 2017 4:38 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
If you're going to do all your own I/O onboard as well, then you can of course do it any way you like. The 6502 uses memory-mapped I/O, so it reads and writes to I/O in exactly the same way it does to memory, with the same instructions, addressing modes, etc..

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 09, 2017 5:53 am 
Offline

Joined: Sun Jan 08, 2017 12:07 am
Posts: 11
In an effort to stay as true to the original bus timing as possible, I think I've come up with a better design yet. I realize this is probably overkill at this point, since, as Garth pointed out, some of the design constraints I was concerned with are loosened (or null) because I'm doing this entirely in the FPGA. According to the hardware manual, "A Phase One clock pulse is the positive pulse during which the address lines change and a Phase Two clock pulse is the positive pulse during which the data is transferred." With this in mind, I've come up with the attached timing diagram. The address changes on falling edges of Φ2, reads from the synchronous block RAM occur on rising edges of Φ2, and data is registered from the data bus after a read midway through the positive Φ2 pulse. This is done by using a (Φ2)/2 clock that is phase aligned with the falling edges of Φ2. One of the rising edges of this clock occurs during the middle of the Φ2 pulse, and it's on that edge that data is captured in a register. This scheme 1) changes the address during the low pulse of Φ2 (AKA the high pulse of Φ1), 2) transfers data during the high pulse of Φ2, and 3) captures data a quarter of a cycle after the start of the high pulse of Φ2. The first two points are notable since they agree with the hardware manual specification, and the third point addresses Garth's note:

GARTHWILSON wrote:
If you need it to be compatible with normal 6502 boards and support ICs, note that although some of those parts need the address and control lines to be valid and stable before the rise of Φ2, they will not have dished up the info you want to read until some number of nanoseconds after that edge


By waiting a quarter cycle after the rising edge to capture data, I'm simulating the wait time that would be needed for an external IC to dish up info.

Does this seem reasonable? Thanks again for all the input.


Attachments:
lda_timing_final.JPG
lda_timing_final.JPG [ 751.47 KiB | Viewed 2849 times ]
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: