6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 4:53 pm

All times are UTC




Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Mar 11, 2022 11:28 pm 
Offline

Joined: Fri Mar 11, 2022 11:25 pm
Posts: 15
Does anybody know much about sharing (video) RAM between the 6502 and "something else"?

I have an FPGA VGA controller (640x480 display at 25MHz) that uses SRAM for its framebuffer. I'm wanting to share that SRAM with a 6502.

I was thinking I could generate the 6502 clock signal from my FPGA but I'm unsure how the RDY and "Phase 2" signals can be used on the 6502.

At the moment my VGA controller reads 320 words from RAM at the start of each horizontal scan line into FIFO and then the RAM is untouched for for 4/5ths of each horizontal scan line. (The RAM is pretty much read during the horizontal porch and is untouched for the active pixel scan).

I'm thinking there must be a way in which I can timeslice/share the RAM between the VGA controller and 6502. I don't have to read the 320 words at once - I could read smaller chunks - ie. read 10 words, yield to the 6502, read 10 words, yield to the 6502 etc.

How would I go about doing this? I want to make sure that if the 6502 is running, it has enough clock cycles to complete it's RAM reads/writes. I know within my VGA controller's SRAM controller, I have a number of cycles per RAM read and write - how can I guarantee that I have given the 6502 enough cycles to complete one of it's instructions? Do I just need to find which instruction takes the most cycles and let the 6502 have that number of clock cycles each timeslice?

I thought about DPRAM but it's far too expensive for me.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 4:47 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8545
Location: Southern California
Welcome.

This gets discussed all the time, so I'm surprised no one has jumped in yet; but I'm at a bit of a loss to tell you what search terms to use to get you to the best index to the many topics about it. Hopefully someone else will help with that. It was done on the Apple II, with video accessing RAM when Φ2 is low and the processor accessing it when Φ2 is high; so there were two memory accesses per cycle.

Something that may be a little simpler is to use a 65816 and use its VDA and VPA to identify dead bus cycles where something external can use more of the cycle. This has been suggested for DMA that's easier on the timing and doesn't affect program-execution speed.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 5:01 am 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1120
Location: Albuquerque NM USA
I'm juggling too many projects. I was hoping others would jump in.

Here is my approach to time share the video RAM. It is the same technique as Apple II except it is running at 1/2 the 25.175MHz or 12.6MHz. 6502 accesses the RAM when 12.6MHz clock is high and video logic access RAM when 12.6MHz is low. The video logic fetches a byte and drives video output with high nibble on one edge of the 12.6MHz clock and drives video output with low nibble on the other edge of the 12.6MHz.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 8:26 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Welcome SparkyNZ!

As Garth notes, it's a common design pattern, to share a memory. But there are many approaches.

There are three ways of looking at time which you will need to reconcile
- actual elapsed nanoseconds or microseconds. The VGA needs to do enough reads in a line.
- 6502 instructions and their variable lengths. This is possibly not at all important, surprisingly enough.
- 6502 cycles. This is the unit of memory access for a 6502.

Oh, and another
- SRAM cycles. This is the unit of access for the SRAM.

Because the VGA has strict time deadlines, it's the 6502 which is going to have to yield. Whether that's a fine grain or a coarse grain is up to you.

If the SRAM cycle is enormously shorter than the 6502 cycle (because SRAMs can be very fast, and you might be clocking the 6502 relatively slow) then you should draw your picture according to SRAM cycles. It might be that you can do say 8 SRAM cycles within the time of one 6502 cycle, in which case maybe your VGA can read 7 bytes and then let the 6502 have one access. In this case, the 6502 has a regular clock and no RDY.

If that's not the case, then the 6502 will need to be held up, from time to time, because the VGA needs accesses. There are two ways to do this: giving the 6502 fewer clock edges, on the one hand by stalling the clock or by pulse-swallowing (which is pretty much the same thing), and on the other hand by using RDY to make the 6502 ignore clock edges, or to treat multiple cycles as one (which is again pretty much the same thing.)

One note: an older NMOS 6502 and a modern CMOS 6502 have an important different. The modern chip treats RDY the same for every cycle, and that's nice and easy. The older NMOS chips will not honour RDY for write cycles, and that's a whole different story. It can be dealt with, using the observation that the NMOS 6502 will perform at most three consecutive writes before an inevitable read.

It's also possible that your FPGA design skills are such that you are willing to put more complexity in there, reading intermittently into a buffer which you can prove will never empty, and serving pixel data from that buffer. But usually these schemes are designed and implemented with a regular schedule of operations.

Hope this helps. Keep us in the loop!


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 13, 2022 8:59 pm 
Offline

Joined: Fri Mar 11, 2022 11:25 pm
Posts: 15
Thank you for all of the helpful replies! I didn't get any notifications from the site so I had no idea there were so many responses :-)

I did a little bit of research over the weekend and I've came to the conclusion that the RDY/BE halting (and Apple 2) approach would be helpful it I was going to share the same physical memory on the same bus.

I had look through my old Vic 20 books to get an idea of what the most complex 6502 instruction would be (as far as RAM access is concerned), and I think INC $xxxx,Y is a good example. This is an instruction that reads from RAM, increments and then writes back to RAM in 7 CPU cycles.

My VGAController reads my RAM and buffers a line of pixel data - and the buffer is more or less full during the horizontal porch and ready to be emptied during the horizontal active display time. For a 640 pixel display, it takes 800 pixel clocks to do a line. 160 of these clocks I am filling up my buffer and roughly 640 clocks I am emptying the buffer. There is a slight overlap where I'm reading from the buffer at the same that I'm writing but that's no big deal. What I'm trying to convey here is that I'm only reading my RAM for 1/5th of the time during one horizontal scan line, so I have plenty of time to do other things.

Tell me if I'm crazy, but I'm thinking I should be able to easily clock the CPU by generating a 5MHz clock - basically allowing the 6502 to have a fifth of the VGA clock (5 lots of 200ns per line) to do it's thing. The VGA display will be asynchronously popping pixel data from the FIFO so it shouldn't be affected by the 6502's time slices. The only difference for me is that instead of reading 320 words in one go, I'll be interleaving the VGA framebuffer reads with 6502 RAM access.

Currently:
Code:
0.......porch..........159  | 160.....................................active.....................................................800
(read 320 VGA words)        |
                     .....FIFO full
                            | pull word (2 pixels) every even X position from FIFO


Rather than providing direct connection between the 6502 and my RAM (in the classic sense), I'm thinking of exposing an "emulated" interface to the 6502 so that it thinks it is talking directly to RAM but will in fact be talking to my SRAM Controller inside the FPGA. This way I can receive read/write requests from the 6502 and process them when a time slice becomes available. Likewise, I should be able to read RAM when the 6502 requests it, throw the result into a data register and continue processing the VGA framebuffer reads.

Proposed:

Code:
0.......porch..........159  | 160.....................................active.....................................................800
(read 80 VGA words)
            | (6502 slice)
                        | (read 80 VGA words)
                                      | (6502 slice)
                                                | (read 80 VGA words)
                                                             | (6502 slice)
                                                                      | (read 80 VGA words)
                                                                                 | (6502 slice)
                                                                                                         
                                                                                                       |  (6502 slice)
                            | pull word (2 pixels) every even X position from FIFO


Above diagram is not "to scale" lol but hopefully it will give everyone (including me) and idea of what I'm doing. If I'm right, the 6502 will not need to halt or yield at all - it will get its dedicated cycles and these time slots will allow suffcient time for a RAM read or write via my SRAM Controller.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 13, 2022 10:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I'm intrigued by the possibility but I don't understand the plan! How many microseconds in a scanline? How many RAM cycles is that? How many 6502 cycles?

(I'm a bit worried that you're still talking about complex 7 cycle 6502 instructions - I don't believe that can be relevant, so one of us is, I think, missing something.)


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 12:14 am 
Offline

Joined: Fri Mar 11, 2022 11:25 pm
Posts: 15
BigEd wrote:
I'm intrigued by the possibility but I don't understand the plan! How many microseconds in a scanline? How many RAM cycles is that? How many 6502 cycles?

(I'm a bit worried that you're still talking about complex 7 cycle 6502 instructions - I don't believe that can be relevant, so one of us is, I think, missing something.)


It probably doesn't help that I'm from a software background :-)

My VGA controller clock is 25Mhz. If there are 800 "pixels" per scanline (including the porch/sync) then each scan line is 800 * 40ns = 32000ns (I think).

If I want to run the CPU at 5Mhz, 1 cycle would be 200ns (I think). I'm using a WDC6502 and making the assumption that I don't need a 50/50 duty cycle on the 6502 - only because of playing around with single stepping the CPU (thanks to Ben Eater's videos). Hmmm.. so if I counted the VGA clock cycles, I'd have to generate the 6502 clock every 5 pixels wouldn't I?

Yep.. I think my original idea was completely wrong. Maybe I made the assumption that each scanline was 25MHz but it's actually each pixel that's written at 25MHz.

I'm doing a bit of hacking with my RAM state machine at the moment to try and support a form of interrupt. My SRAM controller has 2 parts - a command FIFO (takes commands such as write x bytes/word) and RAM state machine that processes each command on the FIFO. I'm trying to add a mechanism so that any multiple read/write command can be interrupted by the 6502 so that it can do its one read or write. 5MHz may well be too fast for my SRAMController overhead.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 8:38 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
This might start to look at bit like Bill's design. Sounds like you can do an SRAM access in 40ns (that's 25MHz) and would like to run your 6502 at 5MHz, which is a 200ns cycle, and compatible with a 100ns access.

In a simple scheme, in 200ns you could do two SRAM accesses for the VGA and one for the 6502. That gives the VGA reads at a rate of 10M per second.

You could slice it up finer: four SRAM accesses for the VGA, then a final access for the 6502. That gives the VGA reads at a rate of 20M per second. You'd still need to have the 6502 see and use a regular 50% duty cycle clock, if it were only rated to 5MHz, but it's quite possible to do the actual access in the final 40ns of a clock cycle.

If the 6502 were rated at 12.5MHz or more (which in practice just means a modern part with a 14MHz rating, each period would need to be 40ns or more, and so a 160ns/40ns duty cycle would work.

If you have the pins and the parts, you could use 16 bit wide SRAMs, and the VGA gets double the rate. The FPGA or glue would need to steer the 6502's byte accesses to the high or low parts of the word.

Bill's scheme of 4 bits per pixel also doubles the effective pixel fetch rate of the VGA.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 10:26 am 
Offline

Joined: Fri Mar 11, 2022 11:25 pm
Posts: 15
BigEd wrote:
If the 6502 were rated at 12.5MHz or more (which in practice just means a modern part with a 14MHz rating, each period would need to be 40ns or more, and so a 160ns/40ns duty cycle would work.

If you have the pins and the parts, you could use 16 bit wide SRAMs, and the VGA gets double the rate. The FPGA or glue would need to steer the 6502's byte accesses to the high or low parts of the word.

Bill's scheme of 4 bits per pixel also doubles the effective pixel fetch rate of the VGA.


Yeah I am using a pair of 512k AS6C4008 chips. That allows me to pull 16 bits at a time for the VGA. I'm using the WDC6502 which I think it rated up to 14Mhz but I'd be happy with it running at 1MHz for now :-) I'm using 8 bits per pixel so I can have a 256 colour palette.

Using the 2 RAM chips I decided I would map the 6502 to a 64k range on the lower-byte chip. I'll end up with waste of 64k on the higher-byte chip but it will be far easier to implement that way.

I almost got it working today. My plan was to add the "interrupt" capability in my RAM state machine and prove that the VGA streaming could keep up. It does.. but I have 19 words being read into my FIFO somewhere which don't get read back out, meaning my display has some rubbish pixels at the start and everything else is knocked out by 38 pixels.

The hardest part in debugging this system (for me) is the latencies involved. For example, I don't see the data from the RAM for 2 cycles or so.. then when it's written to the VGA's FIFO, I can't get the data for 2-3 cycles on top of that.. It all gets a bit confusing to me - even looking at the SignalTap timing traces.

I'll figure it out.. there's only so much my brain can cope with in one day :-)


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 12:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Hmm, those latencies must be part of your FPGA design, as that RAM is a conventional SRAM.

Although, it's a 55nS device, so you couldn't get single-cycle access on a 25MHz clock.

If you only connect the 6502 to one of the RAMs, you won't be able to use it to update pixel values in the other. So that has me a little confused.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 1:48 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
BigEd wrote:
If you only connect the 6502 to one of the RAMs, you won't be able to use it to update pixel values in the other. So that has me a little confused.

I think he's using that block of 64K as the main CPU memory, and the rest of the RAM as video memory. His CPU doesn't have direct access to the video memory, it uses a command buffer style FIFO to queue up instructions to the FPGA which updates the video RAM itself.

If it were me I would just give the CPU its own separate RAM IC and keep the two circuits as independent as you can - that's the whole appeal of the FIFO to me.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 3:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Ooh, thanks! Umm, except then there need be no speed linkage between the two?

I think this might be another case of a block diagram being a big help for everyone concerned...


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 6:15 pm 
Offline

Joined: Fri Mar 11, 2022 11:25 pm
Posts: 15
gfoot wrote:
BigEd wrote:
If you only connect the 6502 to one of the RAMs, you won't be able to use it to update pixel values in the other. So that has me a little confused.

I think he's using that block of 64K as the main CPU memory, and the rest of the RAM as video memory. His CPU doesn't have direct access to the video memory, it uses a command buffer style FIFO to queue up instructions to the FPGA which updates the video RAM itself.

If it were me I would just give the CPU its own separate RAM IC and keep the two circuits as independent as you can - that's the whole appeal of the FIFO to me.


Yeah, you got it. 64K of that same RAM will be the CPU's addressable RAM - with no intention of writing to the framebuffer.

You're right, a block diagram would help. Similar to the C64/Vic20 I have two regions of memory inside the 6502's block that the FPGA will use to update the framebuffer with characters and colour. I have an 80x60 text mode display so 4800 bytes will be character data and 4800 bytes will be colour data for each character on the 80x60 grid. Both the 6502 and VGAController can access these regions. Same deal with future blitter operations - I'll be sharing some memory for blitter register access and possibly for loading graphics object data.

I probably will use an external ROM for the 6502 but for now I can load a program via TCL script into the RAM that the 6502 has access to and let it run from there.

I have been wondering why so many people keep talking about accessing video in 1 cycle and the CPU in another cycle. I guess when using RAM at slower speeds this is easy enough but there are definite delays observed for me on the VGA side.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 6:56 pm 
Online
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
SparkyNZ wrote:
I have been wondering why so many people keep talking about accessing video in 1 cycle and the CPU in another cycle. I guess when using RAM at slower speeds this is easy enough but there are definite delays observed for me on the VGA side.


Often half a cycle for the 6502 and half a cycle for the video generation - mostly because the 6502 only accesses RAM for half the clock cycle - the other half is free for video, dma, etc, ... So you never need to stall the 6502 within that time constraint.

A question I might be asking myself if I were doing this might be: How many bytes can I read from RAM during half a cycle of 6502 time? With a 6502 at 12.5Mhz and 25Mhz VGA clock, then it's obviously 2 bytes - is that enough to keep a VGA engine running? Halve the 6502 speed again to 6.25 and you can read 4 bytes per half cycle.. Maybe my thinking is naive though.

Another thought - again, very probably naive, is if that SRAM is inside the FPGA could you make it effectively dual-ported? Dual port SRAM is a thing, but they tend to be small and expensive - I've really no idea how it might be done inside an FPGA though - that's a land of dragons for me (for now, although I have just had delivered some FPGA hardware I'm about to start to play with...)

Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 9:41 pm 
Offline

Joined: Fri Mar 11, 2022 11:25 pm
Posts: 15
drogon wrote:
Another thought - again, very probably naive, is if that SRAM is inside the FPGA could you make it effectively dual-ported? Dual port SRAM is a thing, but they tend to be small and expensive - I've really no idea how it might be done inside an FPGA though - that's a land of dragons for me (for now, although I have just had delivered some FPGA hardware I'm about to start to play with...) Gordon


A year or two back I started following some FPGA VGA tutorials online. They assumed that the FPGA in use had sufficient internal memory. Unfortunately the Cyclone IV that I'm using doesn't. I'm forever running our of LABs. Yesterday I added something to help me debug what was going on and then I couldn't compile the design anymore. I'm really tempted to jump to the Cyclone V but then my RAM daughterboard PCB won't fit. I'd have to redesign that.. cough up more money.. It's never ending :-) I'm hoping I can keep using the FPGA and daughterboard a little longer until I'm ready to create a new PCB with the 6502 onboard.

Haha.. and during the journey I battled with SDRAM too. I had that working as a VGA framebuffer. That was quite a challenge but ironically I had more problems with the SRAM and lot of it was related to the homemade Dupont connectors I crimped. Soon as I made my daughterboard, things became so much more reliable and I stopped chasing ghosts.

The only snag with the daughterboard at the moment is that I have no means of using extra FPGA pins for the 6502. I have some "stackable headers" that I'll fit when I'm ready so I can pop some hookup wires onto the FPGA and breadboard containing my 6502.

Here's a picture of my FPGA and daughterboard in case anyone else is curious:

Image

Which FPGA have your ordered Gordon?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: