6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Jul 02, 2024 10:48 am

All times are UTC




Post new topic Reply to topic  [ 47 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Wed Jan 05, 2022 2:20 pm 
Offline

Joined: Sat Oct 09, 2021 11:21 am
Posts: 707
Location: Texas
Hello everyone! While I wait for boards and parts to come in for my current VGA project (viewtopic.php?f=12&t=6914), I've been thinking of different ways to access video RAM. I'm going to make a list, perhaps you know of another way I haven't mentioned. The idea is that the processor and the video display BOTH need to access the same RAM somehow. How is this done?

1) Dual Port RAM. Those big hunkin' chips with duplicate ports.
Advantage: All the work is done, just connect both ends and off you go!
Disadvantage: They often are so small on memory size that they would have to be limited to text character data, or perhaps a small monochrome graphic display.

2) Frame Buffer. Basically having twice the amount of RAM, and "ping pong" who gets control of which chip. CPU writes to A but video reads from B during one frame, then CPU writes to B and video reads from A on the next frame.
Advantage: You can write to video RAM any time you like. It also should not cause tearing.
Disadvantage: The software side is tricky since you have alternating images to deal with.

3) Double Duty RAM (like the Apple II). One single RAM chip, but processor accesses on the high phase, and video accesses on the low phase. This is what Bill is doing here: viewtopic.php?f=4&t=6955
Advantage: You can write to the video memory any time. You do not need to have duplicate your RAM, only one image to deal with.
Disadvantage: You must run the CPU at the speed of the 25.175 MHz necessary for VGA, or a dividend of it using latches/flip-flops. This is still very fast for me personally. Also you could get screen tearing.

4) Half FPS, Cached memory. I don't have a good name for this. You simply draw the same image twice in a row, thus not 60 FPS but now 30 FPS. While video is reading the RAM, it is also storing it in a duplicate cache RAM, for the second frame. After the first frame is done, send an interrupt, and you now have almost 1.25 frames worth to write to RAM.
Advantage: You won't have tearing. You have ample time to write to the RAM.
Disadvantage: You can't write anytime you like. You have to have double your RAM. The screen refresh rate is halved (but that's no issue with modern video games btw).

5) Write when Black. This is the style I am using on my VGA project. When off of the visible screen, send an interrupt to tell the CPU it's time to write to video RAM.
Advantage: It's easy to implement. Single RAM chip.
Disadvantage: Depending on how much 'black' you have (including porches and syncs), you might not have a lot of time to write to the RAM. The minimum seems to be 1429 uS.

6) Latch and Wait. Here you don't actually access the RAM directly, you just send data to a latch and wait for video not use the RAM anymore. That could be in the 'black' or 'double duty' style. If you are familiar with The 8-Bit Guy, his "Vera" chip on the Commander X16 is similar to this design.
Advantage: You can write at any time, supposedly.
Disadvantage: If you intend to write at any time, you need to have lots of logic to make sure no data is lost. Also without direct access to RAM you will have to have even more glue logic to keep the process of writing fast and easy to use.

7) Halt the Processor. This is what Ben Eater uses on his "worst video card" series. Use the RDY and BE pins on the 6502 to literally stop all processing while drawing.
Advantage: You write whenever you want. It's not too hard to implement.
Disadvantage: You are REALLY limiting your processing speed, as shown in his videos.

There of course could be alterations to "enhance" one of these methods. For example, you can add some shift registers or latches/flip-flops to use less color bits but also half the speed required to access them.

Any thoughts? Have you used one of these methods, a variant of one of these, or something else entirely? This is not a comprehensive list I'm sure.

Thank you everyone! Just discussion here.

Chad

EDIT:

8 ) Race the Beam. The processor manually draws what is going to the screen. This is what the Atari 2600 did.
Advantage: Um. No need for this discussion at all?
Disadvantage: So many...


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 3:16 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1084
Location: Albuquerque NM USA
I think race the beam should be item #8. Instead of halting the processor (item #7 on the list) and have a dedicated video hardware taking over, beam racing uses the same processor to drive video thus eliminates the dedicated video hardware. Like item #7, it is slow since 90% of its throughput is used to drive video.


Item #9 is Video RAM. VRAM is similar to dual port RAM except it is a big memory, comparable to DRAM in size. The drawback is VRAM, like DRAM, is harder to interface.
Bill

Edit, OK you've listed "race the beam" as item #8 while I was typing.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 3:59 pm 
Offline

Joined: Sat Oct 09, 2021 11:21 am
Posts: 707
Location: Texas
I haven't thought of actual VRAM, interesting, thanks Bill.

And I remember seeing that post, very neat! I think that leads into...

10) Dedicated GPU. Have a second 6502 or similar, and have IT race the beam! I don't know exactly how they would talk though.
Advantage: This leaves your first CPU ready for computation.
Disadvantage: Complicated hardware, and it still leaves a CPU needing to run at 25.175 MHz in the case of VGA.

Chad


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 4:31 pm 
Offline
User avatar

Joined: Tue Jul 17, 2018 9:58 am
Posts: 104
Location: Long Island, NY
You could interface to your dedicated beam-racing second 6502 with some dual-ported RAM :)


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 5:02 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
One question, or variable, is how you intend or expect the CPU to be reading the RAM, or indeed writing it.

That is, you can make an architecture where the graphics subsystem does the work of printing characters, drawing lines, plotting points, overlaying sprites, even reporting collisions. The CPU tells it what to do, and it does it. Never does the CPU read the video memory.

Acorn's architecture for the BBC Micro family allows for a second processor which becomes the application processor. The original 6502 machine becomes the I/O processor and does all the VDU writes (and sometimes reads) and the operating system provides the facilities for the application to do the things it needs. The interface between the host and the second processor can be as simple as a serial line, or a pair of VIAs, or a Tube chip which contains 8 FIFOs. (Revaldinho has also built a two-FIFO board which can serve the same purpose.)

I'm not sure you've distinguished two cases or maybe three cases
- the CPU only runs in the non-visible portions of the frame
- the CPU will be stalled until the next non-visible portion if (and only if) it makes a video RAM access
- the CPU will be stalled, if it makes a video access, if the video system currently needs access.

That third option is something like the C64, I think. The Apple III might be similar: the video system has 16 bit wide access to RAM and so doesn't need the RAM all the time.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 5:17 pm 
Offline

Joined: Sat Oct 09, 2021 11:21 am
Posts: 707
Location: Texas
Agumander, that works. Yep!

BigEd wrote:
One question, or variable, is how you intend or expect the CPU to be reading the RAM, or indeed writing it.

- the CPU only runs in the non-visible portions of the frame
- the CPU will be stalled until the next non-visible portion if (and only if) it makes a video RAM access
- the CPU will be stalled, if it makes a video access, if the video system currently needs access.

That third option is something like the C64, I think. The Apple III might be similar: the video system has 16 bit wide access to RAM and so doesn't need the RAM all the time.


Great post, thanks Ed! I guess I'm thinking of a system like Bill's in particular. He has some dedicated portion of the 64K memory map (4K in his case), and uses RAM banks to reach all of it. This particular video RAM would be separate from the CPU's normal RAM, generally, and I wasn't even thinking of having the CPU read from that RAM ever!

Thing is about a dedicated CPU for graphics processes (the ones you listed were advanced!) is that is complex. Not bad. Again this is just a discussion here and that is an option indeed!

Your three bullets are similar in design: Stall the CPU. Deciding when or how that happens is the trick. Am I right?

Yes, I can see having another option now:

11) Wide RAM, Slow Access. If the RAM's word width is 16-bits or so (or two 8-bit chips), obviously not all of that needs to be dedicated to a single pixel! Shift registers and/or latches to keep the data until it's ready for use. Having 4-bit color would allow for 1 cycle video read and 3 cycles CPU write.
Advantage: Could theoretically write at "any time". Coupling this idea with 30 FPS could really help a slow CPU access the RAM at the right time.
Disadvantage: There are times when the CPU can't access that RAM, either need to "stall" for a cycle, or have more logic to fix that. Lower color definition is inevitable.

Thanks Ed!

Chad


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 6:39 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
Yes indeed, in those stall-the-cpu examples, the difference is in when, or how fine-grained, the stalling is.

Speaking of tearing and racing the beam, even in a conventional setup it can be advantageous for the CPU to know when the start (or end) of visible display area is, or to get an interrupt from the vertical sync, so the application can schedule updates to areas of the screen not presently being displayed. It's common for the application to use a timer (such as from a VIA) to split the frame time into bands, to operate on each band when not being displayed, but the timer needs to be synchronised to the video.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 7:15 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Pretty much all my video designs so far have been #3. e.g. my latest runs the CPU at a quarter of the VGA pixel clock, and the graphics side reads four pixels of data from video RAM while the CPU clock is low, then gets off the video buses to let the CPU have access towards the end of the clock cycle. It's restrictive but simple in its own way, and the speed you get is pretty good. It's just hard to change that speed, eg clocking the CPU higher or lower. The other issue is that with the components I'm using, the timing is questionable - the bus transition times and access time on the RAM are rather long. I made a spreadsheet to analyse that and talked about it a bit in a video a while ago.

I am planning to move to a system more like #6 or #10, for various reasons. One is speed - I'm convinced that the 6502 will fare better in this environment, as it really lacks the addressing modes to efficiently access a large framebuffer anyway.

Another is interface simplicity - right now I have 8 data lines and about 20 address line going to the video interface, plus timing/control signals, and I want to condense that down to maybe just 8 data lines and 3 or 4 address/control lines, to make it more pluggable and easier to fit into existing systems.

The third reason is just to decouple the CPU clock, and be more flexible about that. The 6502 can only access an IO device about once every three cycles, at best, so if I can run it at a faster clock rate it will be able to make better use of the bandwidth available.

Finally I'm generally interested in implementing some forms of hardware acceleration, like block copies, fills, orthogonal lines, etc. This is like building a GPU from scratch. So this will also be a prototyping base for that. It will start with latches like in EGA/VGA cards, to allow the 8 bit CPU to work better with wider data - things like writing a constant colour to multiple pixels, or writing an individual pixel without having to read-modify-write on the CPU. This also allows more efficient block copying, as the CPU can use one operation to copy 4/8/whatever bytes of data.

Exciting possibilities, I only wish I had more time!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 10:22 pm 
Offline

Joined: Sat Oct 09, 2021 11:21 am
Posts: 707
Location: Texas
Ed, you are right on. My project uses the /NMI interrupt at the beginning of the black, and I *know* I have 8500 uS until the next draw period happens. Whether I choose to draw during that time or not is up to me. Using a VIA's timer is also a great option!

George, you are right on to my issue as to why I'm not jumping into the Double-Duty option: I would be tying my processor to the VGA clock somehow. I like that you use it at quarter speed, that at least makes it manageable (in my eyes).

Yes, yes, yes, and yes to your reasoning. I've been thinking about the Latch and Wait option a lot this past month. It was one of my prototype designs along the way. Though you would have to manage the control lines in some way, reusing that data bus helps I/O pin-count a lot. Also, like you said, it can plug into existing systems much easier. In fact, you can even interface it with a VIA! Perhaps those advantages help it's more complex logic a bit. When I was going down that road, I needed counters for the sync signals (or at least for the EEPROM sync signals) and also horizontal and vertical counters for where you are writing to in RAM. Lots of counters! Not to mention the transceivers and latches. I think that as long as you are using the "Double Duty" style, where it's able to write whatever is in the latch on the high/low clock phase, AND the CPU clock is less than 12 MHz, you should be able to write whenever you want. So not "Latch and Wait" anymore, but just "Latch".

Good info! Thanks.

Chad


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 05, 2022 11:26 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
sburrow wrote:
When I was going down that road, I needed counters for the sync signals (or at least for the EEPROM sync signals) and also horizontal and vertical counters for where you are writing to in RAM. Lots of counters! Not to mention the transceivers and latches.

The simple way is to replace all the transceivers I already have on the address bus with tristate DFFs, and keep the rest of the circuit basically the same as before. The CPU can load them directly one by one, to at least benefit from not having to wire all the address lines through.

To be fast though it really needs auto-increment, at least, and you could go more or less complex with that. The benefit though if that I think as soon as you're writing more than about half a dozen bytes, the cost of the CPU loading an address into the latches and then writing a line of bytes using absolute (or zero page) STA is going to be much less than doing the same indirectly via an address in zero page (which still needs to be initialised).

Quote:
I think that as long as you are using the "Double Duty" style, where it's able to write whatever is in the latch on the high/low clock phase, AND the CPU clock is less than 12 MHz, you should be able to write whenever you want. So not "Latch and Wait" anymore, but just "Latch".

Exactly. In fact the CPU clock can be a lot faster. The 6502 takes at least 3 clock cycles to write the next byte, even without the overhead of calculating what that byte is going to be or loading it from elsewhere. So you should be able to run the CPU clock 3 or more times faster than the video bus and still not really need to worry about waiting.

David Clifford has implemented a non-6502 design that is asynchronous, an interesting feature of his design though is that he shadows the video memory, so that when the CPU tries to read from it, it reads the shadow instead and doesn't need to synchronise the read with the video circuit at all.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 06, 2022 9:17 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1438
Location: Scotland
gfoot wrote:
David Clifford has implemented a non-6502 design that is asynchronous, an interesting feature of his design though is that he shadows the video memory, so that when the CPU tries to read from it, it reads the shadow instead and doesn't need to synchronise the read with the video circuit at all.


The BBC Master did this so you could invoke a 20KB video mode without it affecting your program RAM. The down-side was that pixel poking no-longer worked, but that was OK for the vast majority of software that did it properly by calling the OS routines to do that for them.

I don't know if the timing was decoupled from the main RAM though - I suspect not, but as it was using alternative half cycles anyway it was probably OK.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 06, 2022 9:38 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
Hmm, I'm not sure the Master has a shadow in this sense... although the Beeb816 does have shadow modes in this sense. (The Master doesn't need this, as the RAM is fast enough to serve CPU and video at the same time. What the Master does have, and calls it a shadow mode, is an interesting and dynamic memory map, so in certain modes the important OS code can access video memory while all other code sees a larger plain memory space.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 06, 2022 1:27 pm 
Offline
User avatar

Joined: Sun Nov 07, 2021 4:11 pm
Posts: 101
Location: Toronto, Canada
[quote="gfoot"]To be fast though it really needs auto-increment, at least, and you could go more or less complex with that. The benefit though if that I think as soon as you're writing more than about half a dozen bytes, the cost of the CPU loading an address into the latches and then writing a line of bytes using absolute (or zero page) STA is going to be much less than doing the same indirectly via an address in zero page (which still needs to be initialised).[quote]

I've been thinking about this as well, but my feeling is that the autocounting / any advanced functionality would be best left to a GPU-like device—I've been wondering whether a beefy MCU like the PIC18F46, which has 36 GPIOs, runs at 64MHz and is pretty inexpensive, would make a good candidate. Doing this using only discrete logic seems like it would be pretty complex, and I'm not sure that the payoff would be worth it in the end.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 06, 2022 2:04 pm 
Offline

Joined: Mon Feb 15, 2021 2:11 am
Posts: 100
BigEd wrote:
One question, or variable, is how you intend or expect the CPU to be reading the RAM, or indeed writing it.

That is, you can make an architecture where the graphics subsystem does the work of printing characters, drawing lines, plotting points, overlaying sprites, even reporting collisions. The CPU tells it what to do, and it does it. Never does the CPU read the video memory.



The extreme example of that would be the IBM Professional Graphics Controller, which featured an additional 8088 CPU, a ROM with the graphics routines, and its own 384KB of RAM. One of the key designers of that went on to co-found nVidia.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 06, 2022 2:06 pm 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
CountChocula wrote:
gfoot wrote:
To be fast though it really needs auto-increment, at least, and you could go more or less complex with that. The benefit though if that I think as soon as you're writing more than about half a dozen bytes, the cost of the CPU loading an address into the latches and then writing a line of bytes using absolute (or zero page) STA is going to be much less than doing the same indirectly via an address in zero page (which still needs to be initialised).
Quote:

I've been thinking about this as well, but my feeling is that the autocounting / any advanced functionality would be best left to a GPU-like device—I've been wondering whether a beefy MCU like the PIC18F46, which has 36 GPIOs, runs at 64MHz and is pretty inexpensive, would make a good candidate. Doing this using only discrete logic seems like it would be pretty complex, and I'm not sure that the payoff would be worth it in the end.


Beefy MCUs seem to work very well for video generation if you are running off of the on-board CPU. The first generation Maximite used a PIC18 IIRC and there are many other examples using the ESP32, Pi Pico, etc.

I have not seen a good example of using an MCU for graphics coupled with an external master CPU - how do you get the 6502 to efficiently write to the RAM onboard the MCU. Directly interfacing with the 6502 bus eats cycles. You end up attaching a several hundred Mhz MCU to the 6502, which seems like cheating. The 6502 could send data over SPI or similar, but that is not ideal for graphics/games. It is fine for pure text modes.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 47 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: