Methods to Access Video RAM

ojanhk · Post by **ojanhk** » Fri Jan 07, 2022 8:12 pm

Very interesting thread indeed.
It got me thinking, what about a mix of Double Duty and Write when black ?
If there are two memory chips, one used by the CPU and one used by video circuit, when we reach vblank for example, a counter based circuit copies from main RAM to video RAM on the low Phase of the CPU.

This will allow read/write from the CPU at any time and still be able to run at any frequency.
Obviously the downsides will be that it will take multiple frames to populate the video memory and copy circuit will need to be adjusted depending on the speed ratio between cpu clock and video clock.

sburrow · Post by **sburrow** » Fri Jan 07, 2022 8:17 pm

AndrewP wrote:

This is all in theory, the simulation of the 6502 is not accurate (I really only use the 65816) and the screen size is WAY to big for Logisim to simulate in reasonable time.

Logisim, hm? I just downloaded it, wow! I knew something like this probably existed, but I'll be using this in the future. Very neat! Thanks for that.

And that looks very complex, but I'm sure it's much more simple than at first glance. The use of the framebuffer really improves things for sure.

Neat to see, thanks!

Chad

sburrow · Post by **sburrow** » Fri Jan 07, 2022 8:22 pm

ojanhk wrote:

This will allow read/write from the CPU at any time and still be able to run at any frequency.
Obviously the downsides will be that it will take multiple frames to populate the video memory and copy circuit will need to be adjusted depending on the speed ratio between cpu clock and video clock.

That definitely is my thoughts too. Still, I think we might be on #13 now? Correct me if I'm wrong.

13) Copy while Black. While the video is not being displayed, have something automatically copy from the CPU RAM side to the video RAM side.
Advantages: You can write at any time. You even get a frame buffer for no additional cost!
Disadvantages: You need to copy all the memory from one RAM chip to another in a short time. That might not be fully possible within a single frame, so tearing could occur at lower speeds or higher resolutions.

I'm sure if you use the Double Duty method it will definitely work better, but it might still not be fast enough to copy all of the contents in the time of a single frame.

Thanks! Cool idea. It's like an auto-race-the-beam kindof. I like it!

Chad

Proxy · Post by **Proxy** » Fri Jan 07, 2022 9:47 pm

Dr Jefyll wrote:

Yes, and the read clock and the write clock don't need to be synchronized in any way.
A FIFO is a Dual Port RAM... that doesn't require any address lines!

I could see a FIFO being used to turn #6 "latch and wait" into just "latch", as long as the CPU doesn't fill the FIFO it would never need to be stalled.
but that rasies the question... where does the data go? you would also need to put the address the CPU is outputting into a seperate FIFO so the Video Circuit knows where to write the byte.

AndrewP wrote:

This is all in theory, the simulation of the 6502 is not accurate (I really only use the 65816) and the screen size is WAY to big for Logisim to simulate in reasonable time.
The six multiplexers allow either the address from the 6502 or the counter's address to be sent through to the 1K video memory. The bus transceivers keep the data that's being output to the shift register off the 6502's data line and also allow data to be written into video memory.

what fork of Logisim are you're using? in the HC Edition you have an RGB screen component. it would be enough for testing at a somewhat reasonable speed.
sadly i didn't finish my 65C02 for Digital otherwise i could use that for testing. Digital is an alternative to logisim, it runs much much faster and has a VGA output Component and allows export to Verilog/VHDL, it's what i'm using to design my ATF1508 based VGA Controller.

on a different note, would there be any differences between using Multiplexers or Tri-state buffers for this?

sburrow wrote:

ojanhk wrote:

This will allow read/write from the CPU at any time and still be able to run at any frequency.
Obviously the downsides will be that it will take multiple frames to populate the video memory and copy circuit will need to be adjusted depending on the speed ratio between cpu clock and video clock.

That definitely is my thoughts too. Still, I think we might be on #13 now? Correct me if I'm wrong.

13) Copy while Black. While the video is not being displayed, have something automatically copy from the CPU RAM side to the video RAM side.
Advantages: You can write at any time. You even get a frame buffer for no additional cost!
Disadvantages: You need to copy all the memory from one RAM chip to another in a short time. That might not be fully possible within a single frame, so tearing could occur at lower speeds or higher resolutions.

I'm sure if you use the Double Duty method it will definitely work better, but it might still not be fast enough to copy all of the contents in the time of a single frame.

Thanks! Cool idea. It's like an auto-race-the-beam kindof. I like it!

Chad

interesting idea, but i don't see why the speed of the copying would rely on the difference between the CPU and Video Clock.

I'm imaging it like this:
RAM0 is the Chip the CPU has access to, it's directly connected to the Video Controller and connected to the CPU through a bunch of Tri-state buffers.
during normal operation the CPU can access RAM0 like any other Memory (with some slight latency due to the buffers).
but then either automatically (once a frame) or CPU controlled (via a CPU accessable Control Register) the Video Controller cuts the CPU's access to RAM0 and starts up an internal DMA Controller to rapidly copy from RAM0 to RAM1 (Video Memory).
And since the CPU is completely uneffected it can do other work in the meantime and the copying can be done at the highest speed possible.
once the copying is done the CPU's access to RAM0 is restored and an (optional) Interrupt is send to the CPU to let it know that it can work on the next frame.

I really like the idea of having the copying CPU controlled as otherwise the Video Controller could cut the CPU's access to RAM0 in the middle of a read/write cycle.
Obviously "CPU Controlled" doesn't mean that the moment the CPU sets the "go ahead" flag in the Video Controller that it will instantly start copying, rather the controller only checks that flag once every frame to avoid screen tearing.

Assuming standard 640x480 VGA timings there are a total of 36000 "Blank Pixels" between each frame. at the standard ~25MHz Pixel Clock that would be a total of ~1.44 milliseconds
The easiest copying mechanism would be to just copy 1 byte every pixel clock cycle. which allows for up to 36000 Bytes to be copied.
Which is sadly not enough for bitmap 640x480 * 2 colors (38400 Bytes), but it is enough for bitmap 320x240 * 4 colors (19200 Bytes) or even 640x400 * 2 colors (32000 Bytes).
If you use characters/tiles instead of bitmap graphics you can achieve larger resolutions and color depth with the same bandwidth.
Alternatively you can also increase the bandwidth by using a faster clock for the copying. but then you have to manually check if the amount of bytes you want to copy (assuming it's a constant amount) fits into the 1.44ms window without getting to close to the start of next frame, as it's not guaranteed that the copy clock is in sync with the Video Controller's Pixel clock.

this thread is dangerous, it's giving me too many project ideas and i don't have enough CPLDs for all of them!

gfoot · Post by **gfoot** » Fri Jan 07, 2022 10:29 pm

Proxy wrote:

on a different note, would there be any differences between using Multiplexers or Tri-state buffers for this?

I personally use 74HC590 counters with tristate outputs for video addresses, and 74AHCT245 transceivers for the CPU addresss and data bus connections, rather than multiplexers, and it works very well. I think I found that the multiplexers I have are rather slow - of course you can get faster ones, I just didn't have any to hand. 8-bit ICs also feel a bit more space-efficient on a breadboard.

Proxy wrote:

Assuming standard 640x480 VGA timings there are a total of 36000 "Blank Pixels" between each frame. at the standard ~25MHz Pixel Clock that would be a total of ~1.44 milliseconds
The easiest copying mechanism would be to just copy 1 byte every pixel clock cycle. which allows for up to 36000 Bytes to be copied.
Which is sadly not enough for bitmap 640x480 * 2 colors (38400 Bytes), but it is enough for bitmap 320x240 * 4 colors (19200 Bytes) or even 640x400 * 2 colors (32000 Bytes).
If you use characters/tiles instead of bitmap graphics you can achieve larger resolutions and color depth with the same bandwidth.

I fear though that if you use fast enough RAM or other tricks to be able to copy the whole 640x480 image within the blanking periods from the 800x525 virtual frame, then you could more easily have interleaved the writing with the reading, which may seem scary but is not actually very difficult. I'm not sure what you gain by delaying the actual copy until the blanking period.

Proxy · Post by **Proxy** » Fri Jan 07, 2022 10:50 pm

gfoot wrote:

I fear though that if you use fast enough RAM or other tricks to be able to copy the whole 640x480 image within the blanking periods from the 800x525 virtual frame, then you could more easily have interleaved the writing with the reading, which may seem scary but is not actually very difficult. I'm not sure what you gain by delaying the actual copy until the blanking period.

of course that is also an option.
if you run the copy at the same speed as the Pixel Clock (or twice as fast) you can easily interlace it on the RAM1 side.
the benefit of keeping it inside the blanking peroid is that you completely avoid the risk of screen tearing... as starting the copy too early in the current frame could cause artifacts as data is being overwritten that didn't get to be drawn yet. (assuming it copies bytes faster than the Video Controller is outputting them, which is likely the case if a pixel requires less than 8 bits of data)

and honestly at the resolutions or color depths that do require copying beyond the blanking period, it would likely be much easier to just use the double buffer idea with the 2 RAM ICs.
it's almost the same as copying the contents of the whole IC into the other one, but without actually moving any data and it's really swapping the contents instead of copying them.

AndrewP · Post by **AndrewP** » Sat Jan 08, 2022 6:42 am

sburrow wrote:

And that looks very complex, but I'm sure it's much more simple than at first glance. The use of the framebuffer really improves things for sure.

Neat to see, thanks!

No prob! It's one way of doing things but - this whole thread - not the only way of doing things.

I also forgot to mention probably the most important aspect of using a flipping framebuffer. The video clock and CPU clock can be completely separate! I should not have tied the two together in my example.

I also should have mentioned a fairly big downside. To avoid horrible jittering you have to make sure each framebuffer video memory is fully painted every frame. A partial work around is to halve the framerate (or less). A complete fix is to tell the framebuffers when to flip but that's a bunch more circuitry.

Proxy wrote:

what fork of Logisim are you're using? in the HC Edition you have an RGB screen component. it would be enough for testing at a somewhat reasonable speed.

I'm using Logisim Evolution; my adventures in trying to get the W65C816S simulated are on this forum over here if you're interested.

It has a video display but it's a plotting X,Y one not a CRT style scan-lining one; I'll have to write a component for one at some point to test things. I very nearly went down the Digital route and honestly can't remember why I didn't.

AndrewP · Post by **AndrewP** » Sat Jan 08, 2022 7:06 am

gfoot wrote:

Proxy wrote:

on a different note, would there be any differences between using Multiplexers or Tri-state buffers for this?

I personally use 74HC590 counters with tristate outputs for video addresses, and 74AHCT245 transceivers for the CPU addresss and data bus connections, rather than multiplexers, and it works very well. I think I found that the multiplexers I have are rather slow - of course you can get faster ones, I just didn't have any to hand. 8-bit ICs also feel a bit more space-efficient on a breadboard.

Exactly! 10bit address lines are a bit tricky. It's too much for an 8bit driver (or '590) and requires 2 ICs each (for a total of 8 ) but it's too few for a 16bit driver and leaves 6bits wasted in each. I've drawn it up below:

The real problem here is that sometimes you want the CPU to talk to talk to one chip and sometimes you want the CPU to talk to the other chip. And sometimes you want the video to to talk to one chip and sometimes ... well, you get the picture. That's a total of 4 signals that need to be routed

In the multiplexed version the LVC157s are less than half the size of the LVC16244s and are also less than half the price (these considerations really depend on what you're doing). The LVC157 chips have a propagation time of about 3ns and that is amazing(!) to me but not useful if you're not using LVC ICs or TTL levels. I also fully realise I'm a bit of an outlier on this forum by using LVC.

Dr Jefyll · Post by **Dr Jefyll** » Sat Jan 08, 2022 2:30 pm

AndrewP wrote:

I mostly use LVC chips and they won't play nicely with HC chips [...]

You're being a little too hasty. This conclusion is incorrect for certain chips and certain circumstances.

For starters, some LVC chips are rated for 5 Volt Vcc, and thus can input directly from and output directly to HC logic running on 5V. This is true for many/most/all? members of the LVC1G family; see my anycpu.org post, Tiny, superfast gates rival programmable logic.

And even LVC chips running on 3.3V Vcc frequently don't present a problem. See the attached document from TI (excerpt included).

-- Jeff

scba010.pdf: (1.45 MiB) Downloaded 40 times

AndrewP · Post by **AndrewP** » Sat Jan 08, 2022 4:06 pm

Dr Jefyll wrote:

You're being a little too hasty. This conclusion is incorrect for certain chips and certain circumstances.

Yup, sorry that's true - and a misleading statement on my part. I should have phrased it as: HC chips might not play nicely with LVC chips especially when 3.3 LVC is driving 5V CMOS.

I know this: I have an LVC74 flip-flop driving an HC4040 (and it's is fine because the flip-flops output is almost exactly 3.00V once it's stabilised). Please do call me out when I make blanket statements; even if I'm not confusing anyone else it still helps me.

Dr Jefyll · Post by **Dr Jefyll** » Sat Jan 08, 2022 5:04 pm

AndrewP wrote:

Please do call me out when I make blanket statements

Huh? Now it's my responsibility??

Okay, I'm making a joke. And we're all human; slipups can happen. But there's a saying that recommends putting your mind in gear before you put your mouth in motion...

Quote:

I have an LVC74 flip-flop driving an HC4040 (and it's is fine because the flip-flops output is almost exactly 3.00V once it's stabilised

Glad you've gotten this working, but you're clearly operating on the fuzzy edge. In contrast, HCT4040 would be a choice you can rely on.

-- Jeff

sburrow · Post by **sburrow** » Sat Jan 08, 2022 8:24 pm

Dr Jefyll wrote:

But there's a saying that recommends putting your mind in gear before you put your mouth in motion...

I do that all the time! And you Jeff have been very patient with me. Thank you.

On the side, during this discussion, I've been proto-designing a "Latch" board with "Double Duty" access. Ok well, the board's basically done and I'm running auto-router now

320x200 * 16 colors fits on 32K of RAM. I read from the RAM on the high clock, and write to the RAM on the low clock (if there's something in the latch).

I had room for one more IC and one extra enable line not yet used. So I decided to be able to software select using 4-bit or 8-bit parallel load methods. The idea is that I could simply plug this into a VIA or maybe some dedicated I/O space, rock and roll.

Still have to check it all over a bazillion times, but just giving y'all a small update on my findings from this discussion. It has been helpful! Thanks, and keep it up.

Chad

AndrewP · Post by **AndrewP** » Sat Jan 08, 2022 8:41 pm

Dr Jefyll wrote:

AndrewP wrote:

Please do call me out when I make blanket statements

Huh? Now it's my responsibility??

... but there's a saying that recommends putting your mind in gear before you put your mouth in motion...

and it's a good sentiment though not one I've always managed.

I've been trying to remember how I ended up with HC4040s and for the life of me I think of where they came from. I know it was in a time before I started using Mouser (who stock the HCT versions) so I must have found them locally somewhere. Anyway - and slightly on topic - I was trying to use them with a Raspberry Pico to bang out a VGA signal. Very similar to what sburrow is doing here. It didn't work but it's a big part of why this thread interests me and also when I realised I needed to properly simulate what I'm trying.

[EDIT] - Whoops didn't mean to post under you there. I got distracted by counters. With Dr Jefyll mentioning the HCT4040 and gfoot mentioning the HC590 and myself trying to find an HCT590 (I couldn't) I thought I'd try and add a bit to discussion by listing some interesting counters. Then I thought someone had mentioned a '193 in this discussion but I couldn't find it.

Anway, one of the problems I found I had with both the '590 and '4040 is that they're not presettable. If you're going to need to use counters for memory copying then (obviously) you'll need to load an address into them before you start and for that I've found the LVC161 and LVC163. Both are presettable 4bit up counters. The first has an asynchronous reset and the second resets on a rising clock; they're also both very, very fast even with the dodgy signals I've been feeding them they've worked solidly. The HCT193 is interesting because it's a 4bit presettable and up/down counter. Useful but not so much for video signals or blitting.

The HCT40103 is one I only ran into a couple of weeks ago. It's an 8bit presettable down counter with an output indicating when it has hit zero. I haven't played with it yet but figure it will be really useful for copying a constant number of bytes multiple times (say the width of a sprite).

Hopefully there's something new and exciting above!

sburrow · Post by **sburrow** » Sat Jan 08, 2022 8:54 pm

AndrewP wrote:

It didn't work but it's a big part of why this thread interests me and also when I realised I needed to properly simulate what I'm trying.

Haha, yes.

When I was using my Raspberry Pi for some other project, (and it's ok to say I was interpreting it wrong), I was getting various speeds out of the GPIO pins even when using C++ without the GUI desktop. As in, if I wanted to simulate something fast, AND accurately, I don't think it was achievable. Again I could be wrong, or it could have been the code. Who knows.

Anyways, a couple days ago I decided to search details on this VERA chip, and found this:

https://github.com/commanderx16/x16-doc ... ference.md

They are indeed using the Latch method here. I think they dedicated 32 addresses just for this chip, and from there they 'latch' data into it. It kind of reminds me of how a VIA chip works. It seems to even have audio capabilities.

Just some findings. Thanks!

Chad

gfoot · Post by **gfoot** » Sat Jan 08, 2022 9:53 pm

AndrewP wrote:

The HCT40103 is one I only ran into a couple of weeks ago. It's an 8bit presettable down counter with an output indicating when it has hit zero. I haven't played with it yet but figure it will be really useful for copying a constant number of bytes multiple times (say the width of a sprite).

I used 40103s extensively in my second video output circuit, instead of using any upcounters, to reduce the amount of long distance wiring, shunting counter outputs around. They all reset at the start of a line, then counted down from different configurable values to time until the start and end of the hsync pulse, and the start and end of the visible area, and the whole thing again vertically. I activated D flipflops at these transition points, and had separate 4040s that only ticked up during the visible region, for address counting (which is worth doing separately anyway for a few reasons).

Details are here if you're interested: https://github.com/gfoot/compvideo6502

I made a VGA sprite rendering layer more recently, and for that I used 163s preloaded with a negative values to count up until the end of the sprite (in each direction). So the sprite was active for 256 pixels in each direction, for 12-bit count values from -256 to -1. I used the TC signal to determine the active period. I only have this "documented" in video form, the schematic is here: https://youtu.be/A9b6BdymlyE?t=55

Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM

Re: Methods to Access Video RAM