AndrewP wrote:
But how do you achieve vertical quadrupling? Do you redraw the same row 4 times? I haven't solved this yet and just assumed I've have to re-latch the address counter back to what it was at the start of the row three times, and then allow it to continue counting to the next address at the end of the fourth row. Or is enough to just divide all the vertical signal times by 4 and let the monitor sort it out?
i can tell you how i've done it with my current VGA card:
I take the values of the X and Y counters used for the sync signals to calculate the address to read data from.
you're just converting a 2D index into a 1D one, so the math is simple to do in hardware: ((Y * bytes_per_row) + X) + base_address. except not the full width of the Y counter is used. depending on how many lines you want to duplicate you cut off the bottom n bits (basically shifting Y to the right by n bits) before using it for the math. that way the repeating accesses just come naturally.
the left over bits are then used to decide when the final scanline is being drawn to allow the memory access to "escape the loop".
what i mean is that when you're drawing the n'th pixel you're accessing memory for the n+1'th pixel. so when you're at the end of the current scanline the video circuit has to decide if it should load the first pixel of the next scanline, or the first pixel of the current one (which is what you want for drawing the same scanline multiple times in a row).
so that's what the bottom n bits are used for, when all of them are 1, the video circuit is at the last duplicated scanline and is therefore allowed to read the first pixel of the next line at the end of the current one.
i hope that was somewhat understandable, i could show you a screenshot of the circuit i use in my CPLD, but honestly i think that's even less useful
Attachment:
javaw_BGWgPiqIdb.png [ 471.4 KiB | Viewed 4403 times ]
i'm only duplicating a line twice for 320x200, so PY1 refers to the bottom bit of PY (Pixel Y Counter) while PY8 are bits 1-8 of PY, the AND gate on the upper left corner is used for the scanline access wrapping, and the other larger AND gate is for frame wrapping (so the last access loads the first pixel of the next frame).
AndrewP wrote:
A design I've doodled on paper but only partially tested is to use 74HCT40105 4bit FIFOs. Put 8bits of data in two of the '40105s and put 16bits of address in another four '40105s. Then tick data out of all the '40105s at a constant 20MHz (assuming horizontally pixel doubled 800x600). This allows the video data to be read out when the video clock is low and written in when the clock is high (or vice-versa).
FIFOs are pretty useful when it comes to video as you can have different rates for loading bytes and pushing stuff to the screen. for example if cycle interleaving with the CPU is too difficult to pull off, you could probably just pause the CPU every now and then, burst load a bunch of bytes into the FIFO, and then unpause the CPU again.
if you're using a IDT720x or similar FIFO with a "Half Full" output you could have the burst happen automatically when the FIFO is only half filled, to then top it off.
that also reminds me of a concept i made for a fully software controlled VGA controller. ie a seperate 65C02 running at 20-25MHz that manually loads some IDT720x FIFOs with data which is then used by a VGA circuit to draw pixels on a screen.
more specifically there would be 1 FIFO for each duplicated scanline which are all wired together on the CPU side so all of them get loaded with the same data in parallel. then on the Video output side the first scanline uses the first FIFO, second scanline uses the second FIFO, etc. that way the CPU doesn't have to load the same data multiple times in a row.
for example (using 640x480 as a base), for 320x240 (or 320x200) you need 2 parallel FIFOs. for 160x120 (or 160x100) you would need 4.
main question is, can the CPU keep up with the video bandwidth? for 320x200 @ 4bpp (bits per pixel) a whole line of pixels is 160 Bytes and (assuming the CPU runs at 25MHz) you'd had 1600 clock cycles (800 cycles per full scanline) to load those into the FIFO (or 10 cycles per byte). but there is also the vertical blanking period, which helps.
so averaging this out to a whole frame you need to load 32000 Bytes within 420000 Cycles which is 13.125 Cycles per Byte, not much better and would require a tight copying loop.
160x100 @ 4bpp is a lot more feasible but the resolution is also very... economical. running the CPU at 12.5MHz (which is also easier to achieve) you get 210000 Cycles per frame to load 8000 Bytes, which comes out to 26.25 Cycles per Byte.
of course if you use a higher bit depth, like 8 bpp, then you're back on 13.125 Cycles per Byte as you need to load twice the amount of data per frame.
then again, at those resolutions it seems WAY easier to just get some cheap
8kB Dual Port RAM ICs and have the CPU and Video side seperate and doing their own thing. (
16kB is also an option but it's twice the price, i have a VGA card with 4 of those for 64kB of VRAM for 320x200 @ 8bpp, it works but is fairly slow to draw on due to lack of hardware acceleration)