plasmo wrote:
I'm packing 3 lines in 256 bytes. Each line is 80 bytes, so there are 16 unused bytes per page that may be somehow useful when doing horizontal scrolling. If picture is in the center with blank field all around, I can do a "fake" horizontal scroll up to 16 bytes, but I don't know how to do it in more general cases.
hmm, sadly you can't make use of the natural wrapping of the Y register because you have multiple lines in a single page. and you have no cycles between reading bytes to do any checks to wrap after n bytes in software.... hmm.
only thing i could think of would be to replace the whole line reading routine, do it with absolute addressing (either X/Y indexed or not) instead of indirect and use self-modifying code to set the addresses to point to the correct bytes.
then you would need to use the small amount of time you have in the H-blanking period to update the addresses to point to the next line and take into account how much the screen is scrolled horizontally. this is obviously a lot more work compared to updating 2 bytes in ZP, but gives you complete horizontal (and vertical) scrolling.
plasmo wrote:
I've always learn a lot from you designs, so please feel free to describe your approach here.
oh boy, this one is gonna be a bit of a doozy.
the main difference to your design is that the CPLD takes care of both H-Sync and V-Sync and there are 2 FIFOs between the CPLD and CPU bus for loading video data.
so let's first look at the block diagram i made that roughly describes the system:
Attachment:
draw.io_PtSHM7Wn3g.png [ 109.79 KiB | Viewed 2646 times ]
Because this is an expansion card to my 65816 SBC (ie the "primary" system) i needed some kind of interface between the it and the 65C02 (ie "secondary" system), so i'm using 2 FIFOs pointing in opposite directions, forming a single bi-directional FIFO. the ATF1502 is only there to handle the decoding logic and FIFO flags on the primary side.
The secondary system has 62kB of RAM (the W24512AK/W241024AK are the fastest DIP SRAM ICs i could find) and 2kB of ROM used for booting and testing. the Interface FIFOs are mapped to memory plus a status and control register which is located inside the CPLD.
let's get to the main part, the 2 FIFOs and the video generation!
the reason why there are 2 FIFOs in parallel is because of the resolution i'm aiming for. specifically 320x240 or 400x300 (currently the logic is designed for 400x300).
as you can probably tell both of these are made using larger resolutions, 640x480 and 800x600 respectively. they're simply divided by 2 both horizontally (drawing each pixel twice in a row) and vertically (drawing each scanline twice in a row). and one important thing to note is that while this quarters the total amount of memory a single frame takes up, it only
halves the bandwidth needed to draw it. and the reason why it only havles the bandwidth instead of quartering is because the vertical dividing doesn't actually reduce the amount of scanlines, it only makes 2 consecutive scanlines use the same data. so when drawing 400x300 you actually have to draw all 600 scanlines.
this is where the second FIFO comes in, whenever the CPU reads from the video area in memory the byte is also loaded into both FIFOs, then when the CPLD reads out data to draw on screen, it uses one FIFO for the even scanlines and the other for odd scanlines. this successfully halves the memory bandwidth again, so the CPU only needs to read each byte exactly once per frame.
now let's look at how i plan to keep the FIFOs filled and what happens when the CPU fails to keep them filled.
the FIFOs are all 1kB in size, and i've hooked up their Half-Full Flags (which go LOW when the FIFO is more than half full) through the CPLD, negate it, and then connected to the CPU's IRQ line. so when both FIFOs are less than half full, the CPU will receive and IRQ until they are over half again.
since i'm going for 400x300 @ 1bpp, each line of pixels is 50 Bytes large, therefore when an IRQ occours the CPU knows that it can safely fit 5 whole Rows of data into the FIFOs (5 * 50 = 500, 500 < 512). after which it returns and continues whatever work it did before.
this should work fine and allow the CPU to do regular work while only occasionally being interrupted to top off the FIFO.
but i still nedded a plan in case the CPU gets stuck in an infinite loop with interrupts disabled and atleast one of the FIFOs actually runs dry. if that happens the video output disables itself and sends an NMI to the CPU. the CPU can then manually reset the FIFOs (emptying them) and reset the IRQ variables back to the beginining of the frame data and preloads 10 lines of data into the FIFOs before re-enabling the video output again.
enabling the video output doesn't make it continue drawing right away, that would be stupid as the CPU has no idea where the video output currently (like in the middle of a frame) so instead at the end of each frame the video output checks the enable bit in the control register and only then reenables itself, so that the data in the FIFOs lines up with the start of the next frame.
on another note, the FIFOs are always active, even during H- and V-Blank. this is required since the FIFOs could hit the 50% threshold at any time, so the CPU needs to be able to have data loaded into them at any time as well.
but this creates an issue where the CPU wouldn't be able to actually access the data when trying to modify it without a copy of the data also landing in the FIFOs. so to get around that issue i designated 2 seperate video areas. one is always the active area, accessing it will cause writes to the FIFOs while the other one just behaves like regular memory.
so the CPU would be accessing the active area to draw it to the screen, and use the inactive area to construct the next frame, then when it's done it will write to a bit in the control register to flip them (the change only applies at the end of the current frame) and then repeat the process.
one downside is that you don't get access to the current frame to draw the next one.
an alternative idea i had would be to replace that control bit with an enable bit for the FIFO writes. and only enable them at the start of the IRQ ISR and then disable them again before RTI-ing. this would free up ~16kB of RAM and allow the drawing code to directly access it, but could introduce artifacts while drawing stuff to the screen.
and on a final note, since the video circuit is only connected to the CPU bus via some async FIFOs, the CPU and video side don't have to share the same clock. i could run the video side at 40MHz (20MHz pixel clock) and run the CPU at 25MHz to get even more performance, or in case it cannot handle that speed, 16MHz and just spend a bit more time per frame filling up the FIFOs. i have also included a small resistor DAC connected to RGB in parallel to allow for 4 shades of black/white in case i want to go for a higher color depth.
.
but yea that's around the idea i had for this. it's obviously a lot more complicated and relies more on hardware to free up CPU time (i could even fit a small DMAC into the CPLD as it's only around ~50% full, and free up even more CPU time!) but since i wanted to use this as a Co-Processor to do things like drawing shapes and text, i thought that i should free up as much CPU time as possible. maybe i can even get Bad Apple running?
anyways, if there are any more questions just ask! maybe i can even answer them