i like the idea of being able to load functions and such to the GPU's memory to give it different abilities. but i'm not sure about the shared SRAM.
mainly because it needs extra circuitry to keep both busses seperate from eachother and forces both CPU and GPU to use the same clock, something i really, really want to avoid.
using Dual Port RAM is also an option, it's fairly cheap and way easier to hook up on both sides, plus it's fully asynchronous so the clock speed of either side doesn't matter.
you could then preload the DP-RAM with a program and then send a reset to the GPU via some control register or similar. one downside is the speed, the IDT700x DP-RAM chips that i have have a 55ns access time, so any 65xx CPU running +8MHz will need atleast 1 wait state to use them, making them as slow as SST39SF0x0 Flash chips.
another option (the one my first failed VGA card used) is to have a ROM onboard (as they can be physically smaller than shared- or DP-RAM) and just make it contain a small bootloader (~256 Bytes) to load a program over the Bi-directional FIFO into it's own faster RAM, and then jump to it.
it's effectively the same as sharing memory directly, but without the complexity of actually sharing it. though it is slightly slower to load the program.
on another note, how exactly would you generate the addresses for the frame buffer to shift out the data?
if you do the row doubling in hardware using a pair for FIFOs like i described above, then you can simply use a counter that increments and writes to the FIFOs on even scanlines, and does nothing on odd ones. once it reaches the last count value, it resets back to 0 for the next frame.
without FIFOs you have to set the counter back at the end of an even scanline, and let it contine normally on odd ones. which seems more complicated IMO unless you're using using programmable logic.
gfoot wrote:
I think a versatile DMA controller can be useful for a lot of things. I'm not sure I'd personally use it for this sort of video operation though, as it's always going to be slower and more intrusive than performing the blitting operations within video memory itself.
I'm not sure i completely follow, the DMAC i proposed would directly work on video memory or any other part of memory as well. and it's speed is almost the same as a regular DMAC that reads a byte, stores it in a register, and then writes it back. just with all of the reads done first and then the writes afterwards.
gfoot wrote:
Especially with a 6502, where host memory is very limited, I think it makes more sense to just have a lot of video memory - more than is needed for the framebuffer - and use that to store sprites and other graphics, with dedicated DMA-like circuitry inside the video circuit to support efficient copying of this data. It can do this in between "output" accesses to video memory, without slowing anything else down. [...]
yea having more memory than necessary would be one reason to go with a CPU that can directly address more than a 65C02. plus if you do go the route of using DP-RAM for the framebuffer memory then the video circuit is completely off the GPU's bus and the DMAC and GPU can do whatever they want without interfering with the video circuit's reading of pixel data.
or a couple of buffers to seperate the GPU/DMAC bus from the video circuit's. so the GPU/DMAC can only access the framebuffer memory when the video circuit isn't reading from it itself (which can be helped by using some FIFOs, so the video circuit loads an etire row of pixels at once, giving the GPU/DMAC more continuous time to copy bytes over).
this of course couples the clocks again, so the video circuit and GPU/DMAC have to run at some shared clock speed.
or do you mean something else and i'm misreading this?
gfoot wrote:
Something else you might be interested in considering is the ANTIC processor from early Atari systems - if you're considering having the CPU generate a stream of pixel data for the video circuit to output, rather than the video circuit scanning a framebuffer itself - I think this is sort of what ANTIC did, and you could design a more advanced version these days that essentially dynamically works out where to read strips of pixel data from, to implement hardware sprites and things like that.
i did look at the ANTIC before making the thread, and is one of the reasons i made it in the first place. but hardware sprites are a bit too complex.
just having a DMAC with rectangular copying is fast enough to implement some usable sprites.
gfoot wrote:
Perhaps rather than storing a framebuffer then, you store a list of scanline definitions, each consisting of a list of ranges of video memory to output pixels from; the CPU can update that (or you DMA it across) and your graphics processor executes it, streaming pixels to the output circuit.
now you're confusing me a bit, what you described is pretty much exactly the whole point for having a VGA card with it's own processor onboard. to do those kinds of operations to take work off the main system.
So the main system doesn't have to deal with a framebuffer or generating any graphics on it's own. it just sends a list of commands and data to the VGA card via the FIFOs (or some DP-RAM) and the CPU on the VGA card (aka the GPU) then constructs the frame in it's onboard frame/line buffer and then has the video circuit read it out. (or the GPU/DMAC pushes the pixels directly to the output).
of course it would never be as good as the ANTIC itself, unless you use an FPGA to implement some +100MHz softcore GPU to do all those operations insanely fast (or do it in hardware directly).
or are you proposing another layer of abstraction and have yet another CPU within the video circuit to help the GPU construct a frame of video?
hmmm, now that i think about it... and this is probably insane... but what about having 5 CPUs on a VGA card?
something like this:
Attachment:
yyXJA5YNPx.png [ 105.14 KiB | Viewed 8076 times ]
so you have 1 Master CPU on the card that controls 4 "core" CPUs. each core CPU takes care of 1/4th of the framebuffer. this would allow you to do 4 operations at once as long as they're on seperate quarters of the screen.
something more ideal would have all core CPUs be able to access the entire framebuffer instead of only 1 quarter of it. but that would require a more complicated bus architecture.
a few buffers to only allow 1 core to access the framebuffer at a time, with said memory being fast enough to switch between all cores within a cycle (plus the video circuit) it could effectively create n+1 port memory (n read/write ports, 1 for each core, and 1 pure read port for the output).
again the whole idea is completely insane and would reqiuire a massive PCB, or something modular/stacked where you can plug "core cards" into a main video card PCB and use an FPGA or similar to handle the bus multiplexing to the framebuffer depending on how many cores are installed.
but it is a neat concept...