gfoot wrote:
Didn't the Apple 2 have hardware scrolling too though? I thought most systems did back then precisely because writing all that video memory was so slow.
Nope, there is no hardware assist of any kind for text or graphics on any Apple II system, not even the iIGS. The closest thing would be the IIGS's fill mode in 320x200, where black pixels take on the color of the last non-black pixel.
Quote:
I'd be interested to hear what exactly the interface was that you found too slow. Personally I'm probably soon going to move in the opposite direction - away from directly mapped memory, to a register based interface instead with a fast mode to write to sequential auto-incrementing addresses.
I lifted the design for my VDC from the TMS9918. There were eight registers total, but most where config registers. The first three were the important ones here: two bytes of VRAM address and a VRAM data register. Reading or writing the data register just accessed the VRAM location pointed to by the two address bytes, and there was also an auto-increment feature so that you could read or write consecutive bytes without changing the address registers every time.
My scroll code (pre-hardware scrolling) would just do the obvious thing of reading lines 1, copying it to line 0, etc on down to line 24, then cleared line 24. It wasn't HORRIBLY slow but it was definitely slower than my Apple IIe. It was moving twice as much data since every char was two bytes, but it was also running at 6 MHz so you would think it would easily beat a 1 MHz Apple II.
I'm just going to leave the hardware scrolling feature in regardless since it's gonna be faster no matter what, and in fact I might implement pixel-level scrolling even in text mode so that I can smooth scroll. But that is a stretch goal; I'd rather implement hardware sprites first.
Quote:
The thing is, the 6502 is pretty slow at indirect memory writes, and it's also slow at arithmetic wider than 8 bits, which includes incrementing the target address and dealing with overflows. It doesn't have any 16-bit registers, and even if it did they wouldn't be wide enough to address a VGA resolution. So although the ability to send an extra 16 bits of entropy is appealing, it's a bit of an illusion and the cost to the CPU is high.
And this is another reason why the slowness of my original implementation baffled me, With the register-based setup the CPU was doing less indirect access, as the hardware register never moves. The only indirect accesses were to a temp buffer as it copied lines in and out of VRAM.
Quote:
My main advice is, think about the graphics operations you want to perform (the ones that are speed-critical), think about what the code will look like to perform them, actually write the code, and check how fast it will run, and if it doesn't add up, go back to change the design.
I eventually want to do some simple games (my brother kinda got me thinking about a port of an old silly game I wrote for our TI-99/4A 40 years ago). I do plan on hardware sprites since they are not terribly hard to implement, and that will certainly help. Horizontal scroll will be very helpful for games as well but I am hoping I will have that covered with my new design, which will have more freedom in programming where in VRAM the frame buffer starts and how it's arranged.
Quote:
From analysis of my personal case, just about the only thing directly addressed memory is good for is random access, but that's rarely needed, hence why I'm moving away from it. Block writes and copies are much more important. I'm pretty confident I'll get much faster speeds from a more tailored interface, and get the benefit of needing fewer wires between the CPU and graphics circuit.
I'm still not married to that idea; I admit I'm biased here because I grew up on Apple II systems which was all raw frame buffer, so that's what I know. I think the thing I would be most sad about losing without direct frame buffer access iv MVP/MVN, which while 7(?) cycles per byte are still faster than regular loops. Having a hardware blitter could alleviate that, but that is not something I have on my roadmap at the moment.
I think BDD has definitely gotten me started down the road of designing some sort of other bus interface to use for this, I'm just not sure what it's going to look like yet, and I tend to have problems sometimes getting stuck thinking inside the box.
My big worry is the overhead of driving the bus signals manually through a 65C22; it will be fine for my other expansion card plans, but it may be problematic for video.
I am actually thinking I might try my hand at making a dedicated bus controller on a CPLD. It would give the slots their own 8-bit data bus and 24-bit address bus, plus some few control signals. On the '816 side it would act like my existing video controller and let the CPU auto-increment through that address space. This means my new video card could be designed as I originally intended, but the bus controller would just hide it. It would also be useful for making cards with shared buffer space on them for things like microcontroller-driven mass storage controllers.