6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Oct 06, 2024 10:45 pm

All times are UTC




Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Thu Jul 15, 2021 4:13 pm 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
BigDumbDinosaur wrote:
I don't want to be a Doubting Dan, but I think your proposed system is too complex for a first try. Also, I'd be using a CPLD instead of multiple GALs. Without launching into too much detail, using multiple GALs can create logic race conditions in some cases. I can guarantee that you will be tearing out your hair trying to unravel such a mess.


Well technically this is my third system, and the second 65816 system. Or third/fourth, if you count the fact that my "second" system went through a complete redesign & rebuild at one point. Nearly everything on my list was already built and working (albeit a bit flaky) on a breadboard as recently as six months ago.

Regarding CPLDs, I did use them on my last build, but it was a 3.3V system so I was able to use XIlinx 9572XLs. The ATF1504 and 1508 are 5V I know, but I'm an all-Linux shop; my only Windows box is a Windows 10 gaming rig in another room. To date I have not seen any evidence of anyone successfully programming these things on Linux. With the Xilinx parts I can run their IDE to build the images and program them directly from the command line using xc3sprog and a $15 JTAG board. They are also programmable in Verilog which I much prefer to the horror that is WinCUPL. :)

Now, maybe someone can convince me to not go back to 5V in which case the CPLD decision becomes much easier. But, 3.3V had its own problems for me which is why I decided not to go down that route for this build.

BigDumbDinosaur wrote:
IYou need to consider the work that is involved in generating a raster and putting text (let alone graphics) on a screen. Producing a stable picture with VGA requires jitter-free timing that you may not be able to achieve no matter how fast you run the 65C816. This is why so many hobbyists use some sort of microcontroller or video display controller to produce VGA.


Not sure if you saw my followup post to J64C, but I have done video before, so I know what's involved. :) The video will be done on an FPGA, like my last iteration. I already have working Verilog code from my last build that implements a basic graphics controller. Again, the big difference here is just that I'm taking the video off board on this project so that I can iterate on it without holding up the rest of the project.

BigDumbDinosaur wrote:
No, you don't need an expansion connector with access to the MPU buses. You need an expansion bus that provides the needed signals to support plug-in cards. As Garth advises, do not run the MPU buses off-board. This is no different than what is going on in a PC. Excepting the short-lived VESA local bus that was present in 80486 systems (and not all of them), all buses in PCs are not extensions of the MPU buses. In fact, the modern PCI and PCI-Express (PCI-E) buses run at a speed that is a submultiple of the processor bus speed.


I am trying to wrap my head around on how I could make that work at any reasonable speed. My long-term goal is to be able to code up some games on this thing (hence the game controller ports) On my last build, the TinyFPGA board I used did not have enough I/O to expose its internal VRAM directly, so it used a register setup similar to a TMS9918. Even with auto-increment addressing, and a 6 MHz CPU, I had to add hardware text scrolling to make it feel as fast as a 1 MHz Apple II (to be fair, my text takes twice as much space as the A2 due to color/attribute bytes).

The conclusion I took away from that experiment is that I need to memory map at least some of the VRAM. My next video controller is on a much larger FPGA and has enough I/O that I can make it act like an SRAM, so the VRAM will be directly mapped. Current plans call for 64K of VRAM, though I have something like 496K of onboard DP-RAM available so I could make the frame buffer much larger (or more likely provide multiple buffers) if I want to.

Anyway even this is not set in stone, if someone can convince me that I can squeeze decent performance while funneling everything through a VIA or some other interface chip, I am game (no pun intended).

BigDumbDinosaur wrote:
Again, par down your feature list so you can build a first unit with high likelihood of success. Once you get that first unit up and running, study its operation (here is where a good scope and possibly a 16- or 32-channel logic analyzer can be of help) and observe how your timing has worked out. Information gleaned from such observation will assist you in designing and building the next unit that has more features and complexity.


well, there really isn't anything to take out at this point. The core system is just the CPU, RAM, ROM (if I add one, which I'm leaning towards now), VIAs and a UART. The other stuff (keyboard, controllers) are just ports that would hang off the VIAs. That's why I have two of them. :)


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 16, 2021 9:33 pm 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
BigDumbDinosaur wrote:
Excepting the short-lived VESA local bus that was present in 80486 systems (and not all of them), all buses in PCs are not extensions of the MPU buses. In fact, the modern PCI and PCI-Express (PCI-E) buses run at a speed that is a submultiple of the processor bus speed.


Since I am pretty ignorant on the specifics of PC expansion buses I did a little research, and apparently ISA does expose buffered address and data lines. But, as has been pointed out ISA generally topped out at 8 MHz, though according to Wikipedia some motherboards had it running as fast as 16-20 MHz (yikes).

PCI and PCIe of course are totally different beasts, but they also have the advantage of dedicated silicon to run them. I actually have been toying with the idea of creating some sort of bus controller in a CPLD, but I'm still doing research on how much effort it is going to be for me to actually program the Atmel 5V chjps. I could eBay the older Xilinx 9572s, but they're $12-$15 each which is 3-4x what I pay for the 9572XLs new.

Quote:
My next POC unit is going to use a card-edge connector for expansion—it will be a 0.100" spacing type, which was used in all PCs with ISA buses. We'll see how well that works when I get it built.


I am intrigued by this, since I've been following (and admiring) your POC designs for years. Based on what you've suggested to me I am going to assume this will be its own isolated bus of some sort?


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 16, 2021 10:59 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
jmthompson wrote:
I am trying to wrap my head around on how I could make that work at any reasonable speed. My long-term goal is to be able to code up some games on this thing (hence the game controller ports) On my last build, the TinyFPGA board I used did not have enough I/O to expose its internal VRAM directly, so it used a register setup similar to a TMS9918. Even with auto-increment addressing, and a 6 MHz CPU, I had to add hardware text scrolling to make it feel as fast as a 1 MHz Apple II (to be fair, my text takes twice as much space as the A2 due to color/attribute bytes).

Didn't the Apple 2 have hardware scrolling too though? I thought most systems did back then precisely because writing all that video memory was so slow.

Quote:
The conclusion I took away from that experiment is that I need to memory map at least some of the VRAM. My next video controller is on a much larger FPGA and has enough I/O that I can make it act like an SRAM, so the VRAM will be directly mapped. Current plans call for 64K of VRAM, though I have something like 496K of onboard DP-RAM available so I could make the frame buffer much larger (or more likely provide multiple buffers) if I want to.

I'd be interested to hear what exactly the interface was that you found too slow. Personally I'm probably soon going to move in the opposite direction - away from directly mapped memory, to a register based interface instead with a fast mode to write to sequential auto-incrementing addresses.

The thing is, the 6502 is pretty slow at indirect memory writes, and it's also slow at arithmetic wider than 8 bits, which includes incrementing the target address and dealing with overflows. It doesn't have any 16-bit registers, and even if it did they wouldn't be wide enough to address a VGA resolution. So although the ability to send an extra 16 bits of entropy is appealing, it's a bit of an illusion and the cost to the CPU is high.

My main advice is, think about the graphics operations you want to perform (the ones that are speed-critical), think about what the code will look like to perform them, actually write the code, and check how fast it will run, and if it doesn't add up, go back to change the design.

From analysis of my personal case, just about the only thing directly addressed memory is good for is random access, but that's rarely needed, hence why I'm moving away from it. Block writes and copies are much more important. I'm pretty confident I'll get much faster speeds from a more tailored interface, and get the benefit of needing fewer wires between the CPU and graphics circuit.

(Edit - I just saw in the title that your using the 65816, not the 6502, so that may change and things here. I'd still suggest to be very clear what the code will look like though and how it will perform, before committing to a hardware design.)


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 16, 2021 11:56 pm 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
gfoot wrote:
Didn't the Apple 2 have hardware scrolling too though? I thought most systems did back then precisely because writing all that video memory was so slow.


Nope, there is no hardware assist of any kind for text or graphics on any Apple II system, not even the iIGS. The closest thing would be the IIGS's fill mode in 320x200, where black pixels take on the color of the last non-black pixel.

Quote:
I'd be interested to hear what exactly the interface was that you found too slow. Personally I'm probably soon going to move in the opposite direction - away from directly mapped memory, to a register based interface instead with a fast mode to write to sequential auto-incrementing addresses.


I lifted the design for my VDC from the TMS9918. There were eight registers total, but most where config registers. The first three were the important ones here: two bytes of VRAM address and a VRAM data register. Reading or writing the data register just accessed the VRAM location pointed to by the two address bytes, and there was also an auto-increment feature so that you could read or write consecutive bytes without changing the address registers every time.

My scroll code (pre-hardware scrolling) would just do the obvious thing of reading lines 1, copying it to line 0, etc on down to line 24, then cleared line 24. It wasn't HORRIBLY slow but it was definitely slower than my Apple IIe. It was moving twice as much data since every char was two bytes, but it was also running at 6 MHz so you would think it would easily beat a 1 MHz Apple II.

I'm just going to leave the hardware scrolling feature in regardless since it's gonna be faster no matter what, and in fact I might implement pixel-level scrolling even in text mode so that I can smooth scroll. But that is a stretch goal; I'd rather implement hardware sprites first.

Quote:
The thing is, the 6502 is pretty slow at indirect memory writes, and it's also slow at arithmetic wider than 8 bits, which includes incrementing the target address and dealing with overflows. It doesn't have any 16-bit registers, and even if it did they wouldn't be wide enough to address a VGA resolution. So although the ability to send an extra 16 bits of entropy is appealing, it's a bit of an illusion and the cost to the CPU is high.


And this is another reason why the slowness of my original implementation baffled me, With the register-based setup the CPU was doing less indirect access, as the hardware register never moves. The only indirect accesses were to a temp buffer as it copied lines in and out of VRAM.

Quote:
My main advice is, think about the graphics operations you want to perform (the ones that are speed-critical), think about what the code will look like to perform them, actually write the code, and check how fast it will run, and if it doesn't add up, go back to change the design.


I eventually want to do some simple games (my brother kinda got me thinking about a port of an old silly game I wrote for our TI-99/4A 40 years ago). I do plan on hardware sprites since they are not terribly hard to implement, and that will certainly help. Horizontal scroll will be very helpful for games as well but I am hoping I will have that covered with my new design, which will have more freedom in programming where in VRAM the frame buffer starts and how it's arranged.

Quote:
From analysis of my personal case, just about the only thing directly addressed memory is good for is random access, but that's rarely needed, hence why I'm moving away from it. Block writes and copies are much more important. I'm pretty confident I'll get much faster speeds from a more tailored interface, and get the benefit of needing fewer wires between the CPU and graphics circuit.


I'm still not married to that idea; I admit I'm biased here because I grew up on Apple II systems which was all raw frame buffer, so that's what I know. I think the thing I would be most sad about losing without direct frame buffer access iv MVP/MVN, which while 7(?) cycles per byte are still faster than regular loops. Having a hardware blitter could alleviate that, but that is not something I have on my roadmap at the moment.

I think BDD has definitely gotten me started down the road of designing some sort of other bus interface to use for this, I'm just not sure what it's going to look like yet, and I tend to have problems sometimes getting stuck thinking inside the box. :) My big worry is the overhead of driving the bus signals manually through a 65C22; it will be fine for my other expansion card plans, but it may be problematic for video.

I am actually thinking I might try my hand at making a dedicated bus controller on a CPLD. It would give the slots their own 8-bit data bus and 24-bit address bus, plus some few control signals. On the '816 side it would act like my existing video controller and let the CPU auto-increment through that address space. This means my new video card could be designed as I originally intended, but the bus controller would just hide it. It would also be useful for making cards with shared buffer space on them for things like microcontroller-driven mass storage controllers.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 12:35 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8415
Location: Midwestern USA
jmthompson wrote:
BigDumbDinosaur wrote:
My next POC unit is going to use a card-edge connector for expansion—it will be a 0.100" spacing type, which was used in all PCs with ISA buses. We'll see how well that works when I get it built.

I am intrigued by this, since I've been following (and admiring) your POC designs for years. Based on what you've suggested to me I am going to assume this will be its own isolated bus of some sort?

Part of it will be buffered (the data bus) and rest will be raw. POC V1.2's expansion socket is entirely raw and is functional at 20 MHz.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 12:44 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8415
Location: Midwestern USA
jmthompson wrote:
I think BDD has definitely gotten me started down the road of designing some sort of other bus interface to use for this...My big worry is the overhead of driving the bus signals manually through a 65C22; it will be fine for my other expansion card plans, but it may be problematic for video.

When it comes to manipulating individual bits, TRB and TSB are your friends. Their effect is atomic and quite fast.

Unless your video is implemented by a controller with a fair amount of intelligence, you're not going to achieve a high level of video performance, even with a 65C816 running at 20 MHz. Automatic address register incrementing and blitting are almost de rigueur if rapid scrolling, fill and clear operations are expected. Even the Commodore 128's VDC had that capability (although the display kernel had a lot of overhead to deal with two separate video systems).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 1:34 am 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
BigDumbDinosaur wrote:
Unless your video is implemented by a controller with a fair amount of intelligence, you're not going to achieve a high level of video performance, even with a 65C816 running at 20 MHz. Automatic address register incrementing and blitting are almost de rigueur if rapid scrolling, fill and clear operations are expected. Even the Commodore 128's VDC had that capability (although the display kernel had a lot of overhead to deal with two separate video systems).


The video controller will definitely have a lot of hardware assist eventually, it just won't have much to start (starting simple and all that). The blitter is the only thing that isn't on my short-term roadmap, because if I have sprites I can get away without it for a little while. And having an oversize frame buffer, coupled with the ability to move the active area around in memory on the fly, can help with scrolling.

Anyway I'm still thinking about all of this. All the input I've gotten so far has been very helpful and has given me ideas for other routes that I might not have otherwise considered.

(oh and you convinced me to go the CPLD route. I have some ATF1504s on order from Mouser now even though I'm still working out what workflow I am gonna need to program them.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 3:18 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Something else I've been planning for a while that might be interesting for your design is to implement something like the EGA's latch, which is a wide register (32 bits) internal to the graphics hardware. There were various modes of use, but roughly speaking, every time the CPU read a byte, that data along with the corresponding bytes from the other colour planes were loaded into the latch. Then you could for example issue a single byte write to a different address and it would optionally write the data from the latch register instead (allowing a 32 bit copy for the price of 8 bits) or blend the data from the CPU with the latch (allowing for things like only updating some of the bits in the byte, useful for masked rendering).

I wasn't going to exactly copy how that worked - it was more complex than I'd want to build myself - but I drew up a circuit to do something similar, allowing an 8 bit operation to copy 32 bits, and also allowing the CPU to update individual pixels without having to execute its own read-modify-write sequence. The latter is useful for masked text drawing, and line graphics, for example - not my highest priorities to accelerate, but relatively easy to do.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 3:35 am 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
gfoot wrote:
Something else I've been planning for a while that might be interesting for your design is to implement something like the EGA's latch, which is a wide register (32 bits) internal to the graphics hardware. There were various modes of use, but roughly speaking, every time the CPU read a byte, that data along with the corresponding bytes from the other colour planes were loaded into the latch. Then you could for example issue a single byte write to a different address and it would optionally write the data from the latch register instead (allowing a 32 bit copy for the price of 8 bits) or blend the data from the CPU with the latch (allowing for things like only updating some of the bits in the byte, useful for masked rendering).


Interesting...I suspect something like that would end up being as much or more work than just writing a more general-purpose blitter, though.

The FPGA I'm using has 84k LUTs and dual-port RAM that will be running at at least 125 MHz, so there will be plenty of spare resources for sprites, a blitter, and other fun acceleration features. Sprites will be the easiest and most immediately useful, so that's the top thing on my to-do list once I have the basics working.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 3:45 am 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
hope i can just throw my 2 cents into this thread without derailing it too much.

I've been thinking about some CPLD/FPGA based VGA Controller for a while as well... i've designed some simple Controllers that barely fit into an ATF1508 and during that i came up with 2 ways i would make the Controller's VRAM directly available to the CPU. (since that seems to be your goal right now)

Option 1: do it the Apple II/C64 way of just sharing Memory between the CPU and the Controller on alternating clock halfs. (this assums the Video Controller and CPU share a synchronized clock)
but instead of having the Video Controller access the CPU's Memory every half cycle, why not have it the other way around? so the CPU is accessing the Video Controller's memory.
I.E. have some RAM on the Video Card itself, and using some tri-state buffers on the Address/Data Lines, the Video Controller can switch between itself and the CPU accessing the RAM.
Attachment:
gimp-2.10_2021-07-17_02-51-42.png
gimp-2.10_2021-07-17_02-51-42.png [ 72.05 KiB | Viewed 656 times ]

("A" for Address bus and "D" for Data, these imagies should be fairly readable)
my assumtion is that keeping the Controller's reach limited to it's own PCB is gonna be better for the performance of the whole computer.
plus i think having the Video Controller access the CPU's Memory would mess with Wait states while the CPU is reading/writing other IO Devices, since despite RDY being pulled low the Data/Address bus would still constantly change every cycle due to the Video Controller doing it's thing.

Option 2: use 2 RAM ICs and some Multiplexers. (the Video Controller and CPU's clock can be different)
likely the best option IMO, there is another RAM IC (called RAM1 in this example). Now the CPU and Video Controller no longer have to share the same IC/bandwidth.
specifically the CPU and Video Controller are always accessing opposite ICs, and the CPU can send a command to the Video Controller (some bit in some IO Register) to flip around which ICs are accessed by either.
for example with that "flip" bit set to 0 the CPU gets access to RAM0, while the Video Controller reads from RAM1 to draw the image
but with the but set to 1, the Video Controller is now reading from RAM0, while the CPU has access to RAM1.
Ideally this flip would only actually occur during V-Blanking to avoid screen-tearing.

i can think of 2 ways to implement this kind of multi-bit wide Multiplexer
LEFT: if you have enough IO Pins on your FPGA you can easily use it to implement all the Logic required to split the CPU/Video Controller's Access to either RAM0 or RAM1 (Internal BRAM could also be used instead of external ICs, reducing the total size of the PCB and how many IO pins are needed).
RIGHT: if you want to use an FPGA/CPLD with less IO and don't mind the extra PCB space and Propagation Delay from TLL Logic. (probably not usable at higher CPU speeds)
Attachment:
gimp-2.10_2021-07-17_03-58-28.png
gimp-2.10_2021-07-17_03-58-28.png [ 212.39 KiB | Viewed 656 times ]

then again if you're gonna use BRAM it's likely Dual Port so there would be little reason to use this kind of Buffering besides avoiding Screen-Tearing and giving the CPU more time to fill the VRAM with data for the next frame.

either way i'm really interested in high speed (~12.5MHz or faster) expansion busses since i want to have something like that in my future systems as well.
but it looks to be a lot more complicated than expected.
from what i can grasp from this thread a 65C22 would be the easiest option, but in what way would it workexactly? is 1 Port used for the Expansion Data bus and the other for the Expansion Address Bus?
in that case the 65C22 would be used to both select and communicate to exactly 1 IO Device/Register at the time. it would definitely reduce the amount of Memory Addresses taken away by IO, but would also require you to constantly change the Expansion Address if you need to periodically access a lot of different Devices/Registers (like for a Sound chip for example)... making random accesses pretty slow.
so what would a dedicated Bus Controller (CPLD/FPGA) look like compared to a 65C22?


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 4:54 am 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
Proxy wrote:
hope i can just throw my 2 cents into this thread without derailing it too much.


The more the merrier, as far as I'm concerned. :)

Quote:
...i'm really interested in high speed (~12.5MHz or faster) expansion busses since i want to have something like that in my future systems as well, but it looks to be a lot more complicated than expected. from what i can grasp from this thread a 65C22 would be the easiest option, but in what way would it work exactly? is 1 Port used for the Expansion Data bus and the other for the Expansion Address Bus? in that case the 65C22 would be used to both select and communicate to exactly 1 IO Device/Register at the time. it would definitely reduce the amount of Memory Addresses taken away by IO, but would also require you to constantly change the Expansion Address if you need to periodically access a lot of different Devices/Registers (like for a Sound chip for example)... making random accesses pretty slow.


For the actual bus, the idea I'm toying with would look something like an externalized Wishbone bus with 16-bit addressing and 8-bit data. Something like this:

A0..15 [ not the MPU's address bus ]
D0..7 [ not the MPU's data bus ]
CLK [ bus clock ]
/RW [ High for read, low for write ]
/IO [ High for a memory read, low for an I/O port read ]
/STB [ Pulled low by controller to start a bus transaction ]
/ACK [ Pulled low by card to indicate end of transaction ]
/IRQ [ pulled low by card to signal an interrupt ]
/RESET [ System reset ]

This could be implemented, as BDD was suggesting, with a couple of VIAs, plus some helper logic. If you put the data bus lines on port A then CA1 and CA2 can be used for the /STB and /ACK lines, and will even generate the interrupts for you. CB1 and CB2 can handle /RW and /IO. If you use a counter as the address latch then you can arrange it so that when /ACK is pulsed it auto-increments the address for you, too. That way accessing contiguous chunks of memory or registers on a card is just consecutive reads.

Quote:
so what would a dedicated Bus Controller (CPLD/FPGA) look like compared to a 65C22?


Basically just like the VIA solution, but all in one chip instead of a few discrete chips. The problem is that this needs a large CPLD to implement, not because of the logic resources needed but because of the number of I/O pins. The list of bus signals above is already pushing the limits of the smaller 44-pin CPLDs, and it doesn't even include the signals to interface to the CPU. Because of that I'll probably just see about doing this with a VIA or two.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 17, 2021 10:25 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
Sorry for being late to the party, but here's my $0.02.

I would try to avoid - these days - to put the video on an expansion bus, mostly due to the high video access clocks you need to achieve. 25MHz dot clock (640x480) means about 3 access/s = 3MHz for bitmap graphics, or 6MHz if you need to fetch character data per cell and pixel data per character.
Long time ago I had built a "simple" backplane-based computer ( http://www.6502.org/users/andre/csa/index.html ) that was capable of running up to 2MHz over the bus, but it seems newer IC technologies have limited the speed to 1MHz because of the ringing caused by faster transition times. (at least that's my assumption).
So I would not want to have that kind of frequency off the board (at least not in a less-controlled way without bus termination etc)

So, if you want to have the CPU and RAM (assuming on the main board) running anything faster than 1-2MHz as on the expansion bus, you'd have to inject wait times, clock sync down to the expansion bus clock.
If the Video memory is on another board, it has to inject the slow expansion bus access into its faster memory access. May work fine with dual-ported-RAM (never used that so I cannot comment), but with anything time-shared access on a fast RAM it gets tricky.

A separate video processor may be an interesting approach, one you can send commands like turtle graphics that get then executed independently. That decouples the CPU access to the video memory from the actual video generation, and allows for slow expansion bus interface to the video process controlling the faster video memory. Reads (or direct writes) may be challenging and may result in own subroutines instead of a simple LDA/STA. That is quite a deviation from the "classical" approach of memory mapped video, and, also a video processor is not a simple feat (but can be _very_ powerful).

In my approach here https://github.com/fachat/MicroPET I put a CPLD between the CPU and the video memory, synchonized the video and CPU clocks to allow simple scheduling of video memory access to the CPU, and put it all on a single PCB. Of course a CPLD is not as powerful as an FPGA, but can generate monochrome, bitmap- and character-based VGA just fine. The I/O can have different clocks than the video, and is integrated on the board here, but I am looking into providing a version for my abovementioned old machine, that has the bus running at 1MHz and uses the expansion bus for I/O.
(Note that running the 6502 bus off the board may require separate select lines for bus I/O, or making sure address is set to fixed value like zero, or at least disable databus driver and set bus to read when a resource on the CPU board is used, to avoid data bus conflicts)

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 18, 2021 3:54 am 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
fachat wrote:
I would try to avoid - these days - to put the video on an expansion bus, mostly due to the high video access clocks you need to achieve. 25MHz dot clock (640x480) means about 3 access/s = 3MHz for bitmap graphics, or 6MHz if you need to fetch character data per cell and pixel data per character.


Video memory will not be main memory. It'll be DP-RAM on an FPGA, so nothing high speed will cross the expansion bus. The CPU will access the frame buffer through a register window on the card, which will in turn be accessed over an expansion bus powered by one or two VIAs. This will not be winning any speed awards right away, but once I add more hardware acceleration it should be pretty good.

Quote:
A separate video processor may be an interesting approach, one you can send commands like turtle graphics that get then executed independently. That decouples the CPU access to the video memory from the actual video generation, and allows for slow expansion bus interface to the video process controlling the faster video memory. Reads (or direct writes) may be challenging and may result in own subroutines instead of a simple LDA/STA. That is quite a deviation from the "classical" approach of memory mapped video, and, also a video processor is not a simple feat (but can be _very_ powerful).


That seems to be the way this project is headed now as I've more or less been convinced not to provide memory-mapped frame buffers. The next step is just to finalize what the expansion bus will look like.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 43 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: