I've had my PC chugg through the code that's running my bench computer. Using the dead cycles, I can rely on a bus bandwidth of about 60-70 kbytes per second per megahertz, with a peak performance of three times that for certain blocks of code. However, there is also a point in the code where there is a 42 cycle gap between successive dead cycles. If I thought something needed a data rate fast enougth to justify DMA, I don't think I could stand a 42 cycle latency at the wrong moment.
Hi smilingphoenix,
This channel is quite fast, but, most important, it's free in terms of bandwidth cost. The main virtue is not it's speed, but it's (free) bandwidth cost = zero. I mean, the application for this "DMAMagic" thing, should be chosen based on it's main virtue, not on 'prejudices' about what DMA used to be used for.
I'm not knocking the idea - for a system with phase-1 DMA already it would allow you to squeeze out a little more bandwidth, especially on a faster system where 6% of the bus bandwidth is still considerable. I've no doubt you could get it to work, I just wonder exactly what you would use it for?
I still don't know what would be the killer application for it, but I do know that the idea itself is a killer idea...
Well, seriously, if, for example, you wanted to move 256 bytes of data from a sector of a disk to the RAM (in an existing, already built 6502 system), you could go 3 ways:
1.- Pure (phase 2) DMA : 100% of the CPU's memory bandwidth available means that's the fastest way, at least in theory. But in this mode the CPU would be halted during 256 memory cycles.
2.- Via software, using the CPU to poll, something like:
LOOP LDA $somewhere (4 cycles)
STA $somewhereElse,X (5 cycles)
INX (2 cycles)
BNE LOOP (3 cycles)
You'd get a maximum transfer rate of about 1 byte every 14 cycles, or about 7.1% of the available memory bandwidth, or 14 times slower than (1). In this mode, the CPU would be 'halted' (busy transferring during) 256*14 cycles.
3.- Post the transfer request, let the CPU continue doing whatever other productive work while the transfer takes place trought the transparent DMA channel, and either poll every now and then, or trigger an interrupt after the transfer is done. CPU cycles dedicated to the transfer : almost 0. Therefore, somehow,
this is the fastest way.
--Jorge.