The secret, hidden, transparent 6502 DMA channel

Let's talk about anything related to the 6502 microprocessor.
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

Not a problem -- just wanted to make sure, since we have the two threads already, that they stay relatively on topic. Otherwise, detangling the mess of threads gets horrifyingly confusing.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by GARTHWILSON »

Would any of our programmable-logic people be interested in taking up the project of making this kind of DMA controller for the 65816? It should be pretty simple, at least compared to doing it for the 6502.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
BigDumbDinosaur
Posts: 9426
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by BigDumbDinosaur »

GARTHWILSON wrote:
Would any of our programmable-logic people be interested in taking up the project of making this kind of DMA controller for the 65816? It should be pretty simple, at least compared to doing it for the 6502.

Which kind? The topic thread discusses several approaches, of which use of MLB, VDA and VPA would seem most applicable to the '816.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Re: The secret, hidden, transparent 6502 DMA channel

Post by kc5tja »

Hello again. Just poking in to contribute here, since this is more or less relevant to my FPGA activities right now. What isn't made clear here is what the DMA controller will be used for. Is this intended for use with a generic hardware interface of some kind, sort of like the DMA channels on the PC/XT and ISA buses? Or, will this entail a complete bus master implementation? The answer will determine how to go about building such a controller.
User avatar
BigDumbDinosaur
Posts: 9426
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by BigDumbDinosaur »

kc5tja wrote:
Hello again. Just poking in to contribute here, since this is more or less relevant to my FPGA activities right now. What isn't made clear here is what the DMA controller will be used for. Is this intended for use with a generic hardware interface of some kind, sort of like the DMA channels on the PC/XT and ISA buses? Or, will this entail a complete bus master implementation? The answer will determine how to go about building such a controller.

Howdy, Samuel. In my area of interest, I would entertain the idea of it acting as a bus master. My specific application would be to enhance SCSI I/O throughput. I'm currently simulating the effect of a DMA controller by having the MPU manipulate the 53C94's /DACK input and polling the chip's DREQ output. Although this "DMA simulation" adds cycles to the basic I/O loop, it still runs pretty fast—nearly 500KB/second on burst transfers of 48KB. However, a DMA controller that can act as a bus master could easily achieve 6-8 times that performance, nearly equal to that of the SCSI bus' throughput in synchronous mode.

At one time I had given some thought to the idea of rigging up a 65C02 to act as a DMA controller, with the 'C94's DREQ output tied to the 'C02's IRQB input and the latter sitting on a WAI instruction with IRQs disabled, which results in a one cycle response time to the 'C94 saying it has, or is ready for, data. I abandoned the idea after doing some cycle counts—it would have been, at best, twice as fast as the current method, being limited by the 'C02's ability to read and write memory.

In my opinion, any DMA controller intended to run with the '816 would have to effectively transfer a byte per 2-3 clock cycles to be worth the bother. The throughput with such a device could reach 6-10 MB/second, certainly nothing to sneeze at. I suppose with suitably fast silicon, a byte per clock cycle would be feasible, something that I would be very enthusiastic about. :D
x86?  We ain't got no x86.  We don't NEED no stinking x86!
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by scotws »

Just curious, has anybody done anymore work in this direction? Having a '816 as the main processor and a 'C02 as an I/O co-processor sounds interesting (and slightly mainframe-y).
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by Dr Jefyll »

KK microcode addressing (big).gif
KK microcode addressing (big).gif (9.44 KiB) Viewed 1966 times
scotws wrote:
Just curious, has anybody done anymore work in this direction?
Working separately, the OP and I both designed hardware that captures the opcode present on the data bus when SYNC is high. This (and a cycle counter and ROM) can yield info regarding what the CPU is doing on any given bus cycle.

Jorge wanted that information so he could identify 6502 dead bus cyles, and reallocate them for DMA. (An alternative approach is to use an '816 instead. '816 unused bus cycles are trivially easy to identify; just look for VPA and VDA simultaneously low.)

I wanted that information so I could build a microcoded exoskeleton that expands the 65C02 architecture with new instructions and registers. The misleadingly named "KimKlone" project was successfully completed, and is documented on my web site with a short summary as well as a detailed description. There's a 6502.org thread about the KimKlone here. And yes, a modern-day alternative is to use an '816 instead! :D

The link Jorge used in his lead post is now defunct, but what he showed was similar to the diagram above. Wanna try doing something like this yourself? There's a potential problem regarding interrupts. Surprisingly, perhaps, it's possible for an opcode to be fetched yet not executed. See the comment about the XOR gate in this post.

-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by Dr Jefyll »

scotws wrote:
Having a '816 as the main processor and a 'C02 as an I/O co-processor sounds interesting
Two scoops of 65xx goodness in one machine? Positively intoxicating! :mrgreen:
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BigDumbDinosaur
Posts: 9426
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Homebrew DMA Controller

Post by BigDumbDinosaur »

Although a DMA controller to speed up SCSI transfers in my POC unit is not real high on my computing bucket list, I do continue think about it.

To recapitulate, one of the ideas I have mulled is that of using a 65C02 in that role, taking advantage of the SEI -- WAI method to elicit single-cycle response to a DREQ from the 53C94. DREQ (high true) would be connected through an inverter to IRQB on the 65C02. When the 'C94 was ready for data, it asserts DREQ, which would awaken the 65C02 in one clock cycle. The 'C02 would fetch the byte, store it and then toggle /DACK on the 'C94 to indicate that the byte had been read. Similarly, during a write cycle, DREQ would tell the 'C02 that the 'C94 is ready for data. The 'C02 would store a byte into the 'C94's FIFO and then toggle /DACK to tell the 'C94 about it.

I did a cycle count on this and determined that it would be roughly twice as fast as my current method of using the 65C816 as a pseudo-DMA controller. There's more to it than just shuffling bytes around. A counter has to be maintained in RAM and the code driving the 'C02 has to also set up and manage a zero page pointer for addressing purposes. Also, the 'C02's stack is hard-wired to page one RAM. There's not enough of a gain to justify working out the issues.

It could probably be done with a second 65C816 acting as a "DMA controller." Issues of memory management and addressing would be lessened, especially since 16 bit indexing could be done sans ZP pointers. However, the performance improvement isn't going to be much better than by using a 'C02.

The likely solution will come from programming a CPLD to act as a DMA controller. This gets into the design of state machines, since counters and memory load/store operations have to be implemented in a cyclic fashion. As a 65xx data bus is valid only during Ø2 high, it seems that such a device could copy a byte in two cycles. That would, in theory, produce a transfer rate of 10MB/sec with a 65C816 running at maximum clock speed. Compare that with the speed of MVN or MVP, either of which can move a byte in seven cycles, amounting to a theoretical transfer rate of 2.8MB/sec with a 20 MHz Ø2 clock.

I can do combinatorial and register logic in a CPLD, but have yet to work on doing a state machine. However, I do have a picture of the sequence of events that would have to occur.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigDumbDinosaur
Posts: 9426
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by BigDumbDinosaur »

Dr Jefyll wrote:
scotws wrote:
Having a '816 as the main processor and a 'C02 as an I/O co-processor sounds interesting
Two scoops of 65xx goodness in one machine? Positively intoxicating! :mrgreen:
May we be expecting a NUMA design featuring multiple '816s soon from Stratford, ON? :lol:
Last edited by BigDumbDinosaur on Thu Nov 07, 2013 4:44 pm, edited 1 time in total.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
Dr Jefyll
Posts: 3526
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by Dr Jefyll »

BigDumbDinosaur wrote:
May we be expected a NUMA design featuring multiple '816s soon from Stratford, ON? :lol:
Actually, what's more likely is another coprocessor, but only remotely similar to the KK's, and applicable to '816 as well as 'c02. It'd offer some 6809-like address modes, and a partial hardware assist for floating-point calculations done in software.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BigDumbDinosaur
Posts: 9426
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by BigDumbDinosaur »

Dr Jefyll wrote:
BigDumbDinosaur wrote:
May we be expected a NUMA design featuring multiple '816s soon from Stratford, ON? :lol:
Actually, what's more likely is another coprocessor, but only remotely similar to the KK's, and applicable to '816 as well as 'c02. It'd offer some 6809-like address modes, and a partial hardware assist for floating-point calculations done in software.
If someone were to come up with floating point hardware that would be 65xx bus-compatible I'd go for it.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
scotws
Posts: 576
Joined: 07 Jan 2013
Location: Just outside Berlin, Germany
Contact:

Re: Homebrew DMA Controller

Post by scotws »

BigDumbDinosaur wrote:
There's more to it than just shuffling bytes around. A counter has to be maintained in RAM and the code driving the 'C02 has to also set up and manage a zero page pointer for addressing purposes. Also, the 'C02's stack is hard-wired to page one RAM. There's not enough of a gain to justify working out the issues.
So in the end, we're talking about a whole separate little computer that handles these things (which really sounds like a mainframe)?

Reading up on an older entry about parallel processing (viewtopic.php?f=1&t=217) makes me wonder about using some shared RAM instead -- the '816 has enough address space, after all. It could put the data in a shared area and then send the "I/O Processor" (say, a '02) some sort of message to the point of "move those bytes to the hard drive and confirm it worked after you are done". It wouldn't really be faster in the sense of bits written per second, especially with the communication overhead, but the '816 could get on with more noble things in the meantime than dealing with the hard drive.

Of course, you'd basically be building two computers instead of one, so that's a lot of issues :D.
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Re: Homebrew DMA Controller

Post by kc5tja »

scotws wrote:
Reading up on an older entry about parallel processing (viewtopic.php?f=1&t=217) makes me wonder about using some shared RAM instead
I don't think there's a need for this. Any NUMA architecture will give the illusion of shared memory anyway. And since the 65816 and 6502 are both so memory bound, launching a 6502 to operate on memory shared with the 65816 will, unless you limit yourself to just two processors and have a memory subsystem slow enough to exploit opposite clock phases, actually take longer than if the 65816 just did all the work itself.
scotws wrote:
message to the point of "move those bytes to the hard drive and confirm it worked after you are done". It wouldn't really be faster in the sense of bits written per second, especially with the communication overhead, but the '816 could get on with more noble things in the meantime than dealing with the hard drive.

Of course, you'd basically be building two computers instead of one, so that's a lot of issues :D.
This is how, in essence, the Commodore 80xx disk drives worked.

And when you think about it, Commodore's entire 8-bit line-up were really mainframe-y. The host computer would send a high-level instruction to the disk drive, which would interpret this command in a separate address space all-together and completely asynchronously to the host computer's operations. (It's rarely exploited, but it's entirely possible to do disk I/O and computation concurrently with the Commodore architecture.) And, in the case of PET's disk drives, you really did have two CPUs cooperating with each other on shared data structures in memory. The command-processor CPU would give low-level instructions to the drive controller CPU and exchange data via shared buffers.

This is, schematically, surprisingly close to channel I/O in IBM mainframes. The computer sends high-level I/O commands to a "controller," which then breaks those commands down to lower-level I/O functions for individual "units." I don't think this was an accident -- studying the capabilities of Commodore's 8-bit DOS, what with relative files and a flat namespace with long filenames, it's very clearly influenced by System/360 design principles.
fachat
Posts: 1124
Joined: 05 Jul 2005
Location: near Heidelberg, Germany
Contact:

Re: The secret, hidden, transparent 6502 DMA channel

Post by fachat »

I am not sure how much this DMA channel would bring on throughput. In my PET816 accelerator card I have the option to switch that DMA channel off - during valid accesses I run at the original 1MHz, but for an invalid access I speed up the CPU to 10MHz and squeeze the invalid cycle into Phi1. As long as the main machine accepts the address for the next valid access late enough this works (e.g. the Pet 8296 does, the non-CRTC Pet 3032 does not...)

http://www.6502.org/users/andre/adv65/pet816/index.html

On this gallery page I have a screenshot where I measured a loop with the "hiding of bogus cycles - hbogus" on and off. In this specific loop it "looks like" the machine is running with 1.7MHz in a 1MHz system.
http://www.6502.org/users/andre/adv65/p ... llery.html

For a "mainframe-like" system I would rather use Phi1/Phi2 interleave though.

Andre
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/
Post Reply