6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 9:20 pm

All times are UTC




Post new topic Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
 Post subject:
PostPosted: Tue Mar 23, 2010 7:15 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Not a problem -- just wanted to make sure, since we have the two threads already, that they stay relatively on topic. Otherwise, detangling the mess of threads gets horrifyingly confusing.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 01, 2012 9:59 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Would any of our programmable-logic people be interested in taking up the project of making this kind of DMA controller for the 65816? It should be pretty simple, at least compared to doing it for the 6502.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 02, 2012 3:52 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA
GARTHWILSON wrote:
Would any of our programmable-logic people be interested in taking up the project of making this kind of DMA controller for the 65816? It should be pretty simple, at least compared to doing it for the 6502.

Which kind? The topic thread discusses several approaches, of which use of MLB, VDA and VPA would seem most applicable to the '816.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 02, 2012 4:00 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Hello again. Just poking in to contribute here, since this is more or less relevant to my FPGA activities right now. What isn't made clear here is what the DMA controller will be used for. Is this intended for use with a generic hardware interface of some kind, sort of like the DMA channels on the PC/XT and ISA buses? Or, will this entail a complete bus master implementation? The answer will determine how to go about building such a controller.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 02, 2012 5:04 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA
kc5tja wrote:
Hello again. Just poking in to contribute here, since this is more or less relevant to my FPGA activities right now. What isn't made clear here is what the DMA controller will be used for. Is this intended for use with a generic hardware interface of some kind, sort of like the DMA channels on the PC/XT and ISA buses? Or, will this entail a complete bus master implementation? The answer will determine how to go about building such a controller.

Howdy, Samuel. In my area of interest, I would entertain the idea of it acting as a bus master. My specific application would be to enhance SCSI I/O throughput. I'm currently simulating the effect of a DMA controller by having the MPU manipulate the 53C94's /DACK input and polling the chip's DREQ output. Although this "DMA simulation" adds cycles to the basic I/O loop, it still runs pretty fast—nearly 500KB/second on burst transfers of 48KB. However, a DMA controller that can act as a bus master could easily achieve 6-8 times that performance, nearly equal to that of the SCSI bus' throughput in synchronous mode.

At one time I had given some thought to the idea of rigging up a 65C02 to act as a DMA controller, with the 'C94's DREQ output tied to the 'C02's IRQB input and the latter sitting on a WAI instruction with IRQs disabled, which results in a one cycle response time to the 'C94 saying it has, or is ready for, data. I abandoned the idea after doing some cycle counts—it would have been, at best, twice as fast as the current method, being limited by the 'C02's ability to read and write memory.

In my opinion, any DMA controller intended to run with the '816 would have to effectively transfer a byte per 2-3 clock cycles to be worth the bother. The throughput with such a device could reach 6-10 MB/second, certainly nothing to sneeze at. I suppose with suitably fast silicon, a byte per clock cycle would be feasible, something that I would be very enthusiastic about. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 05, 2013 10:26 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Just curious, has anybody done anymore work in this direction? Having a '816 as the main processor and a 'C02 as an I/O co-processor sounds interesting (and slightly mainframe-y).


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 06, 2013 12:07 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Attachment:
KK microcode addressing (big).gif
KK microcode addressing (big).gif [ 9.44 KiB | Viewed 1139 times ]
scotws wrote:
Just curious, has anybody done anymore work in this direction?
Working separately, the OP and I both designed hardware that captures the opcode present on the data bus when SYNC is high. This (and a cycle counter and ROM) can yield info regarding what the CPU is doing on any given bus cycle.

Jorge wanted that information so he could identify 6502 dead bus cyles, and reallocate them for DMA. (An alternative approach is to use an '816 instead. '816 unused bus cycles are trivially easy to identify; just look for VPA and VDA simultaneously low.)

I wanted that information so I could build a microcoded exoskeleton that expands the 65C02 architecture with new instructions and registers. The misleadingly named "KimKlone" project was successfully completed, and is documented on my web site with a short summary as well as a detailed description. There's a 6502.org thread about the KimKlone here. And yes, a modern-day alternative is to use an '816 instead! :D

The link Jorge used in his lead post is now defunct, but what he showed was similar to the diagram above. Wanna try doing something like this yourself? There's a potential problem regarding interrupts. Surprisingly, perhaps, it's possible for an opcode to be fetched yet not executed. See the comment about the XOR gate in this post.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 06, 2013 4:56 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
scotws wrote:
Having a '816 as the main processor and a 'C02 as an I/O co-processor sounds interesting
Two scoops of 65xx goodness in one machine? Positively intoxicating! :mrgreen:

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Homebrew DMA Controller
PostPosted: Wed Nov 06, 2013 5:21 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA
Although a DMA controller to speed up SCSI transfers in my POC unit is not real high on my computing bucket list, I do continue think about it.

To recapitulate, one of the ideas I have mulled is that of using a 65C02 in that role, taking advantage of the SEI -- WAI method to elicit single-cycle response to a DREQ from the 53C94. DREQ (high true) would be connected through an inverter to IRQB on the 65C02. When the 'C94 was ready for data, it asserts DREQ, which would awaken the 65C02 in one clock cycle. The 'C02 would fetch the byte, store it and then toggle /DACK on the 'C94 to indicate that the byte had been read. Similarly, during a write cycle, DREQ would tell the 'C02 that the 'C94 is ready for data. The 'C02 would store a byte into the 'C94's FIFO and then toggle /DACK to tell the 'C94 about it.

I did a cycle count on this and determined that it would be roughly twice as fast as my current method of using the 65C816 as a pseudo-DMA controller. There's more to it than just shuffling bytes around. A counter has to be maintained in RAM and the code driving the 'C02 has to also set up and manage a zero page pointer for addressing purposes. Also, the 'C02's stack is hard-wired to page one RAM. There's not enough of a gain to justify working out the issues.

It could probably be done with a second 65C816 acting as a "DMA controller." Issues of memory management and addressing would be lessened, especially since 16 bit indexing could be done sans ZP pointers. However, the performance improvement isn't going to be much better than by using a 'C02.

The likely solution will come from programming a CPLD to act as a DMA controller. This gets into the design of state machines, since counters and memory load/store operations have to be implemented in a cyclic fashion. As a 65xx data bus is valid only during Ø2 high, it seems that such a device could copy a byte in two cycles. That would, in theory, produce a transfer rate of 10MB/sec with a 65C816 running at maximum clock speed. Compare that with the speed of MVN or MVP, either of which can move a byte in seven cycles, amounting to a theoretical transfer rate of 2.8MB/sec with a 20 MHz Ø2 clock.

I can do combinatorial and register logic in a CPLD, but have yet to work on doing a state machine. However, I do have a picture of the sequence of events that would have to occur.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 06, 2013 5:24 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA
Dr Jefyll wrote:
scotws wrote:
Having a '816 as the main processor and a 'C02 as an I/O co-processor sounds interesting
Two scoops of 65xx goodness in one machine? Positively intoxicating! :mrgreen:

May we be expecting a NUMA design featuring multiple '816s soon from Stratford, ON? :lol:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Thu Nov 07, 2013 4:44 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 07, 2013 5:20 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigDumbDinosaur wrote:
May we be expected a NUMA design featuring multiple '816s soon from Stratford, ON? :lol:
Actually, what's more likely is another coprocessor, but only remotely similar to the KK's, and applicable to '816 as well as 'c02. It'd offer some 6809-like address modes, and a partial hardware assist for floating-point calculations done in software.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 07, 2013 4:46 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA
Dr Jefyll wrote:
BigDumbDinosaur wrote:
May we be expected a NUMA design featuring multiple '816s soon from Stratford, ON? :lol:
Actually, what's more likely is another coprocessor, but only remotely similar to the KK's, and applicable to '816 as well as 'c02. It'd offer some 6809-like address modes, and a partial hardware assist for floating-point calculations done in software.

If someone were to come up with floating point hardware that would be 65xx bus-compatible I'd go for it.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 07, 2013 8:42 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
BigDumbDinosaur wrote:
There's more to it than just shuffling bytes around. A counter has to be maintained in RAM and the code driving the 'C02 has to also set up and manage a zero page pointer for addressing purposes. Also, the 'C02's stack is hard-wired to page one RAM. There's not enough of a gain to justify working out the issues.

So in the end, we're talking about a whole separate little computer that handles these things (which really sounds like a mainframe)?

Reading up on an older entry about parallel processing (viewtopic.php?f=1&t=217) makes me wonder about using some shared RAM instead -- the '816 has enough address space, after all. It could put the data in a shared area and then send the "I/O Processor" (say, a '02) some sort of message to the point of "move those bytes to the hard drive and confirm it worked after you are done". It wouldn't really be faster in the sense of bits written per second, especially with the communication overhead, but the '816 could get on with more noble things in the meantime than dealing with the hard drive.

Of course, you'd basically be building two computers instead of one, so that's a lot of issues :D.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 07, 2013 8:55 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
scotws wrote:
Reading up on an older entry about parallel processing (viewtopic.php?f=1&t=217) makes me wonder about using some shared RAM instead


I don't think there's a need for this. Any NUMA architecture will give the illusion of shared memory anyway. And since the 65816 and 6502 are both so memory bound, launching a 6502 to operate on memory shared with the 65816 will, unless you limit yourself to just two processors and have a memory subsystem slow enough to exploit opposite clock phases, actually take longer than if the 65816 just did all the work itself.

scotws wrote:
message to the point of "move those bytes to the hard drive and confirm it worked after you are done". It wouldn't really be faster in the sense of bits written per second, especially with the communication overhead, but the '816 could get on with more noble things in the meantime than dealing with the hard drive.

Of course, you'd basically be building two computers instead of one, so that's a lot of issues :D.


This is how, in essence, the Commodore 80xx disk drives worked.

And when you think about it, Commodore's entire 8-bit line-up were really mainframe-y. The host computer would send a high-level instruction to the disk drive, which would interpret this command in a separate address space all-together and completely asynchronously to the host computer's operations. (It's rarely exploited, but it's entirely possible to do disk I/O and computation concurrently with the Commodore architecture.) And, in the case of PET's disk drives, you really did have two CPUs cooperating with each other on shared data structures in memory. The command-processor CPU would give low-level instructions to the drive controller CPU and exchange data via shared buffers.

This is, schematically, surprisingly close to channel I/O in IBM mainframes. The computer sends high-level I/O commands to a "controller," which then breaks those commands down to lower-level I/O functions for individual "units." I don't think this was an accident -- studying the capabilities of Commodore's 8-bit DOS, what with relative files and a flat namespace with long filenames, it's very clearly influenced by System/360 design principles.


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 09, 2013 8:54 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
I am not sure how much this DMA channel would bring on throughput. In my PET816 accelerator card I have the option to switch that DMA channel off - during valid accesses I run at the original 1MHz, but for an invalid access I speed up the CPU to 10MHz and squeeze the invalid cycle into Phi1. As long as the main machine accepts the address for the next valid access late enough this works (e.g. the Pet 8296 does, the non-CRTC Pet 3032 does not...)

http://www.6502.org/users/andre/adv65/pet816/index.html

On this gallery page I have a screenshot where I measured a loop with the "hiding of bogus cycles - hbogus" on and off. In this specific loop it "looks like" the machine is running with 1.7MHz in a 1MHz system.
http://www.6502.org/users/andre/adv65/p ... llery.html

For a "mainframe-like" system I would rather use Phi1/Phi2 interleave though.

Andre

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron