Cycle-steal DMA controller design
Cycle-steal DMA controller design
For those who have been following my "First Steps" in the Newbies thread, as well as read my Introduction post, my interest in the 6502 is mainly in the 65816, having done some simple SNES programming and looking at datasheets as a prerequisite to eventually build my own ham radio equipment.
More specifically, I'm interested in taking advantage of seemingly lesser-used features of the '816 and applying them to a new design- possibly the Emulation/Memory Lock lines, the ABORT signal... I feel as if the '816 is underutilized. The one underused feature that I want to tackle first for the '816 is that of Cycle-Steal DMA. I find the concept in and of itself very interesting (particularly taking full advantage of the bus bandwidth when the '816 is busy internally), and creating a DMA controller is something that I've wanted to do on an FPGA to signify that I fully understand the DMA process- not only for the '816, but for other architectures such as the IBM PC and 8237 controller. Reading some other threads on here, it appears I'm not alone in the interest.
At least as far as I can tell, there are no DMA controller ICs currently made suited for the '816 (although I think Zilog might make one still- have to check). I want to use this thread to throw out ideas and receive feedback for a cycle-steal DMA controller suited to the '816 that can be programmed using a CPLD or FPGA. I want to get started programming this thing before my simple development PCB with the '816 in "First Steps" is ready, so this thread will develop in tandem.
More specifically, I'm interested in taking advantage of seemingly lesser-used features of the '816 and applying them to a new design- possibly the Emulation/Memory Lock lines, the ABORT signal... I feel as if the '816 is underutilized. The one underused feature that I want to tackle first for the '816 is that of Cycle-Steal DMA. I find the concept in and of itself very interesting (particularly taking full advantage of the bus bandwidth when the '816 is busy internally), and creating a DMA controller is something that I've wanted to do on an FPGA to signify that I fully understand the DMA process- not only for the '816, but for other architectures such as the IBM PC and 8237 controller. Reading some other threads on here, it appears I'm not alone in the interest.
At least as far as I can tell, there are no DMA controller ICs currently made suited for the '816 (although I think Zilog might make one still- have to check). I want to use this thread to throw out ideas and receive feedback for a cycle-steal DMA controller suited to the '816 that can be programmed using a CPLD or FPGA. I want to get started programming this thing before my simple development PCB with the '816 in "First Steps" is ready, so this thread will develop in tandem.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Cycle-steal DMA controller design
It has been talked about here, and I'd like to see success in this area. Make sure you've read through:
The secret, hidden, transparent 6502 DMA channel
I couldn't answer your question in the "first steps" topic about the 8237 (and it looks like no one else did either), but it sure looks like it can't do the job on the 65xx bus which does so much in each cycle.
The secret, hidden, transparent 6502 DMA channel
I couldn't answer your question in the "first steps" topic about the 8237 (and it looks like no one else did either), but it sure looks like it can't do the job on the 65xx bus which does so much in each cycle.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Cycle-steal DMA controller design
cr1901 wrote:
For those who have been following my "First Steps" in the Newbies thread, as well as read my Introduction post, my interest in the 6502 is mainly in the 65816...I feel as if the '816 is underutilized. The one underused feature that I want to tackle first for the '816 is that of Cycle-Steal DMA...and creating a DMA controller is something that I've wanted to do on an FPGA to signify that I fully understand the DMA process- not only for the '816, but for other architectures such as the IBM PC and 8237 controller. Reading some other threads on here, it appears I'm not alone in the interest.
Quote:
At least as far as I can tell, there are no DMA controller ICs currently made suited for the '816...I want to use this thread to throw out ideas and receive feedback for a cycle-steal DMA controller suited to the '816 that can be programmed using a CPLD or FPGA. I want to get started programming this thing before my simple development PCB with the '816 in "First Steps" is ready, so this thread will develop in tandem.
The 8237 is an even poorer choice due to the greater difficulty of adapting it to a 65C816 bus cycle, as well as its apparent intolerance of 20 MHz operation. So it seems that if DMA is to come to the 65C816 it is going to have to be in the form of programmable logic.
It should be possible to realize a single channel DMAC in a CPLD with some adroit programming. However, I am doubtful of a successful cycle-steal design being possible. If you examine the detailed 65C816 operation tables in the data sheet (starting on page 38) you will see that many of the most commonly-used instructions do not have "dead" cycles. So while a cycle-steal design could be made to work, its real-time performance may be unsatisfactory (especially when I/O is involved) unless you are willing to have the DMAC become the bus master for the duration of the transfer process, once started during a dead cycle. If you are going to do that, you might as well simply make the DMAC the bus master at the time of setup and then let 'er rip until finished.
This latter scenario is not as impractical as it might seem. Assuming that the 65C816 is running at 20 MHz, is not being interrupted by anything and wait-stating is not required, the maximum rate at which it can copy data from memory to memory is 2,857,142 bytes per second by use of MVN or MVP—seven Ø2 cycles per byte. However, MVN/MVP aren't suitable for device I/O, as both source and destination addresses are incremented/decremented after each byte is copied. That obviously isn't going fly with a chip register.
On the other hand, a DMAC executing a series of load/store operations should be able to copy a byte in two Ø2 cycles, assuming Ø2 low continues to be used for address setup and Ø2 high for read/write. Such an arrangement is good for 10,000,000 bytes per second, assuming interrupts are temporarily ignored and wait-stating isn't required. Going any faster with this method would require that the bus speed be upped during DMA activity, which I think would be problematic to implement.
Another possible scenario would be the implementation of double-transition clocking. For example, the DMAC could read the source during Ø2 low and write the destination during Ø2 high. A maximum speed of 20,000,000 bytes per second would be achievable. However, doing so would require that the bus logic be "rewired" during DMA activity and then reverted when the MPU regains control of the buses. Timing would be very tight. Tricky, tricky...
In reality, the theoretical 10 MB/sec achievable with the DMAC using alternate load/store cycles is way more than adequate in a typical application. An entire bank of RAM could be copied/filled in about 6.55 milliseconds, assuming Ø2 is 20 MHz (the same operation using MVN/MVP would take about 46 milliseconds). However, such large amounts of data are seldom processed in most applications. More to the point, if the disk I/O subsystem is fetching a 1K block, DMA could deposit it into a buffer in about 102 microseconds. Assuming the system has a 100 Hz jiffy IRQ rate (as my POC unit does), the block transfer would occur in between successive IRQs, with plenty of room to spare. Sounds pretty practical to me.
As I said, I've only given this superficial attention...
x86? We ain't got no x86. We don't NEED no stinking x86!
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Cycle-steal DMA controller design
It would be great if someone would come up with a 65-family DMAC in CPLD to sell like Daryl sells his 65SPI IC. There wouldn't be enough sales for it to be a money-maker for anyone; but even for those of us in the industry, this forum is kind of a fun sideline educational support anyway, not directly tied to our living.
I don't think having a DMAC that only uses dead bus cycles would preclude also using MVN/MVP-type methods, would it? Couldn't both be done?
How about giving the register a range of addresses, so it doesn't "see" the incrementing of the address, since the lower bits are not connected in its address decoding. Then you could do MVP/MVN for, say 256 or 1024 bytes at a time.
Although I have no DMA experience at all, I can envision situations where the data are copied to or from something that's not on the bus, such that a complete transfer can be done in every cycle. This would be different from the situation of moving memory from one address to another where both are on the processor's bus.
Quote:
If you examine the detailed 65C816 operation tables in the data sheet (starting on page 38) you will see that many of the most commonly-used instructions do not have "dead" cycles. So while a cycle-steal design could be made to work, its real-time performance may be unsatisfactory (especially when I/O is involved) unless you are willing to have the DMAC become the bus master for the duration of the transfer process, once started during a dead cycle.
Quote:
However, MVN/MVP aren't suitable for device I/O, as both source and destination addresses are incremented/decremented after each byte is copied. That obviously isn't going fly with a chip register.
Quote:
Such an arrangement is good for 10,000,000 bytes per second
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Cycle-steal DMA controller design
Somehow one of my posts got deleted from this thread- it included a list of bullet points for consideration...
Let's see if I can recreate it:
I/O arbitration could work by including an 8237 DACK and DREQ equivalent, as well as VPA/VDA on I/O devices, combined with the appropriate logic gates... if DACK is de-asserted when VPA/VDA indicate "bus is free", the transfer is done, send IRQ to CPU to know data has come in.
Let's see if I can recreate it:
- The controller is cycle-steal, there should be enough instructions with dead cycles... and it's the easiest to support, since the '816 has the control signals for it built in.
- At the moment, I can only design it to work with the '816, since it has VDA/VPA outputs.
- Getting source value on the bus after VPA/VPA is valid (and "bus is free"), but before rising edge of PHI2 should be possible on FPGA.
- The controller should have control signals akin to other 65xx peripheral chips such as VIA and ACIA.
- Should support Block-to-Block, Block-to-Single (IO port), and Single (IO port)-to-Block.
- IRQ output should connect to processor signifying "End of transfer"
- Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.
I/O arbitration could work by including an 8237 DACK and DREQ equivalent, as well as VPA/VDA on I/O devices, combined with the appropriate logic gates... if DACK is de-asserted when VPA/VDA indicate "bus is free", the transfer is done, send IRQ to CPU to know data has come in.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Cycle-steal DMA controller design
cr1901 wrote:
Somehow one of my posts got deleted from this thread- it included a list of bullet points for consideration...
Quote:
Let's see if I can recreate it:
- [...]
g. Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.
I noticed the discrepancy in terminology too, between "cycle steal" and "transparent." I wonder if it's one of those terms that has migrated a bit over the years.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Cycle-steal DMA controller design
Given that the '816 still drives the address bus during the cycles that you want to steal, thus requiring conditional isolation of the rest of the system anyway, this looks very much like a modern "north bridge" design. In this case, if you have a mode which suspends the CPU in order to do a block transfer, why not just use wait-states and suspend DMA operation (releasing the wait-state) when an IRQ or NMI occurs?
Re: Cycle-steal DMA controller design
GARTHWILSON wrote:
cr1901 wrote:
Somehow one of my posts got deleted from this thread- it included a list of bullet points for consideration...
GARTHWILSON wrote:
Quote:
Let's see if I can recreate it:
- [...]
g. Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.
Last edited by cr1901 on Thu Mar 13, 2014 1:39 am, edited 1 time in total.
Re: Cycle-steal DMA controller design
Quote:
Given that the '816 still drives the address bus during the cycles that you want to steal, thus requiring conditional isolation of the rest of the system anyway, this looks very much like a modern "north bridge" design. In this case, if you have a mode which suspends the CPU in order to do a block transfer, why not just use wait-states and suspend DMA operation (releasing the wait-state) when an IRQ or NMI occurs?
As for using wait-states... that very well might work as well, but what I was trying to get at was that the cycle-steal mode can also emulate block-transfer mode if a software WAI is used (see my response to Garth), simplifying the DMA controller logic. I did a poor job explaining that though
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Cycle-steal DMA controller design
cr1901 wrote:
The controller is cycle-steal, there should be enough instructions with dead cycles... and it's the easiest to support, since the '816 has the control signals for it built in.
Quote:
At the moment, I can only design it to work with the '816, since it has VDA/VPA outputs.
Quote:
Getting source value on the bus after VPA/VPA is valid (and "bus is free"), but before rising edge of PHI2 should be possible on FPGA.
Quote:
The controller should have control signals akin to other 65xx peripheral chips such as VIA and ACIA.
Incidentally, using DMA on the 65xx peripheral silicon is of little value. None of these devices is fast enough to warrant the extra "cost" of DMAing their input and output. Also, these chips lack the requisite handshake signals that are needed to implement DMA. How do you intend to implement a /DACK and DREQ setup with, for example, the 65C22?
Quote:
Should support Block-to-Block, Block-to-Single (IO port), and Single (IO port)-to-Block.
Quote:
Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.
Quote:
I/O arbitration could work by including an 8237 DACK and DREQ equivalent, as well as VPA/VDA on I/O devices, combined with the appropriate logic gates... if DACK is de-asserted when VPA/VDA indicate "bus is free", the transfer is done, send IRQ to CPU to know data has come in.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Cycle-steal DMA controller design
Cannot respond to all points right now, but...
I wonder why the datasheet makes a point of talking about using VDA/VPA for transparent DMA then? That's what interests me the most actually.
That being said, one could write code in a manner to make the best use of dead cycles. Even if one didn't write code like that, the performance should still be more efficient than using '816 block instructions if only because in cycle-steal transparent DMA, a transfer and internal processing are occurring in parallel. The cycle-steal mode may not be appropriate for "hard real time" transfers, but I highly doubt it will cripple I/O transfers. Certainly it will reduce the bottleneck of the CPU waiting for I/O to finish.
Code is worth 1000 words...
The above code of should should be using atomic instructions at "lda dma_done", but this is for simplicity.
Quote:
However, many of the most-often used instructions do not have a dead cycle. I don't think you will get satisfactory performance using this method. Just an opinion.
That being said, one could write code in a manner to make the best use of dead cycles. Even if one didn't write code like that, the performance should still be more efficient than using '816 block instructions if only because in cycle-steal transparent DMA, a transfer and internal processing are occurring in parallel. The cycle-steal mode may not be appropriate for "hard real time" transfers, but I highly doubt it will cripple I/O transfers. Certainly it will reduce the bottleneck of the CPU waiting for I/O to finish.
Quote:
So if your WAIt is broken by an interrupt unrelated to DMAC activity, how would the foreground task know that it is to just go back to WAIting?
Code: Select all
;...Assume IRQ vector points here
irq_vec:
lda dma_status ;Load status register
cmp #DONE_STATUS ;IRQ came from DMA if DONE
bne dma_not_done ;If done, store
lda #1
sta dma_done ;DMA is done flag
dma_not_done:
;Process other IRQ sources
;... Other code, somewhere else
;Request data from SCSI device or something...
sta start_xfer ;start xfer I/O port for sample device... will trigger DMA
jsl DMA_in_progress
DMA_in_progress:
wai ;Wait until interrupt
lda dma_done
cmp #DONE_STATUS
bne DMA_in_progress
sei
stz dma_done ;Make sure dma_done is reset- critical region
cli
rtl
;Do something with data stored in buffer
;etc etc
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Cycle-steal DMA controller design
Even if transparent DMA is not the hottest-performing, one attraction I see is that it does not upset the timing of other operations you may have going on in a realtime system with hard deadlines.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Cycle-steal DMA controller design
Quote:
Anything that can respond in 15ns or less will work.
Re: Cycle-steal DMA controller design
Preliminary Pinouts...
44-PLCC CPLD or FPGA @3.3Volts*
Any pin which can serve as an input requires a level shifter. Outputs do not require a level shifter (as 3.3V is valid TTL).
VSS- Self explanatory.
VDD- Self explanatory.
A0-A23- Address Line Pins. Outputs except A0-A3. A0-A3 are inputs when VDA != 0 || VPA != 0, and access internal registers. Outputs when VDA == 0 && VPA == 0. A0-A3 require level shifter.
D0-D7- Bidirectional Data Lines. Inputs when VDA != 0 || VPA != 0, bidirectional when VDA == 0 && VPA == 0. Requires level shifter.**
PHI2- Clock. Input. Requires level shifter.
VDA- Gets current bus activity from processor. Input. Requires level shifter.
VPA/SYNC- Gets current bus activity from processor. VPA is for '816, SYNC is for '02. Input. Requires level shifter.
IRQB- Output. Send IRQ to processor to signify end of XFER
DREQ0-DREQ1- DMA Request from I/O. DREQ0 takes priority. Inputs. Requires level shifter.
DACK0-DACK1- DMA Acknowledge to A/O. Outputs.
RWB/CE- Read/Write or Chip Enable (Combined to one pin). Chip Enable input when VDA != 0 || VPA != 0, Read/Write outputs when VDA == 0 && VPA == 0. Requires level shifter.
That equals 44 pins there... I can combine some more pins if needed though.
Registers (Unlike other designs, all registers can be read):
Status Register (Pending Xfer? Current Xfer? Done? Bus Error***? 8-bits)
Count Register Total*2 (how many bytes to transfer for each channel? 16-bits each)
Count Register Current*2 (how many bytes are left to transfer for each channel? 16-bits each)
Command Register (Start Block-to-Block Xfer? Stop current xfer?)
Bank Register SRC/DEST (8-bits)
Address Start SRC/DEST (16-bits)
Mode Register (Transparent-816, Standard-816, Cycle-steal-02, Standard-02)
More to follow... this will be built incrementally... so this is a good starting point.
*For forward compatibility, since 5Volt parts aren't going to be around for much longer :/. A big fat level shifter should work nicely though
. Now to find a good one...
**Some DMA controllers, such as the 8237, can do single-to-block transfers without an intermediate buffer... I'm not 100% sure how this works, but I'll read some data sheets to find out.
***Not sure how to detect this, but... the 68k has a bus error vector, so there MUST be a digital way to detect an inconsistent state.
Any feedback on what I've suggested so far is appreciated. Standard DMA can be emulated from this transparent design using a software loop. Or perhaps I'll put it in as an alternate mode of xfer for SCSI devices
.
I'm considering just buying a soft core from WDC for the time being for my dev board to get working on this... I wonder if they will sell to hobbyist individuals...
44-PLCC CPLD or FPGA @3.3Volts*
Any pin which can serve as an input requires a level shifter. Outputs do not require a level shifter (as 3.3V is valid TTL).
VSS- Self explanatory.
VDD- Self explanatory.
A0-A23- Address Line Pins. Outputs except A0-A3. A0-A3 are inputs when VDA != 0 || VPA != 0, and access internal registers. Outputs when VDA == 0 && VPA == 0. A0-A3 require level shifter.
D0-D7- Bidirectional Data Lines. Inputs when VDA != 0 || VPA != 0, bidirectional when VDA == 0 && VPA == 0. Requires level shifter.**
PHI2- Clock. Input. Requires level shifter.
VDA- Gets current bus activity from processor. Input. Requires level shifter.
VPA/SYNC- Gets current bus activity from processor. VPA is for '816, SYNC is for '02. Input. Requires level shifter.
IRQB- Output. Send IRQ to processor to signify end of XFER
DREQ0-DREQ1- DMA Request from I/O. DREQ0 takes priority. Inputs. Requires level shifter.
DACK0-DACK1- DMA Acknowledge to A/O. Outputs.
RWB/CE- Read/Write or Chip Enable (Combined to one pin). Chip Enable input when VDA != 0 || VPA != 0, Read/Write outputs when VDA == 0 && VPA == 0. Requires level shifter.
That equals 44 pins there... I can combine some more pins if needed though.
Registers (Unlike other designs, all registers can be read):
Status Register (Pending Xfer? Current Xfer? Done? Bus Error***? 8-bits)
Count Register Total*2 (how many bytes to transfer for each channel? 16-bits each)
Count Register Current*2 (how many bytes are left to transfer for each channel? 16-bits each)
Command Register (Start Block-to-Block Xfer? Stop current xfer?)
Bank Register SRC/DEST (8-bits)
Address Start SRC/DEST (16-bits)
Mode Register (Transparent-816, Standard-816, Cycle-steal-02, Standard-02)
More to follow... this will be built incrementally... so this is a good starting point.
*For forward compatibility, since 5Volt parts aren't going to be around for much longer :/. A big fat level shifter should work nicely though
**Some DMA controllers, such as the 8237, can do single-to-block transfers without an intermediate buffer... I'm not 100% sure how this works, but I'll read some data sheets to find out.
***Not sure how to detect this, but... the 68k has a bus error vector, so there MUST be a digital way to detect an inconsistent state.
Any feedback on what I've suggested so far is appreciated. Standard DMA can be emulated from this transparent design using a software loop. Or perhaps I'll put it in as an alternate mode of xfer for SCSI devices
I'm considering just buying a soft core from WDC for the time being for my dev board to get working on this... I wonder if they will sell to hobbyist individuals...
Last edited by cr1901 on Fri Mar 14, 2014 4:53 am, edited 2 times in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Cycle-steal DMA controller design
Quote:
44-PLCC CPLD or FPGA @3.3Volts*
Any pin which can serve as an input requires a level shifter. Outputs do not require a level shifter (as 3.3V is valid TTL).
Any pin which can serve as an input requires a level shifter. Outputs do not require a level shifter (as 3.3V is valid TTL).
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?