6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 3:39 am

All times are UTC




Post new topic Reply to topic  [ 49 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Wed Mar 12, 2014 11:36 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
For those who have been following my "First Steps" in the Newbies thread, as well as read my Introduction post, my interest in the 6502 is mainly in the 65816, having done some simple SNES programming and looking at datasheets as a prerequisite to eventually build my own ham radio equipment.

More specifically, I'm interested in taking advantage of seemingly lesser-used features of the '816 and applying them to a new design- possibly the Emulation/Memory Lock lines, the ABORT signal... I feel as if the '816 is underutilized. The one underused feature that I want to tackle first for the '816 is that of Cycle-Steal DMA. I find the concept in and of itself very interesting (particularly taking full advantage of the bus bandwidth when the '816 is busy internally), and creating a DMA controller is something that I've wanted to do on an FPGA to signify that I fully understand the DMA process- not only for the '816, but for other architectures such as the IBM PC and 8237 controller. Reading some other threads on here, it appears I'm not alone in the interest.

At least as far as I can tell, there are no DMA controller ICs currently made suited for the '816 (although I think Zilog might make one still- have to check). I want to use this thread to throw out ideas and receive feedback for a cycle-steal DMA controller suited to the '816 that can be programmed using a CPLD or FPGA. I want to get started programming this thing before my simple development PCB with the '816 in "First Steps" is ready, so this thread will develop in tandem.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 12, 2014 1:33 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
It has been talked about here, and I'd like to see success in this area. Make sure you've read through:
The secret, hidden, transparent 6502 DMA channel
I couldn't answer your question in the "first steps" topic about the 8237 (and it looks like no one else did either), but it sure looks like it can't do the job on the 65xx bus which does so much in each cycle.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 12, 2014 8:38 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA
cr1901 wrote:
For those who have been following my "First Steps" in the Newbies thread, as well as read my Introduction post, my interest in the 6502 is mainly in the 65816...I feel as if the '816 is underutilized. The one underused feature that I want to tackle first for the '816 is that of Cycle-Steal DMA...and creating a DMA controller is something that I've wanted to do on an FPGA to signify that I fully understand the DMA process- not only for the '816, but for other architectures such as the IBM PC and 8237 controller. Reading some other threads on here, it appears I'm not alone in the interest.

I've been mulling DMA for the '816 on and off for a while, mostly in the context of trying to up the raw SCSI transfer rate in POC V1.1. However, DMA is fairly low right now on the "bucket list."

Quote:
At least as far as I can tell, there are no DMA controller ICs currently made suited for the '816...I want to use this thread to throw out ideas and receive feedback for a cycle-steal DMA controller suited to the '816 that can be programmed using a CPLD or FPGA. I want to get started programming this thing before my simple development PCB with the '816 in "First Steps" is ready, so this thread will develop in tandem.

I too have looked at commercially available DMA controllers (DMAC) but have concluded that nothing that is readily available is adaptable to the '816 buses without a lot of hoop-jumping. The Motorola 68440 is probably the closest match and could be made to work but would require a fair amount of glue logic to generate the MC68000 type signals that are needed. Also, it doesn't appear that an MC68440 exists that can run at the 65C816's maximum theoretical Ø2 rate of 20 MHz.

The 8237 is an even poorer choice due to the greater difficulty of adapting it to a 65C816 bus cycle, as well as its apparent intolerance of 20 MHz operation. So it seems that if DMA is to come to the 65C816 it is going to have to be in the form of programmable logic.

It should be possible to realize a single channel DMAC in a CPLD with some adroit programming. However, I am doubtful of a successful cycle-steal design being possible. If you examine the detailed 65C816 operation tables in the data sheet (starting on page 38) you will see that many of the most commonly-used instructions do not have "dead" cycles. So while a cycle-steal design could be made to work, its real-time performance may be unsatisfactory (especially when I/O is involved) unless you are willing to have the DMAC become the bus master for the duration of the transfer process, once started during a dead cycle. If you are going to do that, you might as well simply make the DMAC the bus master at the time of setup and then let 'er rip until finished.

This latter scenario is not as impractical as it might seem. Assuming that the 65C816 is running at 20 MHz, is not being interrupted by anything and wait-stating is not required, the maximum rate at which it can copy data from memory to memory is 2,857,142 bytes per second by use of MVN or MVP—seven Ø2 cycles per byte. However, MVN/MVP aren't suitable for device I/O, as both source and destination addresses are incremented/decremented after each byte is copied. That obviously isn't going fly with a chip register.

On the other hand, a DMAC executing a series of load/store operations should be able to copy a byte in two Ø2 cycles, assuming Ø2 low continues to be used for address setup and Ø2 high for read/write. Such an arrangement is good for 10,000,000 bytes per second, assuming interrupts are temporarily ignored and wait-stating isn't required. Going any faster with this method would require that the bus speed be upped during DMA activity, which I think would be problematic to implement.

Another possible scenario would be the implementation of double-transition clocking. For example, the DMAC could read the source during Ø2 low and write the destination during Ø2 high. A maximum speed of 20,000,000 bytes per second would be achievable. However, doing so would require that the bus logic be "rewired" during DMA activity and then reverted when the MPU regains control of the buses. Timing would be very tight. Tricky, tricky...

In reality, the theoretical 10 MB/sec achievable with the DMAC using alternate load/store cycles is way more than adequate in a typical application. An entire bank of RAM could be copied/filled in about 6.55 milliseconds, assuming Ø2 is 20 MHz (the same operation using MVN/MVP would take about 46 milliseconds). However, such large amounts of data are seldom processed in most applications. More to the point, if the disk I/O subsystem is fetching a 1K block, DMA could deposit it into a buffer in about 102 microseconds. Assuming the system has a 100 Hz jiffy IRQ rate (as my POC unit does), the block transfer would occur in between successive IRQs, with plenty of room to spare. Sounds pretty practical to me.

As I said, I've only given this superficial attention...

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 12, 2014 9:04 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
It would be great if someone would come up with a 65-family DMAC in CPLD to sell like Daryl sells his 65SPI IC. There wouldn't be enough sales for it to be a money-maker for anyone; but even for those of us in the industry, this forum is kind of a fun sideline educational support anyway, not directly tied to our living.

Quote:
If you examine the detailed 65C816 operation tables in the data sheet (starting on page 38) you will see that many of the most commonly-used instructions do not have "dead" cycles. So while a cycle-steal design could be made to work, its real-time performance may be unsatisfactory (especially when I/O is involved) unless you are willing to have the DMAC become the bus master for the duration of the transfer process, once started during a dead cycle.

I don't think having a DMAC that only uses dead bus cycles would preclude also using MVN/MVP-type methods, would it? Couldn't both be done?

Quote:
However, MVN/MVP aren't suitable for device I/O, as both source and destination addresses are incremented/decremented after each byte is copied. That obviously isn't going fly with a chip register.

How about giving the register a range of addresses, so it doesn't "see" the incrementing of the address, since the lower bits are not connected in its address decoding. Then you could do MVP/MVN for, say 256 or 1024 bytes at a time.

Quote:
Such an arrangement is good for 10,000,000 bytes per second

Although I have no DMA experience at all, I can envision situations where the data are copied to or from something that's not on the bus, such that a complete transfer can be done in every cycle. This would be different from the situation of moving memory from one address to another where both are on the processor's bus.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 12, 2014 10:50 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Somehow one of my posts got deleted from this thread- it included a list of bullet points for consideration...

Let's see if I can recreate it:
  1. The controller is cycle-steal, there should be enough instructions with dead cycles... and it's the easiest to support, since the '816 has the control signals for it built in.
  2. At the moment, I can only design it to work with the '816, since it has VDA/VPA outputs.
  3. Getting source value on the bus after VPA/VPA is valid (and "bus is free"), but before rising edge of PHI2 should be possible on FPGA.
  4. The controller should have control signals akin to other 65xx peripheral chips such as VIA and ACIA.
  5. Should support Block-to-Block, Block-to-Single (IO port), and Single (IO port)-to-Block.
  6. IRQ output should connect to processor signifying "End of transfer"
  7. Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.

First off, it appears the datasheet/wikipedia page may be using the wrong terminology for "cycle steal"... apparently, according to the Wikipedia page on DMA, cycle steal implies that the DMA controller forcefully steals the bus, a la a mode on the 8237, which keeps reasserting HRQ every byte (the CPU won't acknowledge HRQ until the next bus cycle, so work still gets done). I'm guessing what the '816 literature means by "cycle steal" is what the Wikipedia page calls "transparent", where there is no bus arbitration, and the DMA controller just uses the CPU when the controller is told that no one has control of the bus.

I/O arbitration could work by including an 8237 DACK and DREQ equivalent, as well as VPA/VDA on I/O devices, combined with the appropriate logic gates... if DACK is de-asserted when VPA/VDA indicate "bus is free", the transfer is done, send IRQ to CPU to know data has come in.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 12:10 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
cr1901 wrote:
Somehow one of my posts got deleted from this thread- it included a list of bullet points for consideration...

Are you sure you finished posting it? I never saw it, and I stay on top of things here pretty well. I have, in the past, clicked "Preview" and then moved on, forgetting to click "Submit."

Quote:
Let's see if I can recreate it:
    [...]
    g. Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.

I'm not sure what you mean here, unless you just mean that letter f's IRQ gets the processor going again. The WAIt instruction puts the processor to sleep, waiting for an interrupt. It won't be executing a loop, because it will be sleeping after hitting WAI the first time.

I noticed the discrepancy in terminology too, between "cycle steal" and "transparent." I wonder if it's one of those terms that has migrated a bit over the years.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 12:16 am 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
Given that the '816 still drives the address bus during the cycles that you want to steal, thus requiring conditional isolation of the rest of the system anyway, this looks very much like a modern "north bridge" design. In this case, if you have a mode which suspends the CPU in order to do a block transfer, why not just use wait-states and suspend DMA operation (releasing the wait-state) when an IRQ or NMI occurs?


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 1:25 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
GARTHWILSON wrote:
cr1901 wrote:
Somehow one of my posts got deleted from this thread- it included a list of bullet points for consideration...

Are you sure you finished posting it? I never saw it, and I stay on top of things here pretty well. I have, in the past, clicked "Preview" and then moved on, forgetting to click "Submit."

Would not be surprised if I did just that- I was running on little sleep.

GARTHWILSON wrote:
Quote:
Let's see if I can recreate it:
    [...]
    g. Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.

I'm not sure what you mean here, unless you just mean that letter f's IRQ gets the processor going again. The WAIt instruction puts the processor to sleep, waiting for an interrupt. It won't be executing a loop, because it will be sleeping after hitting WAI the first time.

IRQs can come from multiple sources, so the CPU would need to check whether the IRQ actually came from the DMA controller... if it didn't, service the interrupt, and WAI again in a loop. There would be a status register in the DMA controller to indicate "done xfer".


Last edited by cr1901 on Thu Mar 13, 2014 1:39 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 1:30 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Quote:
Given that the '816 still drives the address bus during the cycles that you want to steal, thus requiring conditional isolation of the rest of the system anyway, this looks very much like a modern "north bridge" design. In this case, if you have a mode which suspends the CPU in order to do a block transfer, why not just use wait-states and suspend DMA operation (releasing the wait-state) when an IRQ or NMI occurs?

Not sure if I understand what you mean by your first sentence- the DMA controller would assert BE to tristate the '816 bus (which is asynchronous, thankfully). I'm not sure how a modern north-bridge chipset works other than it controls DDRx memory, so I can't comment.

As for using wait-states... that very well might work as well, but what I was trying to get at was that the cycle-steal mode can also emulate block-transfer mode if a software WAI is used (see my response to Garth), simplifying the DMA controller logic. I did a poor job explaining that though :P.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 5:10 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA
cr1901 wrote:
The controller is cycle-steal, there should be enough instructions with dead cycles... and it's the easiest to support, since the '816 has the control signals for it built in.[/list]

However, many of the most-often used instructions do not have a dead cycle. I don't think you will get satisfactory performance using this method. Just an opinion.

Quote:
At the moment, I can only design it to work with the '816, since it has VDA/VPA outputs.

DMA is certainly possible with the 65C02. However, the 'C02 doesn't have dead cycles in the same sense as that of the '816. Best you can do is watch SYNC so you know when the opcode of the next instruction is about to be fetched. That would be the time for control to be given to the DMAC.

Quote:
Getting source value on the bus after VPA/VPA is valid (and "bus is free"), but before rising edge of PHI2 should be possible on FPGA.

Anything that can respond in 15ns or less will work.

Quote:
The controller should have control signals akin to other 65xx peripheral chips such as VIA and ACIA.

I completely disagree with this premise. When the DMAC is the bus master it should generate control signals that are analogs of the MPU's control signals, as that is what RAM, ROM and I/O are expecting to see. The DMAC is just a specialized processor whose claim to fame is the ability to rapidly copy bytes from point A to point B. In doing that, it has to do what the '816 would do, and that is generate addresses, manipulate RWB, read or write, etc.

Incidentally, using DMA on the 65xx peripheral silicon is of little value. None of these devices is fast enough to warrant the extra "cost" of DMAing their input and output. Also, these chips lack the requisite handshake signals that are needed to implement DMA. How do you intend to implement a /DACK and DREQ setup with, for example, the 65C22?

Quote:
Should support Block-to-Block, Block-to-Single (IO port), and Single (IO port)-to-Block.

Also useful would be a block-fill, in which the DMAC is told to fill the selected range with a specific byte value. However, that can be simulated by writing the byte into the first location in a range and then performing a single-to-block copy. I use a similar method with MVN to implement block-fill and block-clear in my software.

Quote:
Standard DMA can be emulated (more or less) using WAI loop which checks that DMA is done.

I'm not sure I'm following your thinking on this. First off, WAI is not a "loop." As Garth said, WAI actually stops the MPU by halting the internal clock in the high condition. There is no processing going on at this point. An ABORT, IRQ or NMI will terminate WAI and restart the '816, causing it to execute the next instruction. You can't set conditions for WAIting. So if your WAIt is broken by an interrupt unrelated to DMAC activity, how would the foreground task know that it is to just go back to WAIting?

Quote:
I/O arbitration could work by including an 8237 DACK and DREQ equivalent, as well as VPA/VDA on I/O devices, combined with the appropriate logic gates... if DACK is de-asserted when VPA/VDA indicate "bus is free", the transfer is done, send IRQ to CPU to know data has come in.

I still think you're making it too convoluted trying to sneak in during dead cycles—and the 8237 is hardly a good model to follow. In any case, if your DMAC uses alternate clock cycles to load/store, it will handily outrun the MPU. Greatest efficiency would be achieved by making the DMAC bus master for the duration of the transfer. Anything else will probably drag down the system during I/O activity, which is where DMA is best applied.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 7:45 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Cannot respond to all points right now, but...
Quote:
However, many of the most-often used instructions do not have a dead cycle. I don't think you will get satisfactory performance using this method. Just an opinion.

I wonder why the datasheet makes a point of talking about using VDA/VPA for transparent DMA then? That's what interests me the most actually.

That being said, one could write code in a manner to make the best use of dead cycles. Even if one didn't write code like that, the performance should still be more efficient than using '816 block instructions if only because in cycle-steal transparent DMA, a transfer and internal processing are occurring in parallel. The cycle-steal mode may not be appropriate for "hard real time" transfers, but I highly doubt it will cripple I/O transfers. Certainly it will reduce the bottleneck of the CPU waiting for I/O to finish.

Quote:
So if your WAIt is broken by an interrupt unrelated to DMAC activity, how would the foreground task know that it is to just go back to WAIting?

Code is worth 1000 words...
Code:
;...Assume IRQ vector points here
irq_vec:
   lda dma_status ;Load status register
   cmp #DONE_STATUS ;IRQ came from DMA if DONE
   bne dma_not_done ;If done, store
   lda #1
   sta dma_done ;DMA is done flag

dma_not_done:
   ;Process other IRQ sources


;... Other code, somewhere else
;Request data from SCSI device or something...
sta start_xfer ;start xfer I/O port for sample device... will trigger DMA
jsl DMA_in_progress

DMA_in_progress:
   wai ;Wait until interrupt
   lda dma_done
   cmp #DONE_STATUS
   bne DMA_in_progress
   sei
   stz dma_done ;Make sure dma_done is reset- critical region
   cli
   rtl

;Do something with data stored in buffer
;etc etc



The above code of should should be using atomic instructions at "lda dma_done", but this is for simplicity.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 8:05 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Even if transparent DMA is not the hottest-performing, one attraction I see is that it does not upset the timing of other operations you may have going on in a realtime system with hard deadlines.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 13, 2014 8:44 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Quote:
Anything that can respond in 15ns or less will work.

According to the data sheet, at 14Mhz, VDA/VPA is valid after a maximum of 30ns after the falling edge of PHI2 (tADS)... the clock will rise 35ns after the falling edge... that's 5ns prepare the address on the bus to be read! Not enough time... unless, I'm not reading the datasheet correctly, which is always probable.


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 14, 2014 4:30 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Preliminary Pinouts...

44-PLCC CPLD or FPGA @3.3Volts*
Any pin which can serve as an input requires a level shifter. Outputs do not require a level shifter (as 3.3V is valid TTL).

VSS- Self explanatory.
VDD- Self explanatory.
A0-A23- Address Line Pins. Outputs except A0-A3. A0-A3 are inputs when VDA != 0 || VPA != 0, and access internal registers. Outputs when VDA == 0 && VPA == 0. A0-A3 require level shifter.
D0-D7- Bidirectional Data Lines. Inputs when VDA != 0 || VPA != 0, bidirectional when VDA == 0 && VPA == 0. Requires level shifter.**
PHI2- Clock. Input. Requires level shifter.
VDA- Gets current bus activity from processor. Input. Requires level shifter.
VPA/SYNC- Gets current bus activity from processor. VPA is for '816, SYNC is for '02. Input. Requires level shifter.
IRQB- Output. Send IRQ to processor to signify end of XFER
DREQ0-DREQ1- DMA Request from I/O. DREQ0 takes priority. Inputs. Requires level shifter.
DACK0-DACK1- DMA Acknowledge to A/O. Outputs.
RWB/CE- Read/Write or Chip Enable (Combined to one pin). Chip Enable input when VDA != 0 || VPA != 0, Read/Write outputs when VDA == 0 && VPA == 0. Requires level shifter.

That equals 44 pins there... I can combine some more pins if needed though.

Registers (Unlike other designs, all registers can be read):
Status Register (Pending Xfer? Current Xfer? Done? Bus Error***? 8-bits)
Count Register Total*2 (how many bytes to transfer for each channel? 16-bits each)
Count Register Current*2 (how many bytes are left to transfer for each channel? 16-bits each)
Command Register (Start Block-to-Block Xfer? Stop current xfer?)
Bank Register SRC/DEST (8-bits)
Address Start SRC/DEST (16-bits)
Mode Register (Transparent-816, Standard-816, Cycle-steal-02, Standard-02)
More to follow... this will be built incrementally... so this is a good starting point.

*For forward compatibility, since 5Volt parts aren't going to be around for much longer :/. A big fat level shifter should work nicely though :D. Now to find a good one...

**Some DMA controllers, such as the 8237, can do single-to-block transfers without an intermediate buffer... I'm not 100% sure how this works, but I'll read some data sheets to find out.

***Not sure how to detect this, but... the 68k has a bus error vector, so there MUST be a digital way to detect an inconsistent state.


Any feedback on what I've suggested so far is appreciated. Standard DMA can be emulated from this transparent design using a software loop. Or perhaps I'll put it in as an alternate mode of xfer for SCSI devices :P.
I'm considering just buying a soft core from WDC for the time being for my dev board to get working on this... I wonder if they will sell to hobbyist individuals...


Last edited by cr1901 on Fri Mar 14, 2014 4:53 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 14, 2014 4:43 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Quote:
44-PLCC CPLD or FPGA @3.3Volts*
Any pin which can serve as an input requires a level shifter. Outputs do not require a level shifter (as 3.3V is valid TTL).

The '816 data sheet says minimum Vih is 0.8Vdd which would be 4V if Vdd is 5V; so 3.3V does not meet that. I doubt their number though. Their data sheets have always had a lot of things incorrect in them. Experimentation might be in order. Fortunately, as far as I can remember, the truth has always been much better than the spec..

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 49 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 36 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: