6522 as memory sequencer.

cjs · Post by **cjs** » Wed May 10, 2023 5:01 pm

John West wrote:

Map a 6526 into page zero somewhere.... Then LDA (timerA) STA (timerB) as many times as you need.

That's a nice little insight. Though it might make sense just to use some up/down counters (e.g., 74LS191) instead. With a bit of clever address decoding you could map one counter (actually, four '191s cascaded for a 16-bit counter) to $80-83, where reading from $80-81 increments it after the read and $82-83 decrements it after the read, and a second 16-bit counter to $84-87 doing the same. This would give you an easy way of doing stacks in arbitrary memory locations as well.

Then add a 16-bit comparator hooked up to the S.O. line and you can use BVC to loop, with the comparator setting the overflow flag when you've reached your end address.

Sadly, there are no huge speed gains here. It's 13 cycles per byte so for small (≤256 byte) copies it saves only a couple of cycles per byte for a DEX (and maybe one more during the entire copy if it crosses a page boundry), and not exactly a huge amount more if you're doing larger copies. That's far from even the 7 cycles per byte of a 65816, much less the 2 cycles per byte of a DMA engine.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed May 10, 2023 5:06 pm

cjs wrote:

BigDumbDinosaur wrote:

Using MVN/MVP has the limitation that the 65C816 doesn't have the handshake mechanism typically implemented by a true DMA controller.

Interesting. Can you tell me more about that?

An example: the 53CF94 ASIC in my POC unit’s SCSI host adapter is specifically designed for DMA transfers. The device has two handshake lines for the purpose: DREQ and /DACK. DREQ is an active-high output that tells the DMA controller (DMAC) when the CF94 has data waiting (read operation), or can accept data (write operation). /DACK is an active-low input that is driven by the DMAC when it is ready to accept data (read operation) or emit data (write operation).

If, for example, a DMA read is occurring, the CF94 will assert DREQ to tell the DMAC that data is waiting. After becoming the bus master, the DMAC will assert /DACK, causing the CF94 to place a datum on the bus. Once the DMAC has read the datum, it will deassert /DACK and then deposit the datum somewhere. If the CF94 has more data waiting (the device has a 16-deep FIFO), it will again assert DREQ. The DMAC will eventually respond again to DREQ and repeat the cycle.

Quote:

A more difficult problem is that as a copy progresses, both source and destination addresses emitted by the 816 change with each byte that is copied. In the most common type of DMA transfer, data is copied to or from an I/O port, whose address is fixed.

Ah, I'd not thought about that: I was thinking of just doing moves between memory locations, REU-style. And of course on the systems I generally tend to have in mind, 1-2 MHz 6502s, typically I/O devices are too slow to bother with DMA unless (again as with the REU) your "I/O" device is RAM. But I guess that changes as you add faster I/O devices.

In my POC V1.3 unit, I’ve got a performance bottleneck with mass storage access. The SCSI bus, running in asynchronous mode (the slowest mode possible; synchronous transfers can be clocked as high as 20 MB/second), can run at a maximum of 3.5 MB/second. At 16 MHz, the best the 65C816 can do on a read operation is around 660 KB/second. A write operation to a SCSI device can run faster, up to 800 KB/second, which is still a fraction of the SCSI bus speed.

The CF94 is clocked at 25 MHz in my application (permissible clock range for the CF94 is 10 MHz minimum, 40 MHz maximum). Clearly, I'm under-utilizing my SCSI setup, but can’t get it to run any faster due to the limitations of byte-at-a-time read/write with the MPU. So you can see this is one application in which DMA in a 65xx system would be a real benefit.

GARTHWILSON · Post by **GARTHWILSON** » Wed May 10, 2023 5:12 pm

wayfarer, you might be interested in our topic "The secret, hidden, transparent 6502 DMA channel." The idea is that the few dead bus cycles get used for DMA while the processor is doing an internal operation and not accessing the next byte in memory. This way, the processor is never paused or slowed down. The 65816 makes it extra easy to identify which bus cycles the processor is not using. The idea has intrigued me, but it's not high enough on the to-do list for me to actually get to it.

cjs · Post by **cjs** » Wed May 10, 2023 5:34 pm

BigDumbDinosaur wrote:

...the 53CF94 ASIC in my POC unit’s SCSI host adapter is specifically designed for DMA transfers. The device has two handshake lines for the purpose: DREQ and /DACK....

Ah, I see. Thanks for the explanation.

Quote:

In my POC V1.3 unit, I’ve got a performance bottleneck with mass storage access. The SCSI bus...can run at a maximum of 3.5 MB/second. At 16 MHz, the best the 65C816 can do on a read operation is around 660 KB/second.

Yeah, maybe I should have mentioned that in the late-'70s/early-'80s style systems I usually use, I'm also using contemporary peripherals, or am fine with running at the speed of contemporary peripherals.

In that world, 660 KB/second isn't bad at all: that is easily more than most late-'70s and early-'80s microcomputer hard drives. Really serious users in 1981 might buy the newly released 470 MB Fujitsu Eagle, which could do 1.8 MB/sec on an SMD interface, but it was typically only minicomputer owners who had $10,000 (about $35,000 today) to spend on it.

But for me, a floppy drive (or even a fast—by which I mean 19,200 or even 9,600 bps—serial port) is a brilliant upgrade from cassette tape.

wayfarer · Post by **wayfarer** » Thu May 11, 2023 2:41 pm

Proxy wrote:

i mean look at it like this:
how would you get the CPU to do work while avoiding memory accesses, if the instructions required to do anything are located in memory? how would you get them from memory into the CPU if not through a memory access?

Im still thinking Harvard arch... to me, the Program Counter should be completely removed from memory access.
getting the next operation, should not be something that changes data addresses, however, this can complicate design by requiring two separate data and program busses. Opcodes should mostly come out of ROM, and Data should mostly be in RAM... to me at least
I have not worked at this low of level much and the old thrill of exploring a frontier again. some of this stuff is 'old hat' from everything else Ive done in computers and electronics, some of it truly mind-boggling to what they were thinking at the time. I understand von-neuman architecture and why it is used, it is just not easy to think in without a little adjustment to my process.
given the stack and zero page are 'low addresses' and vectors for reset and interrupts are on the 'high addresses', I may do some separation via RAM/ROM...

Quote:

that's the main problem with running a DMAC and CPU at the same time on the same bus.
one way around that is to seperate their busses (via a 74x245 for example) and give the CPU the ability to access the DMAC's bus, but not the other way around.
that way the DMAC can run while the CPU still has it's own private section of RAM/ROM and some IO to access, but would get blocked when trying to harass the active DMAC's section of the address space.
of course it's not an ideal solution as that means the DMAC cannot copy data from/to the private address section.
so another way is bus sharing, since the 6502 and 65C02 only use the second half of a clock cycle to actually access memory, you can use the first half for a DMAC. this allows both of them to run at the same time with no performance loss on either site, but of course it also requires much faster logic and memory (especially at higher speeds like +10MHz).

I see a good reason to separate the stack and/or zero page then. using it as local or "L1" cache.
in short only the 6502 should be able to access the zero page under most circumstances.
your mention of a DMAC and|or the '245 further makes me want to explore a little 4-bit micro slice processor and maybe some buss line latches... reading the 6522 makes me want an 8-bit latch or similar to control its Chip Select Lines and the 4-bit needed to cover 20-bit ISA/pc104/PCMCIA addressing. so 'down the line' this looks like a CPLD in some fashion, akin to what I see several people here developing for their SBCs and projects.

ultimately a DMAC is available in many forms, from dedicated chips to logic to CPLDs or as part of an MCU.
what I am after here is interoperability between the 6502 and 6522 to make working with RO/AM (because everything to a 6502 is just a memory address).

so to get back to interoperability between the 6502 and 6522 for moving bytes around quickly...

John West wrote:

cjs wrote:

using a 6522 to run DMA might be totally useless, but I'd love to see the design of the circuit and software that does it.)

The best I can come up with needs a 6526, as only one of the 6522's timers can take an external clock.
Map a 6526 into page zero somewhere. ....

not exactly like Im thinking, and Im not sure I can get one of those chips...

cjs wrote:

John West wrote:

Map a 6526 into page zero somewhere.... Then LDA (timerA) STA (timerB) as many times as you need.

That's a nice little insight. Though it might make sense just to use some up/down counters (e.g., 74LS191) instead. ...

this is basically what I want to use the 6522 to do

Quote:

Sadly, there are no huge speed gains here. It's 13 cycles per byte so for small (≤256 byte) copies it saves only a couple of cycles per byte for a DEX (and maybe one more during the entire copy if it crosses a page boundry), and not exactly a huge amount more if you're doing larger copies. That's far from even the 7 cycles per byte of a 65816, much less the 2 cycles per byte of a DMA engine.

I am seeing that as well, and "PHI1 DMA" might be a better choice overall

BigDumbDinosaur wrote:

cjs wrote:

BigDumbDinosaur wrote:

Using MVN/MVP has the limitation that the 65C816 doesn't have the handshake mechanism typically implemented by a true DMA controller.

Interesting. Can you tell me more about that?

An example: the 53CF94 ASIC in my POC unit’s SCSI host adapter is specifically designed for DMA transfers....

noted, you have done a lot of amazing things and your 'CF94 chips is certainly one of the Custom Pretty Little Doohickeys that makes me want to build my own custom specific chip... for the moment, Im just focusing on the 6522/6502 as the 65816 is not going to have a lot of these problems. still it is good to know how you solve problems and what arises.

GARTHWILSON wrote:

The secret, hidden, transparent 6502 DMA channel." The idea is that the few dead bus cycles get used for DMA while the processor is doing an internal operation and not accessing the next byte in memory. This way, the processor is never paused or slowed down. The 65816 makes it extra easy to identify which bus cycles the processor is not using. The idea has intrigued me, but it's not high enough on the to-do list for me to actually get to it.

read it, and I see the similarity, however, I am not so much trying to do that, as just automate as much as I can with a 6522.
I will post more later today in my next post, I need to run an errand right quick and I wanted to cover the responses here before they got too many.

"PHI1 DMA" might be a good option, and this is not really DMA (Ill probavly use a PIC as a DMA as you yourself suggest in places), Im just trying to reduce operations used by utilizing the 6522 to its potential, which I see a thing that might work yet... Im just not certain it saves any trouble.

cjs wrote:

BigDumbDinosaur wrote:

...the 53CF94 ASIC in my POC unit’s SCSI host adapter is specifically designed for DMA transfers. The device has two handshake lines for the purpose: DREQ and /DACK....

Ah, I see. Thanks for the explanation...

speed is certainly a concern, mostly just getting a screen updated without tearing sans gpu...

I will post more on my design goals in a bit, I need to run an errand.

cjs · Post by **cjs** » Thu May 11, 2023 3:52 pm

wayfarer wrote:

Opcodes should mostly come out of ROM, and Data should mostly be in RAM... to me at least...

Well, I think of it as address spaces: while Harvard architecture machines typically use ROM or EEPROM in the instruction space and RAM in the data space, there's no reason you can't put either in either space.

But the 6502 itself is not that well suited to systems that turn it in to a Harvard architecture since it's not at all unusual to want to use self-modifying code in it, and the neither are home computers in general, where you usually want to be able to load programs from I/O devices and run them, and save programs back as well.

Quote:

I see a good reason to separate the stack and/or zero page then. using it as local or "L1" cache.

Well, it's definitely not cache since the 6502 always sees it as main memory. Designing a system where some devices can access it and others can't is a pseudo-Harvard architecture (implemented in the bus design rather than the CPU).

Quote:

I am seeing that as well, and "PHI1 DMA" might be a better choice overall

If you're looking for speed and parallelism, yes. That's the usual way of doing it and an enormous number of systems do this.

The Apple II was one of the first: the video subsystem (done entirely with 7400-series parts) reads the frame buffers using this technique, so that might be worth study. (The schematics and a description of how it works are in the technical reference manuals for every member of the Apple II series.) The video system in the Apple II also did DRAM refresh as a side effect of the scanning, which explains the funny frame buffer layout. The VIC-20 and C64 also did this, though all the logic for that is packed inside a custom chip. (And the C64 couldn't get quite enough bandwidth on ϕ1 cycles alone, so it also stole a few ϕ2 cycles from the CPU.)

There were also systems that had a complete second CPU running the ϕ1. The first one I'm aware of is the Commodore 2040 floppy diskette system, which is really more of an embedded device. The Fujitsu FM-8, and it's more famous follow-up the FM-7, had two 6809s: as well as the "main" CPU running on ϕ2 with 64K RAM and 32K ROM, there was a "sub CPU" for handing graphics and the keyboard with its own 48K RAM for the frame buffer (640×200×8), a bit more for work area, and its own ROM. 256 bytes of shared memory was used for communication between the CPUs, so the main CPU would send commands such as "draw this character at this size here" or "draw this line there" and the sub-CPU would actually figure out what bits should go where in the frame buffer and write them. There's more detail and schematics in my retroabandon/fm7re repo, if you're interested in seeing how that's done. (And if you're really concerned about graphics performance, using a "GPU" like this might be the way to go.)

Quote:

"PHI1 DMA" might be a good option, and this is not really DMA

I don't see how it's "not really DMA." It's a (non-CPU) device directly accessing memory without the assistance of the CPU (except to set up the transfer), which seems to me (and Wikipedia) the definition of "DMA."

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu May 11, 2023 4:29 pm

cjs wrote:

wayfarer wrote:

"PHI1 DMA" might be a good option, and this is not really DMA

I don't see how it's "not really DMA." It's a (non-CPU) device directly accessing memory without the assistance of the CPU (except to set up the transfer), which seems to me (and Wikipedia) the definition of "DMA."

In designing a circuit that is to use a DMA controller (DMAC), it’s helpful to think of the device as a highly specialized form of microprocessor with a very small “instruction set” that consists of a fetch instruction and a store instruction. The rest of it is a set of address registers (source and destination), a counter register and a control register to tell the DMAC what to do.

The logic in a DMAC is actually fairly simple to describe, although not as simple to implement. The complexity is in integrating the DMAC into the system. Fortunately, the WDC form of the 65C02 has the necessary signals to make such an integration theoretically painless. The 65C816 is even better in that regard, as it has the ability to tell the glue logic exactly what it is up to during each clock cycle, opening the door to the DMAC grabbing control during the so-called dead cycles, aka cycle-stealing.

I’m inclined to think a CPLD of the size of a Microchip ATF1508 would have sufficient resources to act as a DMAC, assuming the design doesn’t get too ambitious. Three 16-bit registers would be needed for the counter and address registers, and an eight-bit register would be needed for control. A 16-bit address bus would be needed, along with an eight-bit data bus and read/write line. If interfacing to a 65C816 system, an analog of the VDA and VPA outputs likely would be necessary, plus some means of generating the bank bits would be required.

It would be an interesting project...

Proxy · Post by **Proxy** » Thu May 11, 2023 5:54 pm

wayfarer wrote:

Im still thinking Harvard arch... to me, the Program Counter should be completely removed from memory access.

even with the harvard architecture you still read instructions from memory. sure it's not using the same external bus as for data, but it's still accessing memory. (side note, RAM and ROM are both types of "Memory")
plus in such a harvard system, you'd still want the DMAC to be able to access the CPU's Instruction memory as well as Data memory. so you can copy from data memory to instruction memory or vise versa.

wayfarer wrote:

Opcodes should mostly come out of ROM, and Data should mostly be in RAM... to me at least

eh not really, you also want the ability for your CPU to execute from RAM. so you can load programs at runtime without having to reprogram a ROM. which is overall just a very very useful ability to have (just look at any 8-bit home computer that runs software/games without cartridges).

on a side note, couldn't you create a "pseudo-harvard 6502" using a 65816 (VDA/VPA) and some logic?

main benefit is that you technically double your address space. but downside is that you require a DMAC or similar to be able to load programs into instruction memory as the CPU cannot do that on it's own.

...damn, why am i so easily distracted by new ideas like this?

anyways, on the note of a CPLD for a DMAC, if it's for the 65816 you'd likely need full 24-bit address registers to be able to copy between banks, though the counter could stay 16-bits wide, which would limit the DMAC to only being able to copy <64kB of data at once, which i think is more than enough. also, do you really need a full 8 bit control register? i think for a very basic DMAC you'd need only 3 bits in total.
1 bit to start the operation. (which also acts as a status bit, so as long as it's 1 the DMAC is still running)
and 2 bits to select if the counter should be added to the source/destination registers when reading/writing. (basically this allows the DMAC to copy from/to a single non-advancing address which is useful for filling areas of memory with 1 value, or when copying from/to/between IO devices)

i think ideas for such devices deserve their own topic over in Programmable Logic though.

GARTHWILSON · Post by **GARTHWILSON** » Thu May 11, 2023 7:52 pm

wayfarer wrote:

Im still thinking Harvard arch... to me, the Program Counter should be completely removed from memory access.
getting the next operation, should not be something that changes data addresses, however, this can complicate design by requiring two separate data and program busses. Opcodes should mostly come out of ROM, and Data should mostly be in RAM... to me at least
I have not worked at this low of level much and the old thrill of exploring a frontier again. some of this stuff is 'old hat' from everything else Ive done in computers and electronics, some of it truly mind-boggling to what they were thinking at the time. I understand von-neuman architecture and why it is used, it is just not easy to think in without a little adjustment to my process.

The Harvard architecture was done to get greater performance. It's not without penalties though, including making several things in low-level programming more difficult—although HLL compilers tend to hide that.

If the page-1 stack area were onboard so you could, for example, do a PHA or PLA in half the time, I would still want to be able to address it with the abs,X addressing mode and maybe other ones too, for stack-relative addressing like I discuss in the 6502 stacks treatise, starting about in the middle of the page on stack addressing.

wayfarer · Post by **wayfarer** » Tue May 06, 2025 1:25 am

BigDumbDinosaur wrote:

cjs wrote:

In designing a circuit that is to use a DMA controller (DMAC), it’s helpful to think of the device as a highly specialized form of microprocessor with a very small “instruction set” that consists of a fetch instruction and a store instruction. The rest of it is a set of address registers (source and destination), a counter register and a control register to tell the DMAC what to do.

The logic in a DMAC is actually fairly simple to describe, although not as simple to implement. The complexity is in integrating the DMAC into the system. Fortunately, the WDC form of the 65C02 has the necessary signals to make such an integration theoretically painless. The 65C816 is even better in that regard, as it has the ability to tell the glue logic exactly what it is up to during each clock cycle, opening the door to the DMAC grabbing control during the so-called dead cycles, aka cycle-stealing.

I’m inclined to think a CPLD of the size of a Microchip ATF1508 would have sufficient resources to act as a DMAC, assuming the design doesn’t get too ambitious. Three 16-bit registers would be needed for the counter and address registers, and an eight-bit register would be needed for control. A 16-bit address bus would be needed, along with an eight-bit data bus and read/write line. If interfacing to a 65C816 system, an analog of the VDA and VPA outputs likely would be necessary, plus some means of generating the bank bits would be required.

It would be an interesting project...

this is the ISAC chip. or at least part of it.
viewtopic.php?f=4&t=7577

6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.

Re: 6522 as memory sequencer.