65c816 address decoding help

Chromatix · Post by **Chromatix** » Sat Apr 04, 2020 7:20 am

At 8MHz SPI clock, you can theoretically get 1MB/sec into or out of the card, 8 cycles each byte. You'll be limited by how fast you can move that data into a useful place in RAM using the CPU. The SPI interface will have to wait for the CPU to catch up. It's a good place to be.

You would program the ATF750CL to perform the parallel <-> serial conversion with correct SPI and 65xx bus signalling. Nothing more, nothing less. With a nominal Tpd of 15ns, it's plenty fast enough for the job. Write a byte to it, eight bits get sent to the SD card along with eight clock pulses, and the 8 bits sent by the card at the same time are then ready for reading. Very simple.

In CUPL it's a 9-state machine (idle, then eight active states, then back to idle). Eight of the output pins sit on the 65xx data bus; the other two are SPI clock and MOSI (and MISO is an input). In the idle state, you're sensitive to /CE, /OE and /WE, which indicate when you need to activate the output pins (.OE term), or latch them as inputs into the shift register and advance to the first active state. In each active state, the clock gets passed through to the card, the most significant bit of the shift register is presented on MOSI, and the correct clock edge both advances the shift register (pulling in the value on MISO in the process) and the active state.

There are a few auxiliary signals that the SD card may need to see, but which the ATF750 doesn't have enough output pins to drive by itself. They'll be low-frequency, so you can bit-bang them in the normal way. The most important one is /SS or /CS, depending on naming convention. When not actively accessing the card, deselect it to save power. You'll need to have it selected for it to respond. The card socket may provide a card presence detect signal, which you should be able to read, and maybe a write-protect signal, likewise.

You would then need to talk SD card protocol over that interface. That's all software, and you'd need to do it anyway if you were bit-banging, just more slowly. IIRC there's a page or two on the web detailing practical experience with implementing that in practice. There are enough differences between SDSC, SDHC and SDXC cards that I'd advise you to get a small clutch of bog-standard 2GB SD cards, which use the oldest and simplest version of the protocol.

Finally, don't forget that SD cards are 3.3V devices. If the rest of your machine is 5V, you'll need to insert a level shifter and provide a correct power supply. Not doing so will blow up the card.

Skylie33 · Post by **Skylie33** » Sat Apr 04, 2020 7:30 am

Thank you very much for the help! It seems like this is the best solution. One question, though: how should I activate the /CE, /OE, and /WE lines? And are they all necessary? I'm assuming they'd be input pins.

DerTrueForce · Post by **DerTrueForce** » Sat Apr 04, 2020 8:26 am

/CE is the chip enable, or chip select. This is generated by the decoding circuitry.
/RD and /WR are the read and write lines. These are intel-style signals, but they're easy to generate from the 6502's phase-2 and R/W. They're the same signals your ROM and RAM almost certainly use.

Skylie33 · Post by **Skylie33** » Sat Apr 04, 2020 8:43 am

DerTrueForce wrote:

/CE is the chip enable, or chip select. This is generated by the decoding circuitry.
/RD and /WR are the read and write lines. These are intel-style signals, but they're easy to generate from the 6502's phase-2 and R/W. They're the same signals your ROM and RAM almost certainly use.

Right. Thank you for clarifying that!

Chromatix · Post by **Chromatix** » Sat Apr 04, 2020 8:46 am

Right. Since the Phi2 clock is an input to the SPI interface anyway (so that the SPI clock can be generated from it), you could also use /CE and R/W to provide a true 65xx style interface. That would look like the one provided by the 6551 and 6522.

cjs · Post by **cjs** » Sat Apr 04, 2020 9:06 am

Skylie33 wrote:

It seems more convenient to use the ATF750CL as I don't think I'd need more than one SPI device. However, I'm not very well informed when it comes to how SD cards work and how I'd program the ATF750CL to read/write the data.

I cannot recommend strongly enough doing a cheap bit-bang SPI interface and getting it working before designing and building custom hardware to assist you with SPI (unless perhaps that custom hardware is a microcontroller-level system that does all the work for you). Premature optimization without knowing exactly what needs optimizing and, even more importantly, what (due to the protocol details) can't be optimized has a high likelyhood of leading to a hardware assist that doesn't work. Don't make your SPI interface another Commodore 1541! :-)

Skylie33 · Post by **Skylie33** » Sat Apr 04, 2020 9:21 am

cjs wrote:

I cannot recommend strongly enough doing a cheap bit-bang SPI interface and getting it working before designing and building custom hardware to assist you with SPI (unless perhaps that custom hardware is a microcontroller-level system that does all the work for you). Premature optimization without knowing exactly what needs optimizing and, even more importantly, what (due to the protocol details) can't be optimized has a high likelyhood of leading to a hardware assist that doesn't work. Don't make your SPI interface another Commodore 1541!

Yes, the plan is to make a breadboard prototype with the bit-banging method before I go any further. I giggled at your last comment! Definitely wouldn't want that.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Apr 04, 2020 9:36 am

cjs wrote:

BigDumbDinosaur wrote:

cjs wrote:

Hm! So that sounds like yet another reason to move the direct page to cover your I/O addresses when loading data, and then run the load routine in the bank you're loading. That would fix this problem, would it not?

...While accessing an I/O register as a direct page location does eliminate one clock cycle per access—assuming DP points at a page boundary—...

Um, I think two cycles per access in the situation were were talking about, right?

Nope. Non-indexed absolute loads and stores require four cycles, unless the load or store is 16 bits, in which case one additional cycle is required. Non-indexed zero (direct) page loads and stores require three cycles (8-bit transfer) or four (16-bit transfer). However, if the 65C816's DP is pointing to an address that is not on a page boundary a DP load or store will incur a one cycle penalty for each access.

Quote:

I was responding to your earlier comment:

BigDumbDinosaur wrote:

Something to be aware of is reading data from a fixed address with a 65C816, as would be the case with disk I/O, will involve long indirection if the data is going into or coming out of a different bank than the one in which the I/O device is located. Indirection of any kind costs clock cycles because it involves additional internal steps in the MPU. Any 24-bit load or store will incur a one clock cycle penalty for each access.

What I understood you to be saying here is that to load data from the address used for input from your device, say, $C012, you expected one would be using absolute long LDA $00C012, a 4 byte/5 cycle instruction. (I don't actually see any indirection here, though; the address is being used as given, not loaded from another address.) I proposed replacing that with LDA $12 with DP set to $C0 (2 bytes/3 cycles).

Reiterating an earlier statement, unless the hardware register and data structure being used with it are in the same bank, either a hard-coded 24-bit address is needed to access one or the other, or a 24-bit direct page pointer is needed if more than one structure may be associated with the hardware in question. As an example of the latter case, when I finally get my extended memory version of POC functioning, there will be the capability to read a block from one of the disks and deposit it in memory almost anywhere. "Almost anywhere" has to be defined in a 24-bit direct page pointer, since "almost anywhere" includes a bank other than the one in which the I/O hardware is present.

Quote:

I'm not seeing any issue with the target location of the data transfer, so long as you're willing to limit single transfers to 64K or less: just load an index register with $10000 - length, set the operand of your STA instruction to destaddr - $10000 - length, and loop until the index hits 0. Yes, this requires self-modifying code, but it's pretty innocuous as far as self-modifying code goes.

Yes that would work, but only if the program is running in RAM that hasn't been write-protected. However, consider that 24-bit indirect indexed requires 6 cycles, but is far more flexible, plus can easily straddle bank boundaries, making loads greater than 64K quite easy.

Quote:

Also, it means you need not worry about the DBR if you don't want to; STA seems to be the same number of cycles for for absolute and long indexed X, according to the WDC book.

In practice, most '816 programs would not be touching DB at all, as indirection can handle all cases where ad hoc access in arbitrary banks is needed.

As for absolute index and absolute indexed long using the same number of cycles, that would be expected. If a 16-bit address is specified DB has to be used to construct the base address, which then has to be added to by the index. If a 24-bit address is specified, the cycle that would have read DB will instead be used to fetch the MSB of the address.

Quote:

I went through this exercise when I was designing the SCSI and multi-channel UART drivers for my POC V1 units.... Pointing DP at hardware not only proved to be of no value in performance, it resulted in a a lot of hoop-jumping in order to get at things such as indices and pointers that were needed by the driver.

If the driver needed a bunch of indices and pointers, yeah, you'd want the zero page pointing at those.

I've yet to run into a driver for 6502-type I/O hardware that didn't need some pointers. As I said, the dubious benefit of pointing direct page at hardware is more than offset by the convolutions of trying to make drivers sufficiently general to avoid resorting to self-modifying (and potentially buggy) code.

Chromatix · Post by **Chromatix** » Sat Apr 04, 2020 10:06 am

At least on the '816 you have an alternate place to stick a pointer - the stack. The stack-relative-indirect post-indexed addressing mode results in a 16-bit address to which the DBR is prepended (before indexing), and takes 7 cycles (or 8 in 16-bit mode) to complete.

This compares with the indirect-long post-indexed addressing mode which takes one less cycle, provided the DPR is page-aligned. In general the stack pointer is not page-aligned and the programmer cannot easily arrange for it to be so, so the optimisation of skipping the extra address addition cycle in that case isn't provided.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Apr 04, 2020 2:36 pm

Chromatix wrote:

At least on the '816 you have an alternate place to stick a pointer - the stack. The stack-relative-indirect post-indexed addressing mode results in a 16-bit address to which the DBR is prepended (before indexing), and takes 7 cycles (or 8 in 16-bit mode) to complete.

Even more useful is reserving ephemeral stack space and pointing direct page at it so 24-bit pointers can be used. Doing so avoids the necessity of fooling around with DB, which is awkward. It's likely, of course, that the start of the local direct page will not be page-aligned, costing a clock cycle. However, that cost is offset by not having to monkey with DB. It always comes down to trade-offs.

enso · Post by **enso** » Thu Mar 25, 2021 5:48 pm

Skylie33 wrote:

I'm sure the FPGA for my video generation would have some logic space left for that [SPI hardware], but I'm not sure I'll go that route just yet...

A quick note: SPI is pretty much just a shift register. On an FPGA I can implement it in 1/2 a slice on Spartan3, or 1/4 on Spartan6. Compared to the overall size of today's FPGAs it is literally below the noise floor.

As for video generation, here is a VGA interface with all timing and sync in 5 1/2 Spartan3 slices...
https://www.fpgarelated.com/showarticle/42.php...Add a counter and memory, and you still have thousands if not tens of thousands of slices left to do something else.

65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help

Re: 65c816 address decoding help