6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon May 13, 2024 2:55 am

All times are UTC




Post new topic Reply to topic  [ 86 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6
Author Message
PostPosted: Sat Apr 04, 2020 7:20 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
At 8MHz SPI clock, you can theoretically get 1MB/sec into or out of the card, 8 cycles each byte. You'll be limited by how fast you can move that data into a useful place in RAM using the CPU. The SPI interface will have to wait for the CPU to catch up. It's a good place to be.

You would program the ATF750CL to perform the parallel <-> serial conversion with correct SPI and 65xx bus signalling. Nothing more, nothing less. With a nominal Tpd of 15ns, it's plenty fast enough for the job. Write a byte to it, eight bits get sent to the SD card along with eight clock pulses, and the 8 bits sent by the card at the same time are then ready for reading. Very simple.

In CUPL it's a 9-state machine (idle, then eight active states, then back to idle). Eight of the output pins sit on the 65xx data bus; the other two are SPI clock and MOSI (and MISO is an input). In the idle state, you're sensitive to /CE, /OE and /WE, which indicate when you need to activate the output pins (.OE term), or latch them as inputs into the shift register and advance to the first active state. In each active state, the clock gets passed through to the card, the most significant bit of the shift register is presented on MOSI, and the correct clock edge both advances the shift register (pulling in the value on MISO in the process) and the active state.

There are a few auxiliary signals that the SD card may need to see, but which the ATF750 doesn't have enough output pins to drive by itself. They'll be low-frequency, so you can bit-bang them in the normal way. The most important one is /SS or /CS, depending on naming convention. When not actively accessing the card, deselect it to save power. You'll need to have it selected for it to respond. The card socket may provide a card presence detect signal, which you should be able to read, and maybe a write-protect signal, likewise.

You would then need to talk SD card protocol over that interface. That's all software, and you'd need to do it anyway if you were bit-banging, just more slowly. IIRC there's a page or two on the web detailing practical experience with implementing that in practice. There are enough differences between SDSC, SDHC and SDXC cards that I'd advise you to get a small clutch of bog-standard 2GB SD cards, which use the oldest and simplest version of the protocol.

Finally, don't forget that SD cards are 3.3V devices. If the rest of your machine is 5V, you'll need to insert a level shifter and provide a correct power supply. Not doing so will blow up the card.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 7:30 am 
Offline

Joined: Tue Mar 31, 2020 3:40 am
Posts: 33
Thank you very much for the help! It seems like this is the best solution. One question, though: how should I activate the /CE, /OE, and /WE lines? And are they all necessary? I'm assuming they'd be input pins.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 8:26 am 
Offline

Joined: Sat Jun 04, 2016 10:22 pm
Posts: 483
Location: Australia
/CE is the chip enable, or chip select. This is generated by the decoding circuitry.
/RD and /WR are the read and write lines. These are intel-style signals, but they're easy to generate from the 6502's phase-2 and R/W. They're the same signals your ROM and RAM almost certainly use.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 8:43 am 
Offline

Joined: Tue Mar 31, 2020 3:40 am
Posts: 33
DerTrueForce wrote:
/CE is the chip enable, or chip select. This is generated by the decoding circuitry.
/RD and /WR are the read and write lines. These are intel-style signals, but they're easy to generate from the 6502's phase-2 and R/W. They're the same signals your ROM and RAM almost certainly use.


Right. Thank you for clarifying that!


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 8:46 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Right. Since the Phi2 clock is an input to the SPI interface anyway (so that the SPI clock can be generated from it), you could also use /CE and R/W to provide a true 65xx style interface. That would look like the one provided by the 6551 and 6522.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 9:06 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Skylie33 wrote:
It seems more convenient to use the ATF750CL as I don't think I'd need more than one SPI device. However, I'm not very well informed when it comes to how SD cards work and how I'd program the ATF750CL to read/write the data.

I cannot recommend strongly enough doing a cheap bit-bang SPI interface and getting it working before designing and building custom hardware to assist you with SPI (unless perhaps that custom hardware is a microcontroller-level system that does all the work for you). Premature optimization without knowing exactly what needs optimizing and, even more importantly, what (due to the protocol details) can't be optimized has a high likelyhood of leading to a hardware assist that doesn't work. Don't make your SPI interface another Commodore 1541! :-)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 9:21 am 
Offline

Joined: Tue Mar 31, 2020 3:40 am
Posts: 33
cjs wrote:
I cannot recommend strongly enough doing a cheap bit-bang SPI interface and getting it working before designing and building custom hardware to assist you with SPI (unless perhaps that custom hardware is a microcontroller-level system that does all the work for you). Premature optimization without knowing exactly what needs optimizing and, even more importantly, what (due to the protocol details) can't be optimized has a high likelyhood of leading to a hardware assist that doesn't work. Don't make your SPI interface another Commodore 1541! :-)

Yes, the plan is to make a breadboard prototype with the bit-banging method before I go any further. I giggled at your last comment! Definitely wouldn't want that.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 9:36 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8182
Location: Midwestern USA
cjs wrote:
BigDumbDinosaur wrote:
cjs wrote:
Hm! So that sounds like yet another reason to move the direct page to cover your I/O addresses when loading data, and then run the load routine in the bank you're loading. That would fix this problem, would it not?

...While accessing an I/O register as a direct page location does eliminate one clock cycle per access—assuming DP points at a page boundary—...

Um, I think two cycles per access in the situation were were talking about, right?

Nope. Non-indexed absolute loads and stores require four cycles, unless the load or store is 16 bits, in which case one additional cycle is required. Non-indexed zero (direct) page loads and stores require three cycles (8-bit transfer) or four (16-bit transfer). However, if the 65C816's DP is pointing to an address that is not on a page boundary a DP load or store will incur a one cycle penalty for each access.

Quote:
I was responding to your earlier comment:
BigDumbDinosaur wrote:
Something to be aware of is reading data from a fixed address with a 65C816, as would be the case with disk I/O, will involve long indirection if the data is going into or coming out of a different bank than the one in which the I/O device is located. Indirection of any kind costs clock cycles because it involves additional internal steps in the MPU. Any 24-bit load or store will incur a one clock cycle penalty for each access.

What I understood you to be saying here is that to load data from the address used for input from your device, say, $C012, you expected one would be using absolute long LDA $00C012, a 4 byte/5 cycle instruction. (I don't actually see any indirection here, though; the address is being used as given, not loaded from another address.) I proposed replacing that with LDA $12 with DP set to $C0 (2 bytes/3 cycles).

Reiterating an earlier statement, unless the hardware register and data structure being used with it are in the same bank, either a hard-coded 24-bit address is needed to access one or the other, or a 24-bit direct page pointer is needed if more than one structure may be associated with the hardware in question. As an example of the latter case, when I finally get my extended memory version of POC functioning, there will be the capability to read a block from one of the disks and deposit it in memory almost anywhere. "Almost anywhere" has to be defined in a 24-bit direct page pointer, since "almost anywhere" includes a bank other than the one in which the I/O hardware is present.

Quote:
I'm not seeing any issue with the target location of the data transfer, so long as you're willing to limit single transfers to 64K or less: just load an index register with $10000 - length, set the operand of your STA instruction to destaddr - $10000 - length, and loop until the index hits 0. Yes, this requires self-modifying code, but it's pretty innocuous as far as self-modifying code goes.

Yes that would work, but only if the program is running in RAM that hasn't been write-protected. However, consider that 24-bit indirect indexed requires 6 cycles, but is far more flexible, plus can easily straddle bank boundaries, making loads greater than 64K quite easy.

Quote:
Also, it means you need not worry about the DBR if you don't want to; STA seems to be the same number of cycles for for absolute and long indexed X, according to the WDC book.

In practice, most '816 programs would not be touching DB at all, as indirection can handle all cases where ad hoc access in arbitrary banks is needed.

As for absolute index and absolute indexed long using the same number of cycles, that would be expected. If a 16-bit address is specified DB has to be used to construct the base address, which then has to be added to by the index. If a 24-bit address is specified, the cycle that would have read DB will instead be used to fetch the MSB of the address.

Quote:
Quote:
I went through this exercise when I was designing the SCSI and multi-channel UART drivers for my POC V1 units.... Pointing DP at hardware not only proved to be of no value in performance, it resulted in a a lot of hoop-jumping in order to get at things such as indices and pointers that were needed by the driver.

If the driver needed a bunch of indices and pointers, yeah, you'd want the zero page pointing at those.

I've yet to run into a driver for 6502-type I/O hardware that didn't need some pointers. As I said, the dubious benefit of pointing direct page at hardware is more than offset by the convolutions of trying to make drivers sufficiently general to avoid resorting to self-modifying (and potentially buggy) code.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 10:06 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
At least on the '816 you have an alternate place to stick a pointer - the stack. The stack-relative-indirect post-indexed addressing mode results in a 16-bit address to which the DBR is prepended (before indexing), and takes 7 cycles (or 8 in 16-bit mode) to complete.

This compares with the indirect-long post-indexed addressing mode which takes one less cycle, provided the DPR is page-aligned. In general the stack pointer is not page-aligned and the programmer cannot easily arrange for it to be so, so the optimisation of skipping the extra address addition cycle in that case isn't provided.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 04, 2020 2:36 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8182
Location: Midwestern USA
Chromatix wrote:
At least on the '816 you have an alternate place to stick a pointer - the stack. The stack-relative-indirect post-indexed addressing mode results in a 16-bit address to which the DBR is prepended (before indexing), and takes 7 cycles (or 8 in 16-bit mode) to complete.

Even more useful is reserving ephemeral stack space and pointing direct page at it so 24-bit pointers can be used. Doing so avoids the necessity of fooling around with DB, which is awkward. It's likely, of course, that the start of the local direct page will not be page-aligned, costing a clock cycle. However, that cost is offset by not having to monkey with DB. It always comes down to trade-offs.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 5:48 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
Skylie33 wrote:
I'm sure the FPGA for my video generation would have some logic space left for that [SPI hardware], but I'm not sure I'll go that route just yet...

A quick note: SPI is pretty much just a shift register. On an FPGA I can implement it in 1/2 a slice on Spartan3, or 1/4 on Spartan6. Compared to the overall size of today's FPGAs it is literally below the noise floor.

As for video generation, here is a VGA interface with all timing and sync in 5 1/2 Spartan3 slices...
https://www.fpgarelated.com/showarticle/42.php...Add a counter and memory, and you still have thousands if not tens of thousands of slices left to do something else.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 86 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: