6502.org • View topic - A cycle-efficient scheme for e.g. an SPI interface

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Hardware

All times are UTC

A cycle-efficient scheme for e.g. an SPI interface

Page 1 of 1

[ 4 posts ]

Previous topic | Next topic

Author

Message

BigEd

Post subject: A cycle-efficient scheme for e.g. an SPI interface

Posted: Wed Feb 15, 2017 9:53 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10943
Location: England

In the context of memory-mapped peripherals which occupy several addresses and use some of those address bits as commands or modifiers, Hans Franke described an ingenious SPI interface he's built:

Quote:

...it's a working FPGA implementation. I did it like 2 or 3 years ago as part of a larger idea to make a TSA (Triple Serial Adaptor - giving that 3 letter acronym a new positive meaning

) featuring a two channel 16550 serial interface, an I2C controller and above SPI interface. Well, I never really got past the SPI.

After all, a (memory mapped) device doesn't just get the data lines, but also address lines. So by assigning additional address lines to a single device, they can be used to transfer data - at least toward the device. The [Weitek] Abacus FPUs were a good example, as they reserved 64k, thus 16 bits for their 'registers'. Of the 16 bits, 6 were used as command (the rest addresses source/destination operands, and parts thereof.)

(Another nice and performance enhancing measure [on the Weitek] is a several instruction deep command buffer.)

I used a somewhat similar approach with a 6502 based SPI interface. While [the databus carried] the data to be transmitted, address lines were used for commands identifying and controlling the select lines and framing:

Address A: Writing starts a new transfer (CS pulled) and the byte written transferred accordingly; CS is left active afterwards.
Address B: Writing transfers the byte and releases CS when done.

To transfer one byte as single frame, only one write to B is necessary. Can't be shorter, can't it? For multi byte transfers, all bytes, except the last are written to A, while the last goes to B. Again, all that's needed is writing the data you want to be transmitted.

With other designs, the CPU must first assign CS (which is at least one 2 clock instruction, usually rather 4-7), then write the date, then deselect (again 2..7 clocks). Instead of transferring this additional information for frame handling thru a classic command port, memory mapped commands will do it.

For reading I extended this beyond frame control by including certain addresses for default data:

Address C: Reading delivers the last byte read
Address D: Reading delivers the last byte read and CS is released
Address E: Reading delivers the last byte, clears the transmit register and starts another transfer cycle

While the use of Address C is as expected, D might seem superfluous - as the CS release would already have been done by writing the last request byte via address B. It becomes useful with Address E, where a standard case of reading from a device is handled. SPI is designed in a way where even reading requires sending of a byte for every one received. The content doen't matter. So often 00 or FF is used. With a classic SPI control, one would have to write some byte or at least do some kind of operation to retransmit. For a 6502 that's again 2..9 clocks overhead, depending on the interface design, per byte. With the automatic creation and transmission via Address E, the overhead goes away, and reading from a ROM or SD-Card becomes the least possible effort. Just reading. At this point, of course, we need a way to read the last byte without starting a new transmission, but still deactivate CS - this is where D comes in.

Reading from a SD-Card now can run as fast as possible, even in higher clocked systems. An SPI transfer needs 8 clocks (plus 1 as safety for CS handling), the minimal read/store combination in a 6502 is 6 clocks (LDA zp; STA zp), so an SPI clock of 1.5x CPU clock will do the trick. Thus, with 20 MHz SPI clock, it needs a CPU faster than 13.33 MHz to create an overrun.

But that's rather theoretical, as a useful loop will be more like 10-13 clocks (LDA zp, STA (zp),X, INX, BNE), so even the fastest 6502 can't outrun it on a 1:1 clock. On the other hand, even a 1 MHz 6502 can come close to 80+kB/s effective read rate.

(Lightly edited with permission)

Top

MichaelM

Post subject: Re: A cycle-efficient scheme for e.g. an SPI interface

Posted: Wed Feb 15, 2017 11:16 am

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL

Thanks for posting.

The description provides some good ideas that may be useful when modifying my SPI Master peripheral in the future. You need free address space for implementing the memory-mapped interface ideas described. For the M16C5x project the SPI Master interface had to map into a limited number of addresses since it was being added to a PIC soft-core. For the M65C02/M65C02A projects, the SPI Master has a bit more freedom with the address space available. I may consider modifying my M65C02/M65C02A SPI Master core in the manner described.

The design philosophy described that the peripheral can relieve the processor of some of the "drudge" work necessary to execute a data exchange is well worth considering, particularly when the potential improvement in throughput is as described above.

_________________
Michael A.

Top

Raffzahn

Post subject: Re: A cycle-efficient scheme for e.g. an SPI interface

Posted: Wed Feb 15, 2017 2:41 pm

Joined: Wed Jun 09, 2010 6:58 pm
Posts: 2
Location: Munich, FRG

MichaelM wrote:

The description provides some good ideas that may be useful when modifying my SPI Master peripheral in the future.

Feel free to use it, I don't mind to get refered for, even considering that it's not special in any way, just the usual maping of the a protocoll into simple hardware. I'm from a time when design was about functionality, not 'the way it's told'

) Being an oldtimer, I'm still thinking in TTL structures, not realy in terms of programmed logic. That's also the reason why $00 is transfered during read operations, as clearing the transmit register can be obtained by selecting the right 74xxx, while puting toward all ones would require another circuit. You may want to change this to $FF, as it reduces power consumption during reads.

MichaelM wrote:

You need free address space for implementing the memory-mapped interface ideas described. For the M16C5x project the SPI Master interface had to map into a limited number of addresses since it was being added to a PIC soft-core. For the M65C02/M65C02A projects, the SPI Master has a bit more freedom with the address space available. I may consider modifying my M65C02/M65C02A SPI Master core in the manner described..

From an outside view only 3 adresses are needed, as A/B can share adresses with C/D. A/B is write only, while C/D is read only. Even more, they only differ in CS handling, so they can be formulated as one unit. Only E offers additional functionality as it clears (well FFs) the transmit register and starts a transmit cycle as if that value had been writen to like using address A.

The assignment of different registers to one address and seperation via R/W (or WR and RD) is just another exampel of the same philosophy of using _all_ interface lines to transfer meaningful data

In my implementation another register pair is used mostly for the usual aspects (POL,CLK, Device select, etc.) two status bits might be interesting for you:

SUA - SetUp Active - The interface has been configured
TFA - TransFer Active - A Transfer is active (CS pulled)

While TFA basicly just echos the CS state, SUA gets set whenever there is a writing access to (one of) the controll register(s). It only gets reset when CS is released.

These bits come in handy as locking semaphores for concurent access in either a multi tasking enviroment, or just plain foregound and interrupt routines using the same SPI controller. Since they are derivated from and within the used hardware, they provide a fastdevice allocation method with (next to) zero software overhead. Of course it's not perfect or for complicated situations but when your OS is that complex, you'd need additional software layers anyway.

Whenever SUA (or TFA) is set, some process the SPI interface is occupied by some other process. In case of a foreground process this means is needs to sleep a bit (or use whatever machanism are provided) and aquire the device later. Similar during an interrupt (*1). This is all due the fact that SPI transfers can be slowed down, but usually not interrupted. So the SPI controller resource is not shareble during a transfer (*2). As with every semaphor seting, the check/set process needs to be safe within a CLI/SEI region. But thanks to our flags, only during a maximum of 10-14 cycles. (*3)

So even with the control register all I need is 5 controll lines (A1, A0, R/W, Phi, SEL)

MichaelM wrote:

The design philosophy described that the peripheral can relieve the processor of some of the "drudge" work necessary to execute a data exchange is well worth considering, particularly when the potential improvement in throughput is as described above.

Performance it is :twisted:

Anyway, thanks or your interest. It would be nice to see any implementation of yours. And BTW, I liked the way you structured and documented your M16C5x project and files.

H.

--
*1 - Well, with modifying SUA/TFA into some kind of state numbering scheme like Reserved/Setup/Transfer Started/Last Transfer/Released an interrupt could see not only if a the device is in use, but also if the release is immanent (Last Transfer). Considering that going into an interrupt on a 6502 needs at least 7 cycles, plus 3 to check the state bia BIT, it would be only useful with extreme slow devices with less than CPU CLK/16 timing. Even assuming that transfers in a controll environment are usually rather short, thus waiting might be useful, it doesn't releif the need for something to postpone the interrupt in case it's not the last byte. Further more, as slower the devices get, as less acceptable are wait loops for release during an interrupt. So, still not sure if a more complivate status messaging is helpful at all.

*2 - Well, there are devices, most notably memories offering a way to suspend activity via a hold input, just support for this would need quite some hardware (i.e. a last byte buffer for each level on hold) - at least for a general purpose solution - and even more complex software.

*3 - When using a 65C02 it opens an option to add a mechanism using TSB on SUA (in a paralell, writabel address) as a 'device reserved' mechanism to remove the need of CLI/SEI. But that opens another list of constrains for the design.

Last edited by Raffzahn on Thu Feb 16, 2017 9:21 am, edited 2 times in total.

Top

MichaelM

Post subject: Re: A cycle-efficient scheme for e.g. an SPI interface

Posted: Wed Feb 15, 2017 11:11 pm

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL

Raffzahn wrote:

Anyway, thanks or your interest. It would be nice to see any implementation of yours. And BTW, I liked the way you structured and documented your M16C5x project and files.

You can find some discussions of these efforts at the following links: M16C5X and M65C02.

_________________
Michael A.

Top

Page 1 of 1

[ 4 posts ]

Board index » 6502.org Users Forum » Hardware

All times are UTC

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum