6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 1:21 pm

All times are UTC




Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Tue May 31, 2016 7:44 am 
Offline

Joined: Tue May 03, 2016 11:32 am
Posts: 41
Hi all,

I am trying to get SPI working on an interface card I am creating for my Apple ][ (actually an Apple IIgs). I am trying to use the implementation by André for his CS/A65, rev 1.1A. I had to re-create the PHI2 signal, as the Apple ][ bus doesn't include it, by using a 74HCT74 as per Apple's TIL #494 info.

I had to rewrite the driver by André, as I use indexed I/O to the card (using /DEVSEL), but it was almost a complete copy-and-paste action.
However it seems the SCLK is out-of-sync with the MOSI/MISO signal(s), but I am at a lost right now of where the culprit would be....

I have attached both the screenshot of my logic analyzer (Bitscope micro/Bitscope Logic combo) and the source, written for ca65.

Does anyone know what might be solution to my 'problem'? I have this issue while both shifting under T2 and PHI2

Thanks in advance!


Attachments:
File comment: Source file of my test driver
main.s.zip [7.52 KiB]
Downloaded 115 times
File comment: Screen capture of Bitscope Logic
Screen Shot 2016-05-26 at 21.08.23.png
Screen Shot 2016-05-26 at 21.08.23.png [ 63.29 KiB | Viewed 5173 times ]
Top
 Profile  
Reply with quote  
PostPosted: Tue May 31, 2016 8:34 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
I'm very curious about this. I've been exploring SPI a teeny bit recently, and I've seen the things Andre has done. I will admit I didn't understand much of it.

It seems that SPI is, in essence, really quite simple: chip select the chip, put the value you want on the data pin, and toggle the clock pin. Rinse and repeat.

If I had two output ports A and B, and had pin 0 of A set to the clock, and pin 0 of B set to the data out line, shouldn't this "just work"?

Code:
      LDA valueToSend ; Value to stream out
      LDY 8           ; Number of bits
LOOP: STA PORT_B      ; Store A to PORT_B, thus setting Pin 0 with LSBit
      LDX #00         ; Toggle the clock
      STX PORT_A      ; down
      LDX #01         ; ... then ...
      STX PORT_A      ; up
      LSR             ; Next bit
      DEY             ; around we go
      BNE LOOP


Assuming that the device is fast enough to keep up, and yea, not the most efficient use of the ports, but… shouldn't this "just work"? Maybe I have the clock backward, but, on principle, isn't it supposed to be "this simple"?

Given that base understanding, I didn't grok the extra complexities and difficulties that Andre was talking about.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 31, 2016 9:19 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Mr. Hartung, SPI goes msb first, unlike RS-232. Otherwise your code looks right. Note that as long as you know the state of the clock pin, and it's on bit 0, you can cycle it without affecting the other bits using just INC<port> and DEC<port>. (The order will depend on the SPI mode.) And if you were going to take up the whole port, you might as well just shift the port itself to shorten the code. I have a section on bit-banging SPI in the 6502 primer's "circuit potpourri" page, at http://wilsonminesco.com/6502primer/pot ... ITBANG_SPI, and sample code to go with it at http://wilsonminesco.com/6502primer/SPI.ASM .

SPI can do input and output at the same time, but I don't think any of the SPI parts I've used took advantage of that. For MISO (the master-in, slave-out data line), it's nice to put it on bit 6 or 7 of a port so you can use the BIT instruction on it, regardless of what's in A, X, or Y, and without affecting them, and then branch on the V or N flag. The VIA's SR can be used for SPI within a heavy set of limitations. Bit-banging OTOH, although not as fast, gives the freedom to do all of the modes without extra logic, can input and output at the same time, and even do different numbers of bits per frame, of interest to me since I recently came across a data converter I'd like to use that can go with 12-bit frames.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 01, 2016 3:43 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
GARTHWILSON wrote:
as long as you know the state of the clock pin, and it's on bit 0, you can cycle it without affecting the other bits using just INC<port> and DEC<port>.
GARTHWILSON wrote:
For MISO (the master-in, slave-out data line), it's nice to put it on bit 6 or 7 of a port so you can use the BIT instruction on it

I'll go slightly OT here and expand on Garth's point. It's definitely worth paying attention to which bits are assigned for the various functions!

Yes, by all means attach the SPI clock to bit0 of the port, because INC<port> (or DEC<port>) is usually faster than the instructions it would take to set (or clear) the clock if it were attached to some other bit. As for MISO (which requires testing when the Master is inputting), I recommend attaching it to bit7. Bit6 is special only if you use the BIT instruction -- and it turns out the BIT instruction can be optimized away! I didn't quite believe it at first, and had to give my head a shake. But I've successfully tested code equivalent to the following. Here's how the port bits are used in the snippets below. Bit1 (MOSI) was chosen arbitrarily; the other two not.

  • bit7 of VIAPORT is in input mode -- attaches to MISO
  • (bits available for other uses, especially as inputs)
  • bit1 of VIAPORT is in output mode -- attaches to MOSI
  • bit0 of VIAPORT is in output mode -- attaches to Ck

The snippets are written as if part of a subroutine, but I omitted details managed by the caller or preceding inline code -- the critical portion is the loop. Firstly we have code for inputting a byte from SPI. As noted in the comments, the INC instruction does two things -- it performs input as well as output. :shock: This makes a BIT instruction unnecessary -- and, in a loop this tight, omitting one instruction means a substantial speedup, percentage-wise. I've marked the cycle counts for the instructions which execute repeatedly. (If your VIA is mapped in zero-page subtract 2 cycles per bit from the times shown.)

Code:
SPIBYTEIN:       LDA #1          ;LDA #1 is for counting

INPUTLOOP:  4    STZ VIAPORT_IO  ;set Ck=0, mosi=0
            6    INC VIAPORT_IO  ;set Ck=1    INC DOES 2 THINGS (sets Ck; also updates N flag per MISO)
           2/3   BMI MISO_IS_1

           2     CLC             ; MISO is =0
           2     ROL A
           3     BCC INPUTLOOP   ;more bits?
                 RTS

MISO_IS_1:   2   SEC             ; MISO is =1
             2   ROL A
             3   BCC INPUTLOOP   ;more bits?
                 RTS
          19/20 <----- cycles per bit
Edit: here is an improved version, described in my subsequent post. BTW both of these input routines treat the output data as don't-care (MOSI is held at 0).
Code:
SPIBYTEIN:      LDA #1          ;LDA #1 is for counting

INPUTLOOP:  4   STZ VIAPORT_IO  ;set Ck=0, mosi=0
            6   INC VIAPORT_IO  ;set Ck=1    INC DOES 2 THINGS (sets Ck; also updates N flag per MISO)
           2/3  BPL MISO_IS_0

           2    SEC             ;MISO is =1
           2    ROL A
           3    BCC INPUTLOOP   ;more bits?
                RTS

MISO_IS_0:   2  ASL A           ;MISO is =0
             3  BCC INPUTLOOP   ;more bits?
                RTS
          19/18 <----- cycles per bit

And here (below) is the routine for outputting a byte to SPI. These routines are the fastest I've managed to achieve so far, but suggestions are welcome of course. (I guess I could unroll the loops... ) :)

Code:
SPIBYTEOUT:     LDY #2          ;Y is used to hold a constant.
                SEC             ;SEC / ROL A is for counting
                ROL A
OUTPUTLOOP: 2/3 BCS MOSI_1

            4   STZ VIAPORT_IO  ;ck=0, mosi=0  STZ updates both Ck & mosi
            6   INC VIAPORT_IO  ;ck=1
            2   ASL A
            3   BNE OUTPUTLOOP  ;more bits?
                RTS

MOSI_1:       4 STY VIAPORT_IO  ;ck=0, mosi=1  STY updates both Ck & mosi
              6 INC VIAPORT_IO  ;ck=1
              2 ASL A
              3 BNE OUTPUTLOOP  ;more bits?
                RTS
          17/18 <----- cycles per bit


In another post I talk about driving a 16is750 UART with this code. Even a 1 MHz 6502 can bit-bang fast enough to achieve 19.2 or even 34.8 kbaud on the asynch connection -- and of course faster CPU's exceed this figure.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Mon Jun 04, 2018 3:30 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 01, 2016 8:43 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
So, in the first routine, A is set to the byte that it read, correct?

In the second routine, A contains the byte to be sent out. The SEC acts as a sentinel for when the byte is done. The ROL A shifts the first bit to send in to Carry.

Ok, I see how this works. This is very clever.

On an older 6502, you can use X = 0 in place of the STZ, yes?

The initial attraction to SPI was that it seems pretty bone simple, and easy to implement, even badly. And you can get reliable mass storage out of it in the end. Because with the SD cards, you can operated on them with 512 byte blocks, and not worry about leveling or any of those other Flash issues. I saw an article using a 8 pin SPI 1MB flash chip, but you have to manage the 4K page erases yourself and, ostensibly, manage the leveling and block use yourself as well. Having to dedicate 4K to a storage buffer just to write a single 512 byte block seemed excessive.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 01, 2016 9:44 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
whartung wrote:
I saw an article using a 8 pin SPI 1MB flash chip, but you have to manage the 4K page erases yourself and, ostensibly, manage the leveling [...]

One of the SPI flash ICs I've used is the 25VF032 four-megabyte flash in an 8-pin SOIC. Its claimed typical write endurance is 100,000 cycles (which is about a thousand times as many as EPROM gives), so I doubt we'll be wearing it out in our applications. I did not use a FAT. There's a 25VF064 also, eight megabytes, which appears to be the most you can get in an 8-pin SOIC.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 01, 2016 10:19 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Dr Jefyll wrote:
and it turns out the BIT instruction can be optimized away!

I like that.

Quote:
Code:
SPIBYTEIN:      LDA #1          ;LDA #1 is for counting

INPUTLOOP:      STZ VIAPORT_IO  ;set Ck=0
                INC VIAPORT_IO  ;set Ck=1    INC DOES 2 THINGS (sets Ck; also updates N flag per MISO)
                BMI MISO_IS_1

                CLC             ; MISO is =0
                ROL A
                <snip>

How 'bout replacing CLC, ROL with ASL.

Also, other bits on the port will probably be in use for unrelated things, so STZ VIAPORT_IO should probably be replaced with DEC VIAPORT_IO to set Ck=0 without disturbing those other things. (It does assume bit 0 had been high though; so it will need initializing before you start into the loop.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2016 12:07 am 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
GARTHWILSON wrote:
whartung wrote:
I saw an article using a 8 pin SPI 1MB flash chip, but you have to manage the 4K page erases yourself and, ostensibly, manage the leveling [...]

One of the SPI flash ICs I've used is the 25VF032 four-megabyte flash in an 8-pin SOIC. Its claimed typical write endurance is 100,000 cycles (which is about a thousand times as many as EPROM gives), so I doubt we'll be wearing it out in our applications. I did not use a FAT. There's a 25VF064 also, eight megabytes, which appears to be the most you can get in an 8-pin SOIC.


Well my complaint was less about the leveling (because of what you mentioned), but more of "if I want to write a 512 byte block, I need to read in 4K first, copy my block in, then erase it from flash, and write it back", meaning I need a 4K buffer (and arguably the original 512 byte block as well) to write the block.

4K just seems like..a lot, especially on a 64K (or less: 32K) machine since it's ostensibly empty 99% of the time. A large gaping hole Just In Case.

The SD card manages all of that for me.

Of course, nowadays, you could probably get a tiny uController for $1 to act as a surrogate to the host and manage that aspect for you. That's effectively what's on a SD card already anyway. Stream 512 bytes to the surrogate, and use it's 4K of RAM to manage the Flash device.

But cost and part count aren't my primary concern.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2016 12:15 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Thanks for the feedback on the code, fellas. Here's the input routine, replacing CLC / ROL with ASL as Garth suggested. FWIW I also flipped the sense of the branch, because "one" bits ended up 3~ slower than "zero" bits, and I think I'd prefer the timing to be more uniform.

As before,
  • bit7 of VIAPORT is in input mode -- attaches to MISO
  • (bits available for other uses, especially as inputs)
  • bit1 of VIAPORT is in output mode -- attaches to MOSI
  • bit0 of VIAPORT is in output mode -- attaches to Ck
Code:
SPIBYTEIN:      LDA #1          ;LDA #1 is for counting

INPUTLOOP:  4   STZ VIAPORT_IO  ;set Ck=0, mosi=0
            6   INC VIAPORT_IO  ;set Ck=1    INC DOES 2 THINGS (sets Ck; also updates N flag per MISO)
           2/3  BPL MISO_IS_0

           2    SEC             ;MISO is =1
           2    ROL A
           3    BCC INPUTLOOP   ;more bits?
                RTS

MISO_IS_0:   2  ASL A           ;MISO is =0
             3  BCC INPUTLOOP   ;more bits?
                RTS
          19/18 <----- cycles per bit


whartung wrote:
On an older 6502, you can use X = 0 in place of the STZ, yes?
Yes -- and your inferences about register usage are also correct. I got a little lazy about commenting that!

GARTHWILSON wrote:
other bits on the port will probably be in use for unrelated things, so STZ VIAPORT_IO should probably be replaced with DEC VIAPORT_IO to set Ck=0 without disturbing those other things.
Doing as you say will avoid disturbing those other bits, so it's a valid suggestion. But it slows the loop down by 2~ per bit, so let's only use it as a last resort. Depending on circumstances, there may be as many as three ways to avoid having writes such as STZ disturb the other bits.

  • there's no problem if the other bits are inputs
  • there's no problem if the other bits are outputs which remain low by default
  • there's no problem if the other bits are outputs that have pullup resistors attached and are controlled by the DDR

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Tue Jun 12, 2018 2:51 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2016 1:16 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
whartung wrote:
Well my complaint was [...] more of "if I want to write a 512 byte block, I need to read in 4K first, copy my block in, then erase it from flash, and write it back", meaning I need a 4K buffer (and arguably the original 512 byte block as well) to write the block.

4K just seems like..a lot, especially on a 64K (or less: 32K) machine since it's ostensibly empty 99% of the time. A large gaping hole Just In Case.

The 25VF032 I mentioned lets you write as little as one byte at a time (if the 4K sector is already erased), but I think the write counts toward the wear-out the same as writing the whole sector. The 25VF064 (8MB, which I have not used) lets you write 256 bytes at a time, but not just a single byte. At Atmel SPI flash I used years ago gave the user access to its own buffers, so IIRC, you could read a 1K sector into its buffer, overwrite only a small part of that onboard buffer, and write it back.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2016 3:52 am 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
GARTHWILSON wrote:
The 25VF032 I mentioned lets you write as little as one byte at a time (if the 4K sector is already erased), but I think the write counts toward the wear-out the same as writing the whole sector. The 25VF064 (8MB, which I have not used) lets you write 256 bytes at a time, but not just a single byte. At Atmel SPI flash I used years ago gave the user access to its own buffers, so IIRC, you could read a 1K sector into its buffer, overwrite only a small part of that onboard buffer, and write it back.


I think that works fine for logging if all you're doing is appending, since you can "write over" the FF's of the erased bits, but if you need to overwrite and update data, then you need to erase the entire page and rewrite it. That's what I was talking about.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2016 4:20 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Yes, that's what flash is all about; but with the Atmel part, at least you wouldn't have to set aside that much memory in your 6502 memory map. It actually had two buffers, so you could have the flash programming from one (which took a few milliseconds) while you're loading or reading the other.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 6:47 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Dr Jefyll wrote:
Code:
SPIBYTEIN:      LDA #1          ;LDA #1 is for counting

INPUTLOOP:  4   STZ VIAPORT_IO  ;set Ck=0, mosi=0
            6   INC VIAPORT_IO  ;set Ck=1    INC DOES 2 THINGS (sets Ck; also updates N flag per MISO)
           2/3  BPL MISO_IS_0

           2    SEC             ;MISO is =1
           2    ROL A
           3    BCC INPUTLOOP   ;more bits?
                RTS

MISO_IS_0:   2  ASL A           ;MISO is =0
             3  BCC INPUTLOOP   ;more bits?
                RTS
          19/18 <----- cycles per bit


Interesting optimisation challenge! How about this version - put MOSI on bit 2, with bit 1 being an unconnected output pin, and then:
Code:
SPIBYTEIN:    LDA #$FE        ; for counting, and as a nice supply of set bits
              SEC
INPUTLOOP: 4  STZ VIAPORT_IO  ; Ck=0, nextmosi=0, mosi=0
           6  ROL VIAPORT_IO  ; set Ck=1, shift MISO into carry
           2  ROL A
           3  BCS INPUTLOOP

That's 15 cycles per loop, I think, plus a few cycles of setup code.

Though I think for SD cards you're meant to be sending set bits - not clear bits - on MOSI when reading data. You could load #$06 into X or Y and use STX or STY instead of STZ.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 11:41 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
xjmaas wrote:
Hi all,

I am trying to get SPI working on an interface card I am creating for my Apple ][ (actually an Apple IIgs). I am trying to use the implementation by André for his CS/A65, rev 1.1A. I had to re-create the PHI2 signal, as the Apple ][ bus doesn't include it, by using a 74HCT74 as per Apple's TIL #494 info.



Which board of the CS/A are you actually referring to? Maybe I'm getting old and forget stuff, but the main board using SPI was using a CPLD for that and no bit banging.

In fact even the Commodore userport USB was using the VIA shift register and no bit banging.


IMHO bit banging is a waste of CPU cycles that can easily be avoided by using a shift register...

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 20, 2023 11:45 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
I guess phi2 needs to be recreated not because it is not included ( it isn't but phi0 should work just as well) but because the leading edge needs to be delayed.

The VIA is a latching in the register select lines when phi2 goes high. Systems with shared busses typically have video during phi2 low, so when phi2 goes high the register select is not ready yet. The C64 has this problem and it seems the Apple II also.

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 39 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: