6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 11:25 am

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Fast CRC without tables
PostPosted: Tue Nov 23, 2004 2:46 pm 
Offline

Joined: Tue Nov 23, 2004 2:11 pm
Posts: 27
Location: London, UK
Hello all, I was browsing the source code repository and remembered I had written a fast CRC routine a few years ago. So I thought I would contribute it, as it might come in useful for anyone developing TCP/IP stacks etc.

These routines are entered with the next byte of data in A and the current CRC in zero page memory. For the first byte of data CRCLO and CRCHI should contain $FF, and in the case of CRC8, CRC should contain $00. On exit A is clobbered and the updated CRC is in the zero page locations.

All code below is relocatable. Sizes and execution times listed do not include the final RTS, and code is assumed to be page-aligned.

The following routine implements a CRC-16 cycle in constant time, without tables.

Code:
CRCLO   EQU $6          ; current value of CRC
CRCHI   EQU $7          ; not necessarily contiguous

CRC16_F:
        EOR CRCHI       ; A contained the data
        STA CRCHI       ; XOR it into high byte
        LSR             ; right shift A 4 bits
        LSR             ; to make top of x^12 term
        LSR             ; ($1...)
        LSR
        TAX             ; save it
        ASL             ; then make top of x^5 term
        EOR CRCLO       ; and XOR that with low byte
        STA CRCLO       ; and save
        TXA             ; restore partial term
        EOR CRCHI       ; and update high byte
        STA CRCHI       ; and save
        ASL             ; left shift three
        ASL             ; the rest of the terms
        ASL             ; have feedback from x^12
        TAX             ; save bottom of x^12
        ASL             ; left shift two more
        ASL             ; watch the carry flag
        EOR CRCHI       ; bottom of x^5 ($..2.)
        TAY             ; save high byte
        TXA             ; fetch temp value
        ROL             ; bottom of x^12, middle of x^5!
        EOR CRCLO       ; finally update low byte
        STA CRCHI       ; then swap high and low bytes
        STY CRCLO
        RTS

36 bytes, 62 cycles, AXYP clobbered.

Alternative ending:
Code:
        ...
        EOR CRCHI       ; bottom of x^5 ($..2.)
        STA CRCHI       ; save high byte
        TXA             ; fetch temp value
        ROL             ; bottom of x^12, middle of x^5!
        EOR CRCLO       ; finally update low byte
        LDX CRCHI       ; then swap high and low bytes
        STA CRCHI
        STX CRCLO
        RTS

39 bytes, 66 cycles, AXP clobbered, Y preserved.

With a starting CRC of $FFFF, the binary string $01 $02 $03 $04 should evaluate to $89C3.

For comparison, a shorter routine can be derived from Paul Guertin's MAKECRCTABLE in the repository:

Code:
CRC16_S:
        LDX #8          ; A contains the data
        EOR CRCHI       ; as we now have the high byte
BITLOOP ASL CRCLO       ; in A, things are the other
        ROL             ; way round from MAKECRCTABLE
        BCC NOADD
        EOR #$10        ; high byte of polynomial
        PHA
        LDA CRCLO
        EOR #$21        ; low byte of polynomial
        STA CRCLO
        PLA
NOADD   DEX
        BNE BITLOOP
        STA CRCHI
        RTS

24 bytes, 128..256 cycles, AXP clobbered

* * *

Here is a CRC-8 routine in near-constant time where memory constraints do not permit tables:

Code:
CRC     EQU $6          ; current value of CRC
CRC8:
        EOR CRC         ; A contained the data
        STA CRC         ; XOR it with the byte
        ASL             ; current contents of A will become x^2 term
        BCC UP1         ; if b7 = 1
        EOR #$07        ; then apply polynomial with feedback
UP1     EOR CRC         ; apply x^1
        ASL             ; C contains b7 ^ b6
        BCC UP2
        EOR #$07
UP2     EOR CRC         ; apply unity term
        STA CRC         ; save result
        RTS

20 bytes, 25..27 cycles, AP clobbered, XY preserved.

With a starting CRC of $00, the binary string $01 $02 $03 $04 should evaluate to $E3.

Variations:
a) If the application demands constant time then BCC *+2 ($90 $00) can be inserted at UP1 and UP2:
24 bytes, 31 cycles.

b) Making a loop from the repeated code is barely worth one's while:
18 bytes, 36..38 cycles, AXP clobbered, Y preserved.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 12, 2004 5:29 pm 
Offline
Site Admin
User avatar

Joined: Fri Aug 30, 2002 1:08 am
Posts: 281
Location: Northern California
debounce wrote:
Hello all, I was browsing the source code repository and remembered I had written a fast CRC routine a few years ago. So I thought I would contribute it, as it might come in useful for anyone developing TCP/IP stacks etc.

Greg,

Thanks for the submission. I've added it to the Source Code Repository.

Best Regards,
Mike

_________________
- Mike Naberezny (mike@naberezny.com) http://6502.org


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 16, 2004 3:03 pm 
Offline

Joined: Tue Nov 23, 2004 2:11 pm
Posts: 27
Location: London, UK
Thanks Mike, it's much appreciated.

Greg


Top
 Profile  
Reply with quote  
 Post subject: ARC CRC
PostPosted: Wed Aug 10, 2005 2:42 pm 
Offline

Joined: Tue Nov 23, 2004 2:11 pm
Posts: 27
Location: London, UK
I have also written some optimised code for the widely-used ARC algorithm. It has to calculate the parity of an intermediate result, and various parity algorithms can do this in 10 bytes of code or 10 cycles, so code size and speed can be traded-off over a broad range.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARC CRC
PostPosted: Tue Aug 16, 2005 7:45 pm 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
debounce wrote:
I have also written some optimised code for the widely-used ARC algorithm. It has to calculate the parity of an intermediate result, and various parity algorithms can do this in 10 bytes of code or 10 cycles, so code size and speed can be traded-off over a broad range.



Here's one (quoted from a previous post)

Noting that:

1) A simple arithmetic sum of two bits is the same as XORing them.
2) You can propagate bits through a word using carries.

like so:

d00a
0111

and the sum has a xor d in the d bit position



You can collect the bits of a word and XOR them together (generate parity for the word)
at the same time

like so (pseudo assember):

Code:
LDA  WORD
ASL
EOR  WORD
AND #b10101010
ADC #b01100110
AND #b10001000
ADC #b01111000

and end up with the parity in the high bit.

If you want to XOR bits across more than one byte you just have to start with the bytes
XORed together in WORD


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Aug 24, 2005 11:25 am 
Offline

Joined: Tue Nov 23, 2004 2:11 pm
Posts: 27
Location: London, UK
13 bytes and 16 cycles - nice! It may beat quite a few algorithms in the table (you can sometimes get in a mindlock that a bit shift == a string of LSRs etc.) so I'd like to replace them with this fragment - attributed of course. Thanks.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARC CRC
PostPosted: Fri Aug 26, 2005 4:51 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
bogax wrote:
Code:
LDA  WORD
ASL
EOR  WORD
AND #b10101010
ADC #b01100110
AND #b10001000
ADC #b01111000

and end up with the parity in the high bit.


I've looked at this, and even tried to work it out on paper when I got the chance at work. But I just don't see how this works. The ASL causes bit 7 of the A register to be put into the carry, so it *seems* to me that bit 7 is not properly taken into account. Can anyone please demonstrate how this really works, as it appears I'm a bit thick-headed in grokking this at the moment. Thanks.


Top
 Profile  
Reply with quote  
 Post subject: Re: ARC CRC
PostPosted: Fri Aug 26, 2005 9:44 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
I've looked at this, and even tried to work it out on paper when I got the chance at work. But I just don't see how this works. The ASL causes bit 7 of the A register to be put into the carry, so it *seems* to me that bit 7 is not properly taken into account.[/quote]

The EOR uses bit 7, so it is still in there.

There are two tricks being used here. First, you can XOR bits by adding them. Second, you can move a bit into another position by adding it to a string of 1s. This code is doing both in the same instruction.

What we want is the XOR of all 8 bits. The ASL and EOR XOR adjacent bits, so we have the result of four pairs: AxBxCxDx. The AND removes the x bits: A0B0C0D0.

Then add 3 to B and D. If they were 0, this won't affect A and C. If they were 1, A or C will toggle. Now A and C are each the XOR of four bits.

The next AND clears everything but those two, then the final ADC adds C into A, toggling it if C was 1. That's the result we need.

Delightfully clever.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: