Page 1 of 1

Fast CRC without tables

Posted: Tue Nov 23, 2004 2:46 pm
by debounce
Hello all, I was browsing the source code repository and remembered I had written a fast CRC routine a few years ago. So I thought I would contribute it, as it might come in useful for anyone developing TCP/IP stacks etc.

These routines are entered with the next byte of data in A and the current CRC in zero page memory. For the first byte of data CRCLO and CRCHI should contain $FF, and in the case of CRC8, CRC should contain $00. On exit A is clobbered and the updated CRC is in the zero page locations.

All code below is relocatable. Sizes and execution times listed do not include the final RTS, and code is assumed to be page-aligned.

The following routine implements a CRC-16 cycle in constant time, without tables.

Code: Select all

CRCLO   EQU $6          ; current value of CRC
CRCHI   EQU $7          ; not necessarily contiguous

CRC16_F:
        EOR CRCHI       ; A contained the data
        STA CRCHI       ; XOR it into high byte
        LSR             ; right shift A 4 bits
        LSR             ; to make top of x^12 term
        LSR             ; ($1...)
        LSR
        TAX             ; save it
        ASL             ; then make top of x^5 term
        EOR CRCLO       ; and XOR that with low byte
        STA CRCLO       ; and save
        TXA             ; restore partial term
        EOR CRCHI       ; and update high byte
        STA CRCHI       ; and save
        ASL             ; left shift three
        ASL             ; the rest of the terms
        ASL             ; have feedback from x^12
        TAX             ; save bottom of x^12
        ASL             ; left shift two more
        ASL             ; watch the carry flag
        EOR CRCHI       ; bottom of x^5 ($..2.)
        TAY             ; save high byte
        TXA             ; fetch temp value
        ROL             ; bottom of x^12, middle of x^5!
        EOR CRCLO       ; finally update low byte
        STA CRCHI       ; then swap high and low bytes
        STY CRCLO
        RTS
36 bytes, 62 cycles, AXYP clobbered.

Alternative ending:

Code: Select all

        ...
        EOR CRCHI       ; bottom of x^5 ($..2.)
        STA CRCHI       ; save high byte
        TXA             ; fetch temp value
        ROL             ; bottom of x^12, middle of x^5!
        EOR CRCLO       ; finally update low byte
        LDX CRCHI       ; then swap high and low bytes
        STA CRCHI
        STX CRCLO
        RTS
39 bytes, 66 cycles, AXP clobbered, Y preserved.

With a starting CRC of $FFFF, the binary string $01 $02 $03 $04 should evaluate to $89C3.

For comparison, a shorter routine can be derived from Paul Guertin's MAKECRCTABLE in the repository:

Code: Select all

CRC16_S:
        LDX #8          ; A contains the data
        EOR CRCHI       ; as we now have the high byte
BITLOOP ASL CRCLO       ; in A, things are the other
        ROL             ; way round from MAKECRCTABLE
        BCC NOADD
        EOR #$10        ; high byte of polynomial
        PHA
        LDA CRCLO
        EOR #$21        ; low byte of polynomial
        STA CRCLO
        PLA
NOADD   DEX
        BNE BITLOOP
        STA CRCHI
        RTS
24 bytes, 128..256 cycles, AXP clobbered

* * *

Here is a CRC-8 routine in near-constant time where memory constraints do not permit tables:

Code: Select all

CRC     EQU $6          ; current value of CRC
CRC8:
        EOR CRC         ; A contained the data
        STA CRC         ; XOR it with the byte
        ASL             ; current contents of A will become x^2 term
        BCC UP1         ; if b7 = 1
        EOR #$07        ; then apply polynomial with feedback
UP1     EOR CRC         ; apply x^1
        ASL             ; C contains b7 ^ b6
        BCC UP2
        EOR #$07
UP2     EOR CRC         ; apply unity term
        STA CRC         ; save result
        RTS
20 bytes, 25..27 cycles, AP clobbered, XY preserved.

With a starting CRC of $00, the binary string $01 $02 $03 $04 should evaluate to $E3.

Variations:
a) If the application demands constant time then BCC *+2 ($90 $00) can be inserted at UP1 and UP2:
24 bytes, 31 cycles.

b) Making a loop from the repeated code is barely worth one's while:
18 bytes, 36..38 cycles, AXP clobbered, Y preserved.

Re: Fast CRC without tables

Posted: Sun Dec 12, 2004 5:29 pm
by Mike Naberezny
debounce wrote:
Hello all, I was browsing the source code repository and remembered I had written a fast CRC routine a few years ago. So I thought I would contribute it, as it might come in useful for anyone developing TCP/IP stacks etc.
Greg,

Thanks for the submission. I've added it to the Source Code Repository.

Best Regards,
Mike

Posted: Thu Dec 16, 2004 3:03 pm
by debounce
Thanks Mike, it's much appreciated.

Greg

ARC CRC

Posted: Wed Aug 10, 2005 2:42 pm
by debounce
I have also written some optimised code for the widely-used ARC algorithm. It has to calculate the parity of an intermediate result, and various parity algorithms can do this in 10 bytes of code or 10 cycles, so code size and speed can be traded-off over a broad range.

Re: ARC CRC

Posted: Tue Aug 16, 2005 7:45 pm
by bogax
debounce wrote:
I have also written some optimised code for the widely-used ARC algorithm. It has to calculate the parity of an intermediate result, and various parity algorithms can do this in 10 bytes of code or 10 cycles, so code size and speed can be traded-off over a broad range.

Here's one (quoted from a previous post)

Noting that:

1) A simple arithmetic sum of two bits is the same as XORing them.
2) You can propagate bits through a word using carries.

like so:

d00a
0111

and the sum has a xor d in the d bit position



You can collect the bits of a word and XOR them together (generate parity for the word)
at the same time

like so (pseudo assember):

Code: Select all

LDA  WORD
ASL
EOR  WORD
AND #b10101010
ADC #b01100110
AND #b10001000
ADC #b01111000
and end up with the parity in the high bit.

If you want to XOR bits across more than one byte you just have to start with the bytes
XORed together in WORD

Posted: Wed Aug 24, 2005 11:25 am
by debounce
13 bytes and 16 cycles - nice! It may beat quite a few algorithms in the table (you can sometimes get in a mindlock that a bit shift == a string of LSRs etc.) so I'd like to replace them with this fragment - attributed of course. Thanks.

Re: ARC CRC

Posted: Fri Aug 26, 2005 4:51 am
by kc5tja
bogax wrote:

Code: Select all

LDA  WORD
ASL
EOR  WORD
AND #b10101010
ADC #b01100110
AND #b10001000
ADC #b01111000
and end up with the parity in the high bit.
I've looked at this, and even tried to work it out on paper when I got the chance at work. But I just don't see how this works. The ASL causes bit 7 of the A register to be put into the carry, so it *seems* to me that bit 7 is not properly taken into account. Can anyone please demonstrate how this really works, as it appears I'm a bit thick-headed in grokking this at the moment. Thanks.

Re: ARC CRC

Posted: Fri Aug 26, 2005 9:44 am
by John West
I've looked at this, and even tried to work it out on paper when I got the chance at work. But I just don't see how this works. The ASL causes bit 7 of the A register to be put into the carry, so it *seems* to me that bit 7 is not properly taken into account.

The EOR uses bit 7, so it is still in there.

There are two tricks being used here. First, you can XOR bits by adding them. Second, you can move a bit into another position by adding it to a string of 1s. This code is doing both in the same instruction.

What we want is the XOR of all 8 bits. The ASL and EOR XOR adjacent bits, so we have the result of four pairs: AxBxCxDx. The AND removes the x bits: A0B0C0D0.

Then add 3 to B and D. If they were 0, this won't affect A and C. If they were 1, A or C will toggle. Now A and C are each the XOR of four bits.

The next AND clears everything but those two, then the final ADC adds C into A, toggling it if C was 1. That's the result we need.

Delightfully clever.