CRC32 with "nibble" table

Programming the 6502 microprocessor and its relatives in assembly and other languages.
User avatar
BB8
Posts: 57
Joined: 01 Nov 2020
Location: Tatooine

Re: CRC32 with "nibble" table

Post by BB8 »

The table I published earlier was wrong; I did a CRC0 --> CRC3 shift while it's the opposite:
CRCtable.jpg

Code: Select all

CRC0   = $FB
CRC1   = CRC0 + 1
CRC2   = CRC1 + 1
CRC3   = CRC2 + 1

CRC0h   = $FB
CRC0l   = CRC0h + 1
CRC1h   = CRC0l + 1
CRC1l   = CRC1h + 1

CRC2h   = $61
CRC2l   = CRC2h + 1
CRC3h   = CRC2l + 1
CRC3l   = CRC3h + 1

count   = $2

CRC_START = $A000
CRC_COUNT = $2000



*=$0810

        sei
                
        lda #>CRC_START
        ldx #>CRC_COUNT

CalcCRC
        ; A = Start address high
        ; X = count
        sta readvalue+2
        stx count


        lda #$0f
        sta CRC0h
        sta CRC0l
        sta CRC1h
        sta CRC1l
        sta CRC2h
        sta CRC2l
        sta CRC3h
        sta CRC3l

CRC_Loop
readvalue
        lda $FF00
        tay

        and #$0f        ; compute "x"
        eor CRC0l
        tax

        tya             ; compute "y"
        lsr
        lsr
        lsr
        lsr
        eor CRC0h
        eor tab0l,x
        tay

;CRC0l
        lda CRC1l
        eor tab0h,x
        eor tab0l,y
        sta CRC0l
;CRC0h
        lda CRC1h
        eor tab1l,x
        eor tab0h,y
        sta CRC0h
;CRC1l
        lda CRC2l
        eor tab1h,x
        eor tab1l,y
        sta CRC1l
;CRC1h
        lda CRC2h
        eor tab2l,x
        eor tab1h,y
        sta CRC1h
;CRC2l
        lda CRC3l
        eor tab2h,x
        eor tab2l,y
        sta CRC2l
;CRC2h
        lda CRC3h
        eor tab3l,x
        eor tab2h,y
        sta CRC2h
;CRC3l
        lda tab3h,x
        eor tab3l,y
        sta CRC3l
;CRC3h  
        lda tab3h,y
        sta CRC3h

        inc readvalue+1
        bne CRC_Loop

not_high
        inc readvalue+2
        dec count
        bne CRC_Loop

done
        lda CRC0h
        asl
        asl
        asl
        asl
        ora CRC0l
        eor #$ff
        sta CRC0
                lda CRC1h
                asl
                asl
                asl
                asl
                ora CRC1l
                eor #$ff
                sta CRC1
                        lda CRC2h
                        asl
                        asl
                        asl
                        asl
                        ora CRC2l
                        eor #$ff
                        sta CRC2
                                lda CRC3h
                                asl
                                asl
                                asl
                                asl
                                ora CRC3l
                                eor #$ff
                                sta CRC3

        cli
        rts
                
align 256
tab3h   byte $0, $1, $3, $2, $7, $6, $4, $5, $E, $F, $D, $C, $9, $8, $A, $B
tab3l   byte $0, $D, $B, $6, $6, $B, $D, $0, $D, $0, $6, $B, $B, $6, $0, $D
tab2h   byte $0, $B, $6, $D, $D, $6, $B, $0, $B, $0, $D, $6, $6, $D, $0, $B
tab2l   byte $0, $7, $E, $9, $C, $B, $2, $5, $8, $F, $6, $1, $4, $3, $A, $D
tab1h   byte $0, $1, $2, $3, $4, $5, $6, $7, $8, $9, $A, $B, $C, $D, $E, $F
tab1l   byte $0, $0, $0, $0, $1, $1, $1, $1, $3, $3, $3, $3, $2, $2, $2, $2
tab0h   byte $0, $6, $C, $A, $9, $F, $5, $3, $2, $4, $E, $8, $B, $D, $7, $1
tab0l   byte $0, $4, $8, $C, $0, $4, $8, $C, $0, $4, $8, $C, $0, $4, $8, $C

And I'm annoyed: it takes 1239568 cycles, so a gain of less than 40%. I thought it would be way better... :/
CRCcycles.jpg
@Jeff: I'll check out your algo soon... I haven't got to it yet.
@fhw72: thanx... I stole the tables from your code!
User avatar
BB8
Posts: 57
Joined: 01 Nov 2020
Location: Tatooine

Re: CRC32 with "nibble" table

Post by BB8 »

@Jeff: this is yours, and it's around 789K cycles*. Good thinking!

Code: Select all

CRC0   = $FB
CRC1   = CRC0 + 1
CRC2   = CRC1 + 1
CRC3   = CRC2 + 1

count   = $2

CRC_START = $A000
CRC_COUNT = $2000

*=$0810
        sei
                
        lda #>CRC_START
        ldx #>CRC_COUNT

CalcCRC
        ; A = Start address high
        ; X = count
        sta readvalue+2
        stx count

        lda #$FF
        sta CRC0
        sta CRC1
        sta CRC2
        sta CRC3

CRC_Loop
readvalue
        lda $FF00
        eor CRC0
        tay

        and #$0f        ; compute "x"
        tax

        tya             ; compute "y"
        eor tab0l,x
        lsr
        lsr
        lsr
        lsr
        tay

        lda CRC1
        eor tab1l0h,x
        eor tab0,y
        sta CRC0

        lda CRC2
        eor tab2l1h,x
        eor tab1,y
        sta CRC1

        lda CRC3
        eor tab3l2h,x
        eor tab2,y
        sta CRC2

        lda tab3h,x
        eor tab3,y
        sta CRC3
        
        inc readvalue+1
        bne CRC_Loop
not_high
        inc readvalue+2
        dec count
        bne CRC_Loop
done
        ldx #3
loopinv
        lda CRC0,x
        eor #$ff
        sta CRC0,x
        dex
        bpl loopinv

        cli
        rts
                
align 256
tab3h   byte $00, $01, $03, $02, $07, $06, $04, $05, $0E, $0F, $0D, $0C, $09, $08, $0A, $0B
tab3l2h byte $00, $DB, $B6, $6D, $6D, $B6, $DB, $00, $DB, $00, $6d, $B6, $B6, $6D, $00, $DB
tab2l1h byte $00, $71, $E2, $93, $C4, $B5, $26, $57, $88, $F9, $6A, $1B, $4C, $3D, $AE, $DF
tab1l0h byte $00, $06, $0C, $0A, $19, $1F, $15, $13, $32, $34, $3E, $38, $2B, $2D, $27, $21
tab0l   byte $00, $40, $80, $C0, $00, $40, $80, $C0, $00, $40, $80, $C0, $00, $40, $80, $C0

tab3   byte $00, $1D, $3B, $26, $76, $6B, $4D, $50, $ED, $F0, $D6, $CB, $9B, $86, $A0, $BD
tab2   byte $00, $B7, $6E, $D9, $DC, $6B, $B2, $05, $B8, $0F, $D6, $61, $64, $D3, $0A, $BD
tab1   byte $00, $10, $20, $30, $41, $51, $61, $71, $83, $93, $A3, $B3, $C2, $D2, $E2, $F2
tab0   byte $00, $64, $C8, $AC, $90, $F4, $58, $3C, $20, $44, $E8, $8C, $B0, $D4, $78, $1C
CRC_Jeff.jpg
* Edit: actually less than that because the VIC chip steals cycles from the cpu
User avatar
Dr Jefyll
Posts: 3525
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: CRC32 with "nibble" table

Post by Dr Jefyll »

Wow -- this is remarkable, BB8!

I've not yet had a chance to closely examine the code, but it's easy to see that you've taken my "misaligned tables" idea and optimized even further using an idea of your own. Between each LDA and the subsequent STA, you typically get two EORs accomplished. Bravo!

Will you (or someone else) have a chance to actually compute some checksums and see if the results are as expected? Had I written it myself, I know *I* would wanna run tests. Code like this can be tricky to get right. And strik, too, remarked on an attempt that "failed to get working code."

strik, are you following this? In this post you said, "The code when run on the C64 "as is" calculates the CRC32 of the BASIC-ROM 901226-01 as $F833D117." Perhaps you would be willing to run a test (since you apparently already have the necessary ROM data).

-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
fhw72
Posts: 98
Joined: 20 Jul 2017

Re: CRC32 with "nibble" table

Post by fhw72 »

Dr Jefyll wrote:
Wow -- this is remarkable, BB8!

Will you (or someone else) have a chance to actually compute some checksums and see if the results are as expected?
-- Jeff
It's actually the checksum of the BASIC rom.
User avatar
BB8
Posts: 57
Joined: 01 Nov 2020
Location: Tatooine

Re: CRC32 with "nibble" table

Post by BB8 »

I did the test already: in the Vice screenshots you can see the memory dump via "m fb fe" and the result is 17 d1 33 f8 (those variable are in little-endian order), and it checks with OP's "$F833D117"
fhw72
Posts: 98
Joined: 20 Jul 2017

Re: CRC32 with "nibble" table

Post by fhw72 »

BB8 wrote:
I did the test already: in the Vice screenshots you can see the memory dump via "m fb fe" and the result is 17 d1 33 f8 (those variable are in little-endian order), and it checks with OP's "$F833D117"
@BB8: What assembler did you use to translate your file?
User avatar
Dr Jefyll
Posts: 3525
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: CRC32 with "nibble" table

Post by Dr Jefyll »

Okay, sorry, BB8... I didn't closely examine your code, and I didn't look at the screenshot at all. :oops:

But I certainly *did* notice when you said it runs in around 789K cycles! :shock: (actually even less because of the VIC chip stealing cycles)

Congrats again on taking the "misaligned table" idea to the next level!

-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BB8
Posts: 57
Joined: 01 Nov 2020
Location: Tatooine

Re: CRC32 with "nibble" table

Post by BB8 »

fhw72 wrote:
What assembler did you use to translate your file?
That's CBMprgStudio, tailored for Commodores
CMBstudio.jpg
Dr Jefyll wrote:
in around 789K cycles!
It's less than that, around 745Kc: the other value came from the cycle count by Vice (the emulator) but the cpu gets stunned by the Video chip while drawing the video, so the actual cpu time is lower.
User avatar
Dr Jefyll
Posts: 3525
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: CRC32 with "nibble" table

Post by Dr Jefyll »

BB8 wrote:
around 745Kc [...] the cpu gets stunned by the Video chip while drawing the video
Stunned? Interesting choice of words! :lol:

In your test you've use zero-page for the CRC, which gives us an apples-to-apples comparison with the already-optimized code in this post upthread. 745,000 cycles vs 2,001,525 means we're now about 2.69 times as fast, albeit at the cost of an extra 80 bytes in the lookup tables -- they've grown from 64 bytes to 144.

-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: CRC32 with "nibble" table

Post by BigEd »

Excellent code and algorithm optimisation thread!
User avatar
Virtual1
Posts: 17
Joined: 01 Aug 2025

Re: CRC32 with "nibble" table

Post by Virtual1 »

just wondering, do you NEED it to be THE crc-32 digest, or do you just need something to do a decent job of detecting changes/ comparing blocks?

I needed a basic digest function recently, and coded something that produces what appears to be a good 2-byte digest of an arbitrary size portion of memory. I just ran it on $0100-BFFF, took 4.3 seconds. (1mhz clock speed) I don't know if you'd consider that good or bad, or if you need high quality or the actual CRC32 function, but maybe something like this will work better for you? ($C6 bytes long, but that includes printing results as well as a safety to avoid soft switches, so it could be cut down quite a lot)
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: CRC32 with "nibble" table

Post by BigDumbDinosaur »

Virtual1 wrote:
just wondering, do you NEED it to be THE crc-32 digest, or do you just need something to do a decent job of detecting changes/ comparing blocks?

The original poster hasn’t visited here since June 11.  Seems as though he hasn’t been following this.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: CRC32 with "nibble" table

Post by BigEd »

That's no time at all in this space.
User avatar
Virtual1
Posts: 17
Joined: 01 Aug 2025

Re: CRC32 with "nibble" table

Post by Virtual1 »

BigEd wrote:
That's no time at all in this space.
My initial reason for making it is I needed a way to compare operating systems on different disk images. So i was looking for a "nearly instant" checksum on 256 byte blocks. I did actually consider CRC32, but for one a 32 bit checksum on 256 bytes seems like overkill, and then I looked at the complexity of it and thought no that's overkill too, I want FAST. I made the call that a single byte wasn't good enough, so I went with a two byte digest. Looks like a good compromise.
fhw72
Posts: 98
Joined: 20 Jul 2017

Re: CRC32 with "nibble" table

Post by fhw72 »

Virtual1 wrote:
BigEd wrote:
That's no time at all in this space.
My initial reason for making it is I needed a way to compare operating systems on different disk images. So i was looking for a "nearly instant" checksum on 256 byte blocks. I did actually consider CRC32, but for one a 32 bit checksum on 256 bytes seems like overkill, and then I looked at the complexity of it and thought no that's overkill too, I want FAST. I made the call that a single byte wasn't good enough, so I went with a two byte digest. Looks like a good compromise.
CRC16 would also be 2-byte.... BTW. :D

But getting back to the main point of this thread:

It was not about getting a kind of checksum but getting a CRC32 computed as fast as possible on the 6502.
Post Reply