Page 1 of 2

Bit Counting

Posted: Tue Aug 28, 2007 2:02 pm
by BitWise
Some code for the library:

Code: Select all

;======================================================
; Bit Counting
;------------------------------------------------------
; This file contains various 6502 coded algorithms that
; count the number of one bits in a byte value.
;
; The different CPU performance and memory usage
; characteristics have been calculated for each method
; to show how speed and space trade offs.
;
; These examples have been developed for use with
; Michal Kowalski's 6502 emulator.
;
; If you can think of any other ways of calculating
; this or speeding up these routines please let me
; know.
;
; Andrew Jacobs
;======================================================
; Revision History:
; 2007-08-28 AJ First release.
;------------------------------------------------------

        .ORG $00

SCRATCH .DS  2                  ; Some scratch memory

        .ORG $8000
        .START $8000

;======================================================
; Bit Counting Via a Lookup Table
;------------------------------------------------------
; The fastest way of figuring out how many bits are set
; in a byte is to use a precalculated table and index
; into it using either X or Y.
;
; This method is so simple that you don't need to put
; the code in a subroutine for reuse.
;
; The only downside to this is that it needs a whole
; 256 byte page to hold the lookup table.
;
; Time: 6 cycles (+1 if data table crosses page)
; Size: 256 bytes (for data)

ByteAtATime:
        LDX #54                 ; Load value to lookup
        LDA ByteBitCounts,X     ; and get its bit count

        NOP                     ; A contains bit count

; We can add together the bit counts of several bytes
; to compute the total number of bits in larger values.

        LDA #$12                ; Set up an example
        STA SCRATCH+0           ; 16-bit value
        LDA #$34
        STA SCRATCH+1

        CLC                     ; Clear carry
        LDX SCRATCH+0           ; Fetch first data byte
        LDA ByteBitCounts,X     ; bit count
        LDX SCRATCH+1           ; And add in the next
        ADC ByteBitCounts,X

        NOP                     ; A has bit total count

        JMP HalfTheDataTable    ; Jump to next method

; Look up table of bit counts in the values $00-$FF

ByteBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8

;======================================================
; Bit Counting With A Smaller Lookup Table
;------------------------------------------------------
; We can half the size of the look up table by using
; the fact that the bit count for a value V in the
; range $80-$FF will be one more than that for V & $7F.
;
; So if we move bit 7 into the carry then look up the
; bit count for bits 6 downto 0 and add the carry back
; we get the right result.
;
; Time: 20 cycles (+1 if data table crosses page)
; Size: 142 bytes (13  for code, 128 for data)

HalfTheDataTable:
        LDA #$54                ; Load a test value
        JSR BitCountSeven       ; Calculate the count

        NOP                     ; Result is in A

        JMP NybbleAtATime       ; Jump to next method

;------------------------------------------------------

BitCountSeven:
        CLC                     ; Ensure bit 0 is safe
        ADC #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
        RTS

; Look up table of bit counts in the values $00-$7F

SevenBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7

;======================================================
; Bit Counting With An Even Smaller Lookup Table
;------------------------------------------------------
; If we are prepared to do a small amount of shifting
; then we can reduce the data table to just 16 bytes.
;
; This alogrithm breaks the value into two $0-$F nybble
; values and adds the bit counts for these together.
;
; Time: 39 cycles (+2 if data table crosses page)
; Size: 36 bytes (20  for code, 16 for data)

NybbleAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountNybble      ; Calculate the count

        NOP                     ; Result is in A

        JMP BitAtATime          ; Jump to next method

;------------------------------------------------------

BitCountNybble:
        PHA                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        AND #$0F
        TAY                     ; And save in Y
        PLA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
        RTS

; Look up table of bit counts in the values $00-$0F

NybbleBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4


;======================================================
; Bit Counting Using Shifts
;------------------------------------------------------
; The smallest method for count bits shifts the value
; left one bit at a time and counts each one bit that
; is shifted out.
;
; If we use ASL to perform the shift then the value
; will become zero after at most eight iterations. By
; looking for see when the value has reached zero we
; can stop iterating as soon as there are no bits left
; to shift out.
;
; Time: 17 to 91 cycles (depending on the value)
; Size: 12 bytes

BitAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountIterative   ; Calculate the count

        NOP                     ; Result is in A

        JMP DivideAndConquer    ; Jump to next method

;------------------------------------------------------

BitCountIterative:
        LDX #0                  ; Clear bit count
BitShiftLoop:
        ASL                     ; Shift a bit
        BCC BitCountSkip        ; Did a one shift out?
        INX                     ; Add one to count
        ORA #0                  ; Retest for zero
BitCountSkip:
        BNE BitShiftLoop        ; Repeat till zero
        TXA                     ; Move count to A
        RTS

;======================================================
; Divide and Conquer
;------------------------------------------------------
; This alogrithm works by aligning and adding parts of
; the value until finally a bit count is produced.
;
; +--------------------------------
; | 0 + 1 | 1 + 0 | 0 + 1 | 1 + 1 |
; +---v-------v-------v-------v----
; | 0   1 + 0   1 | 0   1 + 1   0 |
; +-------v---------------v--------
; | 0   0   1   0 + 0   0   1   1 |
; +---------------v----------------
; | 0   0   0   0   0   1   0   1 |
; +--------------------------------
;
; This method always takes the same amount of time
; regardless of the value
;

; Time: 66 cycles
; Size: 37 bytes

DivideAndConquer:
        LDA #$54                ; Load a test value
        JSR BitCountParallel    ; Calculate the count

        NOP                     ; Result is in A

        BRK                     ; All done

;------------------------------------------------------

BitCountParallel:
        PHA
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        PLA
        LSR                     ; Strip out even bits
        AND #$55
        CLC                     ; And add together
        ADC SCRATCH
        PHA
        AND #$33                ; Strip out odd pairs
        STA SCRATCH
        PLA
        LSR
        LSR
        AND #$33                ; Strip out even pairs
        CLC
        ADC SCRATCH             ; And add together
        STA SCRATCH
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        CLC
        ADC SCRATCH             ; Add to lo nybble
        AND #$0F                ; And prune to result
        RTS

        .END

Re: Bit Counting

Posted: Tue Aug 28, 2007 7:26 pm
by Thowllly
BitWise wrote:

Code: Select all

BitCountSeven:
        CLC                     ; Ensure bit 0 is safe
        ADC #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
faster/smaller:

Code: Select all

BitCountSeven:
        CMP #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
even faster/smaller:

Code: Select all

BitCountSeven:
        LSR                     ; Put bit 0 into carry, bit 7-1 in bit 6-0 and value 0 in bit 7
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 0

Posted: Tue Aug 28, 2007 7:58 pm
by BitWise
Thanks, I'll tweak that code.

I also noticed a couple of places where I could use TAX/TXA instead of PHA/PLA which saves a few cycles.

Posted: Tue Aug 28, 2007 11:42 pm
by dclxvi
In BitCountNybble, the four LSRs shift zeros into bits 7-3, so the subsequent AND #$0F can be deleted.

BitCountIterative will shorter space-wise (and faster for most cases) with:

Code: Select all

BitCountIterative:
        LDX #$FF                ; Reset bit count
BitCountIncrement:
        INX                     ; Add one to count
BitShiftLoop:
        ASL                     ; Shift a bit and test for zero
        BCS BitCountIncrement   ; Did a one shift out?
BitCountSkip:
        BNE BitShiftLoop        ; Repeat till zero
        TXA                     ; Move count to A
        RTS
In BitCountParallel, if you replace:

Code: Select all

        LSR                     ; Strip out even bits
        AND #$55
        CLC                     ; And add together
with:

Code: Select all

        AND #$AA                ; Strip out even bits
        LSR
then a zero will be shifted into the carry and the subsequent CLC is unneccessary. You can use an AND #$CC followed by two LSRs to eliminate a CLC later on as well.

Also, the final CLC is unnecessary since bit 3 of the accumulator (which will be shifted into the carry by the four LSRs) will always be zero. (Test it for all 256 cases if you don't believe me.)

Finally, the following is only 33 bytes, uses no tables, no temporary RAM variables, and no registers (other than A, of course). (It doesn't run in constant time, though.)

Code: Select all

   LSR
   BCC L1
   ADC #$3F
L1 LSR
   BCC L2
   ADC #$1F
L2 LSR
   BCC L3
   ADC #$0F
L3 LSR
   BCC L4
   ADC #$07
L4 LSR
   BCC L5
   ADC #$03
L5 LSR
   BCC L6
   ADC #$01
L6 LSR
   ADC #$00

Re: Bit Counting

Posted: Wed Aug 29, 2007 10:37 am
by Thowllly
BitWise wrote:

Code: Select all

BitCountNybble:
        PHA                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        AND #$0F
        TAY                     ; And save in Y
        PLA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
Went back to look at this thread today, and couldn't help myself:

Code: Select all

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        TAY                     ; And save in Y
        TXA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
PHA/PLA can be replaced, like you said. The first AND can also be removed because the 4 LSRs shift in 4 zeroes. You can save 2 more cycles like this:

Code: Select all

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        TAY                     ; And save in Y
        TXA                     ; Recover value
        AND #$07                ; Strip out bit 2-0, bit 3 is already in carry.
        TAX                     ; And save in X
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
I think that should be correct. But it is less clear...

and dclxvi, that code at the end of your post actually made me smile, nice :)

Posted: Wed Aug 29, 2007 11:45 pm
by BitWise
Heres the revised code.

Code: Select all

;======================================================
; Bit Counting
;------------------------------------------------------
; This file contains various 6502 coded algorithms that
; count the number of one bits in a byte value.
;
; The different CPU performance and memory usage
; characteristics have been calculated for each method
; to show the speed and space trade offs.
;
; If you have the space to spare than the 256 byte
; lookup table is the fastest otherwise the in place
; shift/add method is a good allrounder.
;
; These examples have been developed for use with
; Michal Kowalski's 6502 emulator.
;
; If you can think of any other ways of calculating
; this or speeding up these routines please let me
; know.
;
; Compiled by Andrew Jacobs (BitWise) with additions
; from members of the 6502.org forum.
;
; Thanks to Thowllly & dclxvi for code speed ups and
; dclxvi for the in place shift/add method.
;======================================================
; Revision History:
; 2007-08-28 AJ First release.
; 2007-08-30 AJ Included forum suggestions.
;------------------------------------------------------

        .ORG $00

SCRATCH .DS  2                  ; Some scratch memory

        .ORG $8000
        .START $8000

        JMP ByteAtATime         ; Jump to first example

;======================================================
; Bit Counting Via a Lookup Table
;------------------------------------------------------
; The fastest way of figuring out how many bits are set
; in a byte is to use a precalculated table and index
; into it using either X or Y.
;
; This method is so simple that you don't need to put
; the code in a subroutine for reuse.
;
; The only downside to this is that it needs a whole
; 256 byte page to hold the lookup table.
;
; Time: 6 cycles (+1 if data table crosses page)
; Size: 256 bytes (for data)

; Look up table of bit counts in the values $00-$FF

ByteBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8

;------------------------------------------------------

ByteAtATime:
        LDX #$54                ; Load value to lookup
        LDA ByteBitCounts,X     ; and get its bit count

        NOP                     ; A contains bit count

; We can add together the bit counts of several bytes
; to compute the total number of bits in larger values.

        LDA #$12                ; Set up an example
        STA SCRATCH+0           ; 16-bit value
        LDA #$34
        STA SCRATCH+1

        CLC                     ; Clear carry
        LDX SCRATCH+0           ; Fetch first data byte
        LDA ByteBitCounts,X     ; bit count
        LDX SCRATCH+1           ; And add in the next
        ADC ByteBitCounts,X

        NOP                     ; A has bit total count

        JMP HalfTheDataTable    ; Jump to next method

;======================================================
; Bit Counting With A Smaller Lookup Table
;------------------------------------------------------
; We can half the size of the look up table by using
; the fact that the bit count for a value V in the
; range $80-$FF will be one more than that for V & $7F.
;
; Infact it doesn't matter which bit we use providing
; we are left with an easy to access 7 bits afterwards.
;
; So if we move bit 0 into the carry using LSR and then
; look up the bit count for bits 6 downto 0 and add the
; carry back we get the right result.
;
; Time: 16 cycles (+1 if data table crosses page)
; Size: 136 bytes (8  for code, 128 for data)

BitCountSeven:
        LSR                     ; Put bit 0 into carry
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
        RTS

; Look up table of bit counts in the values $00-$7F

SevenBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7

;------------------------------------------------------

HalfTheDataTable:
        LDA #$54                ; Load a test value
        JSR BitCountSeven       ; Calculate the count

        NOP                     ; Result is in A

        JMP NybbleAtATime       ; Jump to next method

;======================================================
; Bit Counting With An Even Smaller Lookup Table
;------------------------------------------------------
; If we are prepared to do a small amount of shifting
; then we can reduce the data table to 16 bytes.
;
; This algorithm breaks the value into three parts. The
; hi nybble is shifted down and placed in Y. The shifts
; leave bit 3 of the value in the carry so only bits
; 2 to 0 need to be extracted into X. The bit counts
; for the X & Y values are looked up and added which
; combines them with the bit in C.
;
; Time: 31 cycles (+2 if data table crosses page)
; Size: 33 bytes (17  for code, 16 for data)

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR                     ; Leave <3> in C
        TAY                     ; And save <7:4> in Y
        TXA                     ; Recover value
        AND #$07                ; Put out <2:0> in X
        TAX                     ; And save in X
        LDA NybbleBitCounts,Y   ; Fetch count for Y
        ADC NybbleBitCounts,X   ; Add count for X & C
        RTS

; Look up table of bit counts in the values $00-$0F

NybbleBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4

;------------------------------------------------------

NybbleAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountNybble      ; Calculate the count

        NOP                     ; Result is in A

        JMP BitAtATime          ; Jump to next method

;======================================================
; Bit Counting Using Shifts
;------------------------------------------------------
; The smallest method for count bits shifts the value
; left one bit at a time and counts each one bit that
; is shifted out.
;
; If we use ASL to perform the shift then the value
; will become zero after at most eight iterations. By
; looking for see when the value has reached zero we
; can stop iterating as soon as there are no bits left
; to shift out.
;
; Time: 18 to 76 cycles (depending on the value)
; Size: 10 bytes

BitCountIterative:
        LDX #$FF                ; Set count to -1
.Incr:  INX                     ; Add one to count
.Loop:  ASL                     ; Shift by one bit
        BCS .Incr               ; Count one bits
        BNE .Loop               ; Repeat till zero
        TXA                     ; Move count to A
        RTS

;------------------------------------------------------

BitAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountIterative   ; Calculate the count

        NOP                     ; Result is in A

        JMP ShiftAndAdd         ; Jump to next method

;======================================================
; Shift and Add
;------------------------------------------------------
; This method uses the most significant bits of the
; accumulator to keep track of the bit count as the
; remaining data value bits are shifted out of least
; significant position.
;
; The follow illustrates the intermediate states of the
; calculation for the sample $54 value. The : shows the
; division between the running count and the remaining
; uncounted bits.
;
; +-------------------------------+
; | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |  Initial value
; +-------------------------------+
; | 0 | 0 : 1 | 0 | 1 | 0 | 1 | 0 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 : 1 | 0 | 1 | 0 | 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 : 1 | 0 | 1 | 0 |  C = 1
; | 0 | 0 | 0 | 1 : 1 | 0 | 1 | 0 |  Add $0F+C ($10)
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 1 : 1 | 0 | 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 1 : 1 | 0 |  C = 1
; | 0 | 0 | 0 | 0 | 1 | 0 : 1 | 0 |  Add $03+1 ($04)
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 1 | 0 : 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 :  C = 1
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |  Add $00+C ($01)
; +-------------------------------+
;
; Time: 40 to 46 cycles (depending on the value)
; Size: 34 bytes

BitCountShiftAdd:
        LSR                     ; Is bit 0 set?
        BCC .Skip0
        ADC #$3F                ; Yes, add $40
.Skip0  LSR                     ; Is bit 1 set?
        BCC .Skip1
        ADC #$1F                ; Yes, add $20
.Skip1  LSR                     ; Is bit 2 set?
        BCC .Skip2
        ADC #$0F                ; Yes, add $10
.Skip2  LSR                     ; Is bit 3 set?
        BCC .Skip3
        ADC #$07                ; Yes, add $08
.Skip3  LSR                     ; Is bit 4 set?
        BCC .Skip4
        ADC #$03                ; Yes, add $04
.Skip4  LSR                     ; Is bit 5 set?
        BCC .Skip5
        ADC #$01                ; Yes, add $02
.Skip5  LSR
        ADC #$00                ; Add bit 6 to count
        RTS

;------------------------------------------------------

ShiftAndAdd:
        LDA #$54                ; Load a test value
        JSR BitCountShiftAdd    ; Calculate the count

        NOP                     ; Result is in A

        JMP DivideAndConquer    ; Jump to next method

;======================================================
; Divide and Conquer
;------------------------------------------------------
; This algorithm works by aligning and adding parts of
; the value until finally the bit count is produced.
;
; +--------------------------------
; | 0 + 1 | 1 + 0 | 0 + 1 | 1 + 1 |
; +---v-------v-------v-------v----
; | 0   1 + 0   1 | 0   1 + 1   0 |
; +-------v---------------v--------
; | 0   0   1   0 + 0   0   1   1 |
; +---------------v----------------
; | 0   0   0   0   0   1   0   1 |
; +--------------------------------
;
; This method always takes the same amount of time
; regardless of the value
;
; Time: 50 cycles
; Size: 34 bytes

BitCountParallel:
        TAX
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        TXA
        AND #$AA                ; Strip out even bits
        LSR
        ADC SCRATCH             ; And add together
        TAX
        AND #$33                ; Strip out odd pairs
        STA SCRATCH
        TXA
        AND #$CC                ; Strip out even pairs
        LSR
        LSR
        ADC SCRATCH             ; And add together
        STA SCRATCH
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        ADC SCRATCH             ; Add to lo nybble
        AND #$0F                ; And prune to result
        RTS

;------------------------------------------------------

DivideAndConquer:
        LDA #$54                ; Load a test value
        JSR BitCountParallel    ; Calculate the count

        NOP                     ; Result is in A

        BRK                     ; All done

        .END
I think that includes all the comments. Cycle counts where done by hand so might be a bit out.

Edit: Fixed typos.

Posted: Tue Oct 23, 2007 9:56 pm
by debounce
In BitCountParallel, you could replace the beginning

Code: Select all

BitCountParallel:
        TAX
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        TXA
        AND #$AA                ; Strip out even bits
        LSR
        ADC SCRATCH             ; And add together
with

Code: Select all

BitCountParallel:
        STA SCRATCH
        AND #$AA                ; get fudge factor for each pair
        LSR                     ; divide by 2 (c=0)
        SBC SCRATCH             ; oops, no reverse-subtract!
        EOR #$FF                ; take one's complement
saving 2 bytes. Inspired by some code 'bogax' gave to help me!

Posted: Wed Oct 31, 2007 11:20 pm
by dclxvi
Here's another way to count bits:

Code: Select all

   TAX
   BEQ L2
   LDX #0
   SEC
L1 INX
   STA SCRATCH
   SBC #1
   AND SCRATCH
   BNE L1
   TXA
L2 RTS
The result is returned in A (and X too).

On a 65C02, the SBC #1 can be replaced with DEC A, and the SEC can be eliminated.

Posted: Sat Nov 21, 2009 10:03 pm
by bogax
Guess I'm a bit late to this party :)

Code: Select all

 asl
 php
 asl
 rol
 adc #$80
 and #$7F
 adc #$C0
 and #$3F
 adc #$E0
 and #$1F
 adc #$F0
 and #$0F
 adc #$F8
 and #$07
 adc #$00
 plp
 adc #$00
It's not particularly short or fast but it does have the virtues
of running in constant time and only messing with the accumulator.


I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)

Small Code is Good Code

Posted: Sun Nov 22, 2009 3:33 am
by BigDumbDinosaur
How about this one:

Code: Select all

         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits
Running the above with the worst-case test value requires 81 clock cycles. The best case, all bits set, requires 63 clocks. Naturally, it'll run a tad slower if a page boundary gets crossed by a branch, in which case, best case will be 70 clocks and worst case, 90 clocks.

Posted: Sun Nov 22, 2009 4:34 am
by Nightmaretony
bogax wrote:
Guess I'm a bit late to this party :)

Code: Select all

 asl
 php
 asl
 rol
 adc #$80
 and #$7F
 adc #$C0
 and #$3F
 adc #$E0
 and #$1F
 adc #$F0
 and #$0F
 adc #$F8
 and #$07
 adc #$00
 plp
 adc #$00
It's not particularly short or fast but it does have the virtues
of running in constant time and only messing with the accumulator.


I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)

Your code would actually work out quite well for later processors. A friend has a programming blog for her software library, and it mentioned about branching, so your method without using any branches proves interesting.

http://burgerbecky.livejournal.com/tag/code

Re: Small Code is Good Code

Posted: Sun Nov 22, 2009 6:53 am
by bogax
BigDumbDinosaur wrote:
How about this one:

Code: Select all

         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits
You'll fall through the bne and miss counting down on any
remaining unset bits after the last set bit.

eg 1,3,7 will return 8 (no unset bits before the last set bit)
6,B,17 will return 7 (one unset bit before the last set bit)
etc

Re: Small Code is Good Code

Posted: Sun Nov 22, 2009 9:57 pm
by BigDumbDinosaur
bogax wrote:
BigDumbDinosaur wrote:
How about this one:

Code: Select all

         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits
You'll fall through the bne and miss counting down on any
remaining unset bits after the last set bit.

eg 1,3,7 will return 8 (no unset bits before the last set bit)
6,B,17 will return 7 (one unset bit before the last set bit)
etc
Right you are. Guess I shouldn't be writing code after midnight. :)

Here's a different version, which someone may have already posted:

Code: Select all

         lda #%11111111        ;worst-case test value
         ldx #0                ;nbits
;
loop01   lsr a                 ;test for a set bit
         bcc loop02            ;clear
;
         inx                   ;nbits=nbits+1
;
loop02   bne loop01            ;more bits to check
;
         brk                   ;.X = nbits

Posted: Mon Nov 23, 2009 12:10 am
by debounce
bogax wrote:
I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)
Don't be so hard on yourself bogax, it's neat code and a darn sight more elegant than not using the stack, viz:

Code: Select all

                        ; C ---A----
                        ; ? hgfedcba
        asl             ; h gfedcba.
        rol             ; g fedcba.h
        adc     #$80    ; f Fedcbapq   pq = h+g
        adc     #$00    ; . Fedcbapq   pq = h+g+f
        asl
        asl
        rol             ; d cbapq..e
        adc     #$80    ; c Cbapq.tu   tu = e+d
        and     #$7b    ; c .bapq.tu
        adc     #$c0    ; b BBapq.tu   tu = e+d+c
        and     #$3b    ; b ..apq.tu
        adc     #$e0    ; a AAApqstu  stu = e+d+c+b
        and     #$1f    ; a ...pqstu
        adc     #$00    ; . ...pqstu  stu = e+d+c+b+a
        cmp     #$10    ; p ...pqstu
        adc     #$f0    ; p PPPPqstu  stu = e+d+c+b+a+p
        and     #$0f    ; p ....qstu
        adc     #$f8    ; q QQQQQstu  stu = e+d+c+b+a+p+p
        and     #$07    ; q .....stu
        adc     #$00    ; . ....rstu rstu = e+d+c+b+a+p+p+q
--Greg

Posted: Mon Nov 23, 2009 1:08 am
by ElEctric_EyE
Forgive my ignorance,but what is the point of a "CMP #$10" without a branch instruction afterwards?