Bit Counting

Programming the 6502 microprocessor and its relatives in assembly and other languages.
User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Bit Counting

Post by BitWise »

Some code for the library:

Code: Select all

;======================================================
; Bit Counting
;------------------------------------------------------
; This file contains various 6502 coded algorithms that
; count the number of one bits in a byte value.
;
; The different CPU performance and memory usage
; characteristics have been calculated for each method
; to show how speed and space trade offs.
;
; These examples have been developed for use with
; Michal Kowalski's 6502 emulator.
;
; If you can think of any other ways of calculating
; this or speeding up these routines please let me
; know.
;
; Andrew Jacobs
;======================================================
; Revision History:
; 2007-08-28 AJ First release.
;------------------------------------------------------

        .ORG $00

SCRATCH .DS  2                  ; Some scratch memory

        .ORG $8000
        .START $8000

;======================================================
; Bit Counting Via a Lookup Table
;------------------------------------------------------
; The fastest way of figuring out how many bits are set
; in a byte is to use a precalculated table and index
; into it using either X or Y.
;
; This method is so simple that you don't need to put
; the code in a subroutine for reuse.
;
; The only downside to this is that it needs a whole
; 256 byte page to hold the lookup table.
;
; Time: 6 cycles (+1 if data table crosses page)
; Size: 256 bytes (for data)

ByteAtATime:
        LDX #54                 ; Load value to lookup
        LDA ByteBitCounts,X     ; and get its bit count

        NOP                     ; A contains bit count

; We can add together the bit counts of several bytes
; to compute the total number of bits in larger values.

        LDA #$12                ; Set up an example
        STA SCRATCH+0           ; 16-bit value
        LDA #$34
        STA SCRATCH+1

        CLC                     ; Clear carry
        LDX SCRATCH+0           ; Fetch first data byte
        LDA ByteBitCounts,X     ; bit count
        LDX SCRATCH+1           ; And add in the next
        ADC ByteBitCounts,X

        NOP                     ; A has bit total count

        JMP HalfTheDataTable    ; Jump to next method

; Look up table of bit counts in the values $00-$FF

ByteBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8

;======================================================
; Bit Counting With A Smaller Lookup Table
;------------------------------------------------------
; We can half the size of the look up table by using
; the fact that the bit count for a value V in the
; range $80-$FF will be one more than that for V & $7F.
;
; So if we move bit 7 into the carry then look up the
; bit count for bits 6 downto 0 and add the carry back
; we get the right result.
;
; Time: 20 cycles (+1 if data table crosses page)
; Size: 142 bytes (13  for code, 128 for data)

HalfTheDataTable:
        LDA #$54                ; Load a test value
        JSR BitCountSeven       ; Calculate the count

        NOP                     ; Result is in A

        JMP NybbleAtATime       ; Jump to next method

;------------------------------------------------------

BitCountSeven:
        CLC                     ; Ensure bit 0 is safe
        ADC #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
        RTS

; Look up table of bit counts in the values $00-$7F

SevenBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7

;======================================================
; Bit Counting With An Even Smaller Lookup Table
;------------------------------------------------------
; If we are prepared to do a small amount of shifting
; then we can reduce the data table to just 16 bytes.
;
; This alogrithm breaks the value into two $0-$F nybble
; values and adds the bit counts for these together.
;
; Time: 39 cycles (+2 if data table crosses page)
; Size: 36 bytes (20  for code, 16 for data)

NybbleAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountNybble      ; Calculate the count

        NOP                     ; Result is in A

        JMP BitAtATime          ; Jump to next method

;------------------------------------------------------

BitCountNybble:
        PHA                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        AND #$0F
        TAY                     ; And save in Y
        PLA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
        RTS

; Look up table of bit counts in the values $00-$0F

NybbleBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4


;======================================================
; Bit Counting Using Shifts
;------------------------------------------------------
; The smallest method for count bits shifts the value
; left one bit at a time and counts each one bit that
; is shifted out.
;
; If we use ASL to perform the shift then the value
; will become zero after at most eight iterations. By
; looking for see when the value has reached zero we
; can stop iterating as soon as there are no bits left
; to shift out.
;
; Time: 17 to 91 cycles (depending on the value)
; Size: 12 bytes

BitAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountIterative   ; Calculate the count

        NOP                     ; Result is in A

        JMP DivideAndConquer    ; Jump to next method

;------------------------------------------------------

BitCountIterative:
        LDX #0                  ; Clear bit count
BitShiftLoop:
        ASL                     ; Shift a bit
        BCC BitCountSkip        ; Did a one shift out?
        INX                     ; Add one to count
        ORA #0                  ; Retest for zero
BitCountSkip:
        BNE BitShiftLoop        ; Repeat till zero
        TXA                     ; Move count to A
        RTS

;======================================================
; Divide and Conquer
;------------------------------------------------------
; This alogrithm works by aligning and adding parts of
; the value until finally a bit count is produced.
;
; +--------------------------------
; | 0 + 1 | 1 + 0 | 0 + 1 | 1 + 1 |
; +---v-------v-------v-------v----
; | 0   1 + 0   1 | 0   1 + 1   0 |
; +-------v---------------v--------
; | 0   0   1   0 + 0   0   1   1 |
; +---------------v----------------
; | 0   0   0   0   0   1   0   1 |
; +--------------------------------
;
; This method always takes the same amount of time
; regardless of the value
;

; Time: 66 cycles
; Size: 37 bytes

DivideAndConquer:
        LDA #$54                ; Load a test value
        JSR BitCountParallel    ; Calculate the count

        NOP                     ; Result is in A

        BRK                     ; All done

;------------------------------------------------------

BitCountParallel:
        PHA
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        PLA
        LSR                     ; Strip out even bits
        AND #$55
        CLC                     ; And add together
        ADC SCRATCH
        PHA
        AND #$33                ; Strip out odd pairs
        STA SCRATCH
        PLA
        LSR
        LSR
        AND #$33                ; Strip out even pairs
        CLC
        ADC SCRATCH             ; And add together
        STA SCRATCH
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        CLC
        ADC SCRATCH             ; Add to lo nybble
        AND #$0F                ; And prune to result
        RTS

        .END
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
Thowllly
Posts: 51
Joined: 22 Oct 2003
Location: Norway

Re: Bit Counting

Post by Thowllly »

BitWise wrote:

Code: Select all

BitCountSeven:
        CLC                     ; Ensure bit 0 is safe
        ADC #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
faster/smaller:

Code: Select all

BitCountSeven:
        CMP #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
even faster/smaller:

Code: Select all

BitCountSeven:
        LSR                     ; Put bit 0 into carry, bit 7-1 in bit 6-0 and value 0 in bit 7
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 0
User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Post by BitWise »

Thanks, I'll tweak that code.

I also noticed a couple of places where I could use TAX/TXA instead of PHA/PLA which saves a few cycles.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
User avatar
dclxvi
Posts: 362
Joined: 11 Mar 2004

Post by dclxvi »

In BitCountNybble, the four LSRs shift zeros into bits 7-3, so the subsequent AND #$0F can be deleted.

BitCountIterative will shorter space-wise (and faster for most cases) with:

Code: Select all

BitCountIterative:
        LDX #$FF                ; Reset bit count
BitCountIncrement:
        INX                     ; Add one to count
BitShiftLoop:
        ASL                     ; Shift a bit and test for zero
        BCS BitCountIncrement   ; Did a one shift out?
BitCountSkip:
        BNE BitShiftLoop        ; Repeat till zero
        TXA                     ; Move count to A
        RTS
In BitCountParallel, if you replace:

Code: Select all

        LSR                     ; Strip out even bits
        AND #$55
        CLC                     ; And add together
with:

Code: Select all

        AND #$AA                ; Strip out even bits
        LSR
then a zero will be shifted into the carry and the subsequent CLC is unneccessary. You can use an AND #$CC followed by two LSRs to eliminate a CLC later on as well.

Also, the final CLC is unnecessary since bit 3 of the accumulator (which will be shifted into the carry by the four LSRs) will always be zero. (Test it for all 256 cases if you don't believe me.)

Finally, the following is only 33 bytes, uses no tables, no temporary RAM variables, and no registers (other than A, of course). (It doesn't run in constant time, though.)

Code: Select all

   LSR
   BCC L1
   ADC #$3F
L1 LSR
   BCC L2
   ADC #$1F
L2 LSR
   BCC L3
   ADC #$0F
L3 LSR
   BCC L4
   ADC #$07
L4 LSR
   BCC L5
   ADC #$03
L5 LSR
   BCC L6
   ADC #$01
L6 LSR
   ADC #$00
Thowllly
Posts: 51
Joined: 22 Oct 2003
Location: Norway

Re: Bit Counting

Post by Thowllly »

BitWise wrote:

Code: Select all

BitCountNybble:
        PHA                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        AND #$0F
        TAY                     ; And save in Y
        PLA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
Went back to look at this thread today, and couldn't help myself:

Code: Select all

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        TAY                     ; And save in Y
        TXA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
PHA/PLA can be replaced, like you said. The first AND can also be removed because the 4 LSRs shift in 4 zeroes. You can save 2 more cycles like this:

Code: Select all

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        TAY                     ; And save in Y
        TXA                     ; Recover value
        AND #$07                ; Strip out bit 2-0, bit 3 is already in carry.
        TAX                     ; And save in X
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
I think that should be correct. But it is less clear...

and dclxvi, that code at the end of your post actually made me smile, nice :)
User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Post by BitWise »

Heres the revised code.

Code: Select all

;======================================================
; Bit Counting
;------------------------------------------------------
; This file contains various 6502 coded algorithms that
; count the number of one bits in a byte value.
;
; The different CPU performance and memory usage
; characteristics have been calculated for each method
; to show the speed and space trade offs.
;
; If you have the space to spare than the 256 byte
; lookup table is the fastest otherwise the in place
; shift/add method is a good allrounder.
;
; These examples have been developed for use with
; Michal Kowalski's 6502 emulator.
;
; If you can think of any other ways of calculating
; this or speeding up these routines please let me
; know.
;
; Compiled by Andrew Jacobs (BitWise) with additions
; from members of the 6502.org forum.
;
; Thanks to Thowllly & dclxvi for code speed ups and
; dclxvi for the in place shift/add method.
;======================================================
; Revision History:
; 2007-08-28 AJ First release.
; 2007-08-30 AJ Included forum suggestions.
;------------------------------------------------------

        .ORG $00

SCRATCH .DS  2                  ; Some scratch memory

        .ORG $8000
        .START $8000

        JMP ByteAtATime         ; Jump to first example

;======================================================
; Bit Counting Via a Lookup Table
;------------------------------------------------------
; The fastest way of figuring out how many bits are set
; in a byte is to use a precalculated table and index
; into it using either X or Y.
;
; This method is so simple that you don't need to put
; the code in a subroutine for reuse.
;
; The only downside to this is that it needs a whole
; 256 byte page to hold the lookup table.
;
; Time: 6 cycles (+1 if data table crosses page)
; Size: 256 bytes (for data)

; Look up table of bit counts in the values $00-$FF

ByteBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8

;------------------------------------------------------

ByteAtATime:
        LDX #$54                ; Load value to lookup
        LDA ByteBitCounts,X     ; and get its bit count

        NOP                     ; A contains bit count

; We can add together the bit counts of several bytes
; to compute the total number of bits in larger values.

        LDA #$12                ; Set up an example
        STA SCRATCH+0           ; 16-bit value
        LDA #$34
        STA SCRATCH+1

        CLC                     ; Clear carry
        LDX SCRATCH+0           ; Fetch first data byte
        LDA ByteBitCounts,X     ; bit count
        LDX SCRATCH+1           ; And add in the next
        ADC ByteBitCounts,X

        NOP                     ; A has bit total count

        JMP HalfTheDataTable    ; Jump to next method

;======================================================
; Bit Counting With A Smaller Lookup Table
;------------------------------------------------------
; We can half the size of the look up table by using
; the fact that the bit count for a value V in the
; range $80-$FF will be one more than that for V & $7F.
;
; Infact it doesn't matter which bit we use providing
; we are left with an easy to access 7 bits afterwards.
;
; So if we move bit 0 into the carry using LSR and then
; look up the bit count for bits 6 downto 0 and add the
; carry back we get the right result.
;
; Time: 16 cycles (+1 if data table crosses page)
; Size: 136 bytes (8  for code, 128 for data)

BitCountSeven:
        LSR                     ; Put bit 0 into carry
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
        RTS

; Look up table of bit counts in the values $00-$7F

SevenBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7

;------------------------------------------------------

HalfTheDataTable:
        LDA #$54                ; Load a test value
        JSR BitCountSeven       ; Calculate the count

        NOP                     ; Result is in A

        JMP NybbleAtATime       ; Jump to next method

;======================================================
; Bit Counting With An Even Smaller Lookup Table
;------------------------------------------------------
; If we are prepared to do a small amount of shifting
; then we can reduce the data table to 16 bytes.
;
; This algorithm breaks the value into three parts. The
; hi nybble is shifted down and placed in Y. The shifts
; leave bit 3 of the value in the carry so only bits
; 2 to 0 need to be extracted into X. The bit counts
; for the X & Y values are looked up and added which
; combines them with the bit in C.
;
; Time: 31 cycles (+2 if data table crosses page)
; Size: 33 bytes (17  for code, 16 for data)

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR                     ; Leave <3> in C
        TAY                     ; And save <7:4> in Y
        TXA                     ; Recover value
        AND #$07                ; Put out <2:0> in X
        TAX                     ; And save in X
        LDA NybbleBitCounts,Y   ; Fetch count for Y
        ADC NybbleBitCounts,X   ; Add count for X & C
        RTS

; Look up table of bit counts in the values $00-$0F

NybbleBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4

;------------------------------------------------------

NybbleAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountNybble      ; Calculate the count

        NOP                     ; Result is in A

        JMP BitAtATime          ; Jump to next method

;======================================================
; Bit Counting Using Shifts
;------------------------------------------------------
; The smallest method for count bits shifts the value
; left one bit at a time and counts each one bit that
; is shifted out.
;
; If we use ASL to perform the shift then the value
; will become zero after at most eight iterations. By
; looking for see when the value has reached zero we
; can stop iterating as soon as there are no bits left
; to shift out.
;
; Time: 18 to 76 cycles (depending on the value)
; Size: 10 bytes

BitCountIterative:
        LDX #$FF                ; Set count to -1
.Incr:  INX                     ; Add one to count
.Loop:  ASL                     ; Shift by one bit
        BCS .Incr               ; Count one bits
        BNE .Loop               ; Repeat till zero
        TXA                     ; Move count to A
        RTS

;------------------------------------------------------

BitAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountIterative   ; Calculate the count

        NOP                     ; Result is in A

        JMP ShiftAndAdd         ; Jump to next method

;======================================================
; Shift and Add
;------------------------------------------------------
; This method uses the most significant bits of the
; accumulator to keep track of the bit count as the
; remaining data value bits are shifted out of least
; significant position.
;
; The follow illustrates the intermediate states of the
; calculation for the sample $54 value. The : shows the
; division between the running count and the remaining
; uncounted bits.
;
; +-------------------------------+
; | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |  Initial value
; +-------------------------------+
; | 0 | 0 : 1 | 0 | 1 | 0 | 1 | 0 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 : 1 | 0 | 1 | 0 | 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 : 1 | 0 | 1 | 0 |  C = 1
; | 0 | 0 | 0 | 1 : 1 | 0 | 1 | 0 |  Add $0F+C ($10)
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 1 : 1 | 0 | 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 1 : 1 | 0 |  C = 1
; | 0 | 0 | 0 | 0 | 1 | 0 : 1 | 0 |  Add $03+1 ($04)
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 1 | 0 : 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 :  C = 1
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |  Add $00+C ($01)
; +-------------------------------+
;
; Time: 40 to 46 cycles (depending on the value)
; Size: 34 bytes

BitCountShiftAdd:
        LSR                     ; Is bit 0 set?
        BCC .Skip0
        ADC #$3F                ; Yes, add $40
.Skip0  LSR                     ; Is bit 1 set?
        BCC .Skip1
        ADC #$1F                ; Yes, add $20
.Skip1  LSR                     ; Is bit 2 set?
        BCC .Skip2
        ADC #$0F                ; Yes, add $10
.Skip2  LSR                     ; Is bit 3 set?
        BCC .Skip3
        ADC #$07                ; Yes, add $08
.Skip3  LSR                     ; Is bit 4 set?
        BCC .Skip4
        ADC #$03                ; Yes, add $04
.Skip4  LSR                     ; Is bit 5 set?
        BCC .Skip5
        ADC #$01                ; Yes, add $02
.Skip5  LSR
        ADC #$00                ; Add bit 6 to count
        RTS

;------------------------------------------------------

ShiftAndAdd:
        LDA #$54                ; Load a test value
        JSR BitCountShiftAdd    ; Calculate the count

        NOP                     ; Result is in A

        JMP DivideAndConquer    ; Jump to next method

;======================================================
; Divide and Conquer
;------------------------------------------------------
; This algorithm works by aligning and adding parts of
; the value until finally the bit count is produced.
;
; +--------------------------------
; | 0 + 1 | 1 + 0 | 0 + 1 | 1 + 1 |
; +---v-------v-------v-------v----
; | 0   1 + 0   1 | 0   1 + 1   0 |
; +-------v---------------v--------
; | 0   0   1   0 + 0   0   1   1 |
; +---------------v----------------
; | 0   0   0   0   0   1   0   1 |
; +--------------------------------
;
; This method always takes the same amount of time
; regardless of the value
;
; Time: 50 cycles
; Size: 34 bytes

BitCountParallel:
        TAX
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        TXA
        AND #$AA                ; Strip out even bits
        LSR
        ADC SCRATCH             ; And add together
        TAX
        AND #$33                ; Strip out odd pairs
        STA SCRATCH
        TXA
        AND #$CC                ; Strip out even pairs
        LSR
        LSR
        ADC SCRATCH             ; And add together
        STA SCRATCH
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        ADC SCRATCH             ; Add to lo nybble
        AND #$0F                ; And prune to result
        RTS

;------------------------------------------------------

DivideAndConquer:
        LDA #$54                ; Load a test value
        JSR BitCountParallel    ; Calculate the count

        NOP                     ; Result is in A

        BRK                     ; All done

        .END
I think that includes all the comments. Cycle counts where done by hand so might be a bit out.

Edit: Fixed typos.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
debounce
Posts: 27
Joined: 23 Nov 2004
Location: London, UK

Post by debounce »

In BitCountParallel, you could replace the beginning

Code: Select all

BitCountParallel:
        TAX
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        TXA
        AND #$AA                ; Strip out even bits
        LSR
        ADC SCRATCH             ; And add together
with

Code: Select all

BitCountParallel:
        STA SCRATCH
        AND #$AA                ; get fudge factor for each pair
        LSR                     ; divide by 2 (c=0)
        SBC SCRATCH             ; oops, no reverse-subtract!
        EOR #$FF                ; take one's complement
saving 2 bytes. Inspired by some code 'bogax' gave to help me!
User avatar
dclxvi
Posts: 362
Joined: 11 Mar 2004

Post by dclxvi »

Here's another way to count bits:

Code: Select all

   TAX
   BEQ L2
   LDX #0
   SEC
L1 INX
   STA SCRATCH
   SBC #1
   AND SCRATCH
   BNE L1
   TXA
L2 RTS
The result is returned in A (and X too).

On a 65C02, the SBC #1 can be replaced with DEC A, and the SEC can be eliminated.
bogax
Posts: 250
Joined: 18 Nov 2003

Post by bogax »

Guess I'm a bit late to this party :)

Code: Select all

 asl
 php
 asl
 rol
 adc #$80
 and #$7F
 adc #$C0
 and #$3F
 adc #$E0
 and #$1F
 adc #$F0
 and #$0F
 adc #$F8
 and #$07
 adc #$00
 plp
 adc #$00
It's not particularly short or fast but it does have the virtues
of running in constant time and only messing with the accumulator.


I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Small Code is Good Code

Post by BigDumbDinosaur »

How about this one:

Code: Select all

         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits
Running the above with the worst-case test value requires 81 clock cycles. The best case, all bits set, requires 63 clocks. Naturally, it'll run a tad slower if a page boundary gets crossed by a branch, in which case, best case will be 70 clocks and worst case, 90 clocks.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
Nightmaretony
In Memoriam
Posts: 618
Joined: 27 Jun 2003
Location: Meadowbrook
Contact:

Post by Nightmaretony »

bogax wrote:
Guess I'm a bit late to this party :)

Code: Select all

 asl
 php
 asl
 rol
 adc #$80
 and #$7F
 adc #$C0
 and #$3F
 adc #$E0
 and #$1F
 adc #$F0
 and #$0F
 adc #$F8
 and #$07
 adc #$00
 plp
 adc #$00
It's not particularly short or fast but it does have the virtues
of running in constant time and only messing with the accumulator.


I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)

Your code would actually work out quite well for later processors. A friend has a programming blog for her software library, and it mentioned about branching, so your method without using any branches proves interesting.

http://burgerbecky.livejournal.com/tag/code
"My biggest dream in life? Building black plywood Habitrails"
bogax
Posts: 250
Joined: 18 Nov 2003

Re: Small Code is Good Code

Post by bogax »

BigDumbDinosaur wrote:
How about this one:

Code: Select all

         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits
You'll fall through the bne and miss counting down on any
remaining unset bits after the last set bit.

eg 1,3,7 will return 8 (no unset bits before the last set bit)
6,B,17 will return 7 (one unset bit before the last set bit)
etc
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Small Code is Good Code

Post by BigDumbDinosaur »

bogax wrote:
BigDumbDinosaur wrote:
How about this one:

Code: Select all

         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits
You'll fall through the bne and miss counting down on any
remaining unset bits after the last set bit.

eg 1,3,7 will return 8 (no unset bits before the last set bit)
6,B,17 will return 7 (one unset bit before the last set bit)
etc
Right you are. Guess I shouldn't be writing code after midnight. :)

Here's a different version, which someone may have already posted:

Code: Select all

         lda #%11111111        ;worst-case test value
         ldx #0                ;nbits
;
loop01   lsr a                 ;test for a set bit
         bcc loop02            ;clear
;
         inx                   ;nbits=nbits+1
;
loop02   bne loop01            ;more bits to check
;
         brk                   ;.X = nbits
x86?  We ain't got no x86.  We don't NEED no stinking x86!
debounce
Posts: 27
Joined: 23 Nov 2004
Location: London, UK

Post by debounce »

bogax wrote:
I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)
Don't be so hard on yourself bogax, it's neat code and a darn sight more elegant than not using the stack, viz:

Code: Select all

                        ; C ---A----
                        ; ? hgfedcba
        asl             ; h gfedcba.
        rol             ; g fedcba.h
        adc     #$80    ; f Fedcbapq   pq = h+g
        adc     #$00    ; . Fedcbapq   pq = h+g+f
        asl
        asl
        rol             ; d cbapq..e
        adc     #$80    ; c Cbapq.tu   tu = e+d
        and     #$7b    ; c .bapq.tu
        adc     #$c0    ; b BBapq.tu   tu = e+d+c
        and     #$3b    ; b ..apq.tu
        adc     #$e0    ; a AAApqstu  stu = e+d+c+b
        and     #$1f    ; a ...pqstu
        adc     #$00    ; . ...pqstu  stu = e+d+c+b+a
        cmp     #$10    ; p ...pqstu
        adc     #$f0    ; p PPPPqstu  stu = e+d+c+b+a+p
        and     #$0f    ; p ....qstu
        adc     #$f8    ; q QQQQQstu  stu = e+d+c+b+a+p+p
        and     #$07    ; q .....stu
        adc     #$00    ; . ....rstu rstu = e+d+c+b+a+p+p+q
--Greg
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Forgive my ignorance,but what is the point of a "CMP #$10" without a branch instruction afterwards?
Post Reply