6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 19, 2024 3:47 pm

All times are UTC




Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Bit Counting
PostPosted: Tue Aug 28, 2007 2:02 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Some code for the library:
Code:
;======================================================
; Bit Counting
;------------------------------------------------------
; This file contains various 6502 coded algorithms that
; count the number of one bits in a byte value.
;
; The different CPU performance and memory usage
; characteristics have been calculated for each method
; to show how speed and space trade offs.
;
; These examples have been developed for use with
; Michal Kowalski's 6502 emulator.
;
; If you can think of any other ways of calculating
; this or speeding up these routines please let me
; know.
;
; Andrew Jacobs
;======================================================
; Revision History:
; 2007-08-28 AJ First release.
;------------------------------------------------------

        .ORG $00

SCRATCH .DS  2                  ; Some scratch memory

        .ORG $8000
        .START $8000

;======================================================
; Bit Counting Via a Lookup Table
;------------------------------------------------------
; The fastest way of figuring out how many bits are set
; in a byte is to use a precalculated table and index
; into it using either X or Y.
;
; This method is so simple that you don't need to put
; the code in a subroutine for reuse.
;
; The only downside to this is that it needs a whole
; 256 byte page to hold the lookup table.
;
; Time: 6 cycles (+1 if data table crosses page)
; Size: 256 bytes (for data)

ByteAtATime:
        LDX #54                 ; Load value to lookup
        LDA ByteBitCounts,X     ; and get its bit count

        NOP                     ; A contains bit count

; We can add together the bit counts of several bytes
; to compute the total number of bits in larger values.

        LDA #$12                ; Set up an example
        STA SCRATCH+0           ; 16-bit value
        LDA #$34
        STA SCRATCH+1

        CLC                     ; Clear carry
        LDX SCRATCH+0           ; Fetch first data byte
        LDA ByteBitCounts,X     ; bit count
        LDX SCRATCH+1           ; And add in the next
        ADC ByteBitCounts,X

        NOP                     ; A has bit total count

        JMP HalfTheDataTable    ; Jump to next method

; Look up table of bit counts in the values $00-$FF

ByteBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8

;======================================================
; Bit Counting With A Smaller Lookup Table
;------------------------------------------------------
; We can half the size of the look up table by using
; the fact that the bit count for a value V in the
; range $80-$FF will be one more than that for V & $7F.
;
; So if we move bit 7 into the carry then look up the
; bit count for bits 6 downto 0 and add the carry back
; we get the right result.
;
; Time: 20 cycles (+1 if data table crosses page)
; Size: 142 bytes (13  for code, 128 for data)

HalfTheDataTable:
        LDA #$54                ; Load a test value
        JSR BitCountSeven       ; Calculate the count

        NOP                     ; Result is in A

        JMP NybbleAtATime       ; Jump to next method

;------------------------------------------------------

BitCountSeven:
        CLC                     ; Ensure bit 0 is safe
        ADC #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
        RTS

; Look up table of bit counts in the values $00-$7F

SevenBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7

;======================================================
; Bit Counting With An Even Smaller Lookup Table
;------------------------------------------------------
; If we are prepared to do a small amount of shifting
; then we can reduce the data table to just 16 bytes.
;
; This alogrithm breaks the value into two $0-$F nybble
; values and adds the bit counts for these together.
;
; Time: 39 cycles (+2 if data table crosses page)
; Size: 36 bytes (20  for code, 16 for data)

NybbleAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountNybble      ; Calculate the count

        NOP                     ; Result is in A

        JMP BitAtATime          ; Jump to next method

;------------------------------------------------------

BitCountNybble:
        PHA                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        AND #$0F
        TAY                     ; And save in Y
        PLA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
        RTS

; Look up table of bit counts in the values $00-$0F

NybbleBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4


;======================================================
; Bit Counting Using Shifts
;------------------------------------------------------
; The smallest method for count bits shifts the value
; left one bit at a time and counts each one bit that
; is shifted out.
;
; If we use ASL to perform the shift then the value
; will become zero after at most eight iterations. By
; looking for see when the value has reached zero we
; can stop iterating as soon as there are no bits left
; to shift out.
;
; Time: 17 to 91 cycles (depending on the value)
; Size: 12 bytes

BitAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountIterative   ; Calculate the count

        NOP                     ; Result is in A

        JMP DivideAndConquer    ; Jump to next method

;------------------------------------------------------

BitCountIterative:
        LDX #0                  ; Clear bit count
BitShiftLoop:
        ASL                     ; Shift a bit
        BCC BitCountSkip        ; Did a one shift out?
        INX                     ; Add one to count
        ORA #0                  ; Retest for zero
BitCountSkip:
        BNE BitShiftLoop        ; Repeat till zero
        TXA                     ; Move count to A
        RTS

;======================================================
; Divide and Conquer
;------------------------------------------------------
; This alogrithm works by aligning and adding parts of
; the value until finally a bit count is produced.
;
; +--------------------------------
; | 0 + 1 | 1 + 0 | 0 + 1 | 1 + 1 |
; +---v-------v-------v-------v----
; | 0   1 + 0   1 | 0   1 + 1   0 |
; +-------v---------------v--------
; | 0   0   1   0 + 0   0   1   1 |
; +---------------v----------------
; | 0   0   0   0   0   1   0   1 |
; +--------------------------------
;
; This method always takes the same amount of time
; regardless of the value
;

; Time: 66 cycles
; Size: 37 bytes

DivideAndConquer:
        LDA #$54                ; Load a test value
        JSR BitCountParallel    ; Calculate the count

        NOP                     ; Result is in A

        BRK                     ; All done

;------------------------------------------------------

BitCountParallel:
        PHA
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        PLA
        LSR                     ; Strip out even bits
        AND #$55
        CLC                     ; And add together
        ADC SCRATCH
        PHA
        AND #$33                ; Strip out odd pairs
        STA SCRATCH
        PLA
        LSR
        LSR
        AND #$33                ; Strip out even pairs
        CLC
        ADC SCRATCH             ; And add together
        STA SCRATCH
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        CLC
        ADC SCRATCH             ; Add to lo nybble
        AND #$0F                ; And prune to result
        RTS

        .END

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject: Re: Bit Counting
PostPosted: Tue Aug 28, 2007 7:26 pm 
Offline

Joined: Wed Oct 22, 2003 4:07 am
Posts: 51
Location: Norway
BitWise wrote:
Code:
BitCountSeven:
        CLC                     ; Ensure bit 0 is safe
        ADC #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
faster/smaller:
Code:
BitCountSeven:
        CMP #$80                ; Put bit 7 into carry
        AND #$7F                ; Strip out bits 6-0
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
even faster/smaller:
Code:
BitCountSeven:
        LSR                     ; Put bit 0 into carry, bit 7-1 in bit 6-0 and value 0 in bit 7
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 0


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 28, 2007 7:58 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Thanks, I'll tweak that code.

I also noticed a couple of places where I could use TAX/TXA instead of PHA/PLA which saves a few cycles.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 28, 2007 11:42 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
In BitCountNybble, the four LSRs shift zeros into bits 7-3, so the subsequent AND #$0F can be deleted.

BitCountIterative will shorter space-wise (and faster for most cases) with:

Code:
BitCountIterative:
        LDX #$FF                ; Reset bit count
BitCountIncrement:
        INX                     ; Add one to count
BitShiftLoop:
        ASL                     ; Shift a bit and test for zero
        BCS BitCountIncrement   ; Did a one shift out?
BitCountSkip:
        BNE BitShiftLoop        ; Repeat till zero
        TXA                     ; Move count to A
        RTS


In BitCountParallel, if you replace:

Code:
        LSR                     ; Strip out even bits
        AND #$55
        CLC                     ; And add together


with:

Code:
        AND #$AA                ; Strip out even bits
        LSR


then a zero will be shifted into the carry and the subsequent CLC is unneccessary. You can use an AND #$CC followed by two LSRs to eliminate a CLC later on as well.

Also, the final CLC is unnecessary since bit 3 of the accumulator (which will be shifted into the carry by the four LSRs) will always be zero. (Test it for all 256 cases if you don't believe me.)

Finally, the following is only 33 bytes, uses no tables, no temporary RAM variables, and no registers (other than A, of course). (It doesn't run in constant time, though.)

Code:
   LSR
   BCC L1
   ADC #$3F
L1 LSR
   BCC L2
   ADC #$1F
L2 LSR
   BCC L3
   ADC #$0F
L3 LSR
   BCC L4
   ADC #$07
L4 LSR
   BCC L5
   ADC #$03
L5 LSR
   BCC L6
   ADC #$01
L6 LSR
   ADC #$00


Top
 Profile  
Reply with quote  
 Post subject: Re: Bit Counting
PostPosted: Wed Aug 29, 2007 10:37 am 
Offline

Joined: Wed Oct 22, 2003 4:07 am
Posts: 51
Location: Norway
BitWise wrote:
Code:
BitCountNybble:
        PHA                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        AND #$0F
        TAY                     ; And save in Y
        PLA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
Went back to look at this thread today, and couldn't help myself:
Code:
BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        TAY                     ; And save in Y
        TXA                     ; Recover value
        AND #$0F                ; Strip out lo nybble
        TAX                     ; And save in X
        CLC
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
PHA/PLA can be replaced, like you said. The first AND can also be removed because the 4 LSRs shift in 4 zeroes. You can save 2 more cycles like this:
Code:
BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        TAY                     ; And save in Y
        TXA                     ; Recover value
        AND #$07                ; Strip out bit 2-0, bit 3 is already in carry.
        TAX                     ; And save in X
        LDA NybbleBitCounts,X   ; Then add counts for
        ADC NybbleBitCounts,Y   ; each nybble
I think that should be correct. But it is less clear...

and dclxvi, that code at the end of your post actually made me smile, nice :)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Aug 29, 2007 11:45 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Heres the revised code.
Code:
;======================================================
; Bit Counting
;------------------------------------------------------
; This file contains various 6502 coded algorithms that
; count the number of one bits in a byte value.
;
; The different CPU performance and memory usage
; characteristics have been calculated for each method
; to show the speed and space trade offs.
;
; If you have the space to spare than the 256 byte
; lookup table is the fastest otherwise the in place
; shift/add method is a good allrounder.
;
; These examples have been developed for use with
; Michal Kowalski's 6502 emulator.
;
; If you can think of any other ways of calculating
; this or speeding up these routines please let me
; know.
;
; Compiled by Andrew Jacobs (BitWise) with additions
; from members of the 6502.org forum.
;
; Thanks to Thowllly & dclxvi for code speed ups and
; dclxvi for the in place shift/add method.
;======================================================
; Revision History:
; 2007-08-28 AJ First release.
; 2007-08-30 AJ Included forum suggestions.
;------------------------------------------------------

        .ORG $00

SCRATCH .DS  2                  ; Some scratch memory

        .ORG $8000
        .START $8000

        JMP ByteAtATime         ; Jump to first example

;======================================================
; Bit Counting Via a Lookup Table
;------------------------------------------------------
; The fastest way of figuring out how many bits are set
; in a byte is to use a precalculated table and index
; into it using either X or Y.
;
; This method is so simple that you don't need to put
; the code in a subroutine for reuse.
;
; The only downside to this is that it needs a whole
; 256 byte page to hold the lookup table.
;
; Time: 6 cycles (+1 if data table crosses page)
; Size: 256 bytes (for data)

; Look up table of bit counts in the values $00-$FF

ByteBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
        .BYTE 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8

;------------------------------------------------------

ByteAtATime:
        LDX #$54                ; Load value to lookup
        LDA ByteBitCounts,X     ; and get its bit count

        NOP                     ; A contains bit count

; We can add together the bit counts of several bytes
; to compute the total number of bits in larger values.

        LDA #$12                ; Set up an example
        STA SCRATCH+0           ; 16-bit value
        LDA #$34
        STA SCRATCH+1

        CLC                     ; Clear carry
        LDX SCRATCH+0           ; Fetch first data byte
        LDA ByteBitCounts,X     ; bit count
        LDX SCRATCH+1           ; And add in the next
        ADC ByteBitCounts,X

        NOP                     ; A has bit total count

        JMP HalfTheDataTable    ; Jump to next method

;======================================================
; Bit Counting With A Smaller Lookup Table
;------------------------------------------------------
; We can half the size of the look up table by using
; the fact that the bit count for a value V in the
; range $80-$FF will be one more than that for V & $7F.
;
; Infact it doesn't matter which bit we use providing
; we are left with an easy to access 7 bits afterwards.
;
; So if we move bit 0 into the carry using LSR and then
; look up the bit count for bits 6 downto 0 and add the
; carry back we get the right result.
;
; Time: 16 cycles (+1 if data table crosses page)
; Size: 136 bytes (8  for code, 128 for data)

BitCountSeven:
        LSR                     ; Put bit 0 into carry
        TAX
        LDA SevenBitCounts,X    ; Fetch the bit count
        ADC #0                  ; Add in bit 7
        RTS

; Look up table of bit counts in the values $00-$7F

SevenBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
        .BYTE 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7

;------------------------------------------------------

HalfTheDataTable:
        LDA #$54                ; Load a test value
        JSR BitCountSeven       ; Calculate the count

        NOP                     ; Result is in A

        JMP NybbleAtATime       ; Jump to next method

;======================================================
; Bit Counting With An Even Smaller Lookup Table
;------------------------------------------------------
; If we are prepared to do a small amount of shifting
; then we can reduce the data table to 16 bytes.
;
; This algorithm breaks the value into three parts. The
; hi nybble is shifted down and placed in Y. The shifts
; leave bit 3 of the value in the carry so only bits
; 2 to 0 need to be extracted into X. The bit counts
; for the X & Y values are looked up and added which
; combines them with the bit in C.
;
; Time: 31 cycles (+2 if data table crosses page)
; Size: 33 bytes (17  for code, 16 for data)

BitCountNybble:
        TAX                     ; Save a copy of value
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR                     ; Leave <3> in C
        TAY                     ; And save <7:4> in Y
        TXA                     ; Recover value
        AND #$07                ; Put out <2:0> in X
        TAX                     ; And save in X
        LDA NybbleBitCounts,Y   ; Fetch count for Y
        ADC NybbleBitCounts,X   ; Add count for X & C
        RTS

; Look up table of bit counts in the values $00-$0F

NybbleBitCounts:
        .BYTE 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4

;------------------------------------------------------

NybbleAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountNybble      ; Calculate the count

        NOP                     ; Result is in A

        JMP BitAtATime          ; Jump to next method

;======================================================
; Bit Counting Using Shifts
;------------------------------------------------------
; The smallest method for count bits shifts the value
; left one bit at a time and counts each one bit that
; is shifted out.
;
; If we use ASL to perform the shift then the value
; will become zero after at most eight iterations. By
; looking for see when the value has reached zero we
; can stop iterating as soon as there are no bits left
; to shift out.
;
; Time: 18 to 76 cycles (depending on the value)
; Size: 10 bytes

BitCountIterative:
        LDX #$FF                ; Set count to -1
.Incr:  INX                     ; Add one to count
.Loop:  ASL                     ; Shift by one bit
        BCS .Incr               ; Count one bits
        BNE .Loop               ; Repeat till zero
        TXA                     ; Move count to A
        RTS

;------------------------------------------------------

BitAtATime:
        LDA #$54                ; Load a test value
        JSR BitCountIterative   ; Calculate the count

        NOP                     ; Result is in A

        JMP ShiftAndAdd         ; Jump to next method

;======================================================
; Shift and Add
;------------------------------------------------------
; This method uses the most significant bits of the
; accumulator to keep track of the bit count as the
; remaining data value bits are shifted out of least
; significant position.
;
; The follow illustrates the intermediate states of the
; calculation for the sample $54 value. The : shows the
; division between the running count and the remaining
; uncounted bits.
;
; +-------------------------------+
; | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |  Initial value
; +-------------------------------+
; | 0 | 0 : 1 | 0 | 1 | 0 | 1 | 0 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 : 1 | 0 | 1 | 0 | 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 : 1 | 0 | 1 | 0 |  C = 1
; | 0 | 0 | 0 | 1 : 1 | 0 | 1 | 0 |  Add $0F+C ($10)
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 1 : 1 | 0 | 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 1 : 1 | 0 |  C = 1
; | 0 | 0 | 0 | 0 | 1 | 0 : 1 | 0 |  Add $03+1 ($04)
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 1 | 0 : 1 |  C = 0 skip add
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 :  C = 1
; +-------------------------------+
; | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |  Add $00+C ($01)
; +-------------------------------+
;
; Time: 40 to 46 cycles (depending on the value)
; Size: 34 bytes

BitCountShiftAdd:
        LSR                     ; Is bit 0 set?
        BCC .Skip0
        ADC #$3F                ; Yes, add $40
.Skip0  LSR                     ; Is bit 1 set?
        BCC .Skip1
        ADC #$1F                ; Yes, add $20
.Skip1  LSR                     ; Is bit 2 set?
        BCC .Skip2
        ADC #$0F                ; Yes, add $10
.Skip2  LSR                     ; Is bit 3 set?
        BCC .Skip3
        ADC #$07                ; Yes, add $08
.Skip3  LSR                     ; Is bit 4 set?
        BCC .Skip4
        ADC #$03                ; Yes, add $04
.Skip4  LSR                     ; Is bit 5 set?
        BCC .Skip5
        ADC #$01                ; Yes, add $02
.Skip5  LSR
        ADC #$00                ; Add bit 6 to count
        RTS

;------------------------------------------------------

ShiftAndAdd:
        LDA #$54                ; Load a test value
        JSR BitCountShiftAdd    ; Calculate the count

        NOP                     ; Result is in A

        JMP DivideAndConquer    ; Jump to next method

;======================================================
; Divide and Conquer
;------------------------------------------------------
; This algorithm works by aligning and adding parts of
; the value until finally the bit count is produced.
;
; +--------------------------------
; | 0 + 1 | 1 + 0 | 0 + 1 | 1 + 1 |
; +---v-------v-------v-------v----
; | 0   1 + 0   1 | 0   1 + 1   0 |
; +-------v---------------v--------
; | 0   0   1   0 + 0   0   1   1 |
; +---------------v----------------
; | 0   0   0   0   0   1   0   1 |
; +--------------------------------
;
; This method always takes the same amount of time
; regardless of the value
;
; Time: 50 cycles
; Size: 34 bytes

BitCountParallel:
        TAX
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        TXA
        AND #$AA                ; Strip out even bits
        LSR
        ADC SCRATCH             ; And add together
        TAX
        AND #$33                ; Strip out odd pairs
        STA SCRATCH
        TXA
        AND #$CC                ; Strip out even pairs
        LSR
        LSR
        ADC SCRATCH             ; And add together
        STA SCRATCH
        LSR                     ; Shift down hi nybble
        LSR
        LSR
        LSR
        ADC SCRATCH             ; Add to lo nybble
        AND #$0F                ; And prune to result
        RTS

;------------------------------------------------------

DivideAndConquer:
        LDA #$54                ; Load a test value
        JSR BitCountParallel    ; Calculate the count

        NOP                     ; Result is in A

        BRK                     ; All done

        .END

I think that includes all the comments. Cycle counts where done by hand so might be a bit out.

Edit: Fixed typos.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Oct 23, 2007 9:56 pm 
Offline

Joined: Tue Nov 23, 2004 2:11 pm
Posts: 25
Location: London, UK
In BitCountParallel, you could replace the beginning
Code:
BitCountParallel:
        TAX
        AND #$55                ; Strip out odd bits
        STA SCRATCH
        TXA
        AND #$AA                ; Strip out even bits
        LSR
        ADC SCRATCH             ; And add together
with
Code:
BitCountParallel:
        STA SCRATCH
        AND #$AA                ; get fudge factor for each pair
        LSR                     ; divide by 2 (c=0)
        SBC SCRATCH             ; oops, no reverse-subtract!
        EOR #$FF                ; take one's complement

saving 2 bytes. Inspired by some code 'bogax' gave to help me!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 31, 2007 11:20 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
Here's another way to count bits:

Code:
   TAX
   BEQ L2
   LDX #0
   SEC
L1 INX
   STA SCRATCH
   SBC #1
   AND SCRATCH
   BNE L1
   TXA
L2 RTS

The result is returned in A (and X too).

On a 65C02, the SBC #1 can be replaced with DEC A, and the SEC can be eliminated.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 21, 2009 10:03 pm 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
Guess I'm a bit late to this party :)

Code:
 asl
 php
 asl
 rol
 adc #$80
 and #$7F
 adc #$C0
 and #$3F
 adc #$E0
 and #$1F
 adc #$F0
 and #$0F
 adc #$F8
 and #$07
 adc #$00
 plp
 adc #$00


It's not particularly short or fast but it does have the virtues
of running in constant time and only messing with the accumulator.


I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)


Top
 Profile  
Reply with quote  
 Post subject: Small Code is Good Code
PostPosted: Sun Nov 22, 2009 3:33 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8138
Location: Midwestern USA
How about this one:

Code:
         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits

Running the above with the worst-case test value requires 81 clock cycles. The best case, all bits set, requires 63 clocks. Naturally, it'll run a tad slower if a page boundary gets crossed by a branch, in which case, best case will be 70 clocks and worst case, 90 clocks.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Nov 22, 2009 4:34 am 
Offline

Joined: Fri Jun 27, 2003 8:12 am
Posts: 618
Location: Meadowbrook
bogax wrote:
Guess I'm a bit late to this party :)

Code:
 asl
 php
 asl
 rol
 adc #$80
 and #$7F
 adc #$C0
 and #$3F
 adc #$E0
 and #$1F
 adc #$F0
 and #$0F
 adc #$F8
 and #$07
 adc #$00
 plp
 adc #$00


It's not particularly short or fast but it does have the virtues
of running in constant time and only messing with the accumulator.


I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)



Your code would actually work out quite well for later processors. A friend has a programming blog for her software library, and it mentioned about branching, so your method without using any branches proves interesting.

http://burgerbecky.livejournal.com/tag/code

_________________
"My biggest dream in life? Building black plywood Habitrails"


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 22, 2009 6:53 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
BigDumbDinosaur wrote:
How about this one:

Code:
         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits



You'll fall through the bne and miss counting down on any
remaining unset bits after the last set bit.

eg 1,3,7 will return 8 (no unset bits before the last set bit)
6,B,17 will return 7 (one unset bit before the last set bit)
etc


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 22, 2009 9:57 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8138
Location: Midwestern USA
bogax wrote:
BigDumbDinosaur wrote:
How about this one:

Code:
         lda #%00000000        ;worst-case test value
         ldx #9                ;bits in a byte +1
;
loop01   dex                   ;nbits=nbits-1
         beq done              ;out of bits
;
loop02   lsr a                 ;test for a set bit
         bcc loop01            ;clear, decrement count
;
         bne loop02            ;more bits to check
;
done     brk                   ;.X = nbits



You'll fall through the bne and miss counting down on any
remaining unset bits after the last set bit.

eg 1,3,7 will return 8 (no unset bits before the last set bit)
6,B,17 will return 7 (one unset bit before the last set bit)
etc

Right you are. Guess I shouldn't be writing code after midnight. :)

Here's a different version, which someone may have already posted:

Code:
         lda #%11111111        ;worst-case test value
         ldx #0                ;nbits
;
loop01   lsr a                 ;test for a set bit
         bcc loop02            ;clear
;
         inx                   ;nbits=nbits+1
;
loop02   bne loop01            ;more bits to check
;
         brk                   ;.X = nbits

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Nov 23, 2009 12:10 am 
Offline

Joined: Tue Nov 23, 2004 2:11 pm
Posts: 25
Location: London, UK
bogax wrote:
I actually wrote this when this thread was current but I was never really happy with it.
In particular, having to stash one bit on the stack seems... inelegant.
So I'd pick it up and worry it a bit now and then but it's been two
years now, I don't think I'm going to improve it ;)
Don't be so hard on yourself bogax, it's neat code and a darn sight more elegant than not using the stack, viz:
Code:
                        ; C ---A----
                        ; ? hgfedcba
        asl             ; h gfedcba.
        rol             ; g fedcba.h
        adc     #$80    ; f Fedcbapq   pq = h+g
        adc     #$00    ; . Fedcbapq   pq = h+g+f
        asl
        asl
        rol             ; d cbapq..e
        adc     #$80    ; c Cbapq.tu   tu = e+d
        and     #$7b    ; c .bapq.tu
        adc     #$c0    ; b BBapq.tu   tu = e+d+c
        and     #$3b    ; b ..apq.tu
        adc     #$e0    ; a AAApqstu  stu = e+d+c+b
        and     #$1f    ; a ...pqstu
        adc     #$00    ; . ...pqstu  stu = e+d+c+b+a
        cmp     #$10    ; p ...pqstu
        adc     #$f0    ; p PPPPqstu  stu = e+d+c+b+a+p
        and     #$0f    ; p ....qstu
        adc     #$f8    ; q QQQQQstu  stu = e+d+c+b+a+p+p
        and     #$07    ; q .....stu
        adc     #$00    ; . ....rstu rstu = e+d+c+b+a+p+p+q
--Greg


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Nov 23, 2009 1:08 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Forgive my ignorance,but what is the point of a "CMP #$10" without a branch instruction afterwards?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: