Page 1 of 3
Old Guy with Special Case Need to Divide by Three
Posted: Sat Feb 24, 2024 10:38 pm
by WynnSmith
I love finding this forum. It's been 45 years since I was deep into the 6502. Now I'm out of touch with standard nomenclature and current tools. Your spit and polish is appreciated.
I have a special case need for an algorithm that quickly divides by three. The largest dividend we'll face is 765. It's 10 bits, and when divided by 3, gives us the largest quotient we'll face of 255, which is 8 bits.
To divide by 3, we can multiply by 1/3.
In binary, 1/3 is 0.01010101..., repeating forever. In spite of not having an exact quantity for 1/3, we can use the 8 bits after the decimal to accurately calculate an 8 bit quotient from our ten bit dividend. Since the 6502 lacks a multiply operation, we can use the add operation as indicated by the four bits of 0.01010101 that are set high.
I'm using the 6502's zero page to serve as registers. To accumulate a sum, we'll repeatedly shift the bits and add. Can you suggest a better way? How does my nomenclature need spiffed up? Any comment is welcome.
Code: Select all
;-----------------------------------------------------------------
; The dividend is stored in 00 and 01 before calling this routine.
; The resulting quotient will be accumulated in 03.
; Largest supported dividend is 765. (add error checking?)
DIVLO = $00 ; Zero page locations of 10 bit dividend
DIVHI = $01
QUOLO = $02 ; low bits of accumulated sum
QUOHI = $03 ; high bits of accumulated sum (Quotient)
DIVIDE_BY_THREE:
; First, effectively multiply by 0.0000 0001
lda DIVLO ; low bits of dividend
sta QUOLO
lda DIVHI ; high bits of dividend
; shift 6 times, add 3 times
ldx #3 ; so, we'll loop 3 times
; Then, effectively multiply by 0.0000 0101
; then by 0.0001 0101
; then by 0.0101 0101, which is 1/3
LOOP:
sta QUOHI
asl QUOLO ; first, shift left twice
rol QUOHI
asl QUOLO
rol QUOHI
clc ; Then, add the dividend
lda QUOLO
adc DIVLO
sta QUOLO
lda QUOHI
adc DIVHI
dex
bne LOOP
FINISH:
sec
adc #0 ; + 1 (round up)
sta QUOHI ; The resulting Quotient is in the Accumulator
rts
Thank you.
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sat Feb 24, 2024 11:09 pm
by Proxy
honestly with how small the value range is (10 bit input, 8 bit output), i'd just use a table. only takes 1kB plus a function like this to fetch the value (assuming a 6502/C02):
Code: Select all
; Inputs in Y (Low Byte) and A (High Byte)
; Output in A
; Uses 1 Word in ZP called "tmp" and destroys X
div3:
AND #%00000011 ; Truncate the High Byte to the lowest 2 bits (10 bits in total)
TAX ; Store it into X for now
LDA #<table
STA z:tmp ; Store the Low Byte of the table Address in the Low Byte of tmp
TXA
CLC
ADC #>table ; Then Add the upper 2 bits of the value to the High Byte of the table Address
STA z:tmp+1 ; And store the result into the High Byte of tmp
LDA (tmp),Y ; Finally use the complete Address to fetch the correct division result from the table
RTS
of course on a 65816 you can just use an inline function/macro:
Code: Select all
.macro DIV3 value
.A8
.I16 ; Assuming 8-bit Accumulator and 16-bit Index Registers
LDX value ; This works with Immediate values and memory locations without any conditional assembly
LDA f:table,X
.endmacro
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 12:51 am
by repose
https://www.nesdev.org/wiki/Divide_by_3
Code: Select all
sta temp
lsr
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
rts
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 4:34 am
by BillG
I'm using the 6502's zero page to serve as registers. To accumulate a sum, we'll repeatedly shift the bits and add. Can you suggest a better way? How does my nomenclature need spiffed up? Any comment is welcome.
Your algorithm yields 64 / 3 = 22 instead of 21.
You did not state whether speed or size is more important and where along that tradeoff curve you want.
I implemented a different approach and will post it after I verify its results.
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 4:35 am
by BillG
Code: Select all
sta temp
lsr
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
rts
This code is only for 8-bit dividends which 765 is not...
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 4:57 am
by repose
oh right, sorry...
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 5:04 am
by BigDumbDinosaur
honestly with how small the value range is (10 bit input, 8 bit output), i'd just use a table.
I second that. These days, memory is less valuable than time, so I’d go with the table approach for such a specific requirement.
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 5:25 am
by BillG
honestly with how small the value range is (10 bit input, 8 bit output), i'd just use a table.
I second that. These days, memory is less valuable than time, so I’d go with the table approach for such a specific requirement.
Depending upon the application, a 1 KB table may be huge.
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 6:37 am
by barnacle
Though the table need be no larger than 765, since that's the range of the input. Which may help.
Neil
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 6:41 am
by BillG
Code: Select all
00001 lib FAKEFLEX.LIB
. 00002 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
. 00003 ;
. 00004 ; Fake FLEX environment for debugging in emulator
. 00005 ;
.0000 00481 .list
00482
0002 00483 org ZeroPage
00484
00485 ;-----------------------------------------------------------------
00486 ; The dividend is stored in 00 and 01 before calling this routine.
00487 ; The resulting quotient will be accumulated in 03.
00488 ; Largest supported dividend is 765. (add error checking?)
00489
0002 00490 DIVLO rmb 1 ; Zero page locations of 10 bit dividend
0003 00491 DIVHI rmb 1
0004 00492 QUOLO rmb 1 ; low bits of accumulated sum
0005 00493 QUOHI rmb 1 ; high bits of accumulated sum (Quotient)
00494
0006 00495 _DIVLO rmb 1 ; Zero page locations of 10 bit dividend
0007 00496 _DIVHI rmb 1
0008 00497 _QUOLO rmb 1 ; low bits of accumulated sum
0009 00498 _QUOHI rmb 1 ; high bits of accumulated sum (Quotient)
00499
000A 00500 lo_temp rmb 1
000B 00501 hi_temp rmb 1
000C 00502 lo_product rmb 1
000D 00503 Quotient rmb 1
00504
000E 00505 Ptr rmb 2
0010 00506 Number rmb 2
00507
2000 00508 org FreeRAM
00509
2000 00510 DIVIDE_BY_THREE:
00511
00512 ; First, effectively multiply by 0.0000 0001
2000 A5 02 [3] 00513 lda DIVLO ; low bits of dividend
2002 85 04 [3] 00514 sta QUOLO
2004 A5 03 [3] 00515 lda DIVHI ; high bits of dividend
00516
00517 ; shift 6 times, add 3 times
2006 A2 03 [2] 00518 ldx #3 ; so, we'll loop 3 times
00519
00520 ; Then, effectively multiply by 0.0000 0101
00521 ; then by 0.0001 0101
00522 ; then by 0.0101 0101, which is 1/3
00523
2008 00524 LOOP:
00525
2008 85 05 [3] 00526 sta QUOHI
00527
200A 06 04 [5] 00528 asl QUOLO ; first, shift left twice
200C 26 05 [5] 00529 rol QUOHI
200E 06 04 [5] 00530 asl QUOLO
2010 26 05 [5] 00531 rol QUOHI
2012 18 [2] 00532 clc ; Then, add the dividend
2013 A5 04 [3] 00533 lda QUOLO
2015 65 02 [3] 00534 adc DIVLO
2017 85 04 [3] 00535 sta QUOLO
2019 A5 05 [3] 00536 lda QUOHI
201B 65 03 [3] 00537 adc DIVHI
00538
201D CA [2] 00539 dex
201E D0 E8 (2008) [2/3] 00540 bne LOOP
00541
2020 00542 FINISH:
00543
2020 38 [2] 00544 sec
2021 69 00 [2] 00545 adc #0 ; + 1 (round up)
2023 85 05 [3] 00546 sta QUOHI ; The resulting Quotient is in the Accumulator
00547
2025 60 [6] 00548 rts
00549
2100 00550 org FreeRAM+$100
00551
00552 ; divide 16 bit number by 3 by multiplying by 1/3
00553 ; enter with
00554 ; A containing the hi byte of the number to be divided by 3
00555 ; Y containing the lo byte of the number to be divided by 3
00556 ; the hi byte of the partial product is kept in A or saved
00557 ; on the stack when neccessary
00558 ; the product (N/3 quotient) is returned hi byte in A,
00559 ; lo byte in Y
00560
2100 00561 div3_ay:
00562
00563 ; save the number in lo_temp, hi_temp
00564
2100 84 0A [3] 00565 sty lo_temp
2102 84 0C [3] 00566 sty lo_product
2104 85 0B [3] 00567 sta hi_temp
00568
2106 A0 09 [2] 00569 ldy #$09
2108 18 [2] 00570 clc
2109 90 0A (2115) [2/3] 00571 bcc ENTER
00572
00573 ; each pass through loop adds the number in
00574 ; lo_temp, hi_temp to the partial product and
00575 ; then divides the partial product by 4
00576
210B 00577 _LOOP:
210B 48 [3] 00578 pha
210C A5 0C [3] 00579 lda lo_product
210E 65 0A [3] 00580 adc lo_temp
2110 85 0C [3] 00581 sta lo_product
2112 68 [4] 00582 pla
2113 65 0B [3] 00583 adc hi_temp
2115 00584 ENTER:
2115 6A [2] 00585 ror A
2116 66 0C [5] 00586 ror lo_product
2118 4A [2] 00587 lsr A
2119 66 0C [5] 00588 ror lo_product
211B 88 [2] 00589 dey
211C D0 ED (210B) [2/3] 00590 bne _LOOP
211E A4 0C [3] 00591 ldy lo_product
2120 60 [6] 00592 rts
00593
2200 00594 org FreeRAM+$200
00595
00596 ;
00597 ; Observation: 765/3 fits within one byte!
00598 ;
00599 ;
00600 ; 1/3 = 1/4 + 1/12
00601 ;
00602 ; Look at the problem this way:
00603 ;
00604 ; 1 = 1/4 + 1/4 + 1/4 + 1/12 + 1/12 + 1/12
00605 ;
00606 ; Step 1:
00607 ;
00608 ; Divide by 4 using two right shifts. Store quotient.
00609 ;
00610 ; Step 2:
00611 ;
00612 ; Add quotient to the remainder. Divide this by 4.
00613 ;
00614 ; Step 3:
00615 ;
00616 ; Add quotient to sum of partial quotients
00617 ;
00618 ; Step 4:
00619 ;
00620 ; Repeat steps 2 and 3, summing quotients, until partial quotient is 0 and
00621 ; remainder < 3
00622 ;
00623 ; Step 5:
00624 ;
00625 ; If the remainder is 3, add 1 to the quotient
00626 ;
2200 00627 Mine:
2200 A5 06 [3] 00628 lda _DIVLO ; Calculate remainder of division by 4
2202 29 03 [2] 00629 and #3
2204 A8 [2] 00630 tay
00631
2205 A5 06 [3] 00632 lda _DIVLO ; Divide by 4
2207 46 07 [5] 00633 lsr _DIVHI ; This uses up the upper byte
2209 6A [2] 00634 ror A
220A 46 07 [5] 00635 lsr _DIVHI
220C 6A [2] 00636 ror A
00637
220D 85 06 [3] 00638 sta _DIVLO ; Store lower byte of number/4
220F 85 09 [3] 00639 sta _QUOHI ; And partial quotient
2211 F0 1D (2230) [2/3] 00640 beq ZeroDiv ; Early out if zero
00641
2213 98 [2] 00642 tya ; Add partial quotient to remainder
2214 18 [2] 00643 clc
2215 65 06 [3] 00644 adc _DIVLO
2217 85 06 [3] 00645 sta _DIVLO ; This is the new dividend
00646
2219 00647 MyLoop:
2219 29 03 [2] 00648 and #3 ; Calculate remainder
221B A8 [2] 00649 tay
00650
221C A5 06 [3] 00651 lda _DIVLO ; Divide by 4
221E 4A [2] 00652 lsr A
221F 4A [2] 00653 lsr A
2220 85 06 [3] 00654 sta _DIVLO ; Store partial quotient
2222 F0 0C (2230) [2/3] 00655 beq ZeroDiv ; Dividend has become zero
00656
2224 18 [2] 00657 clc ; Add to sum of partial quotients
2225 65 09 [3] 00658 adc _QUOHI
2227 85 09 [3] 00659 sta _QUOHI
00660
2229 98 [2] 00661 tya ; Add partial quotient to remainder
00662 ; clc ; Carry still clear from above adc
222A 65 06 [3] 00663 adc _DIVLO
222C 85 06 [3] 00664 sta _DIVLO ; This is the new dividend
00665
222E D0 E9 (2219) [2/3] 00666 bne MyLoop ; Always branches
00667
2230 00668 ZeroDiv:
2230 C0 03 [2] 00669 cpy #3
2232 90 02 (2236) [2/3] 00670 blo ImDone
00671
2234 E6 09 [5] 00672 inc _QUOHI ; Add one third of three remaining
00673
2236 00674 ImDone:
2236 60 [6] 00675 rts
00676
2300 00677 org FreeRAM+$300
00678
00679 ;
00680 ; Test driver
00681 ;
2300 00682 Start:
2300 A9 00 [2] 00683 lda #>64
2302 A2 40 [2] 00684 ldx #<64
00685
2304 85 03 [3] 00686 sta DIVHI
2306 85 07 [3] 00687 sta _DIVHI
00688
2308 86 02 [3] 00689 stx DIVLO
230A 86 06 [3] 00690 stx _DIVLO
00691
230C 85 11 [3] 00692 sta Number+1
230E 86 10 [3] 00693 stx Number
00694
2310 20 2000 [6] 00695 jsr DIVIDE_BY_THREE
00696
2313 A5 07 [3] 00697 lda _DIVHI
2315 A4 06 [3] 00698 ldy _DIVLO
2317 20 2100 [6] 00699 jsr div3_ay
231A 84 0D [3] 00700 sty Quotient
00701
231C 20 2200 [6] 00702 jsr Mine
00703
231F A2 10 [2] 00704 ldx #<Number
2321 A0 00 [2] 00705 ldy #>Number
2323 A9 01 [2] 00706 lda #1
2325 20 02DB [6] 00707 jsr OUTDEC
00708
2328 A2 81 [2] 00709 ldx #<DividedBy
232A A0 23 [2] 00710 ldy #>DividedBy
232C 20 236E [6] 00711 jsr PSTR
00712
232F A9 00 [2] 00713 lda #0
2331 85 11 [3] 00714 sta Number+1
00715
2333 A5 05 [3] 00716 lda QUOHI
2335 85 10 [3] 00717 sta Number
2337 A2 10 [2] 00718 ldx #<Number
2339 A0 00 [2] 00719 ldy #>Number
233B A9 01 [2] 00720 lda #1
233D 20 02DB [6] 00721 jsr OUTDEC
00722
2340 A2 95 [2] 00723 ldx #<NesDev
2342 A0 23 [2] 00724 ldy #>NesDev
2344 20 236E [6] 00725 jsr PSTR
00726
2347 A5 0D [3] 00727 lda Quotient
2349 85 10 [3] 00728 sta Number
234B A2 10 [2] 00729 ldx #<Number
234D A0 00 [2] 00730 ldy #>Number
234F A9 01 [2] 00731 lda #1
2351 20 02DB [6] 00732 jsr OUTDEC
00733
2354 A2 9E [2] 00734 ldx #<Separator
2356 A0 23 [2] 00735 ldy #>Separator
2358 20 236E [6] 00736 jsr PSTR
00737
235B A5 09 [3] 00738 lda _QUOHI
235D 85 10 [3] 00739 sta Number
235F A2 10 [2] 00740 ldx #<Number
2361 A0 00 [2] 00741 ldy #>Number
2363 A9 01 [2] 00742 lda #1
2365 20 02DB [6] 00743 jsr OUTDEC
00744
2368 20 022E [6] 00745 jsr PCRLF
00746
236B 4C 0200 [3] 00747 jmp WARMS
00748
00749 ;
00750 ;
00751 ;
236E 00752 PSTR:
236E 86 0E [3] 00753 stx Ptr
2370 84 0F [3] 00754 sty Ptr+1
2372 A0 00 [2] 00755 ldy #0
00756
2374 00757 2:
2374 B1 0E [5/6] 00758 lda (Ptr),Y
2376 C9 04 [2] 00759 cmp #4
2378 F0 06 (2380) [2/3] 00760 beq 3f
00761
237A 20 0221 [6] 00762 jsr PUTCHR
00763
237D C8 [2] 00764 iny
00765
237E D0 F4 (2374) [2/3] 00766 bne 2b ; Always branches
00767
2380 00768 3:
2380 60 [6] 00769 rts
00770
2381 202F203320657175 00771 DividedBy fcc " / 3 equals: yours ",4
2389 616C733A20796F75
2391 7273 20 04
2395 204E6573446576 20 00772 NesDev fcc " NesDev ",4
239D 04
239E 206D696E65 20 04 00773 Separator fcc " mine ",4
00774
2300 00775 end Start
If more speed is needed in the worse cases, limit the number of times through my loop and divide the remainder by 3 using a now much smaller table.
Edit: Fixed bug in test driver.
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 7:11 am
by barnacle
Or... multiply by 85, as a sixteen bit sum, and use the high byte? It'll be off by one for some of the answers.
85 is 0x55, so eight sixteen bit shifts and four sixteen bit adds.
Neil
edit: just realised that's what the above method is doing.

Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 7:13 am
by BillG
Timing for a few selected dividends:
765 / 3
164 cycles for your code
353 cycles for NesDev code
185 cycles for my code
256 / 3
164 cycles for your code
353 cycles for NesDev code
181 cycles for my code
128 / 3
164 cycles for your code
353 cycles for NesDev code
146 cycles for my code
64 / 3
164 cycles for your code
353 cycles for NesDev code
146 cycles for my code
32 / 3
164 cycles for your code
353 cycles for NesDev code
111 cycles for my code
16 / 3
164 cycles for your code
353 cycles for NesDev code
111 cycles for my code
4 / 3
164 cycles for your code
353 cycles for NesDev code
76 cycles for my code
3 / 3
164 cycles for your code
353 cycles for NesDev code
54 cycles for my code
0 / 3
164 cycles for your code
353 cycles for NesDev code
50 cycles for my code
To be fair, note that the NesDev code is yielding a 16-bit result.
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 8:26 am
by BigEd
Welcome, WynnSmith!
As your input is only barely over a byte in size, I wonder if we could start by comparing and subtracting 384 to get the top bit of the result, and then use the minimal bytesize version to get the rest.
(It's clever to use multiplication for division in the case that multiplication is fast but division is not, but I'm not sure it's always optimal when multiplication is also not fast. In general, a non-restoring division, perhaps with pre-calculated shifted divisors, might be a win.)
Code: Select all
sta temp
lsr
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
adc temp
ror
lsr
rts
This code is only for 8-bit dividends which 765 is not...
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 2:25 pm
by BB8
I tried writing this. I thought it could be fast enough but in fact it's not.
It's a division by subtraction: at every iteration subtracts multiples of 3 (243, 81, 27, 9, 3) and adds the partial quotient to the result.
Code: Select all
ldx #5 ; 6th element of tab
lda #0
sta res ; result =0
hv lda nh ; loads high byte
beq lv ; if zero then goes to low byte value
clc ; will subtract 243
lda #81 ; adds 81 to result
adc res
sta res
sec ; subtracts from low byte
lda nl
sbc tab,x
sta nl
bcs hv ; if no carry go to next iteration
dec nh ; adjust high byte
jo bcc hv
; **** value is <256 (only low byte)
lv lda nl ; loads dividend
lv2 cmp tab,x ; check if can sub the current tab value
bcc nextval ; if lower goes to point to next tab value
dex ; retreives the partial quotient to add to result
lda tab,x
inx
clc ; add partial quotient to result
adc res
sta res
sec ; subtract current tab value from dividend
lda nl
sbc tab,x
sta nl
jmp lv2
nextval
dex ; points to next tab value
bne lv ; if not the first then repeat
rts
tab byte 1,3,9,27,81,243
res byte 0
nl byte <726
nh byte >726
edit: I should try with the fast single byte division when the high byte is done: so (1) repeatedly subtract 243 from the dividend (and add 81 to result) until the dividend is <256, then (2) use the single byte division
Re: Old Guy with Special Case Need to Divide by Three
Posted: Sun Feb 25, 2024 3:40 pm
by BB8
This one seems fast enough. I didn't check it fully.
Subtracts 255 from the dividend repeatedly (at most will do it twice), adding 85 to the result until the dividend high byte becomes zero. Then performs the fast single byte division by 3.
Code: Select all
lda #0
sta res ; result =0
hv lda nh ; loads high byte
beq lv ; if zero then goes to low byte value
clc ; will subtract 255
lda #85 ; adds 85 to result
adc res
sta res
sec ; subtracts from low byte
lda nl
sbc #255
sta nl
bcs hv ; if no carry goes to next iteration
dec nh ; adjusts high byte
jo bcc hv
; **** value is <256 (only low byte)
lv
lda nl
lsr
lsr
adc nl
ror
lsr
adc nl
ror
lsr
adc nl
ror
lsr
adc nl
ror
lsr
clc
adc res
sta res
rts
res byte 0
nl byte <726
nh byte >726
optimized a little:
Code: Select all
hv lda nh ; loads high byte
beq lv ; if zero then goes to low byte value
lda #0
clc
hv2
adc #85 ; adds 85 to result
inc nl ; subtracts 255 from low byte
beq hv2
dec nh ; adjusts high byte
bne hv2
; **** value is <256 (only low byte)
lv sta res
lda nl
lsr
lsr
adc nl
ror
lsr
adc nl
ror
lsr
adc nl
ror
lsr
adc nl
ror
lsr
clc
adc res
sta res
rts
res byte 0
nl byte <726
nh byte >726
executes in 61-101 cycles +rts