6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 5:54 am

All times are UTC




Post new topic Reply to topic  [ 44 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Feb 24, 2024 10:38 pm 
Offline

Joined: Sat Feb 24, 2024 9:52 pm
Posts: 4
I love finding this forum. It's been 45 years since I was deep into the 6502. Now I'm out of touch with standard nomenclature and current tools. Your spit and polish is appreciated.

I have a special case need for an algorithm that quickly divides by three. The largest dividend we'll face is 765. It's 10 bits, and when divided by 3, gives us the largest quotient we'll face of 255, which is 8 bits.

To divide by 3, we can multiply by 1/3.
In binary, 1/3 is 0.01010101..., repeating forever. In spite of not having an exact quantity for 1/3, we can use the 8 bits after the decimal to accurately calculate an 8 bit quotient from our ten bit dividend. Since the 6502 lacks a multiply operation, we can use the add operation as indicated by the four bits of 0.01010101 that are set high.

I'm using the 6502's zero page to serve as registers. To accumulate a sum, we'll repeatedly shift the bits and add. Can you suggest a better way? How does my nomenclature need spiffed up? Any comment is welcome.

Code:
;-----------------------------------------------------------------
; The dividend is stored in 00 and 01 before calling this routine.
; The resulting quotient will be accumulated in 03.
; Largest supported dividend is 765.  (add error checking?)

DIVLO  = $00    ; Zero page locations of 10 bit dividend
DIVHI  = $01
QUOLO  = $02    ; low bits of accumulated sum
QUOHI  = $03    ; high bits of accumulated sum (Quotient)


DIVIDE_BY_THREE:

; First, effectively multiply by 0.0000 0001
lda DIVLO    ; low bits of dividend
sta QUOLO   
lda DIVHI    ; high bits of dividend

; shift 6 times, add 3 times
ldx #3    ; so, we'll loop 3 times

; Then, effectively multiply by 0.0000 0101
; then by 0.0001 0101
; then by 0.0101 0101, which is 1/3

LOOP:

sta QUOHI   

asl QUOLO    ; first, shift left twice
rol QUOHI   
asl QUOLO   
rol QUOHI   
clc       ; Then, add the dividend
lda QUOLO   
adc DIVLO
sta QUOLO
lda QUOHI
adc DIVHI

dex
bne LOOP

FINISH:

sec
adc #0    ;  + 1 (round up)
sta QUOHI    ; The resulting Quotient is in the Accumulator

rts


Thank you.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 11:09 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 745
Location: Germany
honestly with how small the value range is (10 bit input, 8 bit output), i'd just use a table. only takes 1kB plus a function like this to fetch the value (assuming a 6502/C02):

Code:
; Inputs in Y (Low Byte) and A (High Byte)
; Output in A
; Uses 1 Word in ZP called "tmp" and destroys X
div3:
    AND #%00000011      ; Truncate the High Byte to the lowest 2 bits (10 bits in total)
    TAX                 ; Store it into X for now
    LDA #<table
    STA z:tmp           ; Store the Low Byte of the table Address in the Low Byte of tmp
    TXA
    CLC
    ADC #>table         ; Then Add the upper 2 bits of the value to the High Byte of the table Address
    STA z:tmp+1         ; And store the result into the High Byte of tmp
    LDA (tmp),Y         ; Finally use the complete Address to fetch the correct division result from the table
RTS


of course on a 65816 you can just use an inline function/macro:
Code:
.macro DIV3 value
    .A8
    .I16                ; Assuming 8-bit Accumulator and 16-bit Index Registers
    LDX value           ; This works with Immediate values and memory locations without any conditional assembly
    LDA f:table,X
.endmacro


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 12:51 am 
Offline

Joined: Mon Feb 20, 2012 1:46 pm
Posts: 23
Location: America
https://www.nesdev.org/wiki/Divide_by_3

Code:
 sta temp
 lsr
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 rts


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 4:34 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
WynnSmith wrote:
I'm using the 6502's zero page to serve as registers. To accumulate a sum, we'll repeatedly shift the bits and add. Can you suggest a better way? How does my nomenclature need spiffed up? Any comment is welcome.

Your algorithm yields 64 / 3 = 22 instead of 21.

You did not state whether speed or size is more important and where along that tradeoff curve you want.

I implemented a different approach and will post it after I verify its results.


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 4:35 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
repose wrote:
Code:
 sta temp
 lsr
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 rts

This code is only for 8-bit dividends which 765 is not...


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 4:57 am 
Offline

Joined: Mon Feb 20, 2012 1:46 pm
Posts: 23
Location: America
oh right, sorry...


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 5:04 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
Proxy wrote:
honestly with how small the value range is (10 bit input, 8 bit output), i'd just use a table.

I second that.  These days, memory is less valuable than time, so I’d go with the table approach for such a specific requirement.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 5:25 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
BigDumbDinosaur wrote:
Proxy wrote:
honestly with how small the value range is (10 bit input, 8 bit output), i'd just use a table.

I second that.  These days, memory is less valuable than time, so I’d go with the table approach for such a specific requirement.

Depending upon the application, a 1 KB table may be huge.


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 6:37 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Though the table need be no larger than 765, since that's the range of the input. Which may help.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 6:41 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
Code:
                          00001          lib    FAKEFLEX.LIB
.                         00002 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
.                         00003 ;
.                         00004 ; Fake FLEX environment for debugging in emulator
.                         00005 ;
.0000                     00481          .list
                          00482
 0002                     00483          org    ZeroPage
                          00484
                          00485 ;-----------------------------------------------------------------
                          00486 ; The dividend is stored in 00 and 01 before calling this routine.
                          00487 ; The resulting quotient will be accumulated in 03.
                          00488 ; Largest supported dividend is 765.  (add error checking?)
                          00489
 0002                     00490 DIVLO    rmb    1         ; Zero page locations of 10 bit dividend
 0003                     00491 DIVHI    rmb    1
 0004                     00492 QUOLO    rmb    1         ; low bits of accumulated sum
 0005                     00493 QUOHI    rmb    1         ; high bits of accumulated sum (Quotient)
                          00494
 0006                     00495 _DIVLO   rmb    1         ; Zero page locations of 10 bit dividend
 0007                     00496 _DIVHI   rmb    1
 0008                     00497 _QUOLO   rmb    1         ; low bits of accumulated sum
 0009                     00498 _QUOHI   rmb    1         ; high bits of accumulated sum (Quotient)
                          00499
 000A                     00500 lo_temp  rmb    1
 000B                     00501 hi_temp  rmb    1
 000C                     00502 lo_product rmb  1
 000D                     00503 Quotient rmb    1
                          00504
 000E                     00505 Ptr      rmb    2
 0010                     00506 Number   rmb    2
                          00507
 2000                     00508          org    FreeRAM
                          00509
 2000                     00510 DIVIDE_BY_THREE:
                          00511
                          00512 ; First, effectively multiply by 0.0000 0001
 2000 A5 02           [3] 00513          lda    DIVLO     ; low bits of dividend
 2002 85 04           [3] 00514          sta    QUOLO
 2004 A5 03           [3] 00515          lda    DIVHI     ; high bits of dividend
                          00516
                          00517 ; shift 6 times, add 3 times
 2006 A2 03           [2] 00518          ldx    #3        ; so, we'll loop 3 times
                          00519
                          00520 ; Then, effectively multiply by 0.0000 0101
                          00521 ; then by 0.0001 0101
                          00522 ; then by 0.0101 0101, which is 1/3
                          00523
 2008                     00524 LOOP:
                          00525
 2008 85 05           [3] 00526          sta    QUOHI
                          00527
 200A 06 04           [5] 00528          asl    QUOLO     ; first, shift left twice
 200C 26 05           [5] 00529          rol    QUOHI
 200E 06 04           [5] 00530          asl    QUOLO
 2010 26 05           [5] 00531          rol    QUOHI
 2012 18              [2] 00532          clc              ; Then, add the dividend
 2013 A5 04           [3] 00533          lda    QUOLO
 2015 65 02           [3] 00534          adc    DIVLO
 2017 85 04           [3] 00535          sta    QUOLO
 2019 A5 05           [3] 00536          lda    QUOHI
 201B 65 03           [3] 00537          adc    DIVHI
                          00538
 201D CA              [2] 00539          dex
 201E D0 E8 (2008)  [2/3] 00540          bne    LOOP
                          00541
 2020                     00542 FINISH:
                          00543
 2020 38              [2] 00544          sec
 2021 69 00           [2] 00545          adc    #0        ;  + 1 (round up)
 2023 85 05           [3] 00546          sta    QUOHI     ; The resulting Quotient is in the Accumulator
                          00547
 2025 60              [6] 00548          rts
                          00549
 2100                     00550          org    FreeRAM+$100
                          00551
                          00552 ; divide 16 bit number by 3 by multiplying by 1/3
                          00553 ; enter with
                          00554 ; A containing the hi byte of the number to be divided by 3
                          00555 ; Y containing the lo byte of the number to be divided by 3
                          00556 ; the hi byte of the partial product is kept in A or saved
                          00557 ; on the stack when neccessary
                          00558 ; the product (N/3 quotient) is returned hi byte in A,
                          00559 ; lo byte in Y
                          00560
 2100                     00561 div3_ay:
                          00562
                          00563  ; save the number in lo_temp, hi_temp
                          00564
 2100 84 0A           [3] 00565          sty    lo_temp
 2102 84 0C           [3] 00566          sty    lo_product
 2104 85 0B           [3] 00567          sta    hi_temp
                          00568
 2106 A0 09           [2] 00569          ldy    #$09
 2108 18              [2] 00570          clc
 2109 90 0A (2115)  [2/3] 00571          bcc    ENTER
                          00572
                          00573  ; each pass through loop adds the number in
                          00574  ; lo_temp, hi_temp to the partial product and
                          00575  ; then divides the partial product by 4
                          00576
 210B                     00577 _LOOP:
 210B 48              [3] 00578          pha
 210C A5 0C           [3] 00579          lda    lo_product
 210E 65 0A           [3] 00580          adc    lo_temp
 2110 85 0C           [3] 00581          sta    lo_product
 2112 68              [4] 00582          pla
 2113 65 0B           [3] 00583          adc    hi_temp
 2115                     00584 ENTER:
 2115 6A              [2] 00585          ror    A
 2116 66 0C           [5] 00586          ror    lo_product
 2118 4A              [2] 00587          lsr    A
 2119 66 0C           [5] 00588          ror    lo_product
 211B 88              [2] 00589          dey
 211C D0 ED (210B)  [2/3] 00590          bne    _LOOP
 211E A4 0C           [3] 00591          ldy    lo_product
 2120 60              [6] 00592          rts
                          00593
 2200                     00594          org    FreeRAM+$200
                          00595
                          00596 ;
                          00597 ; Observation: 765/3 fits within one byte!
                          00598 ;
                          00599 ;
                          00600 ; 1/3 = 1/4 + 1/12
                          00601 ;
                          00602 ; Look at the problem this way:
                          00603 ;
                          00604 ; 1 = 1/4 + 1/4 + 1/4 + 1/12 + 1/12 + 1/12
                          00605 ;
                          00606 ; Step 1:
                          00607 ;
                          00608 ;  Divide by 4 using two right shifts.  Store quotient.
                          00609 ;
                          00610 ; Step 2:
                          00611 ;
                          00612 ;  Add quotient to the remainder.  Divide this by 4.
                          00613 ;
                          00614 ; Step 3:
                          00615 ;
                          00616 ;  Add quotient to sum of partial quotients
                          00617 ;
                          00618 ; Step 4:
                          00619 ;
                          00620 ;  Repeat steps 2 and 3, summing quotients, until partial quotient is 0 and
                          00621 ;    remainder < 3
                          00622 ;
                          00623 ; Step 5:
                          00624 ;
                          00625 ;  If the remainder is 3, add 1 to the quotient
                          00626 ;
 2200                     00627 Mine:
 2200 A5 06           [3] 00628          lda    _DIVLO    ; Calculate remainder of division by 4
 2202 29 03           [2] 00629          and    #3
 2204 A8              [2] 00630          tay
                          00631
 2205 A5 06           [3] 00632          lda    _DIVLO    ; Divide by 4
 2207 46 07           [5] 00633          lsr    _DIVHI    ; This uses up the upper byte
 2209 6A              [2] 00634          ror    A
 220A 46 07           [5] 00635          lsr    _DIVHI
 220C 6A              [2] 00636          ror    A
                          00637
 220D 85 06           [3] 00638          sta    _DIVLO    ; Store lower byte of number/4
 220F 85 09           [3] 00639          sta    _QUOHI    ; And partial quotient
 2211 F0 1D (2230)  [2/3] 00640          beq    ZeroDiv   ; Early out if zero
                          00641
 2213 98              [2] 00642          tya              ; Add partial quotient to remainder
 2214 18              [2] 00643          clc
 2215 65 06           [3] 00644          adc    _DIVLO
 2217 85 06           [3] 00645          sta    _DIVLO    ; This is the new dividend
                          00646
 2219                     00647 MyLoop:
 2219 29 03           [2] 00648          and    #3        ; Calculate remainder
 221B A8              [2] 00649          tay
                          00650
 221C A5 06           [3] 00651          lda    _DIVLO    ; Divide by 4
 221E 4A              [2] 00652          lsr    A
 221F 4A              [2] 00653          lsr    A
 2220 85 06           [3] 00654          sta    _DIVLO    ; Store partial quotient
 2222 F0 0C (2230)  [2/3] 00655          beq    ZeroDiv   ; Dividend has become zero
                          00656
 2224 18              [2] 00657          clc              ; Add to sum of partial quotients
 2225 65 09           [3] 00658          adc    _QUOHI
 2227 85 09           [3] 00659          sta    _QUOHI
                          00660
 2229 98              [2] 00661          tya              ; Add partial quotient to remainder
                          00662 ; clc ; Carry still clear from above adc
 222A 65 06           [3] 00663          adc    _DIVLO
 222C 85 06           [3] 00664          sta    _DIVLO    ; This is the new dividend
                          00665
 222E D0 E9 (2219)  [2/3] 00666          bne    MyLoop    ; Always branches
                          00667
 2230                     00668 ZeroDiv:
 2230 C0 03           [2] 00669          cpy    #3
 2232 90 02 (2236)  [2/3] 00670          blo    ImDone
                          00671
 2234 E6 09           [5] 00672          inc    _QUOHI    ; Add one third of three remaining
                          00673
 2236                     00674 ImDone:
 2236 60              [6] 00675          rts
                          00676
 2300                     00677          org    FreeRAM+$300
                          00678
                          00679 ;
                          00680 ; Test driver
                          00681 ;
 2300                     00682 Start:
 2300 A9 00           [2] 00683          lda    #>64
 2302 A2 40           [2] 00684          ldx    #<64
                          00685
 2304 85 03           [3] 00686          sta    DIVHI
 2306 85 07           [3] 00687          sta    _DIVHI
                          00688
 2308 86 02           [3] 00689          stx    DIVLO
 230A 86 06           [3] 00690          stx    _DIVLO
                          00691
 230C 85 11           [3] 00692          sta    Number+1
 230E 86 10           [3] 00693          stx    Number
                          00694
 2310 20 2000         [6] 00695          jsr    DIVIDE_BY_THREE
                          00696
 2313 A5 07           [3] 00697          lda    _DIVHI
 2315 A4 06           [3] 00698          ldy    _DIVLO
 2317 20 2100         [6] 00699          jsr    div3_ay
 231A 84 0D           [3] 00700          sty    Quotient
                          00701
 231C 20 2200         [6] 00702          jsr    Mine
                          00703
 231F A2 10           [2] 00704          ldx    #<Number
 2321 A0 00           [2] 00705          ldy    #>Number
 2323 A9 01           [2] 00706          lda    #1
 2325 20 02DB         [6] 00707          jsr    OUTDEC
                          00708
 2328 A2 81           [2] 00709          ldx    #<DividedBy
 232A A0 23           [2] 00710          ldy    #>DividedBy
 232C 20 236E         [6] 00711          jsr    PSTR
                          00712
 232F A9 00           [2] 00713          lda    #0
 2331 85 11           [3] 00714          sta    Number+1
                          00715
 2333 A5 05           [3] 00716          lda    QUOHI
 2335 85 10           [3] 00717          sta    Number
 2337 A2 10           [2] 00718          ldx    #<Number
 2339 A0 00           [2] 00719          ldy    #>Number
 233B A9 01           [2] 00720          lda    #1
 233D 20 02DB         [6] 00721          jsr    OUTDEC
                          00722
 2340 A2 95           [2] 00723          ldx    #<NesDev
 2342 A0 23           [2] 00724          ldy    #>NesDev
 2344 20 236E         [6] 00725          jsr    PSTR
                          00726
 2347 A5 0D           [3] 00727          lda    Quotient
 2349 85 10           [3] 00728          sta    Number
 234B A2 10           [2] 00729          ldx    #<Number
 234D A0 00           [2] 00730          ldy    #>Number
 234F A9 01           [2] 00731          lda    #1
 2351 20 02DB         [6] 00732          jsr    OUTDEC
                          00733
 2354 A2 9E           [2] 00734          ldx    #<Separator
 2356 A0 23           [2] 00735          ldy    #>Separator
 2358 20 236E         [6] 00736          jsr    PSTR
                          00737
 235B A5 09           [3] 00738          lda    _QUOHI
 235D 85 10           [3] 00739          sta    Number
 235F A2 10           [2] 00740          ldx    #<Number
 2361 A0 00           [2] 00741          ldy    #>Number
 2363 A9 01           [2] 00742          lda    #1
 2365 20 02DB         [6] 00743          jsr    OUTDEC
                          00744
 2368 20 022E         [6] 00745          jsr    PCRLF
                          00746
 236B 4C 0200         [3] 00747          jmp    WARMS
                          00748
                          00749 ;
                          00750 ;
                          00751 ;
 236E                     00752 PSTR:
 236E 86 0E           [3] 00753          stx    Ptr
 2370 84 0F           [3] 00754          sty    Ptr+1
 2372 A0 00           [2] 00755          ldy    #0
                          00756
 2374                     00757 2:
 2374 B1 0E         [5/6] 00758          lda    (Ptr),Y
 2376 C9 04           [2] 00759          cmp    #4
 2378 F0 06 (2380)  [2/3] 00760          beq    3f
                          00761
 237A 20 0221         [6] 00762          jsr    PUTCHR
                          00763
 237D C8              [2] 00764          iny
                          00765
 237E D0 F4 (2374)  [2/3] 00766          bne    2b        ; Always branches
                          00767
 2380                     00768 3:
 2380 60              [6] 00769          rts
                          00770
 2381 202F203320657175    00771 DividedBy fcc   " / 3 equals: yours ",4
 2389 616C733A20796F75
 2391 7273 20 04
 2395 204E6573446576 20   00772 NesDev   fcc    " NesDev ",4
 239D 04
 239E 206D696E65 20 04    00773 Separator fcc   " mine ",4
                          00774
 2300                     00775          end    Start

If more speed is needed in the worse cases, limit the number of times through my loop and divide the remainder by 3 using a now much smaller table.

Edit: Fixed bug in test driver.


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 7:11 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Or... multiply by 85, as a sixteen bit sum, and use the high byte? It'll be off by one for some of the answers.

85 is 0x55, so eight sixteen bit shifts and four sixteen bit adds.

Neil

edit: just realised that's what the above method is doing. :mrgreen:


Last edited by barnacle on Sun Feb 25, 2024 7:22 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 7:13 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
Timing for a few selected dividends:

765 / 3

164 cycles for your code
353 cycles for NesDev code
185 cycles for my code

256 / 3

164 cycles for your code
353 cycles for NesDev code
181 cycles for my code

128 / 3

164 cycles for your code
353 cycles for NesDev code
146 cycles for my code

64 / 3

164 cycles for your code
353 cycles for NesDev code
146 cycles for my code

32 / 3

164 cycles for your code
353 cycles for NesDev code
111 cycles for my code

16 / 3

164 cycles for your code
353 cycles for NesDev code
111 cycles for my code

4 / 3

164 cycles for your code
353 cycles for NesDev code
76 cycles for my code

3 / 3

164 cycles for your code
353 cycles for NesDev code
54 cycles for my code

0 / 3

164 cycles for your code
353 cycles for NesDev code
50 cycles for my code

To be fair, note that the NesDev code is yielding a 16-bit result.


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 8:26 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Welcome, WynnSmith!

As your input is only barely over a byte in size, I wonder if we could start by comparing and subtracting 384 to get the top bit of the result, and then use the minimal bytesize version to get the rest.

(It's clever to use multiplication for division in the case that multiplication is fast but division is not, but I'm not sure it's always optimal when multiplication is also not fast. In general, a non-restoring division, perhaps with pre-calculated shifted divisors, might be a win.)

BillG wrote:
repose wrote:
Code:
 sta temp
 lsr
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 adc temp
 ror
 lsr
 rts

This code is only for 8-bit dividends which 765 is not...


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 2:25 pm 
Offline
User avatar

Joined: Sun Nov 01, 2020 10:36 am
Posts: 35
Location: Tatooine
I tried writing this. I thought it could be fast enough but in fact it's not.
It's a division by subtraction: at every iteration subtracts multiples of 3 (243, 81, 27, 9, 3) and adds the partial quotient to the result.

Code:
        ldx #5          ; 6th element of tab
        lda #0
        sta res         ; result =0
hv      lda nh          ; loads high byte
        beq lv          ; if zero then goes to low byte value

        clc             ; will subtract 243
        lda #81         ; adds 81 to result
        adc res
        sta res

        sec             ; subtracts from low byte
        lda nl
        sbc tab,x
        sta nl
        bcs hv          ; if no carry go to next iteration
        dec nh          ; adjust high byte
jo      bcc hv

;  **** value is <256 (only low byte)
lv      lda nl          ; loads dividend
lv2     cmp tab,x       ; check if can sub the current tab value
        bcc nextval     ; if lower goes to point to next tab value
        dex             ; retreives the partial quotient to add to result
        lda tab,x
        inx
        clc             ; add partial quotient to result
        adc res
        sta res
        sec             ; subtract current tab value from dividend
        lda nl
        sbc tab,x
        sta nl
        jmp lv2

nextval
        dex             ; points to next tab value
        bne lv          ; if not the first then repeat
        rts
tab byte 1,3,9,27,81,243
res byte 0
nl  byte <726
nh  byte >726



edit: I should try with the fast single byte division when the high byte is done: so (1) repeatedly subtract 243 from the dividend (and add 81 to result) until the dividend is <256, then (2) use the single byte division


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 3:40 pm 
Offline
User avatar

Joined: Sun Nov 01, 2020 10:36 am
Posts: 35
Location: Tatooine
This one seems fast enough. I didn't check it fully.
Subtracts 255 from the dividend repeatedly (at most will do it twice), adding 85 to the result until the dividend high byte becomes zero. Then performs the fast single byte division by 3.

Code:
        lda #0
        sta res         ; result =0
hv      lda nh          ; loads high byte
        beq lv          ; if zero then goes to low byte value

        clc             ; will subtract 255
        lda #85         ; adds 85 to result
        adc res
        sta res

        sec             ; subtracts from low byte
        lda nl
        sbc #255
        sta nl
        bcs hv          ; if no carry goes to next iteration
        dec nh          ; adjusts high byte
jo      bcc hv

;  **** value is <256 (only low byte)
lv     
        lda nl
        lsr
        lsr
        adc nl
        ror
        lsr
        adc nl
        ror
        lsr
        adc nl
        ror
        lsr
        adc nl
        ror
        lsr
        clc
        adc res
        sta res
        rts

res byte 0
nl  byte <726
nh  byte >726



optimized a little:
Code:
hv      lda nh          ; loads high byte
        beq lv          ; if zero then goes to low byte value

        lda #0
        clc

hv2
        adc #85         ; adds 85 to result
        inc nl          ; subtracts 255 from low byte
        beq hv2
        dec nh          ; adjusts high byte
        bne hv2

;  **** value is <256 (only low byte)
lv      sta res
        lda nl
        lsr
        lsr
        adc nl
        ror
        lsr
        adc nl
        ror
        lsr
        adc nl
        ror
        lsr
        adc nl
        ror
        lsr
        clc
        adc res
        sta res
        rts

res byte 0
nl  byte <726
nh  byte >726

executes in 61-101 cycles +rts


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 44 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: