6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu May 09, 2024 7:46 pm

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Mon Aug 21, 2023 9:46 am 
Offline
User avatar

Joined: Tue Apr 03, 2018 2:10 pm
Posts: 125
As something of a weekend warrior (or weekend worrier) when it comes to programming, I quickly run out of talent. And that's happened here.

I have a 32-bit unsigned int stored in location UINT32_A. And I want to divide it by a 16-bit unsigned int stored at UINT16. Both are stored little-endian, of course. I want to return both quotient and remainder.

Based on other division routines I've used successfully, I tried this.
Code:
\ ------------------------------------------------------------------------------
\ ---  uint32_div16
\ ---  Divide a 32-bit unsigned integer by a 16-bit uint.
\ ---  Returns: 32-bit uint with quotient
\ ---               32-bit uint with remainder
\ ------------------------------------------------------------------------------
\ ON ENTRY: - UINT32_A should contain number to be divided
\           - UINT16/+1 must contain divisor
\ ON EXIT : - UINT32_RES_R contains remainder
\           - UINT32_RES_Q contains quotient
.uint32_div16
  ldx #3
.clr_loop
  stz UINT32_RES_R,X
  stz UINT32_RES_Q,X
  dex
  bpl clr_loop

  ldx #32                              ; There are 32 bits in dividend
.uint32_div16_loop
  ; ASL - 0 is shifted into bit 0; original bit 7 is shifted into the Carry
  asl UINT32_A
  ; ROL - Carry shifted into bit 0; original bit 7 shifted into Carry.
  rol UINT32_A + 1
  rol UINT32_A + 2
  rol UINT32_A + 3
  rol UINT32_RES_R
  rol UINT32_RES_R + 1
  rol UINT32_RES_R + 2
  rol UINT32_RES_R + 3

  lda UINT32_RES_R
  sec                             ; Trial low-byte subtraction
  sbc UINT16
  tay                             ; Store result of subtraction in Y
  lda UINT32_RES_R + 1            ; Now try the high byte
  sbc UINT16 + 1
  bcc uint32_div16_nosub          ; Did subtraction succeed?
  sta UINT32_RES_R + 1            ; If yes, save result of high-byte sub
  sty UINT32_RES_R                ; And save result of low-byte sub
  inc UINT32_RES_Q                ; and add 1 to the quotient
.uint32_div16_nosub
  dex
  bne uint32_div16_loop
  rts


But it returns the wrong results. For example, I tried dividing $23456789 by $9876 (big-endian notation) and got a quotient of $1205 and remainder of $3D610420.

I'd appreciate any pointers people may have. It doesn't need to be super fast - I just need it to work.

_________________
I like it when things smoke.
BlogZolatron 64 project


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 9:48 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
See http://6502.org/source/integers/ummodfix/ummodfix.htm .

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 12:25 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
speculatrix wrote:
As something of a weekend warrior (or weekend worrier) when it comes to programming, I quickly run out of talent. And that's happened here.

I have a 32-bit unsigned int stored in location UINT32_A. And I want to divide it by a 16-bit unsigned int stored at UINT16. Both are stored little-endian, of course. I want to return both quotient and remainder.
...
But it returns the wrong results. For example, I tried dividing $23456789 by $9876 (big-endian notation) and got a quotient of $1205 and remainder of $3D610420.

I'd appreciate any pointers people may have. It doesn't need to be super fast - I just need it to work.

Using Garth's code may be the best solution, especially if you only need 16 bits in your quotient - but regarding the bug in your own code, I think it's that when you subtract UINT16 from UINT32_RES_R, you're only doing the bottom two bytes:

Code:
  lda UINT32_RES_R
  sec                             ; Trial low-byte subtraction
  sbc UINT16
  tay                             ; Store result of subtraction in Y
  lda UINT32_RES_R + 1            ; Now try the high byte
  sbc UINT16 + 1
  bcc uint32_div16_nosub          ; Did subtraction succeed?

At this stage if the carry is clear it means UINT16 couldn't be subtracted from the bottom two bytes of UINT32_RES_R without borrowing - but UINT32_RES_R is wider than that, and if it has any set bits in its upper bytes then borrowing from them is fine and the subtraction should succeed. When subtracting a 16-bit value from a 32-bit value you ought to routinely subtract #0 from each of the higher bytes of UINT32_RES_R in turn, then look at the carry after that to decide whether the result fitted or not.

Your case is a little more complex than straight subtraction, as you're trying to do subtraction in-place but only if no borrow is required in the end, so you need to remember the results during the process and only store them back to UINT32_RES_R if the subtraction ends up working without borrowing. While you could just do a dummy 32-bit subtract first without storing results, and then repeat it with storing if the carry is set at the end, there are some other shortcuts you could take. Garth's code has what looks like a more optimal version taking advantage of the fact that we don't really need 32 bits in the remainder result - we only need 17 bits (temporarily) and 16 bits for the actual result. It also only returns a 16-bit quotient though which might not be what you want. If you need a 32-bit quotient returned, you may be able to adapt Garth's code, or fix your own code using inspiration from his.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 3:19 pm 
Offline
User avatar

Joined: Tue Apr 03, 2018 2:10 pm
Posts: 125
Thanks for the replies. Most helpful. Now I have a new direction to try. (But I may be back...)

_________________
I like it when things smoke.
BlogZolatron 64 project


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 3:50 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
fun fact about the size of outputs.

the Quotient (the result) needs to be atleast the same width as the Dividend (the number to divide), same with the Remainder and the Divisor (number to divide by).
first one is easy to check, 4 billion / 1 = 4 billion. so the Quotient needs atleast as many bits as the Dividend to support simply passing the number through (ie dividing by 1).
second one is also simple, 65534 % 65535 = 65534, technically it's 1 below the maximum value of the Divisor, but you still need as many bits as the Divisor to represent it.

so in your case 32-bit by 16-bit division will need a 32-bit Result and 16-bit Remainder.

.

knowing that you can remove all operations being done on UINT32_RES_R+2 and beyond.

when building a wider MUL/DIV function i usually just take the simple 8-bit one and extend it to the width of the operands i need.

for example here 8-bit division (result in dividend):

Code:
udiv8:
    STZ remainder       ; Clear the Remainder
    LDY #8              ; Loop Counter
    ASL dividend        ; Shift the Dividend 1 to the left
    @loop:
        ROL remainder   ; And into the Remainder
       
        LDA remainder
        CMP divisor
        BCC @s          ; Check if Remainder >= Divisor
       
        SBC divisor     ; If Divisor > Remainder, Subtract the Divisor from the Remainder
        STA remainder
       
    @s: ROL dividend    ; Shift the Dividend 1 to the left (and set bit 0 if Remainder >= Divisor was true)
        DEY             ; Decrement the Loop Counter
    BNE @loop
RTS


dividend is 32-bits wide so all it's shifts need to be done on 4 bytes. divisor and remainder are only 16-bits wide so all it's shifts, the subtract, and compare need to be done on 2 bytes.

so the modification to the above function could look like this:

Code:
udiv3216:
    STZ remainder
    STZ remainder+1     ; Clear the Remainder
    LDY #32             ; Loop Counter
    ASL dividend
    ROL dividend+1
    ROL dividend+2
    ROL dividend+3      ; Shift the Dividend 1 to the left
    @loop:
        ROL remainder
        ROL remainder+1 ; And into the Remainder
       
        LDA remainder+1
        CMP divisor+1
        BCC @s          ; Check if Remainder >= Divisor (High Bytes first)
        LDA remainder
        CMP divisor
        BCC @s          ; If true, Check the Low Bytes next
       
        LDA remainder
        SBC divisor
        STA remainder
        LDA remainder+1
        SBC divisor+1
        STA remainder+1 ; If Remainder < Divisor, Subtract the Divisor from the Remainder
       
        ROL dividend
        ROL dividend+1
        ROL dividend+2
    @s: ROL dividend+3  ; Shift the Dividend 1 to the left (and set bit 0 if Remainder >= Divisor was true)
        DEY             ; Decrement the Loop Counter
    BNE @loop
RTS


(note that this latter function is untested)

on a side note Garth's function confuses me. a 32-bit Dividend needs a 32-bit Quotient to fully work, so if the function only outputs a 16-bit one then what exactly would be the difference between that and just cutting off the upper 16-bits of the Dividend and using a 16-bit by 16-bit division function?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 5:17 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Proxy wrote:
so in your case 32-bit by 16-bit division will need a 32-bit Result and 16-bit Remainder.

.

knowing that you can remove all operations being done on UINT32_RES_R+2 and beyond.

You still need to be a little bit careful when the dividend has a narrower width than the divisor - e.g. consider $0100 divided by $FF. During the calculation you actually need an extra bit. This is the bug that Garth was pointing out in his page (if I understood correctly) and he uses an extra "CARRY" variable to store this bit before the subtraction. It's essentially the third byte of the remainder, but it's only needed during the calculation and he stores it elsewhere.

Quote:
Code:
    @loop:
        ROL remainder
        ROL remainder+1 ; And into the Remainder
       
        LDA remainder+1
        CMP divisor+1

This CMP is the point where you lost the top bit of the remainder from the last ROL - on entry the remainder could have been $8000, with a dividend of $FFFF, in which case now the remainder ought to be $10000 and the CMP should succeed - but it won't. I think you could insert a "BCS" before this "LDA", skipping over the CMPs, perhaps, and it would probably work out, though you'd also need to re-set the carry before the "ROL dividend" I believe.

I'm not an expert on these division algorithms though, just saying what seem to be problems in the code!

Quote:
on a side note Garth's function confuses me. a 32-bit Dividend needs a 32-bit Quotient to fully work, so if the function only outputs a 16-bit one then what exactly would be the difference between that and just cutting off the upper 16-bits of the Dividend and using a 16-bit by 16-bit division function?

For cases where the result does fit in 16 bits it seems useful enough to me? Perhaps I'm missing something though. I can't see a simple way to reduce the 32-bit number to a 16-bit one and then do a 16-bit division, unless you "normalise" the arguments first, like with floating point. Again though, I'm not an expert on division algorithms!


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 6:47 pm 
Offline
User avatar

Joined: Tue Apr 03, 2018 2:10 pm
Posts: 125
I do indeed want to have a 32-bit quotient. I don't think I need it for the immediate application, but it would be good to have the routine be more generally applicable.

And I think I'd worked out that I need only do subtractions on the lowest two bytes, having a 16-bit divisor.

_________________
I like it when things smoke.
BlogZolatron 64 project


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 6:51 pm 
Offline
User avatar

Joined: Tue Apr 03, 2018 2:10 pm
Posts: 125
Proxy wrote:
so the modification to the above function could look like this:


I'm intrigued as to where the loop starts in that code. Doesn't the shifting of the dividend (essentially into the remainder) need to be within the loop?

UPDATE: Disregard - I saw the ROLs at the bottom.

_________________
I like it when things smoke.
BlogZolatron 64 project


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 6:58 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
gfoot wrote:
You still need to be a little bit careful when the dividend has a narrower width than the divisor - e.g. consider $0100 divided by $FF. During the calculation you actually need an extra bit. This is the bug that Garth was pointing out in his page (if I understood correctly) and he uses an extra "CARRY" variable to store this bit before the subtraction. It's essentially the third byte of the remainder, but it's only needed during the calculation and he stores it elsewhere.


interesting, i didn't catch that on the site. i remember i once implemeted a 16/8-bit div function for converting binary to ASCII decimal. and it worked without needing an extra carry bit/byte... or maybe it did and i just forgot about it.

gfoot wrote:
This CMP is the point where you lost the top bit of the remainder from the last ROL - on entry the remainder could have been $8000, with a dividend of $FFFF, in which case now the remainder ought to be $10000 and the CMP should succeed - but it won't. I think you could insert a "BCS" before this "LDA", skipping over the CMPs, perhaps, and it would probably work out, though you'd also need to re-set the carry before the "ROL dividend" I believe.

I'm not an expert on these division algorithms though, just saying what seem to be problems in the code!

nah that is correct, the MSB of the remainder is simply discarded (why would the remainder have a value upon entry? It's always 0 when the loop starts). the 8 and 16-bit DIV functions that i used to have in my SBC ROM also did that, i remember them working perfectly fine. i based them off online functions like codebase64, or this one, and various other resources that i found which implement the same algorithm in hardware, and they all just discard the top most bit as well.

gfoot wrote:
For cases where the result does fit in 16 bits it seems useful enough to me?

yea that makes sense, but if you cannot know that the result fits into 16-bits, wouldn't it be safer to use a full 32-bit quotient? those extra few bytes and cycles won't really make an impact on performance unless you're running on a slow retro system where every byte and cycle counts (like the NES) or do millions of div calls while mining bitcoin or something.

gfoot wrote:
I can't see a simple way to reduce the 32-bit number to a 16-bit one and then do a 16-bit division, unless you "normalise" the arguments first, like with floating point. Again though, I'm not an expert on division algorithms!

well like said, you just cut it off (or truncate it down to 16-bit). obviously if the dividend it actually 32-bits wide then you'll destroy the value.
but if the quotient fits into 16-bits regardless of the divisor (meaning the dividend is at most 65535), then running the 32/16-bit div function is the same as truncating the dividend to 16-bits and running a regular 16/16-bit div function.

speculatrix wrote:
Disregard - I saw the ROLs at the bottom.

yep, it does one before the loop and then all the other ones at the bottom.
it's not really nessesary, you could do all of them at the top of the loop and have the BCC's branch to the DEY at the end (since it keeps the carry from the SBC to the next iteration it would work fine) but you would need to place a CLC before the start of the loop to make the ROL of the dividend to work correctly.


Last edited by Proxy on Tue Aug 22, 2023 10:10 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 21, 2023 7:19 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
My routine originated from Forth's UM/MOD which had a bug in figForth's initial publication (at least for the 6502) and I set out to fix it when I realized where the problem lay in a work project around 1990.  UM/MOD basically goes the opposite direction of multiplying two 16-bit numbers to get a 32-bit result and adding what will be come the remainder when you do the UM/MOD.  There are other words (routines) available for higher precision.  If you want a 32-bit quotient, you'll need to start with a higher-precision dividend and just pad its top bytes with 0, IIRC.  This is something I concluded after wrestling with writing a divide routine for PIC16 for a work project ten years ago with non-standard precisions.  If you really get through this stuff, you'll definitely end up understanding division a lot better than the average bear does.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 22, 2023 10:02 am 
Offline
User avatar

Joined: Tue Apr 03, 2018 2:10 pm
Posts: 125
I believe I have it working, having fixed some very egregious errors in my first attempt (such as the quotient having a maximum value of 32!).

Code:
\ ------------------------------------------------------------------------------
\ ---  uint32_div16
\ ---  Divide a 32-bit unsigned integer by a 16-bit uint.
\ ---  Returns: 32-bit uint with quotient
\ ---               32-bit uint with remainder (only bottom 16 bits significant)
\ ------------------------------------------------------------------------------
\ ON ENTRY: - UINT32_A/+3   - number to be divided
\           - UINT16_B/+1   - divisor
\ ON EXIT : - UINT32_RES/+3 - remainder
\           - UINT32_A/+3   - quotient
.uint32_div16
  ldx #3
.clr_loop
  stz UINT32_RES,X
  dex
  bpl clr_loop

  ldx #32                               ; There are 32 bits in dividend
.uint32_div16_loop
  ; ----- SHIFT LEFT -----
  asl UINT32_A
  rol UINT32_A + 1
  rol UINT32_A + 2
  rol UINT32_A + 3

  rol UINT32_RES          ; Top-most bit of dividend now in remainder as LSB
  rol UINT32_RES + 1      ; Over the course of 32 iterations, we gradually
  rol UINT32_RES + 2      ; Move the dividend into the remainder
  rol UINT32_RES + 3

  ; ----- TEST SUBTRACTIONS -----
  lda UINT32_RES                ; Try the low byte
  sec
  sbc UINT16_B
  tay                             ; Store the result for later

  lda UINT32_RES + 1            ; Now the high byte
  sbc UINT16_B + 1
  sta MATH_TMP_A                  ; Store for later

  lda UINT32_RES + 2            ; Now third byte of remainder
  sbc #0                          ; Divisor is always 0. Leave the result in A

.test_result
  bcc uint32_div16_nosub          ; Did subtraction succeed? If not, branch.
  sta UINT32_RES + 2            ; Subtraction worked. Store third byte result
  lda MATH_TMP_A                  ; Reload second byte result
  sta UINT32_RES + 1
  sty UINT32_RES                ; And save result of low-byte sub
  inc UINT32_A
.uint32_div16_nosub
  dex
  bne uint32_div16_loop
  rts


I've tested it with only a few values so far, but the results have been right thus far.

The quotient is now returned in the same uint32 used for the dividend.

The remainder is being returned in a uint32 even though it can never be more than 16bits (I think).

As gfoot pointed out, although the divisor is only 16 bits, you need to do a subtraction on the third byte in case the subtraction on the second byte caused a borrow. (What you're subtracting will always be 0, but we need to take care of the state of the Carry).

I have no clue if this is at all efficient, so any tips or any reports of bugs would be appreciated.

_________________
I like it when things smoke.
BlogZolatron 64 project


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot], Hermes and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: