gfoot wrote:
You still need to be a little bit careful when the dividend has a narrower width than the divisor - e.g. consider $0100 divided by $FF. During the calculation you actually need an extra bit. This is the bug that Garth was pointing out in his page (if I understood correctly) and he uses an extra "CARRY" variable to store this bit before the subtraction. It's essentially the third byte of the remainder, but it's only needed during the calculation and he stores it elsewhere.
interesting, i didn't catch that on the site. i remember i once implemeted a 16/8-bit div function for converting binary to ASCII decimal. and it worked without needing an extra carry bit/byte... or maybe it did and i just forgot about it.
gfoot wrote:
This CMP is the point where you lost the top bit of the remainder from the last ROL - on entry the remainder could have been $8000, with a dividend of $FFFF, in which case now the remainder ought to be $10000 and the CMP should succeed - but it won't. I think you could insert a "BCS" before this "LDA", skipping over the CMPs, perhaps, and it would probably work out, though you'd also need to re-set the carry before the "ROL dividend" I believe.
I'm not an expert on these division algorithms though, just saying what seem to be problems in the code!
nah that is correct, the MSB of the remainder is simply discarded (why would the remainder have a value upon entry? It's always 0 when the loop starts). the 8 and 16-bit DIV functions that i used to have in my SBC ROM also did that, i remember them working perfectly fine. i based them off online functions like
codebase64, or
this one, and various other resources that i found which implement the same algorithm in hardware, and they all just discard the top most bit as well.
gfoot wrote:
For cases where the result does fit in 16 bits it seems useful enough to me?
yea that makes sense, but if you cannot know that the result fits into 16-bits, wouldn't it be safer to use a full 32-bit quotient? those extra few bytes and cycles won't really make an impact on performance unless you're running on a slow retro system where every byte and cycle counts (like the NES) or do millions of div calls while mining bitcoin or something.
gfoot wrote:
I can't see a simple way to reduce the 32-bit number to a 16-bit one and then do a 16-bit division, unless you "normalise" the arguments first, like with floating point. Again though, I'm not an expert on division algorithms!
well like said, you just cut it off (or truncate it down to 16-bit). obviously if the dividend it actually 32-bits wide then you'll destroy the value.
but if the quotient fits into 16-bits regardless of the divisor (meaning the dividend is at most 65535), then running the 32/16-bit div function is the same as truncating the dividend to 16-bits and running a regular 16/16-bit div function.
speculatrix wrote:
Disregard - I saw the ROLs at the bottom.
yep, it does one before the loop and then all the other ones at the bottom.
it's not really nessesary, you could do all of them at the top of the loop and have the BCC's branch to the DEY at the end (since it keeps the carry from the SBC to the next iteration it would work fine) but you would need to place a CLC before the start of the loop to make the ROL of the dividend to work correctly.