Multiplication and division can be a bit tricky to get right on the first or even second try, and I applaud your efforts to get right down to the nuts and bolts. I have a smaller and more featureful version from my VTL02 interpreter, but I didn't write it from scratch ... I adapted it from some older 6800 code that I found. I'm not going to claim that mine's any better than anyone elses, although it has been battle-tested, and seems to perform adequately. You are welcome to adapt it to your use case, although I certainly understand that you might be more interested in something you've built yourself from the ground up.
Code:
;-----------------------------------------------------;
; 16-bit x 16-bit unsigned division routine
; var[x] /= var[x+2], remn = remainder
; var[x] /= 0 produces remn = var[x], var[x] = 65535
; 40 bytes
div:
lda #0
sta remn ; remn = 0
sta remn+1
ldy #16 ; loop counter
div1:
asl 0,x ; var[x] is gradually replaced
rol 1,x ; with the quotient
rol remn ; remn is gradually replaced
rol remn+1 ; with the remainder
lda remn
cmp 2,x
lda remn+1 ; partial remainder >= var[x+2]?
sbc 3,x
bcc div2
sta remn+1 ; yes: update the partial
lda remn ; remainder and set the
sbc 2,x ; low bit in the partial
sta remn ; quotient
inc 0,x
div2:
dey
bne div1 ; loop 16 times
rts
To use mine, you reserve two bytes at remn, load the dividend and divisor into four consecutive zero-page bytes, point register X to the first of those bytes and let 'er rip. If you don't need the remainder you may be able to shorten this a bit further, but I'll leave that as an exercise for the interested reader.
_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some
VTL02C on it and see how it grows on you!
Mike B.
(about me) (learning how to github)