Collective groan from the 6502.org community.But wait! I'm just wondering if I can improve on what I've got, and there's no better collection of minds than here.
So, I have two unsigned 16-bit values A and B, where A<B, and I want to compute A/B to 8-bit fractional precision.
My starting point was to look at a simpler case: dividing two 8 bit values, or, in this case, 16-bit divided by 8-bit, yielding an 8-bit value. To my moderate surprise, such a routine is already presented in the
6502 wiki:
Code:
LDA TH
LDX #8
ASL TLQ
L1 ROL
BCS L2
CMP B
BCC L3
L2 SBC B
;
; The SEC is needed when the BCS L2 branch above was taken
;
SEC
L3 ROL TLQ
DEX
BNE L1
Given that the LSB of the numerator is always zero, this can be subtly optimized to the following:
Code:
LDA TH
LDX #8
L1 ASL
BCS L2
CMP B
BCC L3
L2 SBC B
SEC
L3 ROL TLQ
DEX
BNE L1
That actually looks pretty efficient, and I can use it if I know that both my dividends are <256. But I needed a 16-bit case, so I came up with the following:
Code:
divide:
LDA num_hi
LDX #8
loop:
ASL num_lo
ROL A
BCS subtract
CMP denom_hi
BCC skip
BNE subtract
LDY num_lo
CPY denom_lo
BCC skip
subtract:
TAY
LDA num_lo
SBC denom_lo
STA num_lo
TYA
SBC denom_hi
SEC
skip:
ROL result
DEX
BNE loop
RTS
This normally completes (depending on the inputs) in between 250 and 300 cycles, which is
OK, but I'm wondering if I can do better.
I wondered if there was something in doing the subtract, caching the result, and only committing it if it didn't overflow, but I feel like the routine above can short-circuit that work when not required, and probably gives a small performance gain (I guess I would need to measure it).
But the main reason I'm posting this is to ask if anyone knows a better way, or a totally different way! I'm interested in performance here, so if there's a table-based approach which uses a modest amount of RAM for tables, I'm open to ideas.
Note that I don't need an enormous amount of precision: 1/256 is fine. The use case here is as an input to an arctan table: an index from 0-255 (representing 0...1) yielding an angle between 0 and 45 degrees, where 45 degrees is represented by 128. With this kind of resolution, I don't need any more precision in the division results, as every angle is covered in the table. What I
do need is to calculate hundreds of these things in real time, so, the quicker the better!
Over to you
Hope everyone's doing OK during these strange times.