Division routine help sought!
Posted: Sat Apr 25, 2020 8:58 pm
Collective groan from the 6502.org community.
But wait! I'm just wondering if I can improve on what I've got, and there's no better collection of minds than here.
So, I have two unsigned 16-bit values A and B, where A<B, and I want to compute A/B to 8-bit fractional precision.
My starting point was to look at a simpler case: dividing two 8 bit values, or, in this case, 16-bit divided by 8-bit, yielding an 8-bit value. To my moderate surprise, such a routine is already presented in the 6502 wiki:
Given that the LSB of the numerator is always zero, this can be subtly optimized to the following:
That actually looks pretty efficient, and I can use it if I know that both my dividends are <256. But I needed a 16-bit case, so I came up with the following:
This normally completes (depending on the inputs) in between 250 and 300 cycles, which is OK, but I'm wondering if I can do better.
I wondered if there was something in doing the subtract, caching the result, and only committing it if it didn't overflow, but I feel like the routine above can short-circuit that work when not required, and probably gives a small performance gain (I guess I would need to measure it).
But the main reason I'm posting this is to ask if anyone knows a better way, or a totally different way! I'm interested in performance here, so if there's a table-based approach which uses a modest amount of RAM for tables, I'm open to ideas.
Note that I don't need an enormous amount of precision: 1/256 is fine. The use case here is as an input to an arctan table: an index from 0-255 (representing 0...1) yielding an angle between 0 and 45 degrees, where 45 degrees is represented by 128. With this kind of resolution, I don't need any more precision in the division results, as every angle is covered in the table. What I do need is to calculate hundreds of these things in real time, so, the quicker the better!
Over to you
Hope everyone's doing OK during these strange times.
But wait! I'm just wondering if I can improve on what I've got, and there's no better collection of minds than here.
So, I have two unsigned 16-bit values A and B, where A<B, and I want to compute A/B to 8-bit fractional precision.
My starting point was to look at a simpler case: dividing two 8 bit values, or, in this case, 16-bit divided by 8-bit, yielding an 8-bit value. To my moderate surprise, such a routine is already presented in the 6502 wiki:
Code: Select all
LDA TH
LDX #8
ASL TLQ
L1 ROL
BCS L2
CMP B
BCC L3
L2 SBC B
;
; The SEC is needed when the BCS L2 branch above was taken
;
SEC
L3 ROL TLQ
DEX
BNE L1Code: Select all
LDA TH
LDX #8
L1 ASL
BCS L2
CMP B
BCC L3
L2 SBC B
SEC
L3 ROL TLQ
DEX
BNE L1Code: Select all
divide:
LDA num_hi
LDX #8
loop:
ASL num_lo
ROL A
BCS subtract
CMP denom_hi
BCC skip
BNE subtract
LDY num_lo
CPY denom_lo
BCC skip
subtract:
TAY
LDA num_lo
SBC denom_lo
STA num_lo
TYA
SBC denom_hi
SEC
skip:
ROL result
DEX
BNE loop
RTS
I wondered if there was something in doing the subtract, caching the result, and only committing it if it didn't overflow, but I feel like the routine above can short-circuit that work when not required, and probably gives a small performance gain (I guess I would need to measure it).
But the main reason I'm posting this is to ask if anyone knows a better way, or a totally different way! I'm interested in performance here, so if there's a table-based approach which uses a modest amount of RAM for tables, I'm open to ideas.
Note that I don't need an enormous amount of precision: 1/256 is fine. The use case here is as an input to an arctan table: an index from 0-255 (representing 0...1) yielding an angle between 0 and 45 degrees, where 45 degrees is represented by 128. With this kind of resolution, I don't need any more precision in the division results, as every angle is covered in the table. What I do need is to calculate hundreds of these things in real time, so, the quicker the better!
Over to you