I need some help understanding how to use this algorithm.

barrym95838 · Post by **barrym95838** » Tue May 20, 2014 3:03 am

The algorithm in question is here:

http://6502org.wikidot.com/software-math-fastdiv

It is the handiwork of our resident guru dclxvi (Bruce), but I am too dense to figure out how it works from the cryptic explanation and non-existent comments. For the 8-bit version, I am supposed to call this routine with a 39-bit number stored in R, and the quotient quickly appears in the accumulator, with amazing efficiency.

Very impressive ... but ...

My question to all is "how am I supposed to retrieve a quotient from a single number?" Don't I have to provide two numbers, a dividend and a divisor? Or is one of them assumed to be a fixed value? In either case, how do I encode the input(s)? If any of you can help me figure out how it works, I would like to expand it to 32-bits for my 65m32.

Thanks to all,

Mike

White Flame · Post by **White Flame** » Tue May 20, 2014 4:33 am

It looks like a reciprocal, not division.

barrym95838 · Post by **barrym95838** » Tue May 20, 2014 4:42 am

White Flame wrote:

It looks like a reciprocal, not division.

Thanks.

That brings me one step closer to understanding, but I still have a long way to go. You are of the opinion that it's calculating something like 1/R, but I still don't get how to encode the input or decode the output.

If this code was posted on the first of April, I might have a better chance figuring it out!

Mike

dclxvi · Post by **dclxvi** » Thu May 22, 2014 1:16 am

Multiplying by a power of 2 is easy, just use N ASLs. Multiplying by, e.g. 10, isn't much harder:

Code: Select all

   STA TMP
   ASL
   ASL     ; a*4
   CLC
   ADC TMP ; a*4+a = a*5
   ASL     ; (a*4+a)*2 = a*10

To use a different multiplier, e.g. 13, you must change the code (again, untested):

Code: Select all

   STA TMP
   ASL
   CLC
   ADC TMP ; a*2+a = a*3
   ASL
   ASL     ; (a*2+a)*4 = a*12
   CLC
   ADC TMP ; (a*2+a)*4+a = a*13

Likewise multiply-by-31 can be implemented as a*32-a. Some mulitpliers are more convenient than others. 10 works out great, whether you're multiplying an 8-bit, 16-bit, or 32-bit number, because 10 has only 2 bits that are ones. On the other hand, using this technique to multiply, e.g., a 32-bit number by $A9D6 may not be worth it because there are a whole bunch of one bits in $A9D6.

Calculating the remainder when the divisor is a power of 2 is also easy, e.g. mod 8 is AND #7. The analogous technique when the divisor is not a power of 2 is here:

http://6502org.wikidot.com/software-math-fastmod

Likewise, the routine must be changed when a different divisor is used, and some divisors will be more convenient than others.

Likewise, calculating the quotient when the divisor is a power of 2 is easy, just use N LSRs. The routine linked in the original post is the analog for calculating quotients when the divisor is not a power of 2. Again, the routine must be changed for a different divisor and some divisors will be more convenient than others.

Remember repeating decimals back when you learned them in school? e.g.:

Code: Select all

        ______
1/13 = .076923

i.e. 1/13 = (7/100 + 6/1000 + 9/10000 + 2/100000 + 3/1000000) + ...

In base 2:

.1 = 1/2
.01 = 1/4
.001 = 1/8
etc.,

Code: Select all

        ____________
1/13 = .000100111011

i.e. 1/13 = (1/16 + 1/128 + 1/256 + 1/512 + 1/2048 + 1/4096) + ...

So the basic technique is to shift right and add. There are a couple of complications, however. First, there are an infinite number of terms, so you have to determine how many terms you need. You'll need more terms to divide a 32-bit number by 13 than to divide an 8-bit number by 13. (This differs from multiplication by 13, where there are always 3 terms, N*8 + N*4 + N, regardless of whether N is 8, or 16, or 32 bits.)

Second, since you are shifting right, you are losing the least significant bit as it gets shifted out of the register/variable. Thus, there are three possible things to precede the ADC: a CLC, a SEC, or nothing (i.e. use the bit you just shifted out as the carry). Here is the 8-bit routine for divide by 13:

Code: Select all

  STA TMP
  LSR
  LSR     ; a/4
  ADC TMP ; a/4 + a
  ROR     ; a/8 + a/2
  ADC TMP ; a/8 + a/2 + a
  ROR     ; a/16 + a/4 + a/2
  ADC TMP ; a/16 + a/4 + a/2 + a
  ROR
  LSR
  LSR     ; a/128 + a/32 + a/16 + a/8
  SEC
  ADC TMP ; a/128 + a/32 + a/16 + a/8 + a
  ROR
  LSR
  LSR
  LSR     ; a/2048 + a/512 + a/256 + a/128 + a/16

Here is the 8-bit routine for divide-by-10:

Code: Select all

;                       ____
; 1/10 = .1 decimal = .00011 binary
; = (1/16 + 1/32) + (1/256 + 1/512) + (1/4096 + 1/8192) + ...

  STA TMP
  LSR
  LSR
  LSR     ; a/8
  ADC TMP ; a/8 + a
  ROR     ; a/16 + a/2
  ADC TMP ; a/16 + a/2 + a
  ROR
  LSR
  LSR     ; a/128 + a/16 + a/8
  ADC TMP ; a/128 + a/16 + a/8 + a
  ROR     ; a/256 + a/32 + a/16 + a/2
  SEC
  ADC TMP ; a/256 + a/32 + a/16 + a/2 + a
  ROR
  LSR
  LSR
  LSR     ; a/4096 + a/512 + a/256 + a/32 + a/16

barrym95838 · Post by **barrym95838** » Thu May 22, 2014 5:18 am

Thanks for responding, dclxvi!

I can understand what you're saying here (above), and I can even see how the bit pattern of the reciprocal is reflected in your nicely commented examples. I don't grok how and where you determined the need for the SECs, but I can definitely see the big picture.

What I can't understand is how to make proper use of this (from the linked page):

Code: Select all

   ASL R3
   LDA R4
   ROL
   SEC
   PHA
   ADC R0
   PLA
   BCC .1
   INC R1
   BNE .1
   INC R2
   BNE .1
   INC R3
   INC R3
   BNE .1
   ADC #0
.1

My tired old brain can't take the next step in understanding what you're doing in this code, at least not without some coaching. Could I impose on you to walk me (us?) through a sample case or two, showing the relationship between the input and output of this little gem? I can feed it a bit pattern and retrieve the result, but I don't know how to relate them to actual numbers in a meaningful or useful way. For example, if I feed it 39 zeros, what does that mean, and what does the resulting bit pattern mean?

Thanks in advance,

Mike

bogax · Post by **bogax** » Fri May 23, 2014 5:54 am

I'd argue that you should defer the division to retain
as much accuracy in the partial products as possible
1/10 = .000110011

(untested)

Code: Select all

 sta temp  1
 lsr        .1
 adc temp  1.1
 ror        .11
 lsr
 lsr
 lsr        .00011
 adc temp  1.00011
 ror        .100011
 adc temp  1.100011
 ror
 lsr
 lsr
 lsr
 lsr        .0001100011

If you start by dividing by 8
You've thrown away three bits that could participate

For repeating decimals (binamals?) you can form a
partial product that repeats.

Code: Select all

 sta temp  1
 lsr
 adc temp  1.1
 ror
 sta temp   .11 
 lsr
 lsr
 lsr
 lsr        .000011
 adc tmp    .110011
 ror
 lsr
 lsr        .000110011

I don't see the point of setting the carry.
Rounding happens automatically if you don't
clear the carry.
It maybe be possible to get by with fewer partial
products by a judicious use of sec but I think
that would be rare.

dclxvi · Post by **dclxvi** » Thu May 29, 2014 1:55 am

barrym95838 wrote:

For example, if I feed it 39 zeros, what does that mean, and what does the resulting bit pattern mean?

It's pretty much what the bullets points say. R5 to R0 is the input (numerator) and the accumulator (or R5 and R4 for the 16-bit version) is the output (quotient).

8-bit version:

Code: Select all

; compute floor ($123456789A / $007FFFFFFF)
;
   LDA #$9A ; lsb of numerator
   STA R0
   LDA #$78
   STA R1
   LDA #$56
   STA R2
   LDA #$34
   STA R3
   LDA #$12 ; msb of numerator
   STA R4
   JSR DIV
   JSR OUTHEX ; output quotient

16-bit version:

Code: Select all

; compute floor ($123456789ABC / $00007FFFFFFF)
;
   LDA #$BC ; lsb of numerator
   STA R0
   LDA #$9A
   STA R1
   LDA #$78
   STA R2
   LDA #$56
   STA R3
   LDA #$34
   STA R4
   LDA #$12 ; msb of numerator
   STA R5
   JSR DIV
   LDA R5     ; msb of quotient
   JSR OUTHEX
   LDA R4     ; lsb of quotient
   JSR OUTHEX

bogax wrote:

If you start by dividing by 8
You've thrown away three bits that could participate

It doesn't start by dividing by 8, it starts by storing so that every addition uses the full 8 bits of the accumulator. Your first routine ends with one too many LSRs, but if you remove one, it's basically just using one fewer term that the routine I posted above. If you plug it into the following unit test, you'll see that it's close (the quotient only off by one in a few cases), but not quite correct for all numerators. It may be close enough for some applications, though.

Code: Select all

   LDX #0
   STX QUOT
   STX REM
   STX MAXERROR
   LDY #0
@A TXA
   JSR DIV10
   CMP QUOT
   BEQ @B
   JSR OUTNE
@B INC REM   ; increment the remainder
   LDA REM   ; every 10th time, reset the remainder and increment the quotient
   CMP #10
   BCC @C
   LDA #0
   STA REM
   INC QUOT
@C INX
   BNE @A
   LDA MAXERROR ; output maximum error
   JSR OUTHEX
   RTS

OUTHEXSP
   JSR OUTHEX
   LDA #$20
   JMP OUTPUT

GETERROR ; compute abs (actual quotient - expected quotient)
   SEC
   SBC QUOT
   BCS @A
   EOR #$FF ; compute -A (carry is clear here)
   ADC #1
@A RTS

STOREERROR
   CMP MAXERROR
   BCC @A
   STA MAXERROR
@A RTS

OUTNE
   PHA
   TYA
   JSR OUTHEXSP ; output error count
   TXA
   JSR OUTHEXSP ; output numerator
   PLA
   PHA
   JSR OUTHEXSP ; output actual quotient
   LDA QUOT
   JSR OUTHEXSP ; output expected quotient
   PLA
   JSR GETERROR
   JSR STOREERROR
   JSR OUTHEX   ; output error, i.e. abs (actual quotient - expected quotient)
   INY          ; increment error count
   JMP OUTCRLF

MAXERROR .ds 1
QUOT     .ds 1
REM      .ds 1

bogax wrote:

For repeating decimals (binamals?) you can form a
partial product that repeats.

The trouble with this is that you're not storing one bit of the (9 bit) sum after with that second STA (in your second routine). If you plug it into the unit test above, you'll see it's close, but not quite correct.

bogax wrote:

It maybe be possible to get by with fewer partial
products by a judicious use of sec but I think
that would be rare.

It's more common than I would have guessed. I've hardly ever used this division technique, so I never wound up working through the theory mathematically. I just used trial and error to determine how many terms were necessary for various denominators and what to precede the ADC with. (Then I must've got going on some other side project and I never revisited division.) I basically just experimented by plugging different routines into a unit test, and it turned out that division isn't as straightforward as multiplication or modulus.

bogax · Post by **bogax** » Sat Jun 14, 2014 7:17 pm

I see Omegamatrix has posted his compilation of division by a constant routines to nesdev

I need some help understanding how to use this algorithm.

I need some help understanding how to use this algorithm.

Re: I need some help understanding how to use this algorithm

Re: I need some help understanding how to use this algorithm

Re: I need some help understanding how to use this algorithm

Re: I need some help understanding how to use this algorithm

Re: I need some help understanding how to use this algorithm

Re: I need some help understanding how to use this algorithm

Re: I need some help understanding how to use this algorithm