6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Apr 23, 2024 4:45 pm

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Jan 28, 2004 7:26 am 
Offline

Joined: Fri Aug 30, 2002 2:05 pm
Posts: 347
Location: UK
Garth [et-all],

I've been using something similar to your HTD routine for years but starting LSbit first, had I thought to do it MSbit first it would have been faster and shorter. What I have now is similar and as short and fast as I can easily make it. If anyone can think of a faster and/or shorter version please post it.

First saving is to convert the lowest four bits directly thus ..
Code:
   LDA    HTD_IN            ; get the value
   AND    #$0F              ; mask low 4 bits
   SED                      ; set decimal mode
   CLC                      ; clear carry for decimal adjust
   ADC    #$00              ; decimal adjust low nibble
   STA    HTD_OUT           ; save value

.. this saves cycles, as we don't have four trips through the add loop, and bytes as the table is shorter. Next re-arrange the table so that all the bytes of the same magnitude are together. This saves two DEX instructions and looks like this ..
Code:
TABLE2:
   .byte    $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$01,$03   ; 990000s

TABLE1:
   .byte    $00,$00,$00,$01,$02,$05,$10,$20,$40,$81,$63,$27   ; 9900s

TABLE0:
   .byte    $15,$31,$63,$27,$55,$11,$23,$47,$95,$91,$83,$67   ; 99s

.. note that the values are all -1 as the next change is to remove the CLC from before the adds. As the carry is always set it is allowed for here.

The only other change is to hold the highest result byte in the Y register while evaluating, this saves a couple of cycles per loop.

The result is ..
Code:
HTD_IN:
   .ds    2               ; Low byte first, as is normal for 6502.
HTD_OUT:
   .ds    3               ; Low byte first, highest byte last.

; hex to decimal conversion table
; table entries are 2^n-1 as the carry is always set before the add

TABLE2:
   .byte    $00,$00,$00,$00,$00,$00,$00,$00,$00,$00,$01,$03   ; 990000s

TABLE1:
   .byte    $00,$00,$00,$01,$02,$05,$10,$20,$40,$81,$63,$27   ; 9900s

TABLE0:
   .byte    $15,$31,$63,$27,$55,$11,$23,$47,$95,$91,$83,$67   ; 99s

HTD:
   LDA    HTD_IN          ; get the low byte
   AND    #$0F            ; mask low 4 bits
   SED                    ; set decimal mode
   CLC                    ; clear carry for decimal adjust
   ADC    #$00            ; decimal adjust low nibble
   STA    HTD_OUT         ; set 99s

   LDY    #$00            ; clear 990000s
   STY    HTD_OUT+1       ; clear 9900s

   LDX    #$0B            ; set bit count/table index
                          ; (# of highest bit - 4)
HTD1:
   ASL    HTD_IN          ; shift number low byte
   ROL    HTD_IN+1        ; shift number high byte
   BCC    HTD2            ; branch if bit = zero

   LDA    HTD_OUT         ; get 99s
   ADC    TABLE0,X        ; add table 99s byte
   STA    HTD_OUT         ; save 99s
   LDA    HTD_OUT+1       ; get 9900s
   ADC    TABLE1,X        ; add table 9900s byte
   STA    HTD_OUT+1       ; save 9900s
   TYA                    ; copy 990000s
   ADC    TABLE2,X        ; add table 990000s byte
   TAY                    ; save 990000s
HTD2:
   DEX                    ; decrement bit count ..
   BPL    HTD1            ; loop if not all done yet

   CLD                    ; clear decimal mode
   STY    HTD_OUT+2       ; save 990000s
   RTS

That's all I can think of, it runs somewhere between 250 cycles for all bits = 0, and 650 cycles for all 1s.

Cheers,

Lee.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Jan 28, 2004 8:45 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8426
Location: Southern California
Quote:
I've been using something similar to your HTD routine for years but starting LSbit first, had I thought to do it MSbit first it would have been faster and shorter. What I have now is similar and as short and fast as I can easily make it. If anyone can think of a faster and/or shorter version please post it.

Sure. A man I worked with 15 years ago had been in some kind of user group for Hewlett-Packard calculators, before the HP-41 was out with its wider capabilities. He said they'd have contests to see who could program various functions with the fewest steps. (At that time, memory was very limited but you could usually add time.) He said just when they were sure it absolutely could not get any shorter and they were about to announce a winner, someone else would come up with something that was only half as long; but by that time it was hard to figure out why it worked at all!

I'm not sure we'll run into any of that here (and shorter usually means faster anyway), but source code certainly allows for loads of explanation with no penalty on runtime memory or speed. Regardless, it's good to be in the habit of always seeking efficiency and not slipping into the wastelands (literally) of Microsoft where throwing megabytes and GHz at a problem is the accepted solution for overcoming programming sloppiness—bugs notwithstanding.

Pardon the deviation from the subject. I hope it was interesting anyway. I'll always welcome better ways to do something, as long as trade-offs (if there are any) are given consideration.

Garth

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Mar 02, 2004 9:31 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I have a more compact version that does away with the data tables but it costs more cycles to execute.
Code:
            .ORG $10
       
BIN         .DW  12345  ; A test value to convert (LSB first)
BCD         .DS  3      ; Should end up as $45,$23,$01

            .ORG $0200

BINBCD16:   SED         ; Switch to decimal mode
            LDA #0      ; Ensure the result is clear
            STA BCD+0
            STA BCD+1
            STA BCD+2
            LDX #16     ; The number of source bits
       
CNVBIT:     ASL BIN+0   ; Shift out one bit
            ROL BIN+1
            LDA BCD+0   ; And add into result
            ADC BCD+0
            STA BCD+0
            LDA BCD+1   ; propagating any carry
            ADC BCD+1
            STA BCD+1
            LDA BCD+2   ; ... thru whole result
            ADC BCD+2
            STA BCD+2
            DEX         ; And repeat for next bit
            BNE CNVBIT
            CLD         ; Back to binary
           
            BRK         ; All Done.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 20, 2008 1:26 pm 
Offline

Joined: Mon Oct 20, 2008 1:07 pm
Posts: 2
I'm surprised not to see the following extremely simple method suggested anywhere: "loop through" the number to be converted, using the 6502's decimal mode to increment the accumulator decimally (from 0) while decrementing the original value hexadecimally (X register). It might not be particularly efficient for large numbers, but the code is short and simple to understand.

Code:
HEX_TO_DEC:
   PHX     ; Push X register on stack (back-up)
   SED     ; Switch to decimal mode
   TAX     ; Transfer accumulator to X register
   LDA #00 ; Reset accumulator

?LOOP:
   DEX        ; Decrement X by 1
   CPX #00    ; If X < 0,
   BMI ?BREAK ; then break;
   CLC        ; else clear carry
   ADC #01    ; to increment accumulator by 1
   BRA ?LOOP  ; Branch always

?BREAK:
   PLX ; Pull X register from stack (restore)
   RTS ; Return from subroutine

The subroutine above performs a hex-to-decimal conversion on the number in the accumulator. It sets the D flag, but leaves the X register unchanged.

_________________
Languages shape the way we think, or don't.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 20, 2008 6:58 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8426
Location: Southern California
Good for single-byte decimal numbers. You can shorten it further by getting rid of the CPX #00 that follows the DEX, since DEX does that automatically already anyway. You could make it slightly faster by doing the CLC before ?LOOP since nothing in the loop will set the flag until you pass 99 and end up with an error anyway. It might be more important to save the status register for the D flag than to save X, but that decision can be made on a project-by-project basis.

You could reconfigure it to speed up the loop this way:

Code:
        TAX
        DEX
        BMI     ?BREAK
        LDA     #0
        CLC
        PHP
          SED

?LOOP     ADC   #1
          DEX
          BPL   ?LOOP

        PLP
?BREAK  RTS

Did I forget anything? The shortness reminds me of a square-root algorithm I have here which is extremely simple and short and uses no multiplication or division but it's so slow that unless the input number is small, you could go have lunch while it's working on it. That's not a problem though in some controller situations for processes that cannot be hurried, like a sprinkler system, clothes dryer, etc. (although I can't think of any reason you'd need a square-root function in those particular ones).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 20, 2008 8:59 pm 
Offline

Joined: Mon Oct 20, 2008 1:07 pm
Posts: 2
"Elegance is necessarily unnatural, only achieveable at great expense. If you just do something, it won't be elegant, but if you do it and then see what might be more elegant, and do it again, you might, after an unknown number of iterations, get something that is very elegant."

Many thanks for the improvements, GARTHWILSON. Of course, for serious projects, one would probably not use this algorithm (if that's an appropriate name for it); it's useful, though, if one just needs a short piece of code which does the job in a readable manner.

A better subroutine should certainly save the status register. How would you go about accomplishing that?

_________________
Languages shape the way we think, or don't.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 20, 2008 9:34 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8426
Location: Southern California
Quote:
A better subroutine should certainly save the status register. How would you go about accomplishing that?

That's the PHP and PLP ("P" for Processor status).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Oct 21, 2008 11:24 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
GARTHWILSON wrote:
The shortness reminds me of a square-root algorithm I have here which is extremely simple and short and uses no multiplication or division but it's so slow that unless the input number is small, you could go have lunch while it's working on it.


Assuming you're talking about the sum of differences (or what ever it's
called, the differences between succesive squares increase by 2 and are
just the odd numbers, sum the odd numbers till you get your square
and the number of odd numbers you summed is the square root), that's
how HP calculators do it (or did it) (sort of), they use a pseudo division
routine and use that method to find each digit.

Detailed in a famous series of articles by William E Egbert

Personal Calculator Algorithms 1: Square Roots

I think it's here

http://www.hpl.hp.com/hpjournal/pdfs/Is ... 977-05.pdf (5.8MB)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 22, 2008 4:38 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
WOW. I was never taught how to solve square roots by hand, but this successive approximation technique is just plain common sense. Nothing magical about it; and, of all the by-hand techniques I have seen in the past, this is the one which makes the most amount of intuitive sense, by far.

Thank you for the link, bogax. That really was educational.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 22, 2008 5:58 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
kc5tja wrote:
WOW. I was never taught how to solve square roots by hand, but this successive approximation technique is just plain common sense. Nothing magical about it; and, of all the by-hand techniques I have seen in the past, this is the one which makes the most amount of intuitive sense, by far.

Thank you for the link, bogax. That really was educational.


A little googling will find dozens of explanations, damn few of them are
any where near as lucid as Egbert's article he makes it simple.

(note the typical bit by bit fast binary square root is just the same, except
in binary instead of decimal which makes it simpler, how many explanations
of it are as lucid as that article?)

He just does a really good job of explaining it.

You might want to look at the other articles (I guess if they didn't teach
you how to find square roots they probably didn't teach you how to find
eg logarithims :) ) (damn these personal caclulators ;) )

Personal Calculator Algorithms II: Trigonometric Functions

http://www.hpl.hp.com/hpjournal/pdfs/Is ... 977-06.pdf (4.6MB)

Personal Calculator Algorithms III: Inverse Trigonometric Functions

http://www.hpl.hp.com/hpjournal/pdfs/Is ... 977-11.pdf (8MB)

Personal Calculator Algorithms IV: Logarithmic Functions

http://www.hpl.hp.com/hpjournal/pdfs/Is ... 978-04.pdf (7.8MB)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 22, 2008 7:10 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8426
Location: Southern California
I haven't found the super-simple-but slow square-root algorithm yet, but the one I've been using in my 6502 Forth is:

r=(x+r²)/2r

where r starts out as your first guess at the square root, then you repeat the calculation until you converge on an answer. It's fairly efficient.

Edit: I finally found the other one scribbled on a paper in my "algorithms" file. Here's how it works:
Code:
r=1
REPEAT
   x=x-r
   r=r+1
   x=x-r
UNTIL x<0
r=r-1

The amount of time it takes will be roughly proportional to the end result number.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 22, 2008 2:29 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Quote:
r=(x+r²)/2r

I thought this might be Newton's method, but research on the web suggests otherwise. However, upon further reflection, I'm not sure how this might be used, since as r increases, it will attempt to diverge. Solving for r algebraically results in some hairy (in my opinion) math.

For those wondering where this equation came from, it is this:
Code:
Let r = a = sqrt(x), and x = a².  Then,
  r=(x+r²)/2r

  2r² = x + r²

  2r² = a² + r²

  2r² = 2r²  (since r = a)


Quote:
Code:
r=1
REPEAT
   x=x-r
   r=r+1
   x=x-r
UNTIL x<0
r=r-1

This is simply the sum-of-odds technique discussed earlier. This is the worst possible performance (O(n)), but it's brutally simple.

The reason I like HP's algorithm better than any of these is that, since display precision is finite, you know a priori when to stop your calculation. E.g., there is no point in proceeding beyond 12 iterations if all you have is a 12-digit display field. And, it's much simpler to reason about than Newton's method.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 22, 2008 9:02 pm 
Offline

Joined: Fri Aug 30, 2002 2:05 pm
Posts: 347
Location: UK
bogax wrote:
note the typical bit by bit fast binary square root is just the same, except in binary instead of decimal which makes it simpler

Which is more or less how EhBASIC does SQR()

Lee.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Nov 30, 2008 2:52 am 
Offline

Joined: Mon Mar 08, 2004 9:47 pm
Posts: 7
Location: Park City, UT
I found this algorithm years ago and have it programmed into a 6502 circuit I built, and it works pretty well. I think it is called the Babylonia Algorithm. It uses simple math operations, and as long as the numbers are not really large, it converges to fairly accurate values with about 10 to 15 iterations. It also works for very large numbers, but the iteration count must be increased to reach the same level of accuracy.

Code:
  Algorithm Logic:

  SQRT n                                                                               
                                                                                       
  A = n                                                                               
  B = 1.0                                                                             
  I = Iterations                                                                     
                                                                                       
  while I != 0                                                                         
     B = ((A/B)+B)/2
     I = I - 1
  loop

  sqrt = B


Here are some results produced by my 6502 program using 15 interations. For comparison sake, I have also included the results from the simple Windows calculator.

Note: My 6502 program has only 14 digits after the decimal point, so the Windows results will be longer, but you can see how accurate this algorithm is through those 14 digits.

Code:
Input: 5000     6502 Result:  70.71067811865475    Windows Result: 70.7106781186547524400844362104

Input: 56.56    6502 Result:  7.52063827078526     Windows Result: 7.52063827078526626832321752445

Input: .0097    6502 Result:  .09848857801796      Windows Result: 0.09848857801796104721746211414

Input: 5000000  6502 Result:  2236.06797750062521  Windows Result: 2236.06797749978969640917366873



As you can see, for resonably sized numbers, the accuracy is quite good for 15 interations. Even in the last example where I input 5 million, the result is still accurate out to 7 decimal places and then deviates from the windows result. But this is easily fixed with a few additional iterations.

Since I am unlikely to ever use my 6502 circuit to crunch large numbers, 15 iterations works good enough for me, so I have never taken the time to figure out how to predict how many iterations you would need in particular number ranges, but that should be fairly easy to determine with a little experimentation.

Oops. After looking at this again, I think it really is the same thing you have above: B/B * ((A/B+B))/2 = ((A+B^2)/2B. Well, anyway, this gives you an idea of how it actually works when implimented.

Thanks for posting the HP method. I am going to take a closer look at it to see if that would be more efficient. If it works better, I will update my 6502 program to use that instead.


Last edited by bgaarsoe on Thu Dec 11, 2008 10:11 pm, edited 4 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Nov 30, 2008 9:37 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
bgaarsoe wrote:
I found this algorithm years ago and have it programmed into a 6502 circuit I built, and it works pretty well. I think it is called the Babylonia Algorithm. It uses simple math operations, and as long as the numbers are not really large, it converges to fairly accurate values with about 10 to 15 iterations. It also works for very large numbers, but the iteration count must be increased to reach the same level of accuracy.

Code:
  Algorithm Logic:

  SQRT n                                                                               
                                                                                       
  A = n                                                                               
  B = 1.0                                                                             
  I = Iterations                                                                     
                                                                                       
  while I != 0                                                                         
     B = ((A/B)+B)/2
     I = I - 1
  loop

  sqrt = B



.
.

Quote:
Thanks for posting the HP method. I am going to take a closer look at it to see if that would be more efficient. If it works better, I will update my 6502 program to use that instead.


Since it is basically a Newtonian aproximation, you'll do a lot
better with a better initial guess than 1

Most obvious, I guess, would be to halve the number of bits
and use that as an initial guess (since squaring a number
approximately doubles the number of bits)

You could also apply one of the other methods (I'd try Garth's)
to refine that a bit further before you started doing multibyte
divisions.

eg squrt 5000 took me about 10 iterations to get 12 digits with
an initial approximation of 1 but only 4 iterations with an initial
approximation of 64 (64 being the square root of 4096, the
largest power of two that's less than 5000)


And unless you've got hardware division or you're just trying to
reuse a division routine, Lee's way is probably best.

Here's an interesting derivation

http://codebase64.org/doku.php?id=base:fast_sqrt


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Hermes and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: