barrym95838 wrote:
malcontent,
Have you seen Michael J. Mahon's fast square root for Applesoft? It claims to be about twice as fast as the Newton-Raphson version mentioned in your blog, and about nine times as fast as the MS exp(log(x)/2) method. It doesn't look prohibitively large either.
http://michaeljmahon.com/USR.SQR.pdfMike B.
[Edit: He made his routine relocatable, but employed a bit of negative stack addressing to do it, which is not generally safe, except on something that doesn't use interrupts, like an old Apple 1 or 2.]
I've got an integer square root routine for my Forth for the C64 I wrote some time ago that some might find useful. It also uses shifts and subtraction. I wasn't sure if I should start a new topic in Forth, or here with this discussion of square root algorithms. BTW, it does not round the result. Here's the code:
Code:
HEX
CODE SQRT ( UD -- U )
TYA, 5 # LDY,
BEGIN, N ,Y STA, DEY,
0< UNTIL,
10 # LDY,
BEGIN,
2 ,X ASL, 3 ,X ROL,
0 ,X ROL, 1 ,X ROL,
N ROL, N 1+ ROL,
N 2+ ROL,
2 ,X ASL, 3 ,X ROL,
0 ,X ROL, 1 ,X ROL,
N ROL, N 1+ ROL,
N 2+ ROL,
N 3 + ASL, N 4 + ROL,
N 5 + ROL, SEC,
N 3 + ROL, N 4 + ROL,
N 5 + ROL, SEC,
N LDA, N 3 + SBC,
N 1+ LDA, N 4 + SBC,
N 2+ LDA, N 5 + SBC,
CS IF,
N LDA, N 3 + SBC, N STA,
N 1+ LDA, N 4 + SBC, N 1+ STA,
N 2+ LDA, N 5 + SBC, N 2+ STA,
N 3 + LDA, 2 # ORA, N 3 + STA,
THEN,
N 5 + LSR,
N 4 + ROR, N 3 + ROR,
DEY,
0= UNTIL,
N 3 + LDA, 2 ,X STA,
N 4 + LDA, 3 ,X STA,
POP JMP,
END-CODE
N is a scratch pad region in zero page from [N-1] to [N+7].
The code takes a 32 bit number ( a double number on this 16 bit Forth ) and returns a 16 bit result.