Hi!
BigEd wrote:
I thought this might be an interesting programming challenge, as it needs multi-byte precision, a little more precision I think than the usual floating point formats seen on 6502.
Quote:
In 1977, Isaac Asimov asked how many terms of the slowly converging series
π = 4 – 4/3 + 4/5 – 4/7 + 4/9 – …
would you have to sum before doing better than the approximation
The answer is known, as of a couple of years later, and you could easily look it up. In fact it's
here, where I found the question. But how much more interesting to embark on a calculation!
The quoted article goes on to say:
Quote:
If he had access to an Apple II at the time, it would have run at 1 MHz. My calculation took around a minute to run on a 2 GHz laptop, so I’m guessing the same calculation would have taken a day or two on an Apple II. This assumes he could find extended precision software on an Apple II, which is doubtful.
So, a day or two of calculation! A good test, perhaps, of an arithmetic library. Although not the smartest way to answer Asimov's question.
I know about Karatsuba's speedup for multi-digit multiplication, and I know about some table-driven approaches for fast short multiplication on 6502. And over
here I learn about Goldschmidt's algorithms which can be pressed into service for division. All the pieces are in place! (I'm tempted to think that fixed-point would be enough here: although the terms necessarily become ever smaller, the sum is going to be pi-sized.)
A much quicker question to answer, I imagine, would be how many terms are needed to improve on 22/7.
This is a good challenge.
First, there is an obvious way to calculate the required number of terms: as the series converges alternating signs, the error is always less than half the next term in the series. This means that the sum should be better when 1/2 * 4/(2*N-1) < 315/113 - PI, so 2*N-1 > 2/(315/113 -PI) = 7 497 258.185..., so N > 3 748 629.59... and the smallest N = 3 748 630.
Now, the slow answer, just calculate the sum until the result is closer to PI than 315/113. This code runs in an Atari 8-bit computer (so, available in 1979), it is a quick and dirty job using base 10 arithmetic to show the sum in real-time (as the numbers are stored in the screen).
Using 20 digit numbers, it finishes in 5 hours with 12 minutes, 28722673559 CPU cycles. The numbers printed are the current sum/4 , the last added/subtracted term, the value of "2*N-1" and the time in frames, NTSC Ataris have 59.92271 frames per second.
Attachment:
Screenshot from 2019-09-07 22-58-22.png [ 5.22 KiB | Viewed 1774 times ]
And the code:
Code:
org $2000
.zpvar num1, num2,ptr .word
.zpvar count,tmp1,tmp2 :4 .byte
digits = 20
; Shorten screen - makes faster updates
mwa 560 ptr
lda 20
w1 cmp 20
beq w1
ldy #10
lda 561
sta (ptr),y
dey
lda 560
sta (ptr),y
dey
lda #65
sta (ptr),y
; Setup pointers
mwa 88 num1
adw 88 #40 num2
lda #1
sta count
lda #0
sta count+1
sta count+2
sta count+3
sta 18
sta 19
sta 20
jsr set1_n1
do_calc
; Increment count
jsr inc2
jsr cinv_n2
jsr num_sub
; wait
jsr compare_low
bcc ready
jsr inc2
jsr cinv_n2
jsr num_add
; wait
jsr compare_hi
bcc ready
; Show N after a few iterations
lda count
and #$3E
bne do_calc
jsr show_count
; wait
jmp do_calc
ready
jsr show_count
; Loop forever
jmp *
; Low threshold number: 1/4 * (2*PI - 355/113)
low_thr:
.byte $10,$17,$18,$15,$13,$19,$18,$10,$19,$16 ; 0.785398096...
.byte $17,$10,$16,$14,$10,$11,$10,$14,$14,$10
.byte $11,$10,$10,$18,$12,$17,$15,$13,$15,$18
.byte $16,$16,$15,$14,$10,$19,$16,$19,$16,$15
; High threshold number: 1/4 * (PI - 355//113)
hi_thr:
.byte $10,$17,$18,$15,$13,$19,$18,$12,$13,$10 ; 0.785398230...
.byte $10,$18,$18,$14,$19,$15,$15,$17,$15,$12
.byte $12,$11,$12,$13,$18,$19,$13,$18,$10,$15
.byte $13,$10,$19,$17,$13,$14,$15,$11,$13,$12
; Check if num1 is > low
.proc compare_low
ldy #0
nxt
lda low_thr, y
cmp (num1), y
bcc ret
bne ret
iny
cpy #digits
bne nxt
ret rts
.endp
; Check if num1 is < high
.proc compare_hi
ldy #0
nxt
lda (num1), y
cmp hi_thr, y
bcc ret
bne ret
iny
cpy #digits
bne nxt
ret rts
.endp
.proc show_count
mva count tmp1
mva count+1 tmp1+1
mva count+2 tmp1+2
mva count+3 tmp1+3
lda #80
sta ptr
jsr show_number
again
ldx 20
stx tmp1
mva 19 tmp1+1
mva 18 tmp1+2
cpx 20
bne again
mva #0 tmp1+3
lda #100
sta ptr
jmp show_number
.endp
; Prints number in "tmp1"
.proc show_number
ldx #0
stx tmp2
stx tmp2+1
stx tmp2+2
stx tmp2+3
ldy #32
sed
tobcd
; Multiplies tmp2 by 2, in BCD
clc
lda tmp2
adc tmp2
sta tmp2
lda tmp2+1
adc tmp2+1
sta tmp2+1
lda tmp2+2
adc tmp2+2
sta tmp2+2
lda tmp2+3
adc tmp2+3
sta tmp2+3
; Extract one bit
asl tmp1
rol tmp1+1
rol tmp1+2
rol tmp1+3
; Adds 1 to tmp2, in BCD
txa
adc tmp2
sta tmp2
txa
adc tmp2+1
sta tmp2+1
txa
adc tmp2+2
sta tmp2+2
txa
adc tmp2+3
sta tmp2+3
dey
bne tobcd
cld
; Prints BCD
ldy ptr
lda tmp2+3
jsr print2
lda tmp2+2
jsr print2
lda tmp2+1
jsr print2
lda tmp2
print2
tax
lsr
lsr
lsr
lsr
ora #$10
sta (88),y
iny
txa
and #$0F
ora #$10
sta (88),y
iny
rts
.endp
.proc inc2
inc count
bne nxt
inc count+1
bne nxt
inc count+2
bne nxt
inc count+3
nxt
inc count
bne ret
inc count+1
bne ret
inc count+2
bne ret
inc count+3
ret
rts
.endp
; Adds num1 = num1 + num2
.proc num_add
ldy #digits-1
clc
sed
loop:
lda (num1), y
eor #$90
adc (num2), y
and #$0F
ora #$10
sta (num1), y
dey
bpl loop
cld
rts
.endp
; Subtract num1 = num1 - num2
.proc num_sub
ldy #digits-1
sec
sed
loop:
lda (num1), y
sbc (num2), y
and #$0F
ora #$10
sta (num1), y
dey
bpl loop
cld
rts
.endp
; Calculate num2 = 1/N
.proc cinv_n2
; Init REM to 0
lda #1
sta tmp1
lda #0
sta tmp1+1
sta tmp1+2
sta tmp1+3
; Start loop
ldy #0
lda #16
sta (num2),y
iny
loop_dig
; Multiply remainder by 10 and add new digit = 0
jsr mul10
; Divide by N
ldx #0
loop_div
inx
lda tmp1
sec
sbc count
sta tmp1
lda tmp1+1
sbc count+1
sta tmp1+1
lda tmp1+2
sbc count+2
sta tmp1+2
lda tmp1+3
sbc count+3
sta tmp1+3
bpl loop_div
; Restore last value
lda tmp1
clc
adc count
sta tmp1
lda tmp1+1
adc count+1
sta tmp1+1
lda tmp1+2
adc count+2
sta tmp1+2
lda tmp1+3
adc count+3
sta tmp1+3
dex
txa
ora #16
sta (num2),y
iny
cpy #digits
bne loop_dig
rts
.endp
.proc mul10
mva tmp1+1 tmp2+1
mva tmp1+2 tmp2+2
mva tmp1+3 tmp2+3
lda tmp1
sta tmp2
asl ; *2
rol tmp1+1
rol tmp1+2
rol tmp1+3
asl ; *4
rol tmp1+1
rol tmp1+2
rol tmp1+3
adc tmp2 ; *5
sta tmp1
lda tmp1+1
adc tmp2+1
sta tmp1+1
lda tmp1+2
adc tmp2+2
sta tmp1+2
lda tmp1+3
adc tmp2+3
asl tmp1 ; *10
rol tmp1+1
rol tmp1+2
rol
sta tmp1+3
rts
.endp
.proc set1_n1
; Set NUM1 = 1
ldy #digits-1
lda #16
set0
sta (num1),y
dey
bne set0
lda #17
sta (num1),y
rts
.endp
Have Fun!