Happy Pi Day

Martin_H · Post by **Martin_H** » Wed Mar 11, 2026 3:02 pm

To celebrate I ported my 16-bit Forth program to 65816-assembly but upgraded it to calculate Pi to 32 bits of precision with math routines are from the 0f816 code base. Like the Forth program, it uses the Nilakantha infinite series. Described in C syntax n starts at 2 and iterates to the desired precision.
pi = 3 + 4/((n)*(++n)*(++n)) - 4/((n)*(++n)*(++n)) + ...

Tl;Dr After 366 iterations my program generated that Pi equals the ratio 1,686,629,694 / 536,870,912 which equals 3.14159262. This is smaller than Pi by ~0.0000000338. More information and raw output below:

Code: Select all

READY
>S  .0001.0002.0003.0004.0005.0006.0007.0008.0009.0010.0011.0012.0013.0014.0015.0016.0017.0018.0019.0020.0021.0022.0023.0024.0025.0026.0027.0028.0029.0030.0031.0032.0033.0034.0035
READY
>j
Enter Address  BB:AAAA 00:0300
Pi = 6487ED3E / 20000000
N = 0000016E

Source code on Git Hub: https://github.com/Martin-H1/65816/blob ... les/pi.asm

Here's the cut and pasted code for people who don't want to follow the link.

Code: Select all

; Calculate pi using the Nilakantha infinite series. While more complicated
; than the Leibniz formula, it is fairly easy to understand, and converges
; on pi much more quickly.

; The formula takes three and alternately adds and subtracts fractions with
; a numerator of 4 and denominator that is the product of three consecutive
; integers. So each subsequent fraction begins its set of integers with the
; highest value used in the previous fraction.

; Described in C syntax n starts at 2 and iterates to the desired precision.
; pi = 3 + 4/((n)*(++n)*(++n)) - 4/((n)*(++n)*(++n)) + ...

; Here's a three iteration example with an error slightly more than 0.0007.
; pi = 3 + 4/(2*3*4) - 4/(4*5*6)
;        + 4/(6*7*8) - 4/(8*9*10)
;        + 4/(10*11*12) - 4/(12*13*14)
;    = 3.14088134088

.include "ascii.inc"
.include "common.inc"
.include "math32.inc"
.include "print.inc"

; Normally this requires floating point arithmetic, but we're using fixed point
; unsigned arithmetic with 3 integer bits, and 29 fractional bits. 

;
; Aliases
;
THREE	= %0110000000000000
FOUR	= %1000000000000000
RESCALE	= %0010000000000000
CELL = 4
DS_SIZE = CELL * 10

.proc pi
	ON16MEM
	ON16X
	printcr			; start output on a newline

	phd			; save direct page register

	tsc			; get current stack pointer.
	sec
	sbc #DS_SIZE		; reserve data stack workspace
	tcs
	tcd			; direct page now points to stack
	ldx #DS_SIZE		; point X to data stack top

	lda #THREE		; initialize the sum on the data stack
	ldy #0000
	jsr _pushay

@while:	jsr calc_term
	lda #00
	ldy #00
	jsr _pushay
	jsr _stest32
	beq @done
	jsr _popay		; pop unused zero
	jsr _add32		; add term to the sum
	bra @while

@done:	jsr _popay		; remove the two unneeded zeros
	jsr _popay

	print msg1
	jsr _popay		; pop the results

	printc
	tya
	printc

	print msg2
	lda #RESCALE
	printc
	lda #0000
	printc
	printcr

	print msg3
	lda N+2
	printc
	lda N
	printc
	printcr

@cleanup:
	tsc			; clean up stack locals
	clc
	adc #DS_SIZE
	tcs
	pld			; restore direct page pointer
	rtl

.endproc
msg1:	.asciiz "Pi = 0x"
msg2:	.asciiz " / 0x"
msg3:	.asciiz "N = 0x"
N:	.word 2, 0

; calc_term: calculates Qn - Qn+1
; Inputs:
;   X - data stack index
; Outputs:
;   C - clobbered
;   X - data stack index updated with Qn - Qn+1 on data stack
;   Y - clobbered
.proc calc_term
	jsr quotient
	jsr quotient
	jsr _sub32
	rts
.endproc

; quotient: calculates a single scaled quotient term
; Inputs:
;   X - data stack index
; Outputs:
;   C - clobbered
;   X - data stack index updated with quotient on data stack
;   Y - clobbered
.proc quotient
	ldy #00			; numerator of fixed point four
	lda #FOUR
	jsr _pushay
	ldy #00			; scratch space for routine.
	lda #00			; ditto
	jsr _pushay
	jsr denominator		; calculate N*++N*++N
	jsr _udivmod32
	jsr _swap
	jsr _popay		; discard remainder
	rts
.endproc

; denominator: calculates (n)*(++n)*(++n)
; Inputs:
;   memory N
;   X - data stack index
; Outputs:
;   C - clobbered
;   X - data stack index updated with product on data stack
;   Y - clobbered
.proc denominator
	ldy N
	lda N+2
	jsr _pushay		; push N
	inclong N
	ldy N
	lda N+2
	jsr _pushay		; push ++N
	jsr _umult32		; ++n * n
	jsr _popay		; discard unused 32 bits of 64 bit result
	inclong N
	ldy N
	lda N+2			; push ++N
	jsr _pushay
	jsr _umult32		; ++n * first product
	jsr _popay		; discard unused 32 bits of 64 bit result
	rts
.endproc

J64C · Post by **J64C** » Fri Mar 13, 2026 10:54 am

13/3 is Pi day is it?

Martin_H · Post by **Martin_H** » Fri Mar 13, 2026 1:36 pm

J64C wrote:

13/3 is Pi day is it?

In Europe for smaller values of Pi. But a better Pi Approximation Day is on July 22 (22/7 in the day/month format).

Also, tomorrow I'm going to post my bonus Pi Day project which will be hard to top next year.

In the meantime, I'm blogging about Vibe coding a Forth interpreter with Claude in the Forth sub-forum:
viewtopic.php?f=9&t=8577

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Mar 13, 2026 4:25 pm

I actually prefer cake instead of pi(e) for dessert.

barnacle · Post by **barnacle** » Fri Mar 13, 2026 6:42 pm

Surely, the 31st of April... for countries with a sensible date system

(I can't help it if the calendar got messed around with by the Romans!)

Neil

Martin_H · Post by **Martin_H** » Sat Mar 14, 2026 2:54 am

I updated the title because it's already Pi across the pond and will be in New England shortly. So, here's my bonus project.

I had an epiphany after porting my Forth Pi program to 65816 assembly. I originally wrote it for a 16-bit Forth system and assumed that in the fixed-point calculation size. But Forth is portable and allows programs to determine the host's cell size. That means I could make slight changes and if I ran it on a 32-bit system, it would automatically increase the fractional component's size. But running it on a 64-bit system it would allow crazy high precision! Note the "cell 8 * 3 -" below which is the heart of the change:

Code: Select all

\ normally this requires floating point arithmetic, but we're using fixed point
\ multiplying by a scaling factor and returned as a ratio. We'll use 3 integer
\ bits and the remainder of the cell for fractional bits.

1      	       		 \ Bit to shift to create rescale factor
cell 8 * 3 -   		 \ Calculate the number of fractional bits
lshift constant rescale	 \ Use rescale to create fixed point constants.
rescale 3 * constant three
rescale 4 * constant four

Running it on Gforth on my laptop produces this output:

Code: Select all

print_pi
Pi = 7244019458077115758 / 2305843009213693952
N = 123145294410368
 ok

That ratio evaluates to 3.1415926535897901661006925699837 which is 3.072e-15 smaller than Pi! Also, the iteration count N is insane. At this point the only way I can improve my accuracy is to find a 128 bit Forth system or create an arbitrary precision arithmetic library and go to 128 bits.

Full modified code is available on GitHub: https://github.com/Martin-H1/Forth-CS-1 ... ster/pi.fs

drogon · Post by **drogon** » Sat Mar 14, 2026 9:50 am

Maybe a Pi challenge....

Although I suspect the good folks over at stardot will beat just about anyting.... This is my '816 system generating the first 1000 digits. The program is wrtten in BCPL and takes almost 500 seconds to run on my 16Mhz system:

Code: Select all

Time taken: 496.107... pi = 3.+
1415926535 8979323846 2643383279 5028841971 6939937510
5820974944 5923078164 0628620899 8628034825 3421170679
8214808651 3282306647 0938446095 5058223172 5359408128
4811174502 8410270193 8521105559 6446229489 5493038196
4428810975 6659334461 2847564823 3786783165 2712019091
4564856692 3460348610 4543266482 1339360726 0249141273
7245870066 0631558817 4881520920 9628292540 9171536436
7892590360 0113305305 4882046652 1384146951 9415116094
3305727036 5759591953 0921861173 8193261179 3105118548
0744623799 6274956735 1885752724 8912279381 8301194912
9833673362 4406566430 8602139494 6395224737 1907021798
6094370277 0539217176 2931767523 8467481846 7669405132
0005681271 4526356082 7785771342 7577896091 7363717872
1468440901 2249534301 4654958537 1050792279 6892589235
4201995611 2129021960 8640344181 5981362977 4771309960
5187072113 4999999837 2978049951 0597317328 1609631859
5024459455 3469083026 4252230825 3344685035 2619311881
7101000313 7838752886 5875332083 8142061717 7669147303
5982534904 2875546873 1159562863 8823537875 9375195778
1857780532 1712268066 1300192787 6611195909 2164201989

(The same code takes just 3 seconds on my 250Mhz RISC-V system - same binary code. Such is progress.)

-Gordon

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Mar 14, 2026 11:24 am

Martin_H wrote:

I updated the title because it's already Pi across the pond and will be in New England shortly.

Here in the cow pastures southwest of Chicago, there is PI in the sky.

Speaking of PI, it’s been a while since I’ve done any baking. Perhaps I should fire up the ole oven...

By the way, the reason Sir Cumference had such an enormous waistline was because of his fondness for PI.

Uh...I’ll get my coat and quietly exit.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Mar 14, 2026 11:27 am

barnacle wrote:

Surely, the 31st of April... for countries with a sensible date system

Go home, Neil. Yer drunk!

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Mar 14, 2026 11:31 am

drogon wrote:

Maybe a Pi challenge....

Although I suspect the good folks over at stardot will beat just about anyting.... This is my '816 system generating the first 1000 digits. The program is wrtten in BCPL and takes almost 500 seconds to run on my 16Mhz system...

Makes me wonder how much faster that would go if it were written in assembly language. I suspect overhead in interpreting the BCPL code has to account for more than a few of those 500 seconds.

drogon · Post by **drogon** » Sat Mar 14, 2026 11:41 am

BigDumbDinosaur wrote:

drogon wrote:

Maybe a Pi challenge....

Although I suspect the good folks over at stardot will beat just about anyting.... This is my '816 system generating the first 1000 digits. The program is wrtten in BCPL and takes almost 500 seconds to run on my 16Mhz system...

Makes me wonder how much faster that would go if it were written in assembly language. I suspect overhead in interpreting the BCPL code has to account for more than a few of those 500 seconds.

An awful lot. (of overhead). But the program was written quickly. How long to write it in assembler?

Please feel free to re-write it and let us know how much faster it is: Here is the source code:

https://unicorn.drogon.net/pi.b.txt

It's nothing more than 4 * arctan (1) using arbitrary precision arithmetic.

-Gordon

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Mar 14, 2026 2:04 pm

drogon wrote:

BigDumbDinosaur wrote:

drogon wrote:

Maybe a Pi challenge....

Although I suspect the good folks over at stardot will beat just about anyting.... This is my '816 system generating the first 1000 digits. The program is wrtten in BCPL and takes almost 500 seconds to run on my 16Mhz system...

Makes me wonder how much faster that would go if it were written in assembly language. I suspect overhead in interpreting the BCPL code has to account for more than a few of those 500 seconds.

An awful lot. (of overhead). But the program was written quickly. How long to write it in assembler?

Dunno how long it would take to write in assembly language. I guess that would depend on what sort of math facilities one has at one’s disposal. I have a 64-bit integer math library, but that isn’t going to be of much help here—also, it doesn’t do transcendental stuff. At some point, I will look at leveraging the library to do scaled-integer math, but that is a ways off.

Quote:

Please feel free to re-write it and let us know how much faster it is: Here is the source code:

Thanks! I’ll get on it as soon as I have finished the scaled-integer math library and written the transcendental stuff to go with it.

Martin_H · Post by **Martin_H** » Sat Mar 14, 2026 2:46 pm

BigDumbDinosaur wrote:

Uh...I’ll get my coat and quietly exit.

With puns like that I was hoping you'd be here all week.

@Gordon, thanks for the details. I was going to ask how you achieved that many digits, as I was suspecting the Spigot algorithm. I'd considered using that to avoid arbitrary precision arithmetic. But the hexadecimal to decimal conversion seems like a harder problem. BTW Google offered to translate your BCPL program from Danish to English.

drogon · Post by **drogon** » Sat Mar 14, 2026 3:46 pm

BigDumbDinosaur wrote:

drogon wrote:

BigDumbDinosaur wrote:

drogon wrote:

Maybe a Pi challenge....

Although I suspect the good folks over at stardot will beat just about anyting.... This is my '816 system generating the first 1000 digits. The program is wrtten in BCPL and takes almost 500 seconds to run on my 16Mhz system...

Makes me wonder how much faster that would go if it were written in assembly language. I suspect overhead in interpreting the BCPL code has to account for more than a few of those 500 seconds.

An awful lot. (of overhead). But the program was written quickly. How long to write it in assembler?

Dunno how long it would take to write in assembly language. I guess that would depend on what sort of math facilities one has at one’s disposal. I have a 64-bit integer math library, but that isn’t going to be of much help here—also, it doesn’t do transcendental stuff. At some point, I will look at leveraging the library to do scaled-integer math, but that is a ways off.

Quote:

Please feel free to re-write it and let us know how much faster it is: Here is the source code:

Thanks! I’ll get on it as soon as I have finished the scaled-integer math library and written the transcendental stuff to go with it.

You don't need scaled integers - the code does it all for you. All you need is simple manipulation of 32-bit quantities. Plus, minus, multiply, divide and modulo. There is no floating point, no trig. involved. The code evaluates arctan using integers.

Using Störmer’s formula:

Code: Select all

π = 24 arctan(1/8) + 8 arctan(1/57) + 4 arctan(1/239)

The code uses 3 arrays to maintain the calculations. Each array element entry packs 8 decimal digits per 32-bit word, so using 3 banks of 64K per array you can calculate up to half a million digits... given sufficient time...

All good stuff, and of-course there are many other ways to calculate π ...

-Gordon

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Mar 14, 2026 5:33 pm

Martin_H wrote:

BTW Google offered to translate your BCPL program from Danish to English.

Danish to English, eh? So much for artificial “intelligence.” Makes you wonder what Google would do if fed some 6502 assembly language or worse yet, a large program written in Forth.

Happy Pi Day

Happy Pi Day

Re: Happy Pi Day Three Days Early

Re: Happy Pi Day Three Days Early

Re: Happy Pi Day Three Days Early

Re: Happy Pi Day Three Days Early

Re: Happy Pi Day

Re: Happy Pi Day

Re: Happy Pi Day

Re: Happy Pi Day Three Days Early

Re: Happy Pi Day

Re: Happy Pi Day

Re: Happy Pi Day

Re: Happy Pi Day

Re: Happy Pi Day

Re: Happy Pi Day