6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Sep 22, 2024 8:48 am

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Aug 05, 2024 2:33 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Lots of macro assembly and self-modifying code in this project, also lots of optimisations, large and small. It's been a blast.

For some months now I've been thinking about, then eventually collaborating on, a pi program. Dave/hoglet is my team-mate on this, and has (as ever) done most of the actual coding. It's been working, and delivering great results, for a while now, although we can't yet say it's finished.

FAQ:
Do we need another pi program? Yes, of course.
Is this one interesting or different? Yes, indeed.
Is there a silent video with minimal information? Yes, look here.
Is this open source? Of course, repo here.
What does it look like? Like this:
Attachment:
File comment: screenshot, Mode 7 on a Beeb, 800 digits of pi, in just under 54 seconds
Bellard-Beeb-800-digits-54-seconds.png
Bellard-Beeb-800-digits-54-seconds.png [ 41.99 KiB | Viewed 798 times ]


Some full writeups are being posted in this thread over on Stardot:
A tale of two spigots - more digits of pi, and faster
You'll find tabulations, code snippets, a chronology, links, images and even videos over there.

A few highlights, then, in case you haven't already clicked on that link:
- it's a pi spigot, which means it starts producing digits at once - you don't have to wait for the calculation to finish before seeing anything
- written in 6502 assembly, using BeebAsm, with lots of macros for loop unrolling and elaborating variations of code
- Edit to add: we also used Owlet, for early Basic experiments, Beebjit, for accelerated accurate emulation, b2 similarly, and hoglet's 6502Decoder for debugging and profiling. For later and longer runs, we used PiTubeDirect as a very fast 6502 second processor
- almost all the time is in long division, of a bignum by a modest multi-byte divisor, and accumulation of the quotient into a sum
- we have three or four division routines, for different lengths of divisor, and each is patched in place (self-modifying code) for the specific divisor
- the accumulation of terms by addition or alternatively by subtraction is handled by patching in place
- we're using a spigotised version of the 4-term BBP formula (for slightly greater capacity) or the 7-term Bellard formula (for much better speed.)
- the code is written for the BBC Micro, with or without 6502 second processor, is around 8k in size, but need do nothing more OS-specific than print a character (and read the system time.)
- all available RAM can be used for the pi calculation - we can get a little better than 2 digits of pi for each free byte of RAM
- we have a visualisation mode which places the pi calculation workspace in screen memory so you can see what's going on. (Example video here)
- on a BBC Micro we can readily fit a 50000 digit calculation, with a second processor we can push it to 135000. That's using nearly 55k of RAM for the bignums
- on a 2MHz 6502 we can compute 100 digits in under a second, 1000 digits in under a minute and a half, 3000 digits in under a quarter of an hour (like almost all pi calculations, it's quadratic in the number of digits)

Previous threads relating to pi calculations:
the fast realization of π-spigot algorithm
Calculating Pi: HP-41C versus Apple ][
A program for today
Apple Pi: BASIC vs. assembly
computing pi with square roots / RPN calculator build

It's worth noting that a spigot algorithm, which generates successive digits in a very user-friendly way, is not the fastest way to compute pi, on a 6502 or elsewhere. But among spigots, ours seems to be very fast. Also, I believe our program uses less RAM than previous spigots, and less RAM than other means of calculating pi. So our program can compute more digits on any given machine than previous methods - presuming we're only using RAM, and not storage.

Edit: here's a glimpse of some of the code - it's patched in place, so you see immediates which are not in fact constants. This code runs very many times, and versions of this code account for almost all the runtime. This is the listing as output by the assembler, so you don't see the comments or the symbolic names for things:
Code:
.bit_loop_next
     269F   A2 00      LDX #&00
     26A1   E4 67      CPX &67
     26A3   90 13      BCC &26B8
     26A5   D0 27      BNE &26CE
.skip_br
     26A7   A2 00      LDX #&00
     26A9   E4 66      CPX &66
     26AB   90 0B      BCC &26B8
     26AD   D0 1F      BNE &26CE
.skip_br
     26AF   A2 00      LDX #&00
     26B1   E4 65      CPX &65
     26B3   90 03      BCC &26B8
     26B5   D0 17      BNE &26CE
.skip_br
     26B7   18         CLC
.do_subtract
     26B8   AA         TAX
     26B9   A5 65      LDA &65
     26BB   E9 00      SBC #&00
     26BD   85 65      STA &65
     26BF   A5 66      LDA &66
     26C1   E9 00      SBC #&00
     26C3   85 66      STA &66
     26C5   A5 67      LDA &67
     26C7   E9 00      SBC #&00
     26C9   85 67      STA &67
     26CB   8A         TXA
     26CC   09 20      ORA #&20


Edit: here's some corresponding source, from the repo
Code:
.modify

;       Unroll the bit loop
;       FOR J%=0 TO 7
FOR j,0,7

;       IF T%>=D%
FOR i,bytes-1,0,-1
        LDX     #&00         ; operand is modified dynamically
        CPX     temp+i       ; X is used so as not to corrupt A
; Shortcut the first BCC/BNE (optimization suggested by profiling)
IF i=bytes-1 AND j>=COMP_OPT_THRESHOLD
        BEQ     skip_br
ENDIF
        BCC     do_subtract
        BNE     bit_loop_next
.skip_br
NEXT
        CLC
;       T%=T%-D%:B%=B%+1
.do_subtract
        TAX                  ; save A
FOR i,0,bytes-1
        LDA     temp+i
        SBC     #&00         ; operand is modified dynamically
        STA     temp+i
NEXT
        TXA                  ; restore A
        ORA     #(&80 >> j)  ; dynmically modified (e.g. ORA #&40 for add, AND #&BF for subtract)

;       NEXT J%
.bit_loop_next
NEXT


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 06, 2024 5:05 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 844
Location: Potsdam, DE
(As an aside, did I not see that someone has discovered a new way of generating Pi recently? I think one of the mathematical guys on Youtube covered it.)

Neil


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 06, 2024 6:22 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
I think my aim here was two-fold - first, to share our story a little more widely, as I think it's a really interesting 6502 adventure, and showcases some really nice tools, and second, perhaps to initiate some discussion on two fronts:

- about the use of macros, as a way of making quite a large executable from quite a compact and maintainable source
- about live-patching of code (aka self-modifying code) as a way of seriously improving the performance and also improving the density of the executable

Of course the bignum arithmetic is another good topic for this forum. It so happens that we only needed long division and accumulation, and it was advantageous to fuse those into one, unrolled, replicated, routine.

However, I appreciate that it's quite an arcane topic, and a seriously involved implementation, which took quite some time and ingenuity to put together - so it's not an easy thing to start talking about.

(To answer your aside, I suspect you're thinking of this. As a counter-offering, the video which started this adventure of ours is this one, mentioned by Dave in this thread.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2024 1:49 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 143
Location: Lake Tahoe
Very interesting post, Ed. Lot's of good information here - looking forward to digging deeper into this.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2024 10:54 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Thanks! I'm sure we'll post at least one more update over on stardot, but it feels (presently, to me) like we've nearly reached an end point on this particular project.

Edit: there are many other ways to compute digits of pi, and I wouldn't be at all surprised to be investigating one or more of them in future.

Edit to add, from downthread:
Quote:
For those interested in this kind of bignum adventure on 6502, another related thread and program is Bruce's
500+ digits of e
which references Wozniak's 1981 article and project The Impossible Dream: Computing e to 116,000 Places with a Personal Computer.

(I find it interesting to note that e is a much more amenable number than pi, even though both are transcendental, and I find it marvellous that we've managed to compute even more digits, despite this.)


Last edited by BigEd on Sat Aug 10, 2024 5:32 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2024 5:45 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
BigEd wrote:
I'm sure we'll post at least one more update over on stardot, but it feels (presently, to me) like we've nearly reached an end point on the project.

When it comes to computing π, there will never be an end point.  :shock:

Quote:
Edit: there are many other ways to compute digits of pi, and I wouldn't be at all surprised to be investigating one or more of them in future.

π is an immutable constant.  At the risk of sounding somewhat snide, if the value of π is needed, it can be entered into the constants part of a program, or by pressing the π key on one’s pocket calculator.  Not to diminish in any way the efforts of those who are constantly searching for a new way to compute a constant to a level of precision far beyond that needed in all but a vanishingly-small number of situations, but the fascination with this completely escapes me.  It almost seems to be the mathematical analog of repeatedly reinventing the wheel, an endlessly circular process, as it were.

That said, I do find some of the algorithms interesting and quite ingenious.

BTW, my calculator says π = 3.1415926535897932384626433832795.  I’m always amused when I am computing something that uses π as a factor that an irrational number expressed to 32 decimal places is being used to produce a result that only needs to be expressed to 2 or 3 places.  :D  It’s sort of like using a hydrogen bomb to demolish a doghouse.  :P

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2024 6:16 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 844
Location: Potsdam, DE
Well, 355/133 puts the earth's orbit only about ninety miles out of place...


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2024 6:43 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
barnacle wrote:
Well, 355/133 puts the earth's orbit only about ninety miles out of place...

Er... make that 355/113 (not 133).  It's actually even better than that:  Less than 8 miles, out of 93,000,000.  The orbit is not perfectly round though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2024 8:19 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 844
Location: Potsdam, DE
Ooops, typo.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 10, 2024 5:23 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
barnacle wrote:
Well, 355/133 puts the earth's orbit only about ninety miles out of place...

Geesh! The sun will be too close and we will all end up looking like a brisket of beef as done in a hot Texas smoker!

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 10, 2024 5:31 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
For those interested in this kind of bignum adventure on 6502, another related thread and program is Bruce's
500+ digits of e
which references Wozniak's 1981 article and project The Impossible Dream: Computing e to 116,000 Places with a Personal Computer.

(I find it interesting to note that e is a much more amenable number than pi, even though both are transcendental, and I find it marvellous that we've managed to compute even more digits, despite this.)

(I'll pop this note into an earlier post in this thread, in the hope of reaching future explorers interested in this kind of adventure.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 10, 2024 5:42 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
BigEd wrote:
For those interested in this kind of bignum adventure on 6502, another related thread and program is Bruce's
500+ digits of e
which references Wozniak's 1981 article and project The Impossible Dream: Computing e to 116,000 Places with a Personal Computer.

Just think!  If you try to display that on an 80 × 25 text screen, you’ll have only 114,000 digits left to display after you fill the screen.  :shock:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 10, 2024 6:43 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 844
Location: Potsdam, DE
Scroll, man, scroll! (Though the question is: sideways or upwards?)
:p


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 10, 2024 5:49 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
barnacle wrote:
Scroll, man, scroll! (Though the question is: sideways or upwards?)
:p

These days for me it’s sideways, anyways, always.  :twisted:

Seriously...sort of, I wonder how long it would take to calculate e to 116,000 places on a Commodore 64, in BASIC?  Would a teenager live long enough to see a result?  Would his children live long enough to see a result?  :?

Oh, never mind.  A C-64 can’t represent anything to 116,000 places.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 11, 2024 10:32 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
BigDumbDinosaur wrote:
Oh, never mind.  A C-64 can’t represent anything to 116,000 places.

If a 48K Apple ][ can do it, shouldn't a 64K C-64 be able to do it with some sleight-of-hand (maybe only in ML)?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 31 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: