As many of you know, Apple Pi is an Apple ("Integer") BASIC program that calculates pi to (up to) 1000 digits (using Machin's formula pi/4 = 4*arctan(1/5) - arctan(1/239)), was written by the late Bob Bishop, and appeared in the Aug/Sep 1978 issue of Micro, available here at 6502.org:
http://www.6502.org/documents/publicati ... g_1978.pdfApple Pi wasn't particularly fast (it took more than 40 hours to calculate pi to 1000 digits) and not much optimization was attempted. (For example, in Integer BASIC shorter variable names are faster, but rather than use one character variable names, longer, more descriptive names were used.)
In 1982 the late Glen Bredon (author of the Merlin assembler) ported Apple Pi to assembly. This port came with Merlin as an example program for the Merlin linker. (It was split into 5 source code files, plus 1 additional file of macro definitions.)
There were differences between the original BASIC program and the assembly port, including some optimizations. Some differences were minor; for example, the assembly port formatted digits neatly into columns, but the BASIC program simply output digits. Other differences were significant; for example, the assembly port divided by 57127 (=239*239) using 16-bit unsigned arithmetic, rather than dividing by 239 twice (as the BASIC program did, since Integer BASIC uses signed arithmetic and 239*239 does not fit into a signed 16 bit integer). Another difference was that the BASIC program uses two (decimal) digits per byte when calculating more than 200 digits, but the assembly program always uses 1 digit per byte. One added feature that the assembly port had was to find the first nonzero digit when summing the series. (Since the terms get smaller and smaller, they have more and more leading zeros as the calculation proceeds.)
So with the caveat that this is not a direct comparison between interpreted BASIC and assembly, but instead between BASIC and optimized assembly, how much faster is the assembly implementation than the BASIC implementation? Here are the results I measured on an Apple II (at ~1.02 MHz):
Code:
assembly:
100 digits: 3 seconds
200 digits: 9 seconds
250 digits: 14 seconds
400 digits: 33 seconds
1000 digits: 194 seconds = 3:14
BASIC:
10 digits: 16 seconds
100 digits: 788 seconds = 13:08
250 digits: 9266 seconds = 2:34:26
(I wasn't able to summon the enthusiasm for waiting out a 40+ hour calculation in BASIC.)