Page 2 of 2

Re: the fast realization of π-spigot algorithm

Posted: Mon Aug 01, 2016 5:07 pm
by litwr
We have missed viewtopic.php?f=2&t=2239. :roll: It contains the very curious results. Much cheaper 6502 outperforms 6800. :shock: It is more curious to compare 6502 with 6809... Of course, 6809 should be faster but is it only slightly faster or much faster?
Thanks to datajerk! :)
His results also shows than the Machin's formula algorithm is more than 2 times faster than the spigot.
BTW my results indicate that 65816 in 16-bit mode maybe only up to 50% faster than 6502 in the most cases.

Re: the fast realization of π-spigot algorithm

Posted: Tue Oct 18, 2016 4:23 pm
by litwr
Pi-pack 18 is just released - http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html. It contains the newer faster versions for 6502 based machines and a version for 6809 based Dragon-32/64. So it is possible to compare 6809 and 6502 at their edges. The results show that 6809 at 1.78 MHz matches 6502 at 2 MHz. The second accumulator gives a big advantage to 6809. However (IMHO) 6809 was overestimated. A lot of its instructions look clumsy. Its 16-bit index registers are slow with 8-bit ALU, so 6502 is faster with the tables. For example, pi-spigot requires 16-bit multiplication by constant 10000 with 32-bit result. 6809 with hardware 8-bit multiplication is slower than 6502 with 768 bytes table for this task. It was shocking that 6809 instruction to move one register to another requires 6 (!) ticks. 6809 uses Big-endian byte order and it makes it slower for addition or subtraction. 6809 requires more cycles to work with memory than 6502...
6309 which has 4 accumulators, hardware division and faster instruction execution timing should be the fastest for 8-bit architecture.

Re: the fast realization of π-spigot algorithm

Posted: Tue Oct 18, 2016 4:31 pm
by BigEd
Interesting findings. It does seem clear to me that little-endian makes most sense, but one still sees the opposite opinion now and again.

We probably shouldn't be too surprised that a table approach beats the hardware multiplier, but it's a bit of a shame that the multiplier isn't faster.