Thanks for the link.
Somewhere else I found a Forth version of the Sieve of Eratosthenes that used a search range of 100. I downloaded one
program, but it didn't work at all. The second
program that I downloaded yielded the same results as those given for a
Fortran 90 version of the sieve after I modified the search range.
I had downloaded and read that paper earlier last evening and decided that the search range was probably bigger than my little board could support with internal memory.
I determined that for a $D000 fig-FORTH location, running the sieve with that large a search range eats the configuration. However, I found that for a $0400 fig-FORTH location, there is sufficient memory to allocate the flag array.
With the search range set to 16384 and 10 iterations, the 14.7456 MHz M65C02A executes the sieve benchmark (using the first working program) in a stop watch time of approximately 17.5 seconds. (Note: the program linked to by BigEd uses a value of 8190, but it takes one optimization and that is that 2 is a prime and all its multiples may be ignored. Thus, the search space is reduced by half, i.e. the maximum number of cells required is 8190.)
The time measured is
10.85x faster than the Apple II time given in the reference. Examining the differences between the sieve program that I've been using and the one given in the reference, I found that the program I used included an unnecessary multiplication. The program provided in the reference used the FORTH idiom "
DUP +" to perform a multiply by 2 rather than using "
*". With this change, the stop watch time for 10 iterations is approximately 10.5 seconds rather than 17.5 seconds. (A substantial improvement.)
The second measured time is
18.1x faster than the Apple II time given in the reference. The
65CE02 datasheet indicates that it is roughly 25% faster than a standard 65C02 because it eliminates the dead instruction/memory cycles of the 65C02. The M65C02A also eliminates all dead instruction/memory cycles like the 65CE02. At a base frequency of 14.7456 MHz, a 25% improvement in execution time due to dead cycle elimination would yield an effective clock rate of 18.432 MHz. This result is very close to the calculated speed difference of 18.1x between the 1 MHz Apple II 6502 Sieve benchmark and the 14.7456 MHz M65C02A Sieve benchmark.
I had contended that previously contended that the M65C02/M65C02A cores would show an improvement over a standard, cycle accurate core implementation, but had been unable to find a way to measure that difference with a commonly available program. This little exercise has allowed that feature of the M65C02/M65C02A cores to be demonstrated. All in all a good exercise.
Thanks BigEd.
Now on to adding the planned FORTH VM support instructions to the M65C02A. Before closing this post, I've included the source for the two FORTH Sieve of Eratosthenes below:
Code:
: 2DROP drop drop ;
16384 2 / CONSTANT maxp
: SIEVE ( -- n )
HERE maxp 1 FILL
1 ( count, including 2 )
maxp 0 DO
I HERE + C@ IF
I 2 * 3 + ( dup . ) DUP I + ( prime current )
BEGIN DUP maxp U<
WHILE 0 OVER HERE + C!
OVER +
REPEAT
2DROP 1+
THEN
LOOP ;
: PRIMES ." S" 10 1 do sieve sp! loop sp! ." E" ;
Code:
8192 constant size
0 variable flags size 1 - allot
: sieve
flags size 1+ 1 fill
0 size 0
do flags i + c@
if i dup + 3 + dup i +
begin dup size <
while 0 over flags + c! over + repeat
drop drop 1+
then
loop ;
: primes ." S" 10 1 do sieve sp! loop sp! ." E" ;