Page 12 of 14
Re: slightly OT: a simple Benchmark
Posted: Sun Jul 15, 2018 7:39 pm
by GaBuZoMeu
Hmm, aren't there some initializations necessary? Else C% would be set to 0 and D%=DIVS%(1)=0 would cause a DIV_BY_0 error.
Once a prime is found it should went into DIVS%(J) I assume?
Cheers,
Arne
Re: slightly OT: a simple Benchmark
Posted: Sun Jul 15, 2018 7:50 pm
by Chromatix
Yes, some details were omitted for brevity. See my latest C code for how to handle the DIVS% array.
To initialise the wheel array:
Code: Select all
J%=1
FOR I%=1 TO 2*3*5*7 STEP 2
IF (I% MOD 3)*(I% MOD 5)*(I% MOD 7) THEN WHEEL%(J%)=I% : J%=J%+1
NEXT
Re: slightly OT: a simple Benchmark
Posted: Sun Jul 15, 2018 11:33 pm
by GaBuZoMeu
OK, here are the numbers (relativ to the "classic" approach):
MS-Basic
A: 33,1 = 70,9%
B: 50,3 = 67,0%
C: 546,6 = 67,3%
D: 1377,1 = 68,4%
EhBasic:
A: 14,4 = 124%
B: 22,2 = 115%
C: 240,2 = 95,1%
D: 604,4 = 91,2%
E: 1107,8 = 88,1%
At least this elaborate approach yields an effect even with EhBasic. The results A and B are victims of the tables setup which was done every run.
Re: slightly OT: a simple Benchmark
Posted: Mon Jul 16, 2018 6:29 am
by Chromatix
Having implemented the wheel optimisation in C, it's making very little practical difference on my Raspberry Pi. There *is* a speedup, but it's only significant at the small end of the range, which the Pi can get through in less than a second anyway. Once we get into the hundreds of millions, we're saving a few percent at best - though the code never seems to be a liability. A larger wheel would consume more memory and provide only a slight further improvement on large candidates.
The reason for this is that, while we're down to 48 from 105 candidates per 210-interval, and the remainder have three trial divisions eliminated for free, most of the time is spent on candidates which either turn out to *be* prime, or which have only relatively large prime factors; in both cases, a large number of trial divisions is still required to determine this. The Pi 2B has Cortex-A7 cores which have hardware division, so eliminating the many candidates which have small prime factors is relatively quick.
But this is on a machine which, given several hours, can actually get through all the 32-bit integers. On 8-bit and 16-bit machines - and especially in an interpreted language - optimisations which make a concrete difference in the sub-million range are worthwhile. When it comes to interpreters, however, increasing the complexity of the code may increase the overhead of interpretation, even if the underlying arithmetic operations are simplified.
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 9:10 am
by Chromatix
I think the NEXT WITHOUT FOR ERROR happens due to the way EhBASIC uses the stack for both FOR NEXT and IF THEN ELSE constructs.
Curiously, the EhBASIC manual claims that IF-THEN-ELSE is treated as a single statement, so a colon acts as an ENDIF. This would appear to make it impossible to put multiple statements under a single condition without spaghetti logic. This in turn implies that there's no "stack use" by IF.
On the other hand, a single NEXT statement can apparently take multiple loop-variable arguments. Presumably these behave equivalently to consecutive NEXT statements.
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 10:50 am
by Klaus2m5
Curiously, the EhBASIC manual claims that IF-THEN-ELSE is treated as a single statement, so a colon acts as an ENDIF. This would appear to make it impossible to put multiple statements under a single condition without spaghetti logic. This in turn implies that there's no "stack use" by IF.
The IF THEN statement1 [: statement#] behaves the same way in EhBASIC as in all other BASIC dialects: All statements are executed if the condition was TRUE, none is executed if the condition was FALSE. Only IF THEN ELSE allows for just 1 statement for each part and a colon acts as ENDIF. I'll see if I can fix the conditional NEXT issue.
On the other hand, a single NEXT statement can apparently take multiple loop-variable arguments. Presumably these behave equivalently to consecutive NEXT statements.
Yes and I recall that EhBASIC is not the only BASIC allowing multiple FOR loops to be closed with just 1 NEXT.
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 12:57 pm
by BigEd
I'll see if I can fix the conditional NEXT issue.
That would be great!
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 1:16 pm
by Chromatix
What's curiously missing from most BASIC dialects (including structured dialects like BBC BASIC) is a way to cleanly break out of or move to the next iteration of a loop, without using a GOTO.
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 1:27 pm
by BigEd
Yes, it's normal to bump the control variable past the limit, then do NEXT, but you still have to do something at that point. In BBC Basic you could probably use an UNTIL, having set up a REPEAT previously. Or an ENDPROC, using procedures for structure. I'm not sure how the nesting for FOR, REPEAT, PROC would unwind, if they weren't strictly nested. Nor am I sure how efficient they are: I've a feeling at least one takes a copy of the 6502 stack so it can put it back when it's done.
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 7:53 pm
by BigDumbDinosaur
What's curiously missing from most BASIC dialects (including structured dialects like BBC BASIC) is a way to cleanly break out of or move to the next iteration of a loop, without using a GOTO.
Business BASIC interpreters such as BBx and Thoroughbred have
EXITTO, which breaks from a loop or subroutine and cleans up the stack in the process.
Re: slightly OT: a simple Benchmark
Posted: Tue Jul 17, 2018 8:08 pm
by commodorejohn
MS QuickBasic included an EXIT (structure) statement, I know, but yeah, it is an odd omission. I mean, even most Pascal variants have a BREAK statement, if I recall correctly.
Re: slightly OT: a simple Benchmark
Posted: Sun Sep 09, 2018 6:51 pm
by Jeff_Birt
I ported the BASIC benchmark to the C64, well really just copied as no changes were needed. I did add in the use of the Jiffy timer to automatically measure the execution time. The times are listed in the source code below. Also note that I ran this on WinVICE and a real C64 and the execution time was the same which is nice as in VICE I can run it in 'Warp' mode which is about 1,500% of the C64 speed

In the table showing times VBas is BASIC in VICE, CBas is BASIC on real C64, etc.
I then ported to DurexForth also on the C64. This is an ugly bit of code but it works and is still much faster than basic even though I wrote the integer square root function in Forth. I'm just dumping the Jiffy timer to the stack before the code starts and when it is done and calculating the difference by hand as I was too lazy to write a d- function (though I will likely do that and a better sqrt soon in assembly as once I figured out how to work with the Forth start from assembly it was quite easy.)
C64 BASIC
Code: Select all
100 REM
110 REM B A S I C - B E N C H
120 REM
130 REM The program tests whether it is in the area [3.. a] of the natural
140 REM Numbers two consecutive prime numbers P1 < P2 whose
150 REM difference is greater than or equal to B (P2-P1 > = b).
160 REM
170 REM Test values - Results - VBas - CBas - VForth - CForth
180 REM A 1000,20 - 907, 887,20 - 48s 48s 18s 18s
190 REM B 2000,30 - 1361, 1327,34 - 118s 118s 30s 30s
200 REM C 9999,35 - 9587, 9551,36 - 1412s 1412s 374s 374s
210 REM D 32000,50 - 19661, 19609,52 - 3516s 3516s 1530s 1530s
220 REM E 32000,70 - 31469, 31397,72 - 10447s 10477s 2352s 2352s
230 REM F 500000,100 - 370373,3790261,112 - 45819s
240 REM
250 ZS = 3 : INPUT A,B
260 TIME$ = "000000":rem reset clock
270 FOR C = 3 TO A STEP 2
280 FOR D = 3 TO SQR(C) STEP 2
290 IF INT(C/D)*D = C THEN goto 330
300 NEXT D
310 IF C-ZS >= B THEN PRINT C,ZS,C-ZS : GOTO 350
320 ZS = C
330 NEXT C
340 PRINT " no solution " : GOTO 250
350 print "time", time$: goto 250
DurexForth on C64
Code: Select all
variable ZS 3 ZS !
variable A
variable B
variable match 0 match !
code jiffy
dex, sei,
$a1 lda, msb sta,x
$a2 lda, lsb sta,x
dex,
$00 lda,# msb sta,x
$a0 lda, cli, lsb sta,x ;code
( isqrt(n):
x = n \ x == old guess
y = (x + 1) / 2 \ y == new guess
while y < x:
x = y
y = (x + n / x) / 2
)
( Integer square root based on above )
( Returns the largest integer x )
( for which x * x does not exceed n )
: sqrt ( n -- root )
dup 1 + 1 rshift
begin
2dup /include
over + 1 rshift
swap over >
while repeat
nip
;
: dloop ( cloopI -- )
-1 nomatch !
dup \ cloop value
sqrt 4 max 3 do
dup dup I / I *
= if 0 nomatch ! leave then
2 +loop
drop
;
( pops all three bytes of jiffy clock
onto the stack as two unsigned doubles )
( -- ud ud)
code jiffy
dex, sei,
$a1 lda, msb sta,x
$a2 lda, lsb sta,x
dex,
$00 lda,# msb sta,x
$a0 lda, cli, lsb sta,x ;code
( A B -- )
: benchmark
decimal
B ! A ! \ save A&B off stack
jiffy
A @ 3 do
I dloop nomatch @ if \ no match
I ZS @ - B @ -
0< invert if
I . ZS @ . I ZS @ - . leave then
I ZS !
then 2 +loop
jiffy
;
Re: slightly OT: a simple Benchmark
Posted: Tue Sep 11, 2018 9:06 pm
by GaBuZoMeu
Thank you very much Jeff, for the first Forth incarnation of this silly little benchy!
What puzzles me a bit is the huge deviation between these two Basic results:
Code: Select all
SYM (6502,1MHz) Basic1.1 BasBenc1 46,7 75,1 812,7 2014,0 3696,1 ____,_
C64 (6502, 1MHz) Basic BasBenc1 48,0 118,0 1412,0 3516,0 10447,0 45819,0 THX 2 Jeff_Birt
I knew the C64 runs not exactly @ 1MHz - I think it depends on PAL/NTSC. But what is the reason for the strong slow down for larger numbers? And why is the relation not constant?
Re: slightly OT: a simple Benchmark
Posted: Mon Sep 17, 2018 3:05 am
by dmsc
My results for FastBasic (
https://github.com/dmsc/fastbasic ) on NTSC Atari 800XL:
Code: Select all
Atari 800XL (1.79MHz NTSC) FastBasic_3.5 5.3 9.0 123.3 326.1 620.38
This is running the B=35 test:

- Benchmark run with B=35
And this is the source code in the Atari IDE:

- Benchmark source
Re: slightly OT: a simple Benchmark
Posted: Mon Sep 17, 2018 4:40 am
by barrym95838
Nice ... but weren't most of the NTSC 8-bit Ataris clocked at 1.79 MHz?