slightly OT: a simple Benchmark

GaBuZoMeu · Post by **GaBuZoMeu** » Sun Jul 15, 2018 7:39 pm

Hmm, aren't there some initializations necessary? Else C% would be set to 0 and D%=DIVS%(1)=0 would cause a DIV_BY_0 error.

Once a prime is found it should went into DIVS%(J) I assume?

Cheers,
Arne

Chromatix · Post by **Chromatix** » Sun Jul 15, 2018 7:50 pm

Yes, some details were omitted for brevity. See my latest C code for how to handle the DIVS% array.

To initialise the wheel array:

Code: Select all

J%=1
FOR I%=1 TO 2*3*5*7 STEP 2
IF (I% MOD 3)*(I% MOD 5)*(I% MOD 7) THEN WHEEL%(J%)=I% : J%=J%+1
NEXT

GaBuZoMeu · Post by **GaBuZoMeu** » Sun Jul 15, 2018 11:33 pm

OK, here are the numbers (relativ to the "classic" approach):

MS-Basic
A: 33,1 = 70,9%
B: 50,3 = 67,0%
C: 546,6 = 67,3%
D: 1377,1 = 68,4%

EhBasic:
A: 14,4 = 124%
B: 22,2 = 115%
C: 240,2 = 95,1%
D: 604,4 = 91,2%
E: 1107,8 = 88,1%

At least this elaborate approach yields an effect even with EhBasic. The results A and B are victims of the tables setup which was done every run.

Chromatix · Post by **Chromatix** » Mon Jul 16, 2018 6:29 am

Having implemented the wheel optimisation in C, it's making very little practical difference on my Raspberry Pi. There *is* a speedup, but it's only significant at the small end of the range, which the Pi can get through in less than a second anyway. Once we get into the hundreds of millions, we're saving a few percent at best - though the code never seems to be a liability. A larger wheel would consume more memory and provide only a slight further improvement on large candidates.

The reason for this is that, while we're down to 48 from 105 candidates per 210-interval, and the remainder have three trial divisions eliminated for free, most of the time is spent on candidates which either turn out to *be* prime, or which have only relatively large prime factors; in both cases, a large number of trial divisions is still required to determine this. The Pi 2B has Cortex-A7 cores which have hardware division, so eliminating the many candidates which have small prime factors is relatively quick.

But this is on a machine which, given several hours, can actually get through all the 32-bit integers. On 8-bit and 16-bit machines - and especially in an interpreted language - optimisations which make a concrete difference in the sub-million range are worthwhile. When it comes to interpreters, however, increasing the complexity of the code may increase the overhead of interpretation, even if the underlying arithmetic operations are simplified.

Chromatix · Post by **Chromatix** » Tue Jul 17, 2018 9:10 am

Klaus2m5 wrote:

I think the NEXT WITHOUT FOR ERROR happens due to the way EhBASIC uses the stack for both FOR NEXT and IF THEN ELSE constructs.

Curiously, the EhBASIC manual claims that IF-THEN-ELSE is treated as a single statement, so a colon acts as an ENDIF. This would appear to make it impossible to put multiple statements under a single condition without spaghetti logic. This in turn implies that there's no "stack use" by IF.

On the other hand, a single NEXT statement can apparently take multiple loop-variable arguments. Presumably these behave equivalently to consecutive NEXT statements.

Klaus2m5 · Post by **Klaus2m5** » Tue Jul 17, 2018 10:50 am

Chromatix wrote:

Curiously, the EhBASIC manual claims that IF-THEN-ELSE is treated as a single statement, so a colon acts as an ENDIF. This would appear to make it impossible to put multiple statements under a single condition without spaghetti logic. This in turn implies that there's no "stack use" by IF.

The IF THEN statement1 [: statement#] behaves the same way in EhBASIC as in all other BASIC dialects: All statements are executed if the condition was TRUE, none is executed if the condition was FALSE. Only IF THEN ELSE allows for just 1 statement for each part and a colon acts as ENDIF. I'll see if I can fix the conditional NEXT issue.

Chromatix wrote:

On the other hand, a single NEXT statement can apparently take multiple loop-variable arguments. Presumably these behave equivalently to consecutive NEXT statements.

Yes and I recall that EhBASIC is not the only BASIC allowing multiple FOR loops to be closed with just 1 NEXT.

BigEd · Post by **BigEd** » Tue Jul 17, 2018 12:57 pm

Klaus2m5 wrote:

I'll see if I can fix the conditional NEXT issue.

That would be great!

Chromatix · Post by **Chromatix** » Tue Jul 17, 2018 1:16 pm

What's curiously missing from most BASIC dialects (including structured dialects like BBC BASIC) is a way to cleanly break out of or move to the next iteration of a loop, without using a GOTO.

BigEd · Post by **BigEd** » Tue Jul 17, 2018 1:27 pm

Yes, it's normal to bump the control variable past the limit, then do NEXT, but you still have to do something at that point. In BBC Basic you could probably use an UNTIL, having set up a REPEAT previously. Or an ENDPROC, using procedures for structure. I'm not sure how the nesting for FOR, REPEAT, PROC would unwind, if they weren't strictly nested. Nor am I sure how efficient they are: I've a feeling at least one takes a copy of the 6502 stack so it can put it back when it's done.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Jul 17, 2018 7:53 pm

Chromatix wrote:

What's curiously missing from most BASIC dialects (including structured dialects like BBC BASIC) is a way to cleanly break out of or move to the next iteration of a loop, without using a GOTO.

Business BASIC interpreters such as BBx and Thoroughbred have EXITTO, which breaks from a loop or subroutine and cleans up the stack in the process.

commodorejohn · Post by **commodorejohn** » Tue Jul 17, 2018 8:08 pm

MS QuickBasic included an EXIT (structure) statement, I know, but yeah, it is an odd omission. I mean, even most Pascal variants have a BREAK statement, if I recall correctly.

Jeff_Birt · Post by **Jeff_Birt** » Sun Sep 09, 2018 6:51 pm

I ported the BASIC benchmark to the C64, well really just copied as no changes were needed. I did add in the use of the Jiffy timer to automatically measure the execution time. The times are listed in the source code below. Also note that I ran this on WinVICE and a real C64 and the execution time was the same which is nice as in VICE I can run it in 'Warp' mode which is about 1,500% of the C64 speed

In the table showing times VBas is BASIC in VICE, CBas is BASIC on real C64, etc.

I then ported to DurexForth also on the C64. This is an ugly bit of code but it works and is still much faster than basic even though I wrote the integer square root function in Forth. I'm just dumping the Jiffy timer to the stack before the code starts and when it is done and calculating the difference by hand as I was too lazy to write a d- function (though I will likely do that and a better sqrt soon in assembly as once I figured out how to work with the Forth start from assembly it was quite easy.)

C64 BASIC

Code: Select all

100 REM
110 REM B A S I C - B E N C H
120 REM
130 REM The program tests whether it is in the area [3.. a] of the natural
140 REM Numbers two consecutive prime numbers P1 < P2 whose
150 REM difference is greater than or equal to B (P2-P1 > = b).
160 REM
170 REM  Test values -       Results        - VBas - CBas - VForth - CForth
180 REM A    1000,20 -    907,     887,20 -    48s    48s    18s      18s
190 REM B    2000,30 -   1361,    1327,34 -   118s   118s    30s      30s
200 REM C    9999,35 -   9587,    9551,36 -  1412s  1412s   374s     374s
210 REM D   32000,50 -  19661,   19609,52 -  3516s  3516s  1530s    1530s
220 REM E   32000,70 -  31469,   31397,72 - 10447s 10477s  2352s    2352s
230 REM F 500000,100 - 370373,3790261,112 - 45819s
240 REM
250 ZS = 3 : INPUT A,B
260 TIME$ = "000000":rem reset clock
270 FOR C = 3 TO A STEP 2
280 FOR D = 3 TO SQR(C) STEP 2
290 IF INT(C/D)*D = C THEN goto 330
300 NEXT D

310 IF C-ZS >= B THEN PRINT C,ZS,C-ZS : GOTO 350
320 ZS = C
330 NEXT C
340 PRINT " no solution " : GOTO 250
350 print "time", time$: goto 250

DurexForth on C64

Code: Select all

variable ZS 3 ZS !
variable A
variable B
variable match 0 match !

code jiffy
dex, sei,
$a1 lda, msb sta,x
$a2 lda, lsb sta,x
dex,
$00 lda,# msb sta,x
$a0 lda, cli, lsb sta,x ;code

( isqrt(n): 
    x = n           \ x == old guess
    y = (x + 1) / 2 \ y == new guess
    while y < x:
        x = y
        y = (x + n / x) / 2
)
( Integer square root based on above )
( Returns the largest integer x      )
( for which x * x does not exceed n  )
: sqrt ( n -- root )
   dup 1 + 1 rshift
   begin
    2dup /include 
    over + 1 rshift
    swap over >
   while repeat
   nip
;

: dloop ( cloopI -- )
   -1 nomatch !
   dup \ cloop value 
     sqrt 4 max 3 do
     dup dup I / I * 
     = if 0 nomatch ! leave then
   2 +loop
   drop
;

( pops all three bytes of jiffy clock
onto the stack as two unsigned doubles )
( -- ud ud)
code jiffy
dex, sei,
$a1 lda, msb sta,x
$a2 lda, lsb sta,x
dex,
$00 lda,# msb sta,x
$a0 lda, cli, lsb sta,x ;code

( A B -- )
: benchmark
   decimal
   B ! A ! \ save A&B off stack
   jiffy
   A @ 3 do
    I dloop nomatch @ if \ no match
    I ZS @ - B @ -
     0< invert if
      I . ZS @ . I ZS @ - . leave then 
    I ZS !
    then 2 +loop
   jiffy
;

GaBuZoMeu · Post by **GaBuZoMeu** » Tue Sep 11, 2018 9:06 pm

Thank you very much Jeff, for the first Forth incarnation of this silly little benchy!

What puzzles me a bit is the huge deviation between these two Basic results:

Code: Select all

SYM (6502,1MHz)    Basic1.1          BasBenc1            46,7    75,1   812,7  2014,0  3696,1   ____,_
C64 (6502, 1MHz)   Basic             BasBenc1            48,0   118,0  1412,0  3516,0 10447,0  45819,0  THX 2 Jeff_Birt

I knew the C64 runs not exactly @ 1MHz - I think it depends on PAL/NTSC. But what is the reason for the strong slow down for larger numbers? And why is the relation not constant?

Code: Select all

C64/SYM                                                  1,02    1,57    1,74    1,75    2,83

dmsc · Post by **dmsc** » Mon Sep 17, 2018 3:05 am

My results for FastBasic ( https://github.com/dmsc/fastbasic ) on NTSC Atari 800XL:

Code: Select all

Atari 800XL (1.79MHz NTSC)   FastBasic_3.5      5.3     9.0      123.3    326.1    620.38

This is running the B=35 test:

: Benchmark run with B=35

And this is the source code in the Atari IDE:

: Benchmark source

barrym95838 · Post by **barrym95838** » Mon Sep 17, 2018 4:40 am

Nice ... but weren't most of the NTSC 8-bit Ataris clocked at 1.79 MHz?

slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark

Re: slightly OT: a simple Benchmark