slightly OT: a simple Benchmark

Let's talk about anything related to the 6502 microprocessor.
User avatar
GaBuZoMeu
Posts: 660
Joined: 01 Mar 2017
Location: North-Germany

Re: slightly OT: a simple Benchmark

Post by GaBuZoMeu »

Hmm, aren't there some initializations necessary? Else C% would be set to 0 and D%=DIVS%(1)=0 would cause a DIV_BY_0 error. :?:
Once a prime is found it should went into DIVS%(J) I assume?


Cheers,
Arne
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: slightly OT: a simple Benchmark

Post by Chromatix »

Yes, some details were omitted for brevity. See my latest C code for how to handle the DIVS% array.

To initialise the wheel array:

Code: Select all

J%=1
FOR I%=1 TO 2*3*5*7 STEP 2
IF (I% MOD 3)*(I% MOD 5)*(I% MOD 7) THEN WHEEL%(J%)=I% : J%=J%+1
NEXT
User avatar
GaBuZoMeu
Posts: 660
Joined: 01 Mar 2017
Location: North-Germany

Re: slightly OT: a simple Benchmark

Post by GaBuZoMeu »

OK, here are the numbers (relativ to the "classic" approach):

MS-Basic
A: 33,1 = 70,9%
B: 50,3 = 67,0%
C: 546,6 = 67,3%
D: 1377,1 = 68,4%

EhBasic:
A: 14,4 = 124%
B: 22,2 = 115%
C: 240,2 = 95,1%
D: 604,4 = 91,2%
E: 1107,8 = 88,1%

At least this elaborate approach yields an effect even with EhBasic. The results A and B are victims of the tables setup which was done every run.
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: slightly OT: a simple Benchmark

Post by Chromatix »

Having implemented the wheel optimisation in C, it's making very little practical difference on my Raspberry Pi. There *is* a speedup, but it's only significant at the small end of the range, which the Pi can get through in less than a second anyway. Once we get into the hundreds of millions, we're saving a few percent at best - though the code never seems to be a liability. A larger wheel would consume more memory and provide only a slight further improvement on large candidates.

The reason for this is that, while we're down to 48 from 105 candidates per 210-interval, and the remainder have three trial divisions eliminated for free, most of the time is spent on candidates which either turn out to *be* prime, or which have only relatively large prime factors; in both cases, a large number of trial divisions is still required to determine this. The Pi 2B has Cortex-A7 cores which have hardware division, so eliminating the many candidates which have small prime factors is relatively quick.

But this is on a machine which, given several hours, can actually get through all the 32-bit integers. On 8-bit and 16-bit machines - and especially in an interpreted language - optimisations which make a concrete difference in the sub-million range are worthwhile. When it comes to interpreters, however, increasing the complexity of the code may increase the overhead of interpretation, even if the underlying arithmetic operations are simplified.
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: slightly OT: a simple Benchmark

Post by Chromatix »

Klaus2m5 wrote:
I think the NEXT WITHOUT FOR ERROR happens due to the way EhBASIC uses the stack for both FOR NEXT and IF THEN ELSE constructs.
Curiously, the EhBASIC manual claims that IF-THEN-ELSE is treated as a single statement, so a colon acts as an ENDIF. This would appear to make it impossible to put multiple statements under a single condition without spaghetti logic. This in turn implies that there's no "stack use" by IF.

On the other hand, a single NEXT statement can apparently take multiple loop-variable arguments. Presumably these behave equivalently to consecutive NEXT statements.
Klaus2m5
Posts: 442
Joined: 28 Jul 2012
Location: Wiesbaden, Germany

Re: slightly OT: a simple Benchmark

Post by Klaus2m5 »

Chromatix wrote:
Curiously, the EhBASIC manual claims that IF-THEN-ELSE is treated as a single statement, so a colon acts as an ENDIF. This would appear to make it impossible to put multiple statements under a single condition without spaghetti logic. This in turn implies that there's no "stack use" by IF.
The IF THEN statement1 [: statement#] behaves the same way in EhBASIC as in all other BASIC dialects: All statements are executed if the condition was TRUE, none is executed if the condition was FALSE. Only IF THEN ELSE allows for just 1 statement for each part and a colon acts as ENDIF. I'll see if I can fix the conditional NEXT issue.
Chromatix wrote:
On the other hand, a single NEXT statement can apparently take multiple loop-variable arguments. Presumably these behave equivalently to consecutive NEXT statements.
Yes and I recall that EhBASIC is not the only BASIC allowing multiple FOR loops to be closed with just 1 NEXT.
Last edited by Klaus2m5 on Wed Jul 18, 2018 6:58 am, edited 1 time in total.
6502 sources on GitHub: https://github.com/Klaus2m5
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: slightly OT: a simple Benchmark

Post by BigEd »

Klaus2m5 wrote:
I'll see if I can fix the conditional NEXT issue.
That would be great!
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: slightly OT: a simple Benchmark

Post by Chromatix »

What's curiously missing from most BASIC dialects (including structured dialects like BBC BASIC) is a way to cleanly break out of or move to the next iteration of a loop, without using a GOTO.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: slightly OT: a simple Benchmark

Post by BigEd »

Yes, it's normal to bump the control variable past the limit, then do NEXT, but you still have to do something at that point. In BBC Basic you could probably use an UNTIL, having set up a REPEAT previously. Or an ENDPROC, using procedures for structure. I'm not sure how the nesting for FOR, REPEAT, PROC would unwind, if they weren't strictly nested. Nor am I sure how efficient they are: I've a feeling at least one takes a copy of the 6502 stack so it can put it back when it's done.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: slightly OT: a simple Benchmark

Post by BigDumbDinosaur »

Chromatix wrote:
What's curiously missing from most BASIC dialects (including structured dialects like BBC BASIC) is a way to cleanly break out of or move to the next iteration of a loop, without using a GOTO.
Business BASIC interpreters such as BBx and Thoroughbred have EXITTO, which breaks from a loop or subroutine and cleans up the stack in the process.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
commodorejohn
Posts: 299
Joined: 21 Jan 2016
Location: Placerville, CA
Contact:

Re: slightly OT: a simple Benchmark

Post by commodorejohn »

MS QuickBasic included an EXIT (structure) statement, I know, but yeah, it is an odd omission. I mean, even most Pascal variants have a BREAK statement, if I recall correctly.
Jeff_Birt
Posts: 96
Joined: 18 Jul 2018

Re: slightly OT: a simple Benchmark

Post by Jeff_Birt »

I ported the BASIC benchmark to the C64, well really just copied as no changes were needed. I did add in the use of the Jiffy timer to automatically measure the execution time. The times are listed in the source code below. Also note that I ran this on WinVICE and a real C64 and the execution time was the same which is nice as in VICE I can run it in 'Warp' mode which is about 1,500% of the C64 speed :) In the table showing times VBas is BASIC in VICE, CBas is BASIC on real C64, etc.

I then ported to DurexForth also on the C64. This is an ugly bit of code but it works and is still much faster than basic even though I wrote the integer square root function in Forth. I'm just dumping the Jiffy timer to the stack before the code starts and when it is done and calculating the difference by hand as I was too lazy to write a d- function (though I will likely do that and a better sqrt soon in assembly as once I figured out how to work with the Forth start from assembly it was quite easy.)

C64 BASIC

Code: Select all

100 REM
110 REM B A S I C - B E N C H
120 REM
130 REM The program tests whether it is in the area [3.. a] of the natural
140 REM Numbers two consecutive prime numbers P1 < P2 whose
150 REM difference is greater than or equal to B (P2-P1 > = b).
160 REM
170 REM  Test values -       Results        - VBas - CBas - VForth - CForth
180 REM A    1000,20 -    907,     887,20 -    48s    48s    18s      18s
190 REM B    2000,30 -   1361,    1327,34 -   118s   118s    30s      30s
200 REM C    9999,35 -   9587,    9551,36 -  1412s  1412s   374s     374s
210 REM D   32000,50 -  19661,   19609,52 -  3516s  3516s  1530s    1530s
220 REM E   32000,70 -  31469,   31397,72 - 10447s 10477s  2352s    2352s
230 REM F 500000,100 - 370373,3790261,112 - 45819s
240 REM
250 ZS = 3 : INPUT A,B
260 TIME$ = "000000":rem reset clock
270 FOR C = 3 TO A STEP 2
280 FOR D = 3 TO SQR(C) STEP 2
290 IF INT(C/D)*D = C THEN goto 330
300 NEXT D

310 IF C-ZS >= B THEN PRINT C,ZS,C-ZS : GOTO 350
320 ZS = C
330 NEXT C
340 PRINT " no solution " : GOTO 250
350 print "time", time$: goto 250
DurexForth on C64

Code: Select all

variable ZS 3 ZS !
variable A
variable B
variable match 0 match !

code jiffy
dex, sei,
$a1 lda, msb sta,x
$a2 lda, lsb sta,x
dex,
$00 lda,# msb sta,x
$a0 lda, cli, lsb sta,x ;code

( isqrt(n): 
    x = n           \ x == old guess
    y = (x + 1) / 2 \ y == new guess
    while y < x:
        x = y
        y = (x + n / x) / 2
)
( Integer square root based on above )
( Returns the largest integer x      )
( for which x * x does not exceed n  )
: sqrt ( n -- root )
   dup 1 + 1 rshift
   begin
    2dup /include 
    over + 1 rshift
    swap over >
   while repeat
   nip
;

: dloop ( cloopI -- )
   -1 nomatch !
   dup \ cloop value 
     sqrt 4 max 3 do
     dup dup I / I * 
     = if 0 nomatch ! leave then
   2 +loop
   drop
;

( pops all three bytes of jiffy clock
onto the stack as two unsigned doubles )
( -- ud ud)
code jiffy
dex, sei,
$a1 lda, msb sta,x
$a2 lda, lsb sta,x
dex,
$00 lda,# msb sta,x
$a0 lda, cli, lsb sta,x ;code

( A B -- )
: benchmark
   decimal
   B ! A ! \ save A&B off stack
   jiffy
   A @ 3 do
    I dloop nomatch @ if \ no match
    I ZS @ - B @ -
     0< invert if
      I . ZS @ . I ZS @ - . leave then 
    I ZS !
    then 2 +loop
   jiffy
;
User avatar
GaBuZoMeu
Posts: 660
Joined: 01 Mar 2017
Location: North-Germany

Re: slightly OT: a simple Benchmark

Post by GaBuZoMeu »

Thank you very much Jeff, for the first Forth incarnation of this silly little benchy!

What puzzles me a bit is the huge deviation between these two Basic results:

Code: Select all

SYM (6502,1MHz)    Basic1.1          BasBenc1            46,7    75,1   812,7  2014,0  3696,1   ____,_
C64 (6502, 1MHz)   Basic             BasBenc1            48,0   118,0  1412,0  3516,0 10447,0  45819,0  THX 2 Jeff_Birt
I knew the C64 runs not exactly @ 1MHz - I think it depends on PAL/NTSC. But what is the reason for the strong slow down for larger numbers? And why is the relation not constant? :?: :shock: :?:

Code: Select all

C64/SYM                                                  1,02    1,57    1,74    1,75    2,83
dmsc
Posts: 153
Joined: 17 Sep 2018

Re: slightly OT: a simple Benchmark

Post by dmsc »

My results for FastBasic ( https://github.com/dmsc/fastbasic ) on NTSC Atari 800XL:

Code: Select all

Atari 800XL (1.79MHz NTSC)   FastBasic_3.5      5.3     9.0      123.3    326.1    620.38
This is running the B=35 test:
Benchmark run with B=35
Benchmark run with B=35

And this is the source code in the Atari IDE:
Benchmark source
Benchmark source
Last edited by dmsc on Mon Sep 17, 2018 5:05 am, edited 1 time in total.
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: slightly OT: a simple Benchmark

Post by barrym95838 »

Nice ... but weren't most of the NTSC 8-bit Ataris clocked at 1.79 MHz?
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
Post Reply