6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 4:08 pm

All times are UTC




Post new topic Reply to topic  [ 54 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
 Post subject: Benchmarking
PostPosted: Fri Oct 16, 2020 11:51 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Always a highly debated topic and often falls into the "my processor is better than your processor" bun fight, however, having been involved in this area in the world of supercomputers for some time in the past I thought I'd put something here, based on what has been posted recently...

One of the key things is consistency, so over the years a set of standard benchmarks have been designed over the years and a fantastic source of the popular ones are held on Roy Longbottoms benchmark website: http://www.roylongbottom.org.uk/classic.htm however these are not that applicable to our beloved 8-bit CPU, but it still makes for good reading if you're interested in this field... (and as an aside; I worked on supercomputers in Livermore Labs for a while - home place of one of those benchmarks!)

Consistency extends to the number format too - if your system is integer only and the benchmark will run in the integer domain then it's highly likely to be faster than the same benchmark run on a system that uses floating point numbers. Most BASICs (and I mean the 8-bit CPU ones here) use 4 or 5 byte floating point format and worse, when you ask them for Integer variables convert them to floating point to do the calculations then back again! (Exceptions to this are Apple Integer BASIC and BBC BASIC when you use integer variables - BBC Basic will perform the entire calculation in integer mode if it can).

Another "cheat" to watch out for is BASICs (and other languages) that have special interpretation of some variables - BBC Basic has the 27 integer variables, from @% through Z% stored in a fixed memory location and are much faster to use than a generic name% variable. Other BASICS and interpreted languages may also have these features, so, again, consistency is key to make an accurate comparison..

Same for putting spaces in (interpreted) programs, and tricks like multiple statements per line and subroutines at the start of the program... There is a difference between having a consistent benchmark and showing off... (Which doesn't mean that you should never show off - but establish the baseline first)

Something that's popular and I've had on my website for a while, and gets used from time to time is a BASIC version of Mandelbrot. I adapted it from some that were online (and wrong!) and it's been used by quite a few people - often just as a demo if nothing else... I designed it to work in most generic "Microsoft style" BASICs - e.g. EhBASIC and Applesoft as well as BBC Basic without any changes - as consistency is as important as anything in this world, so it's not the most efficient algorithm, not the best code, but it works. I've attached the text version of the code as well as the output it produces (or should produce!)

An other "classic" benchmark, more aimed at the home PC is the Byte sieve benchmark - see https://en.wikipedia.org/wiki/Byte_Sieve

The one I mentioned earlier - Ackermann is here: https://en.wikipedia.org/wiki/Ackermann_function and the one mentioned earlier, the Tak or Takari is here: https://en.wikipedia.org/wiki/Tak_(function)

finally, a set of very simple BASIC benchmarks, popular (probably mostly in the UK) are the Rugg/Feldman benchmarks: https://en.wikipedia.org/wiki/Rugg/Feldman_benchmarks

Anyway, just some food for thought!

Cheers,

-Gordon


Attachments:
mandel.bas.txt [1.06 KiB]
Downloaded 220 times
mandel.txt [3.17 KiB]
Downloaded 188 times

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Fri Oct 16, 2020 4:34 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Thanks for the food for thought, Gordon. Allow me to add one tidbit to the menu! :) This one pertains to hardware.

drogon wrote:
consistency is key to make an accurate comparison
Agreed. Yet, something as seemingly simple as clock rate can be wrongly prioritized, resulting in a skewed playing field. Here's an example which unfortunately turns up on this forum from time to time.

Comparing a 6502 with a Z80 running at the same clock rate skews the results strongly in favor of 6502. That's because the 6502, at one clock per memory access, would necessarily require much faster RAM and ROM than a Z80 running at the same clock rate. Thus the 6502 would have the advantage of drastically improved bandwidth in this slanted comparison. And of course the faster memory (and IO) would have a heavy impact on system cost.

IMO a more sensible comparison would begin with the assumption that both CPU's will use the same speed of memory. In that context the Z80 will have a much faster clock rate, because the Z80, which happens to use a fine grained clock, takes 3 or 4 clocks per memory access. It'll need a faster crystal, but that doesn't impact system cost.

(Some say a 1 MHz 6502 can outperform a 4 MHz Z80, and I don't dispute that. My only concern is that the basis of comparison be valid.)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Sun Nov 01, 2020 1:07 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
drogon wrote:
Something that's popular and I've had on my website for a while, and gets used from time to time is a BASIC version of Mandelbrot. I adapted it from some that were online (and wrong!) and it's been used by quite a few people - often just as a demo if nothing else... I designed it to work in most generic "Microsoft style" BASICs - e.g. EhBASIC and Applesoft as well as BBC Basic without any changes - as consistency is as important as anything in this world, so it's not the most efficient algorithm, not the best code, but it works. I've attached the text version of the code as well as the output it produces (or should produce!)

What a nice idea to draw Mandelbrot in ASCII! I have run this program (W=38, H=21) for the BBC Micro (B-em emulator), Commodore 64/+4 and Amstrad CPC 6128, results are in the next table.

Code:
ABC 802                93
BBC Master (mode 7)   111.95
BBC Master (mode 6)   112.04
BBC Micro B (mode 7)  144.96
Amstrad CPC 6128      163.43
Electronika BK0010-01 205
Commodore 128 (fast)  297.57
Commodore 64          384.16
Atari 800XL           394.12
Commodore +4          485.85
MSX2                  554.98
Commodore 128         620.2
TI-99/4A   (+XB)      757  (T40XB utility is used)


I could suggest adding one more variable for timer resolution for line 510 because it is different for different systems: 100 for the BBC Micro, 60 - Commodore/MSX2, 300 - Amstrad CPC.


Attachments:
ti99-4a-xb-t40.png
ti99-4a-xb-t40.png [ 1.09 KiB | Viewed 2943 times ]
bk0010-01-mandel-ascii.png
bk0010-01-mandel-ascii.png [ 10.94 KiB | Viewed 3243 times ]
msx2-mandel-ascii.png
msx2-mandel-ascii.png [ 28.55 KiB | Viewed 3276 times ]
atari-xl-mandel-ascii.png
atari-xl-mandel-ascii.png [ 5.11 KiB | Viewed 3323 times ]
cbm-128-vdc-fast-mandel-ascii.png
cbm-128-vdc-fast-mandel-ascii.png [ 17.86 KiB | Viewed 3635 times ]
cbm-64-mandel-ascii.png
cbm-64-mandel-ascii.png [ 20.29 KiB | Viewed 3669 times ]
cbm-128-mandel-ascii.png
cbm-128-mandel-ascii.png [ 51.32 KiB | Viewed 3698 times ]
cpc6128-madndel-ascii.png
cpc6128-madndel-ascii.png [ 12.74 KiB | Viewed 3733 times ]
cbm-plus4-madndel-ascii.png
cbm-plus4-madndel-ascii.png [ 16.36 KiB | Viewed 3733 times ]
bbc-micro-mode7-madndel-ascii.png
bbc-micro-mode7-madndel-ascii.png [ 10.42 KiB | Viewed 3733 times ]
bbc-micro-mode6-madndel-ascii.png
bbc-micro-mode6-madndel-ascii.png [ 6.24 KiB | Viewed 3733 times ]

_________________
my blog about processors


Last edited by litwr on Sun Sep 26, 2021 8:20 pm, edited 17 times in total.
Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Sun Nov 01, 2020 2:24 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
Another thing that can make a difference is the size of data you are required to handle. The 6800 and the 6502 are fairly similar CPUs in many ways, with the 6502 often a bit faster for typical work, but if you're doing something like string operations and you're required to work with strings up to 260 bytes rather than 250, the 6800 may become faster simply because the penalty of having only one index register (vs. "multiple," in a way, for the 6502) becomes outweighed by the 6800 doing 16-bit indexing more conveniently than the 6502, which does "8-bit" indexing (again, in a way). There are of course further subtleties to this....

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Sun Nov 01, 2020 5:52 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
I have added information about the Commodore 128. Its result is almost 3 times slower than the C64!
Dear cjs, it seems you have used a wrong thread.

_________________
my blog about processors


Last edited by litwr on Sat Nov 07, 2020 9:16 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Sun Nov 01, 2020 7:40 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
(The C64 seems to be the only one with an unduplicated middle line, so I suspect it is running iterations on slightly different numbers.)


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Sun Nov 01, 2020 8:27 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
BigEd wrote:
(The C64 seems to be the only one with an unduplicated middle line, so I suspect it is running iterations on slightly different numbers.)

Thank you. It seems I made a typo mistake somewhere. It is fixed and now the C64 is not so fast, only 1.6 times faster than the C128.

EDIT. I have also added data for the MSX2 which shows a surprisingly slow result.

_________________
my blog about processors


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Mon Nov 02, 2020 9:57 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
litwr wrote:
I have also added data for the MSX2 which shows a surprisingly slow result.

It's not really all that surprising, no. Unlike other Microsoft BASICs, MSX BASIC uses decimal rather than binary significands, which is probably a performance hit right there, and, more importantly, in your case is probably using double precision floating point (14 digit significands and a 7-bit binary exponent).

I have tried to replicate your benchmark (more on this below) and came out with 521.9 seconds (versus your 554.98) based on the TIME variable and confirmed roughly against a wall clock on my Japanese Sanyo MPC-2 (Wavy2), which is an NTSC MSX1 machine running an earlier version of BASIC (1.0 instead of 2.0).

In the benchmark above I added a line 165 DEFDBL A-Z to ensure that I was using double precision (though this is the default, and the result is the same without it). Changing that to DEFSNG uses 6-digit significands which is I think is a more fair comparison to other MS-BASICs and seems to produce the same visual output. This takes 413.37 seconds, a 21% improvement, which puts the MSX machine between the Commodore Plus/4 (MSX1 being 15% faster) and the Commodore 64 (MSX1 being 7% slower).

(I've also run the benchmarks above on a Japanese Sony HB-55, also MSX1, and came out with the same numbers within .02%: 521.98 s and 413.45 s.)

Regarding your post, especially in its original version, you could have made it a lot more clear what changes you were making to the original benchmark code. It took me some time to replicate your output, and it didn't help that for the MSX one you've both lost the top line of the output and terminated the program differently (or not terminated at all, I think; I replicated with 515 GOTO 515 to prevent the program from exiting). You should also check the result from TIME against a wall clock; since TIME doesn't increment when interrupts are disabled it may undercount the time used. (I confirmed that it's very close on my machine, but this may vary across MSX models.) You should also mention that dividing TIME by 60 to get seconds applies only to NTSC MSX models; since it's based on the video refresh interrupt this would need to be 50 on PAL machines.

I'm not clear on why your MSX2 is noticeably slower (6%) than my MSX1, despite being a newer machine running a newer version of BASIC. Perhaps it's actually faster and you're using a PAL model? Without knowing what hardware you tested on, it's hard even to guess. (I'll have another go at this on one of my MSX2 machines if I can get one up and running; I'm currently waiting on a power supply for them.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Tue Nov 03, 2020 1:23 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
On a simulated Amstrad NC200, running the Z80 version of BBC BASIC, I got 664.95 seconds. I verified that TIME ticks at 100Hz.

On that machine, the display is only 16 lines tall, so it can only show a third of the "image" at a time. But I checked the output against the reference as it was produced, and it seemed to match.


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Thu Nov 05, 2020 10:44 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
Chromatix wrote:
On a simulated Amstrad NC200, running the Z80 version of BBC BASIC, I got 664.95 seconds.

I assume that this is a cycle-accurate simulation of the NC200's 4 MHz Z80 and you were using w=38 h=21, as the all the other benchmarks (except the original program) were?

Is the NC200 version of BASIC one that uses 8-byte floats? That's a little slower than I thought it would be on a 4 MHz Z80, particularly with BBC BASIC, which I had assumed would mostly be faster than MS BASIC. (But perhaps the rewrite of the MS-BASIC floating point routines to decimal single- and double-precision was done by someone who really knew how to optimize them.)

Your figure is about 27% slower than the MS-BASIC decimal 8-byte doubles on a 4 MHz MSX1 machine. To compare on another mobile platform I loaded up the program on my TRS-80 Model 100 (which, like MSX, has an MS-BASIC with 8-byte double- and 4-byte single-precision decimal floats) with a 2.4 MHz 8085 processor. Even here (with the usual w=28 h=21, ) I got very competitive times: 701 s for double-precision and 557 s for single-precision. (The times are accurate only to the second because the Model 100 has no TIME tick clock as far as I'm aware; instead I just print out the human-readable TIME$ for the start and end of the run and manually calculate the elapsed seconds from the HH:MM:SS output.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Thu Nov 05, 2020 12:05 pm 
Offline

Joined: Sat Jan 02, 2016 10:22 am
Posts: 197
I have a BBC Master 128 (MOS 3.5) along with the Z80 co-pro so I've run the "benchmark" with W38 H21 on both the 65C102 and the Z80B. The improved maths routines from the final version of 6502 BBC basic is sufficient to edge out the Z80 clocked 3x as fast and the handicap of having to handle the VDU output. I got a minimal improvrement in on the native CPU swapping to Mode 7, and as expected no difference on the copro where the VDU output is handled by the host.

No MSX systems here, but I do have an MTX, which has similar spec. That turned in a time just short of 6 mins, for a Z80 CPU using 5 byte floating point maths.

Code:
Master 128 Basic 4r32
Mode 6 : 101.79
Mode 7 : 101.70

Master Z80B copro Z80 Basic 2.20
Mode 6 : 106.75
Mode 7 : 106.75

MTX512 4mhz Z80 TMS 9929A
text mode 5m47s  (347s)


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Thu Nov 05, 2020 12:53 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
The NEC PC-8201 is a Kyocera Kyotronic 85-based computer similar to the TRS-80 Model 100, also with a 2.4 MHz 8085. However, it uses "N₈₂-BASIC," which is a Microsoft BASIC but presumably more similar to their N₈₀-BASIC and N₈₈-BASIC, used on the PC-8001 (1979-09) and PC-8801 (1981-12). It has single- and double-precision floats (defaulting to single precision, unusually), but I don't know the formats.

In the default single-precision MANDEL.BA runs in 410 seconds, 28% faster than the M100's 557 second single-precision result, which is interesting. (Could this be due to using the old MS floating point format?) I don't have a double-precision result because it turns out that you can't use double-precision floats as the index in a FOR loop; attemting this produces ?TY Error (type error). And DEFDBL won't take something like DEFDBL X2; you can specify only the first letter.

Gordon, would you be interested/willing to rewrite your program so that integer and floating point variables each have a distinct set of initial letters? That would fix this situation and also make it easier to test using integer variables for a few non-floating-point values on other machines (though in my attempts to experiment with this on MSX this didn't make much difference).

Martin A wrote:
The improved maths routines from the final version of 6502 BBC basic is sufficient to edge out the Z80 clocked 3x as fast and the handicap of having to handle the VDU output.

I don't think I'm too surprised by that; I'm not sure that even a 3 MHz Z80 is typically as fast as a 1 MHz 6502. My rule of thumb has been that a 1 MHz 6502/6800 clock is about equivalent to a 4 MHz 8080/Z80 clock, though I could be wrong about that.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Thu Nov 05, 2020 3:04 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
cjs wrote:
Gordon, would you be interested/willing to rewrite your program so that integer and floating point variables each have a distinct set of initial letters? That would fix this situation and also make it easier to test using integer variables for a few non-floating-point values on other machines (though in my attempts to experiment with this on MSX this didn't make much difference).


I have another version, currently in BCPL that is designed to use scaled integers. I'll see if I can make a BASIC version of that in the next few days.

So it would be a different program, essentially the same algorithm, also ASCII output, then usable in BASICs that properly support integers - most MS based BASICs that use '%' to represent integers actually do the calculations in floating point.... Their only use (as far as I can tell!) is to same space in arrays (2 bytes per number vs. 4 or 5, although BBC Basic has 4 byte integers!)

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Fri Nov 06, 2020 1:20 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
I have an assembler version of my ASCII art Mandelbrot that uses scaled integers via byte swapping and it's really speedy (about 20 times faster than Basic). I've been contemplating porting it to the 6809 and running it on my CoCo.

https://github.com/Martin-H1/6502/blob/ ... elbrot.asm


Top
 Profile  
Reply with quote  
 Post subject: Re: Benchmarking
PostPosted: Sat Nov 07, 2020 10:58 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Ah, no, my NC200 time was with the original dimensions of 64x48. With the smaller dimensions, it takes 168.73 seconds which is a considerably better showing. I'm using MAME, which should be cycle-accurate as far as the CPU is concerned, though the NC200 emulation isn't completely stable yet.

A quick test program suggests the NC200's BBC BASIC uses a 32-bit mantissa in floating-point, which matches the 5-byte format used in standard BBC BASIC:
Code:
10 C=0 : T=1 : Q=0.5
20 REPEAT : C=C+1 : Q=Q/2 : S=T+Q : UNTIL S=T
30 PRINT "Mantissa bits: ";C
40 ON ERROR REPORT : PRINT : GOTO 70
50 C=0 : T=2
60 REPEAT : C=C+1 : Q=T : T=T*2 : UNTIL Q=T
70 PRINT "Max exponent: +";C
80 ON ERROR REPORT : PRINT : GOTO 110
90 C=0 : T=0.5
100 REPEAT : C=C+1 : T=T/2 : UNTIL T=0
110 PRINT "Min exponent: -";C
120 END
The above should print the following under a standard BBC BASIC:
Code:
Mantissa bits: 32
Too big
Max exponent: +127
Min exponent: -127
The "Too big" in the middle indicates that overflow triggers an error condition. A BASIC using IEEE-754 arithmetic should not do that, but generate an Infinity which is then caught by the UNTIL test.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 54 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: