But note that in the standard text mode, the available cycles for the CPU are only 63.8% of the total cycles, given an equivalent 1.14MHz.
slightly OT: a simple Benchmark
Re: slightly OT: a simple Benchmark
Hi!
Yes, was a little typo.
But note that in the standard text mode, the available cycles for the CPU are only 63.8% of the total cycles, given an equivalent 1.14MHz.
barrym95838 wrote:
Nice ... but weren't most of the NTSC 8-bit Ataris clocked at 1.79 MHz?
But note that in the standard text mode, the available cycles for the CPU are only 63.8% of the total cycles, given an equivalent 1.14MHz.
Re: slightly OT: a simple Benchmark
I told Arne that I would do a Forth version of the 'Prim' function version of the benchmark which does not use the sqrt function. Before doing that I did do a sqrt function for Forth in assembly which is based on 'Mr. Woo's Abacus Algorithm' (the sqrt2 word I did in Forth previously). I've copied the results of all tests below. The Foth code is at the bottom. The sqrt in assembly was of course the fastest but it you compare the sqrt2 and Prime versions it gives you an idea of the speed difference between the sqrt done in Forth and the Prime version.
I am quite certain my Forth code is not great, but it works and I'm getting quicker at producing code that actually works!
I am quite certain my Forth code is not great, but it works and I'm getting quicker at producing code that actually works!
Code: Select all
170 REM Test values - Results - VBas - CBas - VForth - CForth sqrt2 sqrt2asm Prime
180 REM A 1000,20 - 907, 887,20 - 48s 48s 18s 18s 18 12 11s
190 REM B 2000,30 - 1361, 1327,34 - 118s 118s 30s 30s 26 19 18s
200 REM C 9999,35 - 9587, 9551,36 - 1412s 1412s 374s 374s 334 276 242s
210 REM D 32000,50 - 19661, 19609,52 - 3516s 3516s 1530s 1530s 1435 353 456s
220 REM E 32000,70 - 31469, 31397,72 - 10447s 10477s 2352s 2352s 2190 1860 2068s
230 REM F 500000,100 - 370373,3790261,112 - 45819s
Code: Select all
\ if x is prime return 1 else 0
\ BLOODY FREAKING DECIMAL MODE!!!
: prime ( num -- flag )
3 \ our index ( num index -- )
begin
2dup 2dup mod 0<> -rot
dup * > and
while
2 +
repeat
\ bool on stack as return value
dup * <
;
require timer2
require prime
variable A \ Limit
variable B \ MinDiff
: benchPrime ( nLimit nMinDif -- )
decimal
B ! A ! \ save A&B off stack
start \ tart timing
1 1 3 \ hiprim lowprim index
begin ( hiprim loprim index -- )
-rot 2dup ( I hiprim loprim hiprim loprim -- )
- B @ < >r ( I hiprim loprim -- ) [ bool1 ]
rot dup ( hiprim loprim index index ) [ bool1 ]
A @ swap - 0< 0= ( hiprim loprim I bool2 ) [bool 1]
r> and ( hiprim loprim I bool 3 )
while ( hiprim loprim I )
dup prime if ( hiprim loprim index -- )
-rot drop over ( hiprim loprim index -- )
then
2 + \ index += 2
repeat
( hiprim loprim index -- )
drop 2dup - b @ < if
." No solution found" . . space
else
swap . . space
then
stop
;
Re: slightly OT: a simple Benchmark
@Jeff_Birt: thank you again, very interesting results. Never thought about an integer sqrt 
@dmsc: an impressive type of Basic interpreter - reminds me to BASIC09. Did you ever thought about a stripped version, that does not require the Atari platform to run? Could be a strong competitor to EhBasic I suppose. Performance rivels with Plasma V2.
I updated the table to include all your results - thank you!
Regards,
Arne
@dmsc: an impressive type of Basic interpreter - reminds me to BASIC09. Did you ever thought about a stripped version, that does not require the Atari platform to run? Could be a strong competitor to EhBasic I suppose. Performance rivels with Plasma V2.
I updated the table to include all your results - thank you!
Regards,
Arne
Re: slightly OT: a simple Benchmark
Hi!
In fact, I would like to do a stripped version, but I need to find the minimum set of requirements. At least, for the integer IDE, you need:
- Screen output at given position,
- Cursor positioning,
- File loading / saving,
- Keyboard input without echo.
For the floating-point IDE you also need the math-pack, currently FastBasic uses the Atari-OS math routines (6 bytes per float, in BCD, uses 2KB), for a portable version I think it would be better to use simple binary arithmetic.
GaBuZoMeu wrote:
@dmsc: an impressive type of Basic interpreter - reminds me to BASIC09. Did you ever thought about a stripped version, that does not require the Atari platform to run? Could be a strong competitor to EhBasic I suppose. Performance rivels with Plasma V2.
- Screen output at given position,
- Cursor positioning,
- File loading / saving,
- Keyboard input without echo.
For the floating-point IDE you also need the math-pack, currently FastBasic uses the Atari-OS math routines (6 bytes per float, in BCD, uses 2KB), for a portable version I think it would be better to use simple binary arithmetic.
Re: slightly OT: a simple Benchmark
Arne pointed out that a few of my results (32000,50 32000,70) had odd looking times and he was quite correct. I tracked this down to the 32bit subtraction word I did in assembly. Although I looked at it 3-4 times and did not see the error I redid it to make use of the address calculation syntax of the built in assembler in DurexForth (instead of lots of inx, dex to shift the stack index around) and it worked. (Bugs me I could not see my error though.) Anyhow new table below. I ran each test at least twice to verify the results.
Code: Select all
Test values - Results - VBas CBas VForth CForth sqrt2 sqrt2asm Prime
1000,20 - 907, 887,20 - 48s 48s 18s 18s 18s 12s 11s
2000,30 - 1361, 1327,34 - 118s 118s 30s 30s 26s 19s 18s
9999,35 - 9587, 9551,36 - 1412s 1412s 374s 374s 334s 276s 242s
32000,50 - 19661, 19609,52 - 3516s 3516s 1530s 1530s 861s 740s 637s
32000,70 - 31469, 31397,72 - 10447s 10477s 2352s 2352s 1615s 1416s 1209s
500000,100 - 370373, 3790261,112 - 45819s
Re: slightly OT: a simple Benchmark
Thank you Jeff for verification! I have edited the main table accordingly.
I'm glad to see that the "Prime" (aka modulo) version is - as assumed - still the fastest one.
I'm glad to see that the "Prime" (aka modulo) version is - as assumed - still the fastest one.
Re: slightly OT: a simple Benchmark
I know I am slightly late to this thread, but I thought I would try this out using the custom BASIC I created for my machine which I call dflat
It is 16 bit integer only (which helps a lot with performance), and running on a 5.36Mhz 65c02. The results are:
A : 3.95s
B : 6.51s
C : 81.61s
It's similar in speed to VTL02 as far as I can see (when adjusted for clock speeds), but I am pleased as dflat is a fairly full implementation of BASIC with loads of string, file, sound and graphics commands.
The benchmark code is as follows:
It is 16 bit integer only (which helps a lot with performance), and running on a 5.36Mhz 65c02. The results are:
A : 3.95s
B : 6.51s
C : 81.61s
It's similar in speed to VTL02 as far as I can see (when adjusted for clock speeds), but I am pleased as dflat is a fairly full implementation of BASIC with loads of string, file, sound and graphics commands.
The benchmark code is as follows:
Code: Select all
1000 def_isPrime(%x)
1005 local %i
1010 %i=3
1020 while((%i*%i)<%x) and ((%x\%i)<>0)
1030 %i=%i+2
1040 wend
1050 if %x<(%i*%i):return 1:else:return 0
1090 endif
1100 enddef
1999 ;
2000 def_start()
2010 print "Range :":input %a
2030 print "Min diff :":input %b
2050 %loPrime=1:%hiPrime=1:%i=1
2075 reset %t
2080 while(%i<%a) and ((%hiPrime-%loPrime)<%b)
2090 %i=%i+2
2100 if _isPrime(%i)
2110 %loPrime=%hiPrime:%hiPrime=%i
2130 endif
2140 wend
2150 if (%hiPrime-%loPrime)<%b
2160 println "No gap that size"
2170 else
2180 println "Hi:",%hiPrime," Lo:",%loPrime
2190 endif
2195 %e=elapsed(%t)
2197 println "Time :",(%e/60),".",((%e\60)*100)/60
2200 enddef
Re: slightly OT: a simple Benchmark
Thank you, dolomiah
I have pasted your results into the main table.
Regards,
Arne
I have pasted your results into the main table.
Regards,
Arne
- floobydust
- Posts: 1394
- Joined: 05 Mar 2013
Re: slightly OT: a simple Benchmark
Well, I'm quite late to the party on this one, but I've been working on a CMOS only version of EhBasic and I've pretty much finished it up. It's based on Version 2.22p4 (Klaus' patched version) and with the changes, I just refer to it as Version 2.22p4C. It's a bit smaller, slightly quicker and uses less page zero space as well. I've removed all of the IRQ and NMI code and added an EXIT command. I've also changed the startup a bit that tests and sets the memory amount without prompting the user. The current size is 9884 bytes and requires only the I/O vectors set in the single source file.
I've done numerous timings which are accurate to 0.01 seconds using the counter/timer in the NXP SCC2691 UART, which is configured for 10ms as a Jiffy clock. Using the initial Basic Bench program, plus a few changes, my results are below. The changes I've made do a few extra bits, to reset the Jiffy clock before the test starts, then print the elapsed time after it completes. I've run the tests multiple times and the results are extremely consistent. The worst change is a 0.01 seconds once in a while as this is the resolution of the timer itself as part of the ISR.
The board is a W65C02 running at 8MHz with 32KB RAM and 32KB ROM using a single 22v10 glue chip and SCC2691 UART. Details on the hardware can be found here:
viewtopic.php?f=4&t=5005
For reference, the actual basic program used is listed below. Line 15 resets the Jiffy clock, line 100 prints the basic elapsed time accurate to 1 second, lines 110 and 120 calculate and print the remaining hundredths of a second.
The benchmark timings are here:
1000,20 = 2.78 seconds
2000,30 = 4.68 seconds
9999,35 = 61.92 seconds
32000,50 = 162.51 seconds
32000,70 = 308.33 seconds
I've done numerous timings which are accurate to 0.01 seconds using the counter/timer in the NXP SCC2691 UART, which is configured for 10ms as a Jiffy clock. Using the initial Basic Bench program, plus a few changes, my results are below. The changes I've made do a few extra bits, to reset the Jiffy clock before the test starts, then print the elapsed time after it completes. I've run the tests multiple times and the results are extremely consistent. The worst change is a 0.01 seconds once in a while as this is the resolution of the timer itself as part of the ISR.
The board is a W65C02 running at 8MHz with 32KB RAM and 32KB ROM using a single 22v10 glue chip and SCC2691 UART. Details on the hardware can be found here:
viewtopic.php?f=4&t=5005
For reference, the actual basic program used is listed below. Line 15 resets the Jiffy clock, line 100 prints the basic elapsed time accurate to 1 second, lines 110 and 120 calculate and print the remaining hundredths of a second.
Code: Select all
10 ZS = 3 : INPUT A,B
15 CALL 32768
20 FOR C = 3 TO A STEP 2
30 FOR D = 3 TO SQR(C) STEP 2
40 IF INT(C/D)*D = C THEN 80
50 NEXT D
60 IF C-ZS >= B THEN PRINT C,ZS,C-ZS : GOTO 100
70 ZS = C
80 NEXT C
90 PRINT " No Solution " : GOTO 10
100 CALL 57374
110 CALL 32784
120 CALL 32800
130 GOTO 101000,20 = 2.78 seconds
2000,30 = 4.68 seconds
9999,35 = 61.92 seconds
32000,50 = 162.51 seconds
32000,70 = 308.33 seconds
Regards, KM
https://github.com/floobydust
https://github.com/floobydust
Re: slightly OT: a simple Benchmark
floobydust wrote:
Well, I'm quite late to the party on this one, but I've been working on a CMOS only version of EhBasic and I've pretty much finished it up. It's based on Version 2.22p4 (Klaus' patched version) and with the changes, I just refer to it as Version 2.22p4C. It's a bit smaller, slightly quicker and uses less page zero space as well. I've removed all of the IRQ and NMI code and added an EXIT command. I've also changed the startup a bit that tests and sets the memory amount without prompting the user. The current size is 9884 bytes and requires only the I/O vectors set in the single source file.
Have you a link for the source yet?
I was just about to embark on something similar myself though for my little system.
I presume you just iterate up memory until you find something that doesn't test - which may be an issue for me as I have an all-ram system with my monitor currently in high memory, although I can relocate that anywhere.
Cheers,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
- floobydust
- Posts: 1394
- Joined: 05 Mar 2013
Re: slightly OT: a simple Benchmark
drogon wrote:
floobydust wrote:
Well, I'm quite late to the party on this one, but I've been working on a CMOS only version of EhBasic and I've pretty much finished it up. It's based on Version 2.22p4 (Klaus' patched version) and with the changes, I just refer to it as Version 2.22p4C. It's a bit smaller, slightly quicker and uses less page zero space as well. I've removed all of the IRQ and NMI code and added an EXIT command. I've also changed the startup a bit that tests and sets the memory amount without prompting the user. The current size is 9884 bytes and requires only the I/O vectors set in the single source file.
Have you a link for the source yet?
I was just about to embark on something similar myself though for my little system.
I presume you just iterate up memory until you find something that doesn't test - which may be an issue for me as I have an all-ram system with my monitor currently in high memory, although I can relocate that anywhere.
Cheers,
-Gordon
I'll just paste the source here. Note that the source has been changed to assemble/link with WDC tools.
The memory test goes up to the declared Ram_top, so just set that to your max, it also zeros out RAM during the test. Also, you'll need to change the vector defaults for your monitor which are listed at lines: 7428 - 7432. The Page Zero usage is contiguous from $00 - $85. Page $04 is used for the input buffer, Ctrl-C bits and system vectors.
There's only the single source file... one other note, I print the intro message from my Monitor code and then JMP to the start of EhBasic, which is $B000 on my system. You can easily change any of the locations for your monitor. Have fun.... and let me know how it works out for you.
Regards, KM
https://github.com/floobydust
https://github.com/floobydust
Re: slightly OT: a simple Benchmark
floobydust wrote:
The memory test goes up to the declared Ram_top, so just set that to your max, it also zeros out RAM during the test. Also, you'll need to change the vector defaults for your monitor which are listed at lines: 7428 - 7432. The Page Zero usage is contiguous from $00 - $85. Page $04 is used for the input buffer, Ctrl-C bits and system vectors.
There's only the single source file... one other note, I print the intro message from my Monitor code and then JMP to the start of EhBasic, which is $B000 on my system. You can easily change any of the locations for your monitor. Have fun.... and let me know how it works out for you.
I only have ca65 under Linux right now, so it might take a a bit to massage it for that - then work out how to get it into my system, but I'll let you know how it goes.
thanks,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: slightly OT: a simple Benchmark
drogon wrote:
floobydust wrote:
The memory test goes up to the declared Ram_top, so just set that to your max, it also zeros out RAM during the test. Also, you'll need to change the vector defaults for your monitor which are listed at lines: 7428 - 7432. The Page Zero usage is contiguous from $00 - $85. Page $04 is used for the input buffer, Ctrl-C bits and system vectors.
There's only the single source file... one other note, I print the intro message from my Monitor code and then JMP to the start of EhBasic, which is $B000 on my system. You can easily change any of the locations for your monitor. Have fun.... and let me know how it works out for you.
I only have ca65 under Linux right now, so it might take a a bit to massage it for that - then work out how to get it into my system, but I'll let you know how it goes.
thanks,
-Gordon
Assembling with ca65 was easy - add the flag to enable colon-less labels and change .equ into =.
Had to do some digging to work out exactly what ehbasic expected for the character in/out routines - as my ones in my monitor weren't quite compatible. Also worked out that it checks for Ctrl-C very often and that's a slow operation in my system right now, so even though I'm running at 16Mhz, the results of that little benchmark above are slower, however this is exactly why I've built the little 6502 test rig before I dive into 65816 land - to make sure I get the little stuff like this sorted first.
Anyway, disabling Ctrl-C and I get almost exactly half the times you get - I have no hardware timers, but using minicom in timestamp mode works fine and times are half above. My 65c02 is running at 16Mhz so that's very encouraging. Running a "real" program is always better than just doing simple memory tests, etc. too (I used to write test & diagnostics once upon a time - our users would often find some code that broke the boards that passed all our tests!)
This run:
Code: Select all
[2018-11-16 22:30:45.909] RUN
[2018-11-16 22:30:46.929] ? 32000,70
[2018-11-16 22:33:40.751] 31469 31397 72
[2018-11-16 22:33:40.768]
[2018-11-16 22:33:40.769] Ready
Code: Select all
[2018-11-16 22:39:22.227] RUN
[2018-11-16 22:39:26.906] ? 32000,70
[2018-11-16 22:39:34.096]
[2018-11-16 22:42:07.197] 31469 31397 72
[2018-11-16 22:42:07.205]
[2018-11-16 22:42:07.208] Ready
Anyway, thanks again,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
- floobydust
- Posts: 1394
- Joined: 05 Mar 2013
Re: slightly OT: a simple Benchmark
Hi Gordon,
Thanks for the feedback. Glad it's working for you. Not sure why checking Ctrl-C would be sluggish on your system. My C02 Pocket SBC is using a NXP SCC2691 with interrupt-driven code for receive, transmit, timer/counter and received break. Perhaps your routine that checks for a key without waiting is the culprit.
The BIOS on my board simply checks the input buffer count. If the count is zero, it returns with the carry flag clear. As the ISR updates the input buffer count, it's a very fast check. If a character is in the buffer, meaning the count isn't zero, it just drops down to the routine to get the character from the buffer.
Thanks for the feedback. Glad it's working for you. Not sure why checking Ctrl-C would be sluggish on your system. My C02 Pocket SBC is using a NXP SCC2691 with interrupt-driven code for receive, transmit, timer/counter and received break. Perhaps your routine that checks for a key without waiting is the culprit.
The BIOS on my board simply checks the input buffer count. If the count is zero, it returns with the carry flag clear. As the ISR updates the input buffer count, it's a very fast check. If a character is in the buffer, meaning the count isn't zero, it just drops down to the routine to get the character from the buffer.
Code: Select all
;Character Input routines
;CHRIN_NW uses CHRIN, returns if a character is not available from the buffer with carry flag clear
; else returns with character in A reg and carry flag set. CHRIN waits for a character to be in the
; buffer, then returns with carry flag set. Receive is IRQ driven/buffered with a size of 128 bytes
CHRIN_NW CLC :Clear Carry flag for no character
LDA ICNT ;Get character count
BNE GET_CH ;Branch if buffer is not empty
RTS ;and return to caller
;
CHRIN LDA ICNT ;Get character count
BEQ CHRIN ;If zero (no character, loop back)
;
GET_CH PHY ;Save Y reg
LDY IHEAD ;Get the buffer head pointer
LDA IBUF,Y ;Get the character from the buffer
INC IHEAD ;Increment head pointer
RMB7 IHEAD ;Strip off bit 7, 128 bytes only
;
DEC ICNT ;Decrement the buffer count
PLY ;Restore Y Reg
SEC ;Set Carry flag for character available
RTS ;Return to caller with character in A reg
;Regards, KM
https://github.com/floobydust
https://github.com/floobydust
Re: slightly OT: a simple Benchmark
floobydust wrote:
Hi Gordon,
Thanks for the feedback. Glad it's working for you. Not sure why checking Ctrl-C would be sluggish on your system.
Thanks for the feedback. Glad it's working for you. Not sure why checking Ctrl-C would be sluggish on your system.
Other than a latch & 8 LEDs, the 6502 side doesn't have any directly connected peripherals. All it has is the top 256 bytes of RAM which is shared with the ATmega host. So to check for a key (or to print, etc.), the 6502 puts a command in the shared memory area and executes WAI. The ATmega sees the Rdy line going low then /BE's the 6502 and un-tri-states it's own bus control lines (A0-7, D0-7, R/W), reads the command from RAM, does what's needed - e.g. check to see if a key is ready to be read on it's serial line, updated the shared RAM with the result, then releases control, hands it back to the 6502 and blips the IRQ line. The 6502 wakes up and carrys on.
One niggle is that it can take the ATmega up to 64µS to recognise the Rdy line going low from the 6502 - this is due to the video generation being done by the ATmega, and even if a video scan-line isn't being output, if you hit the scan-line interrupt then that's another 64µS gone.. It's not ideal, and somewhat cumbersome, but it's workable for now. Video generation (320x240) takes up some 60-70% of the ATmega execution time )-:
Plan B is to use a spare output pin from the ATmega via the GAL (I have spare in & out pins on it) and into D7, then the 6502 can poll the IO address and if N then it knows the ATmega has "something" for it, so it can then initiate the lengthy exchange process. I'll do that today then I think it might be PCB time (with space for a 6522)
Cheers,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/