6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 4:50 am

All times are UTC




Post new topic Reply to topic  [ 210 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14  Next
Author Message
PostPosted: Mon Sep 17, 2018 5:05 am 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

barrym95838 wrote:
Nice ... but weren't most of the NTSC 8-bit Ataris clocked at 1.79 MHz?


Yes, was a little typo.

But note that in the standard text mode, the available cycles for the CPU are only 63.8% of the total cycles, given an equivalent 1.14MHz.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 18, 2018 5:15 pm 
Offline

Joined: Wed Jul 18, 2018 12:12 pm
Posts: 96
I told Arne that I would do a Forth version of the 'Prim' function version of the benchmark which does not use the sqrt function. Before doing that I did do a sqrt function for Forth in assembly which is based on 'Mr. Woo's Abacus Algorithm' (the sqrt2 word I did in Forth previously). I've copied the results of all tests below. The Foth code is at the bottom. The sqrt in assembly was of course the fastest but it you compare the sqrt2 and Prime versions it gives you an idea of the speed difference between the sqrt done in Forth and the Prime version.

I am quite certain my Forth code is not great, but it works and I'm getting quicker at producing code that actually works!

Code:
170 REM  Test values -       Results        - VBas - CBas - VForth - CForth sqrt2 sqrt2asm Prime
180 REM A    1000,20 -    907,     887,20 -    48s    48s    18s      18s   18      12      11s
190 REM B    2000,30 -   1361,    1327,34 -   118s   118s    30s      30s   26      19      18s
200 REM C    9999,35 -   9587,    9551,36 -  1412s  1412s   374s     374s  334     276     242s
210 REM D   32000,50 -  19661,   19609,52 -  3516s  3516s  1530s    1530s 1435     353     456s
220 REM E   32000,70 -  31469,   31397,72 - 10447s 10477s  2352s    2352s 2190    1860    2068s
230 REM F 500000,100 - 370373,3790261,112 - 45819s


Code:
\ if x is prime return 1 else 0
\ BLOODY FREAKING DECIMAL MODE!!!
: prime ( num -- flag )
    3 \ our index ( num index -- )
    begin
        2dup 2dup mod 0<> -rot
        dup * > and   
    while
        2 +
    repeat

    \ bool on stack as return value
    dup * <
;

require timer2
require prime

variable A \ Limit
variable B \ MinDiff

: benchPrime ( nLimit nMinDif -- )
   decimal
   B ! A ! \ save A&B off stack
   start   \ tart timing
   1 1 3   \ hiprim lowprim index

   begin ( hiprim loprim index -- )
    -rot 2dup           ( I hiprim loprim hiprim loprim -- )
    - B @ < >r          ( I hiprim loprim -- ) [ bool1 ]
    rot dup             ( hiprim loprim index index ) [ bool1 ]
    A @ swap - 0< 0=    ( hiprim loprim I bool2 ) [bool 1]
    r> and              ( hiprim loprim I bool 3 )
   while                ( hiprim loprim I )
    dup prime if        ( hiprim loprim index -- )
      -rot drop over    ( hiprim loprim index -- )
    then
    2 + \ index += 2
   repeat

   ( hiprim loprim index -- )
   drop 2dup - b @ < if
    ." No solution found" . . space
   else
    swap . . space
   then
   stop
;


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 18, 2018 8:23 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
@Jeff_Birt: thank you again, very interesting results. Never thought about an integer sqrt :)

@dmsc: an impressive type of Basic interpreter - reminds me to BASIC09. Did you ever thought about a stripped version, that does not require the Atari platform to run? Could be a strong competitor to EhBasic I suppose. Performance rivels with Plasma V2.

I updated the table to include all your results - thank you!


Regards,
Arne


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 19, 2018 1:27 am 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

GaBuZoMeu wrote:
@dmsc: an impressive type of Basic interpreter - reminds me to BASIC09. Did you ever thought about a stripped version, that does not require the Atari platform to run? Could be a strong competitor to EhBasic I suppose. Performance rivels with Plasma V2.


In fact, I would like to do a stripped version, but I need to find the minimum set of requirements. At least, for the integer IDE, you need:
- Screen output at given position,
- Cursor positioning,
- File loading / saving,
- Keyboard input without echo.

For the floating-point IDE you also need the math-pack, currently FastBasic uses the Atari-OS math routines (6 bytes per float, in BCD, uses 2KB), for a portable version I think it would be better to use simple binary arithmetic.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 19, 2018 6:27 pm 
Offline

Joined: Wed Jul 18, 2018 12:12 pm
Posts: 96
Arne pointed out that a few of my results (32000,50 32000,70) had odd looking times and he was quite correct. I tracked this down to the 32bit subtraction word I did in assembly. Although I looked at it 3-4 times and did not see the error I redid it to make use of the address calculation syntax of the built in assembler in DurexForth (instead of lots of inx, dex to shift the stack index around) and it worked. (Bugs me I could not see my error though.) Anyhow new table below. I ran each test at least twice to verify the results.

Code:
 Test values -       Results       -  VBas   CBas   VForth   CForth sqrt2 sqrt2asm Prime
     1000,20 -    907,     887,20  -    48s    48s    18s      18s    18s     12s    11s
     2000,30 -   1361,    1327,34  -   118s   118s    30s      30s    26s     19s    18s
     9999,35 -   9587,    9551,36  -  1412s  1412s   374s     374s   334s    276s   242s
    32000,50 -  19661,   19609,52  -  3516s  3516s  1530s    1530s   861s    740s   637s
    32000,70 -  31469,   31397,72  - 10447s 10477s  2352s    2352s  1615s   1416s  1209s
  500000,100 - 370373, 3790261,112 - 45819s


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 19, 2018 9:53 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Thank you Jeff for verification! I have edited the main table accordingly.
I'm glad to see that the "Prime" (aka modulo) version is - as assumed - still the fastest one. :)


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 24, 2018 7:58 pm 
Offline

Joined: Wed Nov 18, 2015 8:36 am
Posts: 102
Location: UK
I know I am slightly late to this thread, but I thought I would try this out using the custom BASIC I created for my machine which I call dflat :P

It is 16 bit integer only (which helps a lot with performance), and running on a 5.36Mhz 65c02. The results are:
A : 3.95s
B : 6.51s
C : 81.61s

It's similar in speed to VTL02 as far as I can see (when adjusted for clock speeds), but I am pleased as dflat is a fairly full implementation of BASIC with loads of string, file, sound and graphics commands.

The benchmark code is as follows:
Code:
1000 def_isPrime(%x)
1005  local %i
1010  %i=3
1020  while((%i*%i)<%x) and ((%x\%i)<>0)
1030   %i=%i+2
1040  wend
1050  if %x<(%i*%i):return 1:else:return 0
1090  endif
1100 enddef
1999 ;
2000 def_start()
2010  print "Range :":input %a
2030  print "Min diff :":input %b
2050  %loPrime=1:%hiPrime=1:%i=1
2075  reset %t
2080  while(%i<%a) and ((%hiPrime-%loPrime)<%b)
2090   %i=%i+2
2100   if _isPrime(%i)
2110    %loPrime=%hiPrime:%hiPrime=%i
2130   endif
2140  wend
2150  if (%hiPrime-%loPrime)<%b
2160   println "No gap that size"
2170  else
2180   println "Hi:",%hiPrime,"  Lo:",%loPrime
2190  endif
2195  %e=elapsed(%t)
2197  println "Time :",(%e/60),".",((%e\60)*100)/60
2200 enddef


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 25, 2018 8:52 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Thank you, dolomiah

I have pasted your results into the main table.


Regards,
Arne


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 6:38 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1385
Well, I'm quite late to the party on this one, but I've been working on a CMOS only version of EhBasic and I've pretty much finished it up. It's based on Version 2.22p4 (Klaus' patched version) and with the changes, I just refer to it as Version 2.22p4C. It's a bit smaller, slightly quicker and uses less page zero space as well. I've removed all of the IRQ and NMI code and added an EXIT command. I've also changed the startup a bit that tests and sets the memory amount without prompting the user. The current size is 9884 bytes and requires only the I/O vectors set in the single source file.

I've done numerous timings which are accurate to 0.01 seconds using the counter/timer in the NXP SCC2691 UART, which is configured for 10ms as a Jiffy clock. Using the initial Basic Bench program, plus a few changes, my results are below. The changes I've made do a few extra bits, to reset the Jiffy clock before the test starts, then print the elapsed time after it completes. I've run the tests multiple times and the results are extremely consistent. The worst change is a 0.01 seconds once in a while as this is the resolution of the timer itself as part of the ISR.

The board is a W65C02 running at 8MHz with 32KB RAM and 32KB ROM using a single 22v10 glue chip and SCC2691 UART. Details on the hardware can be found here:

viewtopic.php?f=4&t=5005

For reference, the actual basic program used is listed below. Line 15 resets the Jiffy clock, line 100 prints the basic elapsed time accurate to 1 second, lines 110 and 120 calculate and print the remaining hundredths of a second.

Code:
10 ZS = 3 : INPUT A,B
15 CALL 32768
20 FOR C = 3 TO A STEP 2
30 FOR D = 3 TO SQR(C) STEP 2
40 IF INT(C/D)*D = C THEN 80
50 NEXT D
60 IF C-ZS >= B THEN PRINT C,ZS,C-ZS : GOTO 100
70 ZS = C
80 NEXT C
90 PRINT " No Solution " : GOTO 10
100 CALL 57374
110 CALL 32784
120 CALL 32800
130 GOTO 10


The benchmark timings are here:

1000,20 = 2.78 seconds
2000,30 = 4.68 seconds
9999,35 = 61.92 seconds
32000,50 = 162.51 seconds
32000,70 = 308.33 seconds

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 5:10 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
floobydust wrote:
Well, I'm quite late to the party on this one, but I've been working on a CMOS only version of EhBasic and I've pretty much finished it up. It's based on Version 2.22p4 (Klaus' patched version) and with the changes, I just refer to it as Version 2.22p4C. It's a bit smaller, slightly quicker and uses less page zero space as well. I've removed all of the IRQ and NMI code and added an EXIT command. I've also changed the startup a bit that tests and sets the memory amount without prompting the user. The current size is 9884 bytes and requires only the I/O vectors set in the single source file.


Sounds good!

Have you a link for the source yet?

I was just about to embark on something similar myself though for my little system.

I presume you just iterate up memory until you find something that doesn't test - which may be an issue for me as I have an all-ram system with my monitor currently in high memory, although I can relocate that anywhere.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 5:43 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1385
drogon wrote:
floobydust wrote:
Well, I'm quite late to the party on this one, but I've been working on a CMOS only version of EhBasic and I've pretty much finished it up. It's based on Version 2.22p4 (Klaus' patched version) and with the changes, I just refer to it as Version 2.22p4C. It's a bit smaller, slightly quicker and uses less page zero space as well. I've removed all of the IRQ and NMI code and added an EXIT command. I've also changed the startup a bit that tests and sets the memory amount without prompting the user. The current size is 9884 bytes and requires only the I/O vectors set in the single source file.


Sounds good!

Have you a link for the source yet?

I was just about to embark on something similar myself though for my little system.

I presume you just iterate up memory until you find something that doesn't test - which may be an issue for me as I have an all-ram system with my monitor currently in high memory, although I can relocate that anywhere.

Cheers,

-Gordon


Hi Gordon,

I'll just paste the source here. Note that the source has been changed to assemble/link with WDC tools.

Attachment:
basic.asm [322.38 KiB]
Downloaded 106 times


The memory test goes up to the declared Ram_top, so just set that to your max, it also zeros out RAM during the test. Also, you'll need to change the vector defaults for your monitor which are listed at lines: 7428 - 7432. The Page Zero usage is contiguous from $00 - $85. Page $04 is used for the input buffer, Ctrl-C bits and system vectors.

There's only the single source file... one other note, I print the intro message from my Monitor code and then JMP to the start of EhBasic, which is $B000 on my system. You can easily change any of the locations for your monitor. Have fun.... and let me know how it works out for you.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 5:54 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
floobydust wrote:

The memory test goes up to the declared Ram_top, so just set that to your max, it also zeros out RAM during the test. Also, you'll need to change the vector defaults for your monitor which are listed at lines: 7428 - 7432. The Page Zero usage is contiguous from $00 - $85. Page $04 is used for the input buffer, Ctrl-C bits and system vectors.

There's only the single source file... one other note, I print the intro message from my Monitor code and then JMP to the start of EhBasic, which is $B000 on my system. You can easily change any of the locations for your monitor. Have fun.... and let me know how it works out for you.


Great, thanks!

I only have ca65 under Linux right now, so it might take a a bit to massage it for that - then work out how to get it into my system, but I'll let you know how it goes.

thanks,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 10:44 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
drogon wrote:
floobydust wrote:

The memory test goes up to the declared Ram_top, so just set that to your max, it also zeros out RAM during the test. Also, you'll need to change the vector defaults for your monitor which are listed at lines: 7428 - 7432. The Page Zero usage is contiguous from $00 - $85. Page $04 is used for the input buffer, Ctrl-C bits and system vectors.

There's only the single source file... one other note, I print the intro message from my Monitor code and then JMP to the start of EhBasic, which is $B000 on my system. You can easily change any of the locations for your monitor. Have fun.... and let me know how it works out for you.


Great, thanks!

I only have ca65 under Linux right now, so it might take a a bit to massage it for that - then work out how to get it into my system, but I'll let you know how it goes.

thanks,

-Gordon


Just to follow-up, it's working fine.

Assembling with ca65 was easy - add the flag to enable colon-less labels and change .equ into =.


Had to do some digging to work out exactly what ehbasic expected for the character in/out routines - as my ones in my monitor weren't quite compatible. Also worked out that it checks for Ctrl-C very often and that's a slow operation in my system right now, so even though I'm running at 16Mhz, the results of that little benchmark above are slower, however this is exactly why I've built the little 6502 test rig before I dive into 65816 land - to make sure I get the little stuff like this sorted first.

Anyway, disabling Ctrl-C and I get almost exactly half the times you get - I have no hardware timers, but using minicom in timestamp mode works fine and times are half above. My 65c02 is running at 16Mhz so that's very encouraging. Running a "real" program is always better than just doing simple memory tests, etc. too (I used to write test & diagnostics once upon a time - our users would often find some code that broke the boards that passed all our tests!)


This run:
Code:
[2018-11-16 22:30:45.909] RUN
[2018-11-16 22:30:46.929] ? 32000,70
[2018-11-16 22:33:40.751]  31469         31397         72
[2018-11-16 22:33:40.768]
[2018-11-16 22:33:40.769] Ready


is just under 180 seconds, but it included the time I took to type the numbers in. Quick tests with a stopwatch were closer to 150 seconds. Actually making it print a blank line gives:

Code:
[2018-11-16 22:39:22.227] RUN
[2018-11-16 22:39:26.906] ? 32000,70
[2018-11-16 22:39:34.096]
[2018-11-16 22:42:07.197]  31469         31397         72
[2018-11-16 22:42:07.205]
[2018-11-16 22:42:07.208] Ready


So 153 seconds. Neat.

Anyway, thanks again,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 4:59 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1385
Hi Gordon,

Thanks for the feedback. Glad it's working for you. Not sure why checking Ctrl-C would be sluggish on your system. My C02 Pocket SBC is using a NXP SCC2691 with interrupt-driven code for receive, transmit, timer/counter and received break. Perhaps your routine that checks for a key without waiting is the culprit.

The BIOS on my board simply checks the input buffer count. If the count is zero, it returns with the carry flag clear. As the ISR updates the input buffer count, it's a very fast check. If a character is in the buffer, meaning the count isn't zero, it just drops down to the routine to get the character from the buffer.

Code:
;Character Input routines
;CHRIN_NW uses CHRIN, returns if a character is not available from the buffer with carry flag clear
; else returns with character in A reg and carry flag set. CHRIN waits for a character to be in the
; buffer, then returns with carry flag set. Receive is IRQ driven/buffered with a size of 128 bytes
CHRIN_NW   CLC   :Clear Carry flag for no character
               LDA   ICNT   ;Get character count
               BNE   GET_CH   ;Branch if buffer is not empty
               RTS   ;and return to caller
;
CHRIN         LDA   ICNT   ;Get character count
               BEQ   CHRIN   ;If zero (no character, loop back)
;
GET_CH      PHY   ;Save Y reg
               LDY   IHEAD   ;Get the buffer head pointer
               LDA   IBUF,Y   ;Get the character from the buffer
               INC   IHEAD   ;Increment head pointer
               RMB7   IHEAD   ;Strip off bit 7, 128 bytes only
;
               DEC   ICNT   ;Decrement the buffer count
               PLY   ;Restore Y Reg
               SEC   ;Set Carry flag for character available
               RTS   ;Return to caller with character in A reg
;

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 10:21 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
floobydust wrote:
Hi Gordon,

Thanks for the feedback. Glad it's working for you. Not sure why checking Ctrl-C would be sluggish on your system.


Oh, I know exactly why - was just surprised at just how slow. (or how often ehbasic checks)

Other than a latch & 8 LEDs, the 6502 side doesn't have any directly connected peripherals. All it has is the top 256 bytes of RAM which is shared with the ATmega host. So to check for a key (or to print, etc.), the 6502 puts a command in the shared memory area and executes WAI. The ATmega sees the Rdy line going low then /BE's the 6502 and un-tri-states it's own bus control lines (A0-7, D0-7, R/W), reads the command from RAM, does what's needed - e.g. check to see if a key is ready to be read on it's serial line, updated the shared RAM with the result, then releases control, hands it back to the 6502 and blips the IRQ line. The 6502 wakes up and carrys on.

One niggle is that it can take the ATmega up to 64µS to recognise the Rdy line going low from the 6502 - this is due to the video generation being done by the ATmega, and even if a video scan-line isn't being output, if you hit the scan-line interrupt then that's another 64µS gone.. It's not ideal, and somewhat cumbersome, but it's workable for now. Video generation (320x240) takes up some 60-70% of the ATmega execution time )-:

Plan B is to use a spare output pin from the ATmega via the GAL (I have spare in & out pins on it) and into D7, then the 6502 can poll the IO address and if N then it knows the ATmega has "something" for it, so it can then initiate the lengthy exchange process. I'll do that today then I think it might be PCB time (with space for a 6522)

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 210 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron