6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 6:40 pm

All times are UTC




Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Aug 30, 2018 4:02 pm 
Offline

Joined: Wed Jul 18, 2018 12:12 pm
Posts: 96
I've been using DurexForth on the C64 (in VICE) to learn Forth. To try and get an idea of Forth's performance comparted to BASIC I did a simple plot of the 'a' character to the whole screen (1,000 locations). Two general methods were tried one was a single loop adding the loop index to screen memory location and plotting. The second was two loops X and Y calculating the memory address with 'X+Y*40'.

Then in Forth only I implemented a lookup table for the Y offsets. This was actually a bit slower when I still used a Y*2 to index through the array (an array of 16bit values). When I did a left shift instead of multiplications it was much faster!

The closest comparison denoted in the comments below are the 'linear' test and the 'X,Y X+Y*40' tests. The 'start' and 'stop' functions in the code below are a Forth module that uses the C64 Jiffy timer to let you profile your code optimizing to execution time. The Forth code below is only the last test ( 0.766 X,Y luY w/Screen added to yoff ) but I'm guessing you know what the other code would have looked like.

Code:
400 constant Screen
00  constant Minscr
3E8 constant Maxscr
91  constant Up
11  constant Down
9D  constant Left
1D  constant Right

variable scrn 400 scrn !
variable xpos 0 xpos !
variable ypos 0 ypos !

( creates array of screen Y offset values )
( i.e. 0400 0428 0450 )
create yoff 32 allot
: cfgyoff
   19 0 do
    I 28 * Screen + ( value )
    I 2 * yoff + !  ( index address )
   loop
;

( retrieves Y value from yoff array )
( adds in xpos and stores 'a' to screen )
: plotline3
   28 0 do
    I xpos !
    1 ypos @ 1 lshift yoff + @
    xpos @ + c!
   loop
;

( Linear is loop 0-1000 )
( X,Y is loop Y, loop X )
( 9s BASIC linear )
( 15s Basic X,Y X+Y*40 )
( 0.283s Forth linear )
( 2.133s X,Y X+40*40
( 2.316s X,Y lookup Y w/Y*2 )
( 0.833s X,Y lookup Y w/Y<<1 )
( 0.766 X,Y luY w/Screen added to yoff )
: plotxy2
   cfgyoff
   page start
   19 0 do
    I ypos !
    plotline3
   loop
   stop ." s" cr
;


Code:
10 TIME$ = "000000":rem reset clock
20 rem 'linear' test
30 rem for i = 0 to 1000
40 rem poke 1024+i,1
50 rem next i
60 rem
70 rem X,Y test
80 for y = 0 to 25
90 for x = 0 to 40
100 poke 1024+y*40+x, 1
110 next x
120 next y
130 print TIME$


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 31, 2018 4:57 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
It's a little bit extra unfair to BASIC considering it's always doing floating point math.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 31, 2018 7:18 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
There is a little benchmark where numerous Basic variants (and program implementations) and other languages have contributed there numbers. There isn't a single Forth contribution so far... :wink:

Regards,
Arne


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 07, 2018 12:57 am 
Offline

Joined: Wed Jul 18, 2018 12:12 pm
Posts: 96
GaBuZoMeu wrote:
There is a little benchmark where numerous Basic variants (and program implementations) and other languages have contributed there numbers. There isn't a single Forth contribution so far... :wink:

Regards,
Arne


It was very easy to port you BASIC benchmark to the C64. I added in a few lines so I could use the Jiffy timer to automatically log execution time. I think spent a fair amount of time writing a Forth version which I got working this afternoon (after finding at least 1,000 ways it would not work). I'll run both versions through all iterations (A-F) in VICE and then set up a C64 this weekend to confirm execution times on real hardware.

I'll post me code and results in the benchmark thread...


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 07, 2018 3:09 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Even then, there's probably a lot that could be improved on the C64 Forth. The 6502 Forth I started with (I imagine they had common roots, maybe from Ragsdale), before I heavily modified it, had for example,
Code:
: DEPTH        SP@ S0 @ SWAP - 2 / ;

Besides the fact that it would have taken less memory as a primitive something like this,
Code:
CODE  DEPTH
        STX  N        ; Put stack pointer in N.
        LDA  #S0adr   ; Get the address of the 1st byte under the stack.
        SEC
        SBC  N        ; Subtract the data stack pointer value (X above).
        LSR  A        ; Divide by 2, to get from number of bytes on stack
        JMP  CPUSH    ; to number of cells.
      END-CODE
 ;-------------------

they even used 2 / instead of 2/ , making it at least dozens of times as slow!

I imagine there's a lot that could be dramatically improved in the C64 Forth if someone wanted to get serious about it. I've said that the Forth in the HP-71 module I got was not a very good one either. Without even resorting to assembly language, I was able to dramatically speed up many of the words. In the extreme case, I got the word 13x as fast, just by re-writing it in Forth.

When I wrote my '816 (actually '802) Forth, the ability to handle 16 bits at a time without taking cells apart "to get them through the door" so so speak, meant that there were lots more things that were suddenly worth it to write as primitives which would have taken too much memory as primitives on the 6502.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 07, 2018 9:12 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
64Forth was definitely slow. I believe it actually copied the address from the CFA of a word to a location in zero page and performed an indirect jump through the zero page address! That's one way to get around the indirect jump bug, but what a time waster. I much prefer having CREATE determine if the CFA will land on a page boundary and allot 1 byte before the header is created. Blazin' Forth, another indirect threaded Forth and a Forth-83 standard Forth, was considerably faster. Alas, there were things Blazin' Forth could have done better, which motivated me to write my own.


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 07, 2018 9:43 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Even though the 65c02 (CMOS) fixed that bug, I still kept the alignment for my '02 Forth. IIRC, I think it was to make it easier to decompile (using the ANS Forth word SEE), something I seldom need to do but once in a while it's useful. Without taking the time to review it right now, it seems like it had to do with allowing names to have characters with the high bit set (to get the specials we use in engineering, like Greek letters, degree symbol, etc. in the DOS/ANSI [Edit: that should say IBM437] character set).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 07, 2018 10:17 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
My Forth has no aligning at all. Create just allots one byte if the CFA would otherwise cross a page boundary and my version of SEE works just fine, even with code words and the odd mix of code and high level. This:
Code:
HEX
: #  ( D1 -- D2 )
   BASE @ UD/MOD ROT
   >ASSEM
   0 ,X LDA,  0A # CMP,  CS IF,
      6 # ADC,
   THEN,
   30 # ADC,  0 ,X STA,
   >FORTH
   HOLD ;

decompiles to this:
Code:
SEE #
#
 1CDC BASE
 1CDE @
 1CE0 UD/MOD
 1CE2 ROT
 1CE4 (>ASSEM)
 1CE6    0 ,X LDA
 1CE8    A  # CMP
 1CEA 1CEE    BCC
 1CEC    6  # ADC
 1CEE   30  # ADC
 1CF0    0 ,X STA
 1CF2  9C4    JSR ' (>FORTH) >BODY
 1CF5 HOLD
 1CF7 EXIT
 OK

So perhaps, as you said, the alignment was to allow characters with the high bit set?


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 05, 2019 10:01 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
As my previous post hinted, Forth has one more advantage over BASIC, the use of low level code. Once the logic of a program is tested, the slower parts ( like inner loops ) can be rewritten as CODE words. A code definition gets compiled to the same dictionary as high level words, at least on the C64 Forths I've seen ( including my own Fleet Forth .) The entire dictionary can be saved to disk as a new Forth with a single command. In Fleet Forth it is FSAVE. In Blazin' Forth it is SAVE-FORTH. In BASIC, someone can still use low level code with the SYS and USR commands, but then has to decide where to put it and usually manage the loading and saving of the low level code separate from the BASIC code.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 12, 2019 10:07 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
GARTHWILSON wrote:
Even then, there's probably a lot that could be improved on the C64 Forth. The 6502 Forth I started with (I imagine they had common roots, maybe from Ragsdale), before I heavily modified it, had for example,
Code:
: DEPTH        SP@ S0 @ SWAP - 2 / ;

Besides the fact that it would have taken less memory as a primitive something like this,
Code:
CODE  DEPTH
        STX  N        ; Put stack pointer in N.
        LDA  #S0adr   ; Get the address of the 1st byte under the stack.
        SEC
        SBC  N        ; Subtract the data stack pointer value (X above).
        LSR  A        ; Divide by 2, to get from number of bytes on stack
        JMP  CPUSH    ; to number of cells.
      END-CODE
 ;-------------------

they even used 2 / instead of 2/ , making it at least dozens of times as slow!

I imagine there's a lot that could be dramatically improved in the C64 Forth if someone wanted to get serious about it.

There's the rub. Different Forths for the 65XX with the same threading could have vastly different speeds depending on the ratio of primitives ( and which words are primitives ) in each kernel. Which words should be primitive? How do you strike a balance between compact size and speed of a Forth system?
I have some rules of thumb about what should be defined as a primitive. In my Forth I've defined the data stack operators and the math operators as primitives. Anything that is the same size or smaller as a primitive gets defined as a primitive. S>D is the same size so it is defined as a primitive. If a pair of words have a combined size that is smaller as primitives ( possibly because they complement each other ) , they are defined as primitives. [ and ] are an example of this. High level versions in my Forth are:
Code:
: ]  ( -- )  STATE ON ;
: [  ( -- )  STATE OFF ; IMMEDIATE

The bodies are six bytes each for a total of twelve. The primitive versions are:
Code:
HEX
CODE ]
   DEY,
   STATE    STY,
   STATE 1+ STY,
   NEXT JMP,
   END-CODE
CODE [ IMMEDIATE
   -2 ALLOT
   ' ] @ 1+ ,
   END-CODE

The body of ] is ten bytes and the body of [ is zero bytes for a net savings of two bytes.
Sometimes a high level word is simple and small but slow. Rewriting one of the words it uses as a primitive can result in the word being fast enough yet smaller than a version rewritten to be faster without resorting to redefining a component word as a primitive. >BT ( to buffer table ) in my Forth is a word to access the Nth entry in the block buffer table. The primitive is larger, yet helps the high level parts of BLOCK and BUFFER to be fast enough without convoluted ( at least not too badly convoluted ) high level code that would make them bigger.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 12, 2019 10:19 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
...and the '816 makes it so much more efficient, which is why I have many hundreds of primitives in my '816 ITC Forth. [ and ] are:

Code:
        HEADER "[", IMMEDIATE                   ; ( -- )
Lbrac:  PRIMITIVE
        STZ     STATEdata       ; STATE OFF
        GO_NEXT
 ;-------------------
        HEADER "]", NOT_IMMEDIATE               ; ( -- )
Rbrac:  PRIMITIVE
        LDA     #$FFFF
        STA     STATEdata       ; STATE ON
        GO_NEXT
 ;-------------------

and actually I should probably change ] to:

Code:
        HEADER "]", NOT_IMMEDIATE               ; ( -- )
Rbrac:  PRIMITIVE
        DEC     STATEdata       ; STATE ON
        GO_NEXT
 ;-------------------

so they're just one assembly-language instruction each.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 12, 2019 10:58 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
GARTHWILSON wrote:
...and the '816 makes it so much more efficient, which is why I have many hundreds of primitives in my '816 ITC Forth.

One of these days I have got to get a 65816. :D


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 12, 2019 11:04 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
On a 6502, at some point I think that a word gets big enough that the size of the primitive is actually large than the size of comparable high level word. One of Forth's benefits is code density.

Then you have the eForth philosophy of having as few primitives as possible simply to facilitate porting. I think it has, like, 30ish primitive words.

Then there's the "premature optimization" issue. Why make a word primitive if it's very rarely used. If the code is "fast enough", no more work is necessary.

It would be interesting to add a 4 byte space to each word, and modify DOCOL to increment it to act as a crude profiler.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 12, 2019 11:41 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
whartung wrote:
On a 6502, at some point I think that a word gets big enough that the size of the primitive is actually large than the size of comparable high level word. One of Forth's benefits is code density.

Definitely, but some words such as multiplication and division operators in high level Forth on the 6502 ( or 6510 ) would be intolerably slow.
Quote:
Then you have the eForth philosophy of having as few primitives as possible simply to facilitate porting. I think it has, like, 30ish primitive words.

Fleet Forth has about five times that many primitives. Any Forth for the C64 is already targeting a specific platform, so portability may not be that big of an issue.
Quote:
Then there's the "premature optimization" issue. Why make a word primitive if it's very rarely used. If the code is "fast enough", no more work is necessary.

Because it's fun. If the primitive is smaller, then memory is saved. If it is the same size, then there is no cost in memory but a rarely used speedup here maybe another there and elsewhere. As for the work, if the source is for a metacompiler it can be tested interactively even without using the metacompiler. Once there are enough improvements, a new kernel can be built.
Quote:
It would be interesting to add a 4 byte space to each word, and modify DOCOL to increment it to act as a crude profiler.

An interesting idea. How about a word that patches NEXT the same way that Blazin' Forth does for its trace functionality. The profiler word could build a table with the address of each word profiled and how often it is executed and possibly some other data. The profiler would switch off profiling while it runs. The data could be kept in blocks with block numbers high enough to use RAM or a large enough set of block buffers could be configured to keep the needed blocks in memory. Blocks aren't just for source code, they're a primitive form of virtual memory.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 13, 2019 6:20 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
whartung wrote:
It's a little bit extra unfair to BASIC considering it's always doing floating point math.

That's perhaps the 2nd contributor to its sluggishness. The 1st is the fact that CBM BASIC is always parsing (tokenized) source code, even when in RUN-mode. Forth doesn't tokenize its source code, it literally compiles it (perhaps into an intermediate representation, like direct- or indirect-threaded code), which means executing it involves a different runtime engine. If it compiles to subroutine-threaded code, it doesn't even need that.

So even if BASIC were to use integers for its variables, you still have the raw interpreter overhead to contend with: constantly reparsing numbers, performing linear scans to find the next program line to execute, and so on.

MS BASIC (from which CBM BASIC is derived) is not an efficient interpreter at all; its sole saving grace is how small it can be implemented in.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: