Robot Game: C vs Forth vs Assembly
Robot Game: C vs Forth vs Assembly
I've been working on porting a game from Python to C, Forth, and assembly to compare each language. The game is simple and only exists to have something to port, so no guarantees about how fun it is. The CC65 and Tali Forth 2 versions are finished. You can try playing them in your browser without downloading anything. The assembly versions will follow soon. Let me know if you find any bugs or differences between the versions please.
Robot Game - CC65 version
Robot Game - Tali Forth 2 version
Keys:
W - up
S - down
A - left
D - right
space - attack/mine/drill
M - open menu
Robot Game - CC65 version
Robot Game - Tali Forth 2 version
Keys:
W - up
S - down
A - left
D - right
space - attack/mine/drill
M - open menu
- Attachments
-
- CC65 robot game.png (6.15 KiB) Viewed 4679 times
Re: Robot Game: C vs Forth vs Assembly
I think I stepped into a lava field! This is amazing.
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Robot Game: C vs Forth vs Assembly
BigEd wrote:
I think I stepped into a lava field! This is amazing.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Robot Game: C vs Forth vs Assembly
I especially like how, when running the Forth version, you don't have to type "run" and can just start typing Forth things at it. I am a bit sad that the word "words" appears to be missing (did that get yanked out along with with the wordlists?), but I was able to define a new word and play with the Forth a bit.
I'm super-impressed with the selection/menu system in your game. It looks really nice and is easy to use (press M to get back to the map when done - that's the only part I couldn't figure out at first.) I know how much work can go into making things work that nicely/smoothly when you're starting from scratch.
I'm super-impressed with the selection/menu system in your game. It looks really nice and is easy to use (press M to get back to the map when done - that's the only part I couldn't figure out at first.) I know how much work can go into making things work that nicely/smoothly when you're starting from scratch.
Re: Robot Game: C vs Forth vs Assembly
Thanks! I thought Forth people would appreciate that. You can also press Q in the Forth version once you're in the game to get back to the Forth prompt. This is useful to make sure a word didn't accidentally leave something on the stack. Yes, "words" was scrapped along with every single other thing I could get rid of. The way the memory banking works is each bank has an identical copy of Forth and words get defined in the left over space, so freeing up one byte of Forth frees up one byte each in the five banks. One of the banks had less than 100 bytes left, so I was cutting it close. The version I uploaded shows a lot more space left since I turned the underflow check off. On the other hand, words wouldn't be too useful since it would only show the words in the current bank and those words would be encoded, ie the word "DrawMenuInventoryChar" gets redefined every time I send the source to Tali to something like "x1a." Automatically shortening all the names to 2 and 3 word abbreviations also saves hundreds of bytes.
Hmm, maybe I should let escape work for the map. Was it obvious from the titles that K and R get you to the other two menus?
I intend to put everything on github when I finish the other version but here is the source code in the mean time if you have any comments: Robot Game Forth source
One of the assembly versions is also done: Robot Game traditional assembly source
Hmm, maybe I should let escape work for the map. Was it obvious from the titles that K and R get you to the other two menus?
I intend to put everything on github when I finish the other version but here is the source code in the mean time if you have any comments: Robot Game Forth source
One of the assembly versions is also done: Robot Game traditional assembly source
Re: Robot Game: C vs Forth vs Assembly
All four versions of the game are done. Here's a summary I did comparing the performance of each language with links to the games so you can play in your browser: Robot Game summary. C was 20-80% as fast as traditional assembly, Forth was 5-25% as fast, and the optimizer project I posted about here was up to 10% faster. Here's a graph of 28 different tasks the game does internally. Traditional/unoptimized assembly is normalized at 100%. Anything above that is faster and anything below is slower, ie a rating of 20% means five times slower than normal assembly.
Re: Robot Game: C vs Forth vs Assembly
It is interesting why is C the fastest with rand8? IMHO you can check C compiler generated assembly and improve it a bit.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Robot Game: C vs Forth vs Assembly
litwr wrote:
It is interesting why is C the fastest with rand8? IMHO you can check C compiler generated assembly and improve it a bit.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Robot Game: C vs Forth vs Assembly
litwr wrote:
It is interesting why is C the fastest with rand8? IMHO you can check C compiler generated assembly and improve it a bit.
Quote:
In all cases, it will depend on how good the compiler and/or kernel is. The common Forths for the '02 are nowhere near optimized.
Re: Robot Game: C vs Forth vs Assembly
great investigation!
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Robot Game: C vs Forth vs Assembly
Druzyek wrote:
GARTHWILSON wrote:
In all cases, it will depend on how good the compiler and/or kernel is. The common Forths for the '02 are nowhere near optimized.
I forgot you said it was Tali Forth (which I have not evaluated). The common Forths for the '02 (usually based on figForth) have a lot of things as secondaries that could be turned into primitives for much better performance.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Robot Game: C vs Forth vs Assembly
Druzyek,
Thanks for taking the time to make that graph. I find it quite interesting.
For reference, all of TaliForth2's primitives are written in assembly. There are only a few that are done as JSRs to other words (Tali2 is an STC Forth).
I believe native compiling (where the opcodes for a simple word are compiled directly into another word) was disabled to conserve as much memory as possible (correct me if I'm wrong on that one, Druzyek), so all of the compiled code would end up as JSRs to the primitives or other defined words. I can see having JSR/RTS overhead for every word, along with stack juggling, slowing things like drawing stuff on the screen quite a bit vs. a C or ASM routine. Anything with loops is likely to be noticeably slower, and indeed almost all of the slowest routines have "Draw" in their name.
The native compiling would really have only helped with the JSR/RTS overhead, and it can't be used on words that have flow control (IF/ELSE/THEN/DO/LOOP/etc) as Tali2 doesn't know how to update the JMP addresses (or even which bytes might be addresses) as it just copies the bytes for one word into another.
I suspect the loop constructs Tali2 uses to be a lot of the slowness. The loop index is held on the return stack while the loop is running. Tali2 has to pop it into A/Y, add to it, and then push it back. LOOP is implemented as assembly that puts a 1 on the data stack and then falls right into +LOOP.
Thanks for taking the time to make that graph. I find it quite interesting.
For reference, all of TaliForth2's primitives are written in assembly. There are only a few that are done as JSRs to other words (Tali2 is an STC Forth).
I believe native compiling (where the opcodes for a simple word are compiled directly into another word) was disabled to conserve as much memory as possible (correct me if I'm wrong on that one, Druzyek), so all of the compiled code would end up as JSRs to the primitives or other defined words. I can see having JSR/RTS overhead for every word, along with stack juggling, slowing things like drawing stuff on the screen quite a bit vs. a C or ASM routine. Anything with loops is likely to be noticeably slower, and indeed almost all of the slowest routines have "Draw" in their name.
The native compiling would really have only helped with the JSR/RTS overhead, and it can't be used on words that have flow control (IF/ELSE/THEN/DO/LOOP/etc) as Tali2 doesn't know how to update the JMP addresses (or even which bytes might be addresses) as it just copies the bytes for one word into another.
I suspect the loop constructs Tali2 uses to be a lot of the slowness. The loop index is held on the return stack while the loop is running. Tali2 has to pop it into A/Y, add to it, and then push it back. LOOP is implemented as assembly that puts a 1 on the data stack and then falls right into +LOOP.
Re: Robot Game: C vs Forth vs Assembly
SamCoVT,
Did you see in the summary I posted a link to above that I thanked you for your help? Thanks again! I also mentioned that Tali Forth 2 is really great and the poor performance seems to be due to the nature of Forth.
I originally left "nc-limit" at the default of 20 then bumped it up to 26 once everything was working right. The results are from that version, so it should be doing a lot of native compiling.
That is a good point. I think a lot of the slowdown also comes from the stack-based nature of the language:
-You're constantly thrashing X, which doesn't happen in C and assembly, with stuff like "drop 5" where you have useless runs like DEX / DEX / INX / INX.
-Constraining everything to the top levels of the stack wastes a lot of cycles in stuff like "swap 1+ swap" which would be only one instruction in assembly since you can step over the topmost stack levels and work on any item.
-Automatically promoting 8-bit values to 16-bit to keep stack items the same size means you're always doing calculations the slowest way.
Another thing that slows everything down not specifically related to stacks is the lack of optimizations in most Forths. Something like "lshift" always generates the same machine code even in the case of "5 lshift" where the number of shifts is known at compile time and better assembly could be generated. C compilers can spot those situations and generate better assembly.
I also got Tali Forth 2 running in my simulator.
Did you see in the summary I posted a link to above that I thanked you for your help? Thanks again! I also mentioned that Tali Forth 2 is really great and the poor performance seems to be due to the nature of Forth.
Quote:
I believe native compiling (where the opcodes for a simple word are compiled directly into another word) was disabled to conserve as much memory as possible (correct me if I'm wrong on that one, Druzyek), so all of the compiled code would end up as JSRs to the primitives or other defined words.
Quote:
I suspect the loop constructs Tali2 uses to be a lot of the slowness. The loop index is held on the return stack while the loop is running. Tali2 has to pop it into A/Y, add to it, and then push it back. LOOP is implemented as assembly that puts a 1 on the data stack and then falls right into +LOOP.
-You're constantly thrashing X, which doesn't happen in C and assembly, with stuff like "drop 5" where you have useless runs like DEX / DEX / INX / INX.
-Constraining everything to the top levels of the stack wastes a lot of cycles in stuff like "swap 1+ swap" which would be only one instruction in assembly since you can step over the topmost stack levels and work on any item.
-Automatically promoting 8-bit values to 16-bit to keep stack items the same size means you're always doing calculations the slowest way.
Another thing that slows everything down not specifically related to stacks is the lack of optimizations in most Forths. Something like "lshift" always generates the same machine code even in the case of "5 lshift" where the number of shifts is known at compile time and better assembly could be generated. C compilers can spot those situations and generate better assembly.
I also got Tali Forth 2 running in my simulator.
Last edited by Druzyek on Mon Jul 06, 2020 3:07 am, edited 1 time in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Robot Game: C vs Forth vs Assembly
SamCoVT wrote:
For reference, all of TaliForth2's primitives are written in assembly.
Code: Select all
: -ROT ROT ROT ;Quote:
I suspect the loop constructs Tali2 uses to be a lot of the slowness. The loop index is held on the return stack while the loop is running. Tali2 has to pop it into A/Y, add to it, and then push it back. LOOP is implemented as assembly that puts a 1 on the data stack and then falls right into +LOOP.
Hmmm... That does sound like a very slow way to do it. My loop (compiled by LOOP) is entirely separate from +loop (compiled by +LOOP), and does not affect or even look at the data stack.
Quote:
with stuff like "drop 5" where you have useless runs like DEX / DEX / INX / INX.
I don't know that I've ever heard of dropping five cells at once. If a word deals with more than about three, it often means a different approach should probably be taken. There are exceptions of course. Re-writing things into primitives in many cases avoids a lot of DUPing and DROPping.
Quote:
-Constraining everything to the top levels of the stack wastes a lot of cycles in stuff like "swap 1+ swap"
If you use it enough to justify it, you can make that a primitive too, INCR_NOS, which on the '816 is only INC 2,X, a single instruction. On the '02 you'd need to add a BNE and an INC 3,X.
You can always add primitives as the need might justify, which amount to a degree of optimization, combining commonly used pairs or sets of words into a single primitive. A list of many of my own '816 ones is at viewtopic.php?p=72734#p72734 .
And like they say, "You can get 90% of the performance of assembly language with only 10% done in assembly, provided it's the right 10%, the critical 10%," or something like that. The actual numbers will depend on the application; but the principle holds. And it's so easy to mix some 65xx assembly into Forth. I know many other processors are not so easy to write assembly language for (to write primitives or runtimes or ISRs, etc.).
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Robot Game: C vs Forth vs Assembly
Quote:
Quote:
with stuff like "drop 5" where you have useless runs like DEX / DEX / INX / INX.
Quote:
Quote:
-Constraining everything to the top levels of the stack wastes a lot of cycles in stuff like "swap 1+ swap"
Quote:
You can always add primitives as the need might justify, which amount to a degree of optimization, combining commonly used pairs or sets of words into a single primitive. A list of many of my own '816 ones is at viewtopic.php?p=72734#p72734 .
Quote:
And like they say, "You can get 90% of the performance of assembly language with only 10% done in assembly, provided it's the right 10%, the critical 10%," or something like that. The actual numbers will depend on the application; but the principle holds. And it's so easy to mix some 65xx assembly into Forth.