leepivonka wrote:
You might find Tali Forth interesting:
http://forum.6502.org/viewtopic.php?f=9&t=2926 https://github.com/scotws/TaliForth2It's for the 65C02. It's designed to be fast & not horribly big.
For the 65C02, TaliForth2 is reasonably fast but isn't as small as it used to be. It's about 21.5K now, but implements much of the ANS2012 standard including the core (including extension words), block (including extension words), double, facility, string, tools, search (including extension words), and string wordsets - along with an assembler and two simple editors.
On the speed front, however, Tali uses native compiling and gets a bit of a speed boost from that (at the expense of code size). In native compiling, rather than using a JSR to another word, Tali just copies the opcodes for that routine into the word being compiled (up to, but not including the RTS). There are also words, such as allow-native, always-native, and never-native to control whether a new word will be native compiled when compiled in a future word. There is also a variable nc_limit that sets the maximum size of word that will be natively compiled - words larger than this will be compiled as a JSR (unless they have been flagged always-native).
This gets you closer to assembly speeds with some of the drawbacks that have already been mentioned (all operations are 16-bit and there will sometimes be unneeded stack thrashing or use of the stack when using a register would have been faster).
When a Forth has an assember, the assembler IS Forth (or perhaps it's the other way around - it gets very meta if you start thinking about it) and is just a different (forth) way to build a new word. If you think of it this way, then the Forth using assembler words is as fast as any other assembler and is really only limited by the skill of the programmer.
If you are interested in timing some different Forths, you'll find that Tali2 runs in py65mon out of the box. You may also find interesting, in the test folder on github, the files talitest.py and cycles.fs. The talitest.py extends py65mon, loads a binary file into the memory space (it's currently set to load taliforth-py65mon.bin but you can change it if you want) and it also provides a cycle counter mapped into the memory space. You read from address 0xF006 to start the cycle counter (ignoring the result), read from 0xF006 to stop the cycle counter (again ignoring the result), and then the cycle count between those two accesses is available as a 32-bit value in memory locations 0xF008-F00B. Just beware that the 32-bit value is presented in Tali2's double format, which is neither little endian or big endian for the entire 32-bit value. This allows it to be read directly using 2@. Because this is memory mapped, it could be used by any Forth that will run in py65mon.