Quote:
Quote:
with stuff like "drop 5" where you have useless runs like DEX / DEX / INX / INX.
I don't know that I've ever heard of dropping
five cells at once. If a word deals with more than about three, it often means a different approach should probably be taken. There are exceptions of course. Re-writing things into primitives in many cases avoids a lot of
DUPing and
DROPping.
I suppose without a single Forth standard I can't assume DROP means the same thing in all contexts. I meant here dropping one stack item and replacing it with 5. That could be just an LDA/STA pair in assembly but is easily 10x the cycles in Forth.
Quote:
Quote:
-Constraining everything to the top levels of the stack wastes a lot of cycles in stuff like "swap 1+ swap"
If you use it enough to justify it, you can make that a primitive too,
INCR_NOS, which on the '816 is only
INC 2,X, a single instruction. On the '02 you'd need to add a
BNE and an
INC 3,X.
You only need the BNE if you're stuck always using 16-bit math, whereas you get away with a single INC in assembly if it's an 8-bit value. I think something like INCR_NOS really illustrates the inefficiency here. You shouldn't have to write custom assembly to add one to a local variable to avoid the 10x slowdown.
Quote:
You can always add primitives as the need might justify, which amount to a degree of optimization, combining commonly used pairs or sets of words into a single primitive. A list of many of my own '816 ones is at
viewtopic.php?p=72734#p72734 .
I think this is a bit of a dead end as well. If you just take my lshift example, there are 16 different variations of doing a 16-bit shift if you know at compile time how many places you're shifting. You could replace something like "5 lshift" with a custom "lshift5" but then you are stuck creating two versions of every word that can take a constant.
Quote:
And like they say, "You can get 90% of the performance of assembly language with only 10% done in assembly, provided it's the right 10%, the critical 10%," or something like that. The actual numbers will depend on the application; but the principle holds. And it's so easy to mix some 65xx assembly into Forth.
That is a nice sentiment but even mixing in a lot of assembly, I think it's absolutely not true on the 6502. I hear stuff like this all the time online. Someone was claiming
here for example that the overhead is only about 25% more than assembly. It seems like people became convinced of how fast Forth is and have just been quoting this for decades without proof. When you actually look at the data and compare the cycles executed, it's more like Forth is 10% the performance, not 90%. On the other hand, if you have a non-trivial example of getting 90% the performance, it would be really neat to see how that works.