barrym95838 wrote:
According to Dr. Brad, Phil Koopman did a study of how often Forth primitives get executed in some "benchmark Forth programs", whatever that means:
1. ENTER (12.21%)
2. EXIT (11.74%)
3. VARIABLE (5.46%)
4. @ (5.40%)
5. 0BRANCH (4.78%)
6. LIT (4.54%)
7. + (4.18%)
8. SWAP (3.90%)
9. R> (3.89%)
10. >R (3.87%)
11. CONSTANT (3.68%)
12. DUP (3.05%)
Note that ENTER, EXIT, R>, and >R combined add up to almost 32% of primitive executions.
I don't know as much as I would like about Forth programming, but if that list is correct, I hesitate to allow myself to be convinced that your PSP in S design is more efficient than PSP in X for STC (even if your compiler is very clever). I started to test-code some comparison primitives so I could hand-compile a small program both ways, but it got too late, so I'll just post this thought and try to share some code tomorrow if I have time between errands.
I wouldn't put too much stock in that list, as it includes compile-time words such as VARIABLE and CONSTANT --- that doesn't make any sense!
I do agree however that my PSP (what I call a data-stack pointer) in S may not be more efficient than PSP in X --- my ENTER and LEAVE may kill the performance gain attained by getting rid of the INX INX and DEX DEX code sequences.
Forth style is to heavily factor the code, meaning that lengthy functions are factored into small functions that call other small functions (this makes testing code easier on the command-line. This means that there are a lot of function calls, so ENTER and LEAVE get executed quite a lot.
Another good point, is that peephole-optimization gets rid of a lot of pushing and pulling of data to the data-stack. Most optimizations involve a "producer" followed by a "consumer." For example:
10 +
Here 10 is the producer, because it pushes data onto the data-stack, and + is the consumer because it pulls data from the data-stack. This actually gets compiled into this:
CLC 10 ADC#
So, it may be better to use X for the data-stack pointer and get rid of ENTER and LEAVE --- none of this vino816 design is written in stone --- it can be changed.
The code I have written now is for the M65c02A --- I use X for the data-stack pointer rather than S even though X is slower and more bloaty (because it needs an OSX prebyte) --- the problem with X being slower and more bloaty is more serious on the 65c816 though, which is why I switched and used S for the data-stack pointer in vino816.
barrym95838 wrote:
It looks like DROP 1 is a good candidate for optimization.
I don't have that one at this time, but it would be easy to add.
This is what I currently have:
Code:
\ These are history variables --- they indicate what the last thing compiled was --- one or none will be set TRUE.
variable ZF-valid? \ does the zero-flag reflect the D value?
variable flg?
variable not?
variable BEQ? \ the last instructions was a BEQ but this is not where 1ST-ADR points
variable BNE? \ the last instructions was a BNE but this is not where 1ST-ADR points
variable JSR? \ this one preps jump-termination
variable nip?
variable dup?
variable dup-not?
variable over?
variable rover?
variable tuck?
variable swap?
variable rot?
variable J?
variable RR@?
variable R@?
variable R@-swap?
variable R@-lit?
variable R@-lit-plus? \ this one is used when R@ is a pointer to a struct and the literal is a field offset
variable R@-@?
variable lit?
variable lit-lit? \ this one is used for storing a literal into a literal address, such as for I/O ports
variable lit-plus?
variable lit-over?
variable lit-swap?
I keep track of a history of the last thing compiled --- sometimes none of the above are set, if the last thing compiled isn't something that I do any peephole-optimization on --- it would be an easy matter to add DROP? to the history variables.
Here is an example of a function that does peephole-optimization, meaning that it may back up over the last thing compiled and compile custom code instead:
Code:
: +, ( -- )
lit-lit? S@ if
backup-to-1st
1st-val S@ 2nd-val S@ +
clear-history literal, exit then
lit? S@ if
backup-to-1st
clc 1st-val S@ add#
lit-plus? happened ZF-valid exit then
R@-LIT? S@ if
backup-to-2nd
clc 1st-val S@ add#
R@-lit-plus? happened ZF-valid exit then
J? S@ if
backup-to-1st
clc 3rd add,s
clear-history ZF-valid exit then
RR@? S@ if
backup-to-1st
clc 2nd add,s
clear-history ZF-valid exit then
R@? S@ if
backup-to-1st
clc 1st add,s
clear-history ZF-valid exit then
R@-swap? S@ if
backup-to-1st
clc 1st add,s
clear-history ZF-valid exit then
R@-@? S@ if
backup-to-1st
clc 1st add(,s
clear-history ZF-valid exit then
clc sos add,x
nip, ; \ NIP, corrupts the ZF-flag
You can see in the above that if LIT? is true, then +, backs up over the code that pushes a literal to the data-stack and compiles code that uses the literal value as an operand.
Peephole-optimization is something that can be expanded upon indefinitely --- I have what I intuitively expect to be the common code-sequences --- more can be added.