The raison d'etre feature, static stacks, is now working fairly well. Here's the same code example as before, with a twist.
char_lib.h
Code:
__attribute__((leaf)) char next_char();
__attribute__((leaf)) void report_counts(int counts[256]);
char_stats.c
Code:
#include "char_lib.h"
void char_stats() {
int counts[256] = {0};
char c;
while ((c = next_char())) {
counts[c]++;
}
report_counts(counts);
}
The only effective difference from before is that the two external function calls are marked with "__attribute__((leaf))".
Without attribute leaf, the compiler must assume that next_char and report_counts can call any external function in the program. This makes char_stats possibly recursive, so counts must be placed on a stack.
But, attribute leaf on an external function tells the compiler that calling such a function cannot in turn call any other function in the caller's translation unit. Such functions are "leaves" from the point of view of the caller. Very nearly every library routine can be so annotated, whether C standard library, OS. This is standard practice in e.g. glibc, where this enables similar whole-program optimizations.
LLVM can then prove char_stats is not recursive, so it can convert the entire stack frame to a single global array, char_stats_sstk. (sstk stands for "static stack", my name for the concept). This completely removes the stack pointer from the generated assembly, and produces something much closer to what I'd write by hand. Eventually, the static stack frames for each function will be merged together by a big interprocedural pass, since any two functions that cannot be alive at the same time can have overlapping static stack frames. That's why I'm calling it "static stack": the eventual array will mirror what the runtime call stack would have looked like, except that its layout is determined at compile time, not at run time.
Attribute leaf is only needed if the compiler can't see the definitions for the functions, if, for example, Link-time Optimization is used, then it would be perfectly apparent to the compiler that char_stats is not recursive. In fact, the whole trio of routines would probably be inlined into one.
The nice thing about attribute leaf is that it doesn't *require* LTO to produce good code; separable compilation can work as well, so long as the boundaries between translation units are appropriately annotated. Functions themselves don't need to be annotated as non-recursive, so there's no burden on the average programmer, so long as they're using LTO. It falls on the maintainers, not the users, of a binary-only or assembly-langage library to add the appropriate leaf annotations.
Generated code:
Code:
.code
.global char__stats
char__stats:
LDA #0
LDY #2
LDX #<char__stats__sstk
STX z:__ZP__0
LDX #>char__stats__sstk
STX z:__ZP__1
TAX
JSR memset
JMP LBB0__2
LBB0__1:
ASL A
STA z:__ZP__0
LDA #0
ROL A
STA z:__ZP__1
LDA #<char__stats__sstk
LDX #>char__stats__sstk
CLC
ADC z:__ZP__0
STA z:__ZP__0
TXA
ADC z:__ZP__1
STA z:__ZP__1
LDY #0
LDA (__ZP__0),Y
CLC
ADC #1
STA (__ZP__0),Y
LDY #1
LDA (__ZP__0),Y
ADC #0
STA (__ZP__0),Y
LBB0__2:
JSR next__char
CMP #0
BNE LBB0__1
LBB0__3:
LDA #<char__stats__sstk
STA z:__ZP__0
LDA #>char__stats__sstk
STA z:__ZP__1
JSR report__counts
RTS
.bss
char__stats__sstk:
.res 512
.global __ZP__0
.global __ZP__1
.global memset
.global next__char
.global report__counts