Last year I was wondering about assigning zero page usage automatically in
another thread, and I have made some progress doing it with a Python script.
I've been using
Macroassembler AS, since it can output the source after the first pass after all macros have been expanded, so my script doesn't have to do the expansion. I already have two macros called FUNC and LOCAL that provide a very small amount of abstraction without using the optimizer script. For example:
Code:
FUNC apple
LOCAL foo
STZ foo
...
ENDFUNC
FUNC banana
LOCAL foo
STA foo
...
ENDFUNC
FUNC main
LOCAL foo
LOCAL ptr,2
JSR apple
JSR banana
...
ENDFUNC
FUNC creates a label, assigns some temporary symbols, and begins a scope block but doesn't lay down any instructions. ENDFUNC closes the scope block and lays down RTS. Each occurence of LOCAL assigns the next zero page address to that symbol and keeps it local to the function. In the example above, apple and banana each need 1 byte for locals and main needs 3 for a total of 5 zero page bytes.
Turning on a flag in the source file makes the LOCAL macro generate symbols that tell the optimizer to assign variable addresses rather than having the macro assign them. After expansion, the file is read in by the optimizer script, which analyzes the flow of the program to see if any functions can share memory. In this example, the locals in apple and banana can be assigned the same zero page address since they never call each other. The local variable usage will fit into 4 bytes instead of 5.
Non-local labels are considered possible functions and any subsequent variables assigned with LOCAL are in scope until the next non-local label, the same way local labels work. The script looks for JSRs to determine which functions call other functions to generate the call graph. The FUNC macro can optionally take a list of attributes to describe the function like "FUNC main, begin" which tells the scipt to begin the call graph with main. I'd like to add other attributes like "interrupt" and "re-entrant" since the memory for those needs to be handled differently. It should also be possible to detect re-entrant functions in the call graph and either generate an error if they aren't marked re-entrant or mark them automatically and handle them differently. Because it only looks for JSRs and not RTSs, it isn't a problem to embed a string after a JSR and use the return address to point to the data then return to the calling function with a calculated jump. Functions that can be reached with JMP (indirect) will also need to be labeled if they use LOCAL but should work too. It's not possible to tell at compile time what functions the indirect JMP might target to make sure they have all been labeled correctly, but I could output information about them and load that into my emulator to help debug.
My plan is to add as few abstractions as possible so that the project stays closer to a small set of macros than a programming language. FUNC and LOCAL are both optional in the optimizer script. You can use them to manage a chunk of zero page memory and allocate the rest how you're used to doing it. Because I have everything set up to read in, tokenize, and analyze assembly, I would like to implement a few other optimizations eventually too.
In this test example I have 38 bytes worth of local variables that fit into only 19 bytes of zero page: