Quote:
Assuming I understand correctly what you want to do (which may not be a safe assumption), you can get around these problems by using a data stack, separate from the return stack, as discussed in the 6502 stacks treatise, particularly chapters 4 through 6. Although I'm a macro junkie, this might eliminate the complication that macros are meant to hide or automate, because the matter of passing parameters becomes implicit, rather than explicit. (I hope you're not disappointed.)
The point of doing things differently to me is the gain in performance with local variables (not with passing arguments), although I do see how you would prefer using stacks for that too, even though it's slower. LDA, STA, and ADC (just as examples) only take 3 cycles from zero page, whereas they take 4 cycles with X indexing into a zero page stack or the return stack.
Imagine a function that uses four temporary variables in zero page with values calculated in the function (ie not arguments passed in). Here is an example of a fragment of an X indexed loop.
Code:
LDA var1 ;3 cycles
STA peripheral ;3 cycles
STX peripheral ;4 cycles (if peripheral not in zero page)
LDA var2 ;3 cycles
STA peripheral ;4 cycles
LDA (var3,X) ;6 cycles
STA peripheral ;4 cycles
STA var4 ;3 cyclesl
If you store your variables in zero page, this fragment takes 30 cycles. Here is how I think you would do it with a data stack in zero page (please let me know if there is a faster way to do it).
Code:
LDX data_sp ;3 cycles if in zero page, or ? cycles if ? (on stack?)
LDA var1,X ;4 cycles
STA peripheral ;4 cycles
LDX x_copy ;3 cycles if in zero page
STX peripheral ;4 cycles
LDX data_sp ;3 cycles
LDA var2, X ;4 cycles
STA peripheral ;4 cycles
LDA var3, X ;4 cycles
STA indirection_temp ;3 cycles, assuming in zero page
LDX x_copy ;3 cycles
LDA (indirection_temp,X) 6 cycles
STA peripheral ;4 cycles
LDX data_sp ;3 cycles
STA var4, X ;4 cycles
This takes 56 cycles (or can you speed it up?), which is almost twice as slow! Of course, I am deliberately trying to create the worst case scenario to prove a point, which is that unindexed zero page is faster when you are working with intermediate values. If you are using arguments passed to the function, which might already be on the stack, as you do with the Forth style stack you mentioned, I see why it is faster to leave them on the data stack and not make a new copy for every function call. However, every time the function rereads one of those stack arguments with X indexing, it incurs a one cycle penalty. If you access it enough times, it will be faster to allocate space for it in zero page and make a copy.
Quote:
One or more of the variables a, b, c, and d may not need to exist as variables at all if they were just calculated by recent operations and won't be needed ever again after main and/or func1 and/or func2 is done with them. They take space on the data stack only while they're needed, then they cease to exist.
Right. I am using a, b, c, and d as temporaries. They also cease to exist when func1 or func2 exit since other functions are free to use that zero page memory (the same concept as a stack).
Quote:
They never had a hard address, so the problem of trying to figure out where to put them (like $00-03 versus $04-07) is gone.
I see the convenience in not having to figure out where to put them. My only point was that figuring out where to put them yourself avoids X indexing for temporary local variables like a, b, c, and d, which is faster. I was doing things like this myself and it quickly became unmanageable, as well as probably not as efficient as a computer could manage. One caveat I see is that the speed gain has to outweigh the longer time it takes to push and pull zero page to a stack, versus adjusting X for your data stack.
Quote:
However, with a separate data stack, input parameters that have been derived earlier can be left on the stack, just waiting there, without interfering with anything, until REPLINE is called, not storing them elsewhere, and not having to re-load them to put them in a stack frame now.
How would you handle an HLL function like this?:
Code:
x=Calculation1();
y=Calculation2();
z=Calculation3();
func(x,y,z);
func(z,x,y);
Would you take these steps?:
1.) leave the calculated x, y, and z values on the stack without storing them somewhere else (yet)
2.) not make a copy of them for the first call to func
3.) make a copy of them on the stack in a different order for the second call to func
Quote:
the accumulator being tested in the IF line doesn't exist at assembly time, so the assembler would have to somehow examine the preceding instructions, ones that are not in the macro definition or invocation. No assembler that I know of can do this.
I imagine you would have to expand all the macros first of all so that you are left with only instructions and data. Then you could interpret the code line by line and figure out which paths are useful and which can be boiled down to constants. Then you could hand it over to the assembler after you had made your own changes.
Do you know of an assembler that will output source after macro expansion but before assembling?
EDIT: Woops, missed part of your post:
Quote:
Quote:
and 3.) only jumped to functions using their names, rather than hard coded addresses, you could do quite a lot. I think you would also never have to compromise performance and you would only increase program size if you chose speed over size. I suspect you would need a lot more than macros though.
Every routine will have an address, although part of the assembler's job is to hide it. If you want the routines relocatable, well, the '02 can do it—sorta—but it's very poorly suited for that. Chapters 12 and 13 deal with that. The 65816 does much better.
I meant if you wanted to try to get a program to optimize like in my example, you would have to follow a few rules of abstraction. There is no way to tell from a JMP instruction if it is entering a new function or just going back to the head of a loop in the same function. You would have to know that (I think) to optimize well.