I think, this is exactly what I was searching for.
This is amazing!
You're welcome. BTW, it's "stucture [no 'd'] macros," because the macros themselves are not particularly structured, but rather, they are used to form program flow control structures when you invoke them. Internally they may or may not particularly appear very structured.
I'm not aware of any such tutorial, and the macro language sections of assemblers' manuals only give rudimentary examples. I guess the writers of the manuals hope that from there you'll either see the possibilities and take off after they've given you the spark, or hope you already have experience from other assemblers. Unfortunately there are a lot of bad implementation examples out in the wild that dissuade people from even trying macros, and a lot of nay-saying from people who haven't thought through the possibilities afforded by conditional assembly in the macro definition. A fair amount of intelligence can be built into the macro definition by using conditional assembly, so you can write your macros to produce the most efficient code possible for each invocation's situation. (I can still imagine the macro language being made more intelligent though, to do even more, like to parse FORTRAN-like lines and spit out good assembly language.)
The best you'll be able to do without a stack is have variables in memory that are only used by the particularly routine, non-reëntrantly (which also means non-recursively). In simple applications, you might do fine. Otherwise, it's not a very good use of memory, and it's easy to tie yourself up in knots, especially if you try to have multiple routines use the same variable addresses to save memory with the idea that those routines won't ever need the variables at the same time. Almost inevitably, it eventually leads to trouble. Been there, done that (in my greener years).
Stacks really do solve a lot of problems, and in the case of structure macros, where you want to have nested structures of the same kind and have all the branching go to the right places, some sort of stack is mandatory (although in the assembly-language macros, it's done by the assembler during assembly time, not on the target computer at run time). These things are addressed in the treatise on 6502 stacks ("stacks" plural, not just the page-1 hardware stack) at http://wilsonminesco.com/stacks/ . Perhaps you got a bad introduction somewhere along the line that got you thinking that stacks are mysterious and complicated. They're not—at least not on the 6502. (I've brought a lot of products to market using PIC16C and PIC16F microcontrollers (and implemented my structure macros for them too), and their super-simple processor is very limited, and stacks do become cumbersome there, and in fact the programmer is not even allowed access to the hardware stack. That's not the case with the 6502.)
Since you're concerned about speed penalty: ADC $105,X in the hardware stack area for example takes the same number of cycles as ADC $105; and in ZP, ADC ZP,X takes only one more cycle than ADC <ZP>. You do have to set the X at the beginning of the routine, but then you can access the various local variables by changing only the base address which is part of the instruction, with no need to keep changing X. If you're concerned about speed though, it may be time to abandon the NMOS 6502, since the CMOS is several times as fast and has more instructions and addressing modes. (I realize that may not be an option if you're dealing with a C64.)
There's an undercurrent I'm sensing that I might come back to write about, if I can figure out how, regarding control and where the details are handled in the various levels of language.