vbc wrote:
Interesting. When I had to make that decision for vbcc, I immediately chose (1). What burden on the programmer do you see?
It's possible that you've a different version of (1) in mind. The way I could manage to think of interrupt-specific stacks working, you'd have to find a place in the memory map for each of them, such that they wouldn't conflict with one another or run into anything else. If an interrupt could be interrupted, then you'd need two stacks, and so on. In general, it seems like this ends up being essentially how threads work; dynamically allocated fixed-size stacks per thread. But one of the advantages of the C runtime model is that you can get asynchronicity working on just one stack, and I hoped to preserve this advantage.
vbc wrote:
Slowing down every (stack using) function in the code to reduce the overhead of an interrupt handler is something I can understand for specific use-cases (although it would usually not be my choice). However, if the interrupt handler calls other (stack using) C functions, will you not loose that improvement in your sub-functions (which, I assume, will have the overhead of the stack pointer adjustment)? Maybe I am overlooking something, but it does not seem beneficial to me.
But more important, as I understand it, this mechanism may incur an additional up to 255 bytes of stack-usage, depending on a - possibly very rare - race condition as well as the layout of the stack. I think you can even produce cases where reducing the stack usage of a function will effectively increase the total stack usage. That seems pretty unmaintainable to me.
I'm hoping to avoid use of the soft stack as much as possible, and a PHA/PLA is a rounding error given the current state of the compiler. I've been trading off performance for simplicity of implementation at every stage of the compiler so far, since I'm working on correctness in a vacuum. This will definitely be something I'll have to revisit, in the context where I can compare its impact to other forms of optimization.
Along those lines, there's another approach with a different tradeoff, too. The SP can never be seen to be more than one page above it's true value, so an interrupt handler could be required to summarily increase SP by one page. This removes the PHA/PLA from every non-interrupt function, and adds a SP increment/decrement to the interrupt handler (only if it doesn't already have one). The downside is that 256 bytes are wasted for every interrupt on the call stack, which is the absolute worst case of the PHA/PLA approach. For PHA/PLA, usually zero bytes will be wasted, since usually interrupts don't occur within SP increments. The chances of multiple interrupt invocations wasting space decrease exponentially with the number of interrupt invocations on the stack.
GARTHWILSON wrote:
Otherwise, for the '02, you'll examine the low 8 bits before deciding how to take action, rather than incrementing or decrementing the low 8 bits and then seeing if you need to do the high 8 bits too. How's this for the stack growing downward and you have a stack pointer in ZP we'll call PTR and post-index the indirect with Y:
That's similar in spirit to what I ended up with, just generalizing for the usual case where we're decrementing the stack by a 16-bit constant. Compute the low byte, stash it on the hard stack, and then feed its carry into the high byte sum. Set the high byte, then set the low byte from the stashed value.
In the case where you're decrementing by 1, the carry into the high byte sum reduces to the compare of the low byte with zero, and the use of that carry reduces to the control flow avoiding the decrement of the the high byte.