thelonious wrote:
After an almost 20 year hiatus, I've become re-enamored with the 6502 mainly due to
André LaMothe's XGameStation project. So, I'm playing around writing some generic algorithms (linked lists, hashes, regexes, etc.) and I'm curious how others handle passing data to and from subroutines. I should mention that I'm aware that I can pass parameters and return values via A, X, Y, pre-defined memory locations and even with flags for boolean results; however, I'm looking for generic approaches that allow one to pass an arbitrary number of parameters and maybe even return multiple values, but that's not as important. Single values are fine for now.
The method you posted would work fine, but does have the disadvantage of requiring you to be concious of stack usage. In fact, most index-based solutions are going to be restricted to 256 bytes of memory. Clearly, keeping things this small is quite advantageous for the 6502. Here are some other solutions you might be interested in researching.
You can split the data stack from the return stack. Forth obviously uses this technique with excellent results (in fact, it was pioneered by Forth), but this technique really isn't limited to Forth. It's a shame no other language seems to make use of it. Let's let the return stack lie in page $01, and define the data stack to exist in page $02. Then you can just use the X register as a data stack pointer of sorts. Accessing the data stack is then a matter of LDA $0200,X
If you're going to be dealing primarily with 16-bit values or 24-bit values, you can define multiple parallel byte-arrays for this purpose. Eg., $0200-$02FF can contain the low bytes of the stack words, and $0300-$03FF can contain the high-bytes. This allows a single INX or DEX instruction to advance the X register to point to the next or previous value in the stack.
Another advantage of splitting the stacks is that return information isn't inter-mingled with the data being passed around. Thus, a high-level routine can be written like this:
Code:
highLevelCode:
jsr Dependent1
jsr Dependent2
jsr Dependent3
2$:
jsr Dependent4
beq 2$
jmp Dependent5
Although it's invoking subroutines, some of which might even be publicly exposed to application software,
note the lack of having to re-mirror the input parameters on the stack all the time. This results in the fastest possible execution while still using stack-frames.
Split-stacks also reduce the impact of recursion, because only the return stack will be populated quickly; well-written recursive routines will tend to use the data stack at a slower rate than the return stack. This can't happen in a unified stack strategy, because parameters
must be replicated every time you re-enter a function, as parameters are always accessed as fixed offsets relative to the CPU's most recently pushed return address (ultimately).
Also, by splitting multi-byte quantities into multiple, parallel 256-byte tables, you can get more stack space. 256 stack entries is 512 bytes, for example, so that somewhat alleviates the restrictions for recursion. But, of course, it still doesn't eliminate it entirely. Sample code:
Code:
add16:
clc
lda $0200,x
adc $0201,x
sta $0201,x
lda $0300,x
adc $0301,x
sta $0301,x
inx
rts
zeq:
lda $0200,x
ora $0300,x
beq 1$
lda #$00
sta $0200,x
sta $0300,x
rts
1$:
lda #$FF
sta $0200,x
sta $0300,x
rts
fetch16:
lda $0200,x
sta $00
lda $0300,x
sta $01
ldy #$00
lda (0),y
sta $0200,x
iny
lda (0),y
sta $0300,x
rts
The problem with the above approach is that it's primarily optimized for integer math, and not memory accesses. If you're going to do an equal mix of the two, or primarily be working with addresses, you might find keeping your data stack entirely in zero-page is of major advantage. The trick is to use two addressing modes which are rarely considered in 6502 programming:
zero-page indexed and
pre-indexed indirect addressing modes become patently useful, instead of mild curiosities. Perhaps an example will explain more than words:
Code:
add16:
clc
lda 2,x
adc 0,x
sta 2,x
lda 3,x
adc 1,x
sta 3,x
inx
inx
rts
ufetch8: ; Fetch 8-bit quantity; zero-extend it.
lda (0,x) ; NOTE: how easy it is to reference memory here!
sta 0,x
lda #$00
sta 1,x
rts
signext: ; sign-extend
lda 0,x
rol a
rol a
and #$01
tax
lda sxt,x
sta 1,x
rts
sxt:
.db $00, $FF
lookup: ; Fetch a signed 8-bit item from a look-up table
jsr add16 ; NOTE: We can still call other public API code,
jsr ufetch8 ; without worrying about stack manipulation.
jmp signext
Now, in your previous post, you claim that there would be a performance penalty for storing the frame pointer in zero-page, and referencing parameters using indirect indexed addressing mode. I would have to agree to some degree. But remember that you only adjust the frame pointer on entry to a subroutine, and on exit.
Code:
mySubroutine:
sec
lda BP
pha
sbc #reservation
sta BP
lda BP+1
pha
sbc #0
sta BP+1
...etc...
pla
sta BP+1
pla
sta BP
rts
Within the region marked "...etc...," you would just reference it using normal (d),Y addressing mode. While this is slower, I think this is for the most part more flexible. The only disadvantage is that you need to move pointers from the stack frame back into zero-page to dereference them. More than anything else,
that, I think, is what will slow the system down.
A slightly less flexible, but still very capable, solution is to let the caller worry about where the heck he's going to put the function's parameters, and let the caller pass in an explicit pointer to the function's parameter and result blocks. This has the
nice advantage of letting simple software use simple memory allocation (maybe even statically allocating the parameter blocks that it'll need in the future), while software that is mutually recursive must be designed to manage its own stack (which your solution already requires it to do anyway).
Personally, I'm entirely in favor of the split-stack technique. It is not mutually exclusive with the C-way of thinking, and can still result in decently fast code. Because the return data is separated from actual data, it's easy to return multiple values on the data stack too (maybe even variable quantities of values, if it's advantageous enough). Certainly it results in more compact code, and even permits better factoring because you don't have to constantly push and pop duplicate arguments all the time when calling sub-subroutines.
But it really depends on what you're looking for.
--
Samuel A. Falvo II