6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 4:25 am

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Stack Frames?
PostPosted: Sun Oct 05, 2003 3:26 am 
Offline

Joined: Sun Oct 05, 2003 2:35 am
Posts: 3
Location: Texas
After an almost 20 year hiatus, I've become re-enamored with the 6502 mainly due to André LaMothe's XGameStation project. So, I'm playing around writing some generic algorithms (linked lists, hashes, regexes, etc.) and I'm curious how others handle passing data to and from subroutines. I should mention that I'm aware that I can pass parameters and return values via A, X, Y, pre-defined memory locations and even with flags for boolean results; however, I'm looking for generic approaches that allow one to pass an arbitrary number of parameters and maybe even return multiple values, but that's not as important. Single values are fine for now.

Right now, I push parameters onto the stack, then the subroutine accesses the values via something like:

Code:
some_subroutine
    tsx ; get current stack pointer
    inx ; skip over return address
    inx ; to point index to parameter
    inx
    lda $100,x


I have the calling routine clean up the stack after the subroutine returns only because it seems a bit cleaner than having the called routine do that (less stack gymnastics).

A subroutine returns values by adding them to the stack while preserving the subroutine's return address on top of the stack. The calling code pulls the return values and any originally passed parameters.

This seems to work OK, but I'm concerned that I could run out of stack space, especially when using recursion. The stack frame discussed above is simple, but I would like to add more the structure. That will end up exhausting the stack even quicker. I'm thinking that I should create my own 16-bit stack using zero-page post-index indirect addressing [lda ($xx),y], but it seems like this will incur a big speed penalty.

I'm curious what others have done. Am I trying to use a 6502 in a way I shouldn't ?

Thanks for any input/feedback,
Thelo[/code]


Top
 Profile  
Reply with quote  
 Post subject: Re: Stack Frames?
PostPosted: Sun Oct 05, 2003 5:13 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
thelonious wrote:
After an almost 20 year hiatus, I've become re-enamored with the 6502 mainly due to André LaMothe's XGameStation project. So, I'm playing around writing some generic algorithms (linked lists, hashes, regexes, etc.) and I'm curious how others handle passing data to and from subroutines. I should mention that I'm aware that I can pass parameters and return values via A, X, Y, pre-defined memory locations and even with flags for boolean results; however, I'm looking for generic approaches that allow one to pass an arbitrary number of parameters and maybe even return multiple values, but that's not as important. Single values are fine for now.


The method you posted would work fine, but does have the disadvantage of requiring you to be concious of stack usage. In fact, most index-based solutions are going to be restricted to 256 bytes of memory. Clearly, keeping things this small is quite advantageous for the 6502. Here are some other solutions you might be interested in researching.

You can split the data stack from the return stack. Forth obviously uses this technique with excellent results (in fact, it was pioneered by Forth), but this technique really isn't limited to Forth. It's a shame no other language seems to make use of it. Let's let the return stack lie in page $01, and define the data stack to exist in page $02. Then you can just use the X register as a data stack pointer of sorts. Accessing the data stack is then a matter of LDA $0200,X

If you're going to be dealing primarily with 16-bit values or 24-bit values, you can define multiple parallel byte-arrays for this purpose. Eg., $0200-$02FF can contain the low bytes of the stack words, and $0300-$03FF can contain the high-bytes. This allows a single INX or DEX instruction to advance the X register to point to the next or previous value in the stack.

Another advantage of splitting the stacks is that return information isn't inter-mingled with the data being passed around. Thus, a high-level routine can be written like this:

Code:
highLevelCode:
  jsr Dependent1
  jsr Dependent2
  jsr Dependent3
2$:
  jsr Dependent4
  beq 2$
  jmp Dependent5


Although it's invoking subroutines, some of which might even be publicly exposed to application software, note the lack of having to re-mirror the input parameters on the stack all the time. This results in the fastest possible execution while still using stack-frames.

Split-stacks also reduce the impact of recursion, because only the return stack will be populated quickly; well-written recursive routines will tend to use the data stack at a slower rate than the return stack. This can't happen in a unified stack strategy, because parameters must be replicated every time you re-enter a function, as parameters are always accessed as fixed offsets relative to the CPU's most recently pushed return address (ultimately).

Also, by splitting multi-byte quantities into multiple, parallel 256-byte tables, you can get more stack space. 256 stack entries is 512 bytes, for example, so that somewhat alleviates the restrictions for recursion. But, of course, it still doesn't eliminate it entirely. Sample code:

Code:
add16:
  clc
  lda $0200,x
  adc $0201,x
  sta $0201,x
  lda $0300,x
  adc $0301,x
  sta $0301,x
  inx
  rts

zeq:
  lda $0200,x
  ora $0300,x
  beq 1$
  lda #$00
  sta $0200,x
  sta $0300,x
  rts
1$:
  lda #$FF
  sta $0200,x
  sta $0300,x
  rts

fetch16:
  lda $0200,x
  sta $00
  lda $0300,x
  sta $01
  ldy #$00
  lda (0),y
  sta $0200,x
  iny
  lda (0),y
  sta $0300,x
  rts


The problem with the above approach is that it's primarily optimized for integer math, and not memory accesses. If you're going to do an equal mix of the two, or primarily be working with addresses, you might find keeping your data stack entirely in zero-page is of major advantage. The trick is to use two addressing modes which are rarely considered in 6502 programming: zero-page indexed and pre-indexed indirect addressing modes become patently useful, instead of mild curiosities. Perhaps an example will explain more than words:

Code:
add16:
  clc
  lda 2,x
  adc 0,x
  sta 2,x
  lda 3,x
  adc 1,x
  sta 3,x
  inx
  inx
  rts

ufetch8: ; Fetch 8-bit quantity; zero-extend it.
  lda (0,x)  ; NOTE: how easy it is to reference memory here!
  sta 0,x
  lda #$00
  sta 1,x
  rts

signext: ; sign-extend
  lda 0,x
  rol a
  rol a
  and #$01
  tax
  lda sxt,x
  sta 1,x
  rts

sxt:
  .db $00, $FF

lookup:  ; Fetch a signed 8-bit item from a look-up table
  jsr add16   ; NOTE: We can still call other public API code,
  jsr ufetch8     ; without worrying about stack manipulation.
  jmp signext


Now, in your previous post, you claim that there would be a performance penalty for storing the frame pointer in zero-page, and referencing parameters using indirect indexed addressing mode. I would have to agree to some degree. But remember that you only adjust the frame pointer on entry to a subroutine, and on exit.

Code:
mySubroutine:
  sec
  lda BP
  pha
  sbc #reservation
  sta BP
  lda BP+1
  pha
  sbc #0
  sta BP+1

  ...etc...

  pla
  sta BP+1
  pla
  sta BP
  rts


Within the region marked "...etc...," you would just reference it using normal (d),Y addressing mode. While this is slower, I think this is for the most part more flexible. The only disadvantage is that you need to move pointers from the stack frame back into zero-page to dereference them. More than anything else, that, I think, is what will slow the system down.

A slightly less flexible, but still very capable, solution is to let the caller worry about where the heck he's going to put the function's parameters, and let the caller pass in an explicit pointer to the function's parameter and result blocks. This has the nice advantage of letting simple software use simple memory allocation (maybe even statically allocating the parameter blocks that it'll need in the future), while software that is mutually recursive must be designed to manage its own stack (which your solution already requires it to do anyway).

Personally, I'm entirely in favor of the split-stack technique. It is not mutually exclusive with the C-way of thinking, and can still result in decently fast code. Because the return data is separated from actual data, it's easy to return multiple values on the data stack too (maybe even variable quantities of values, if it's advantageous enough). Certainly it results in more compact code, and even permits better factoring because you don't have to constantly push and pop duplicate arguments all the time when calling sub-subroutines.

But it really depends on what you're looking for.

--
Samuel A. Falvo II


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 05, 2003 5:54 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
thelonious,

Keep in mind the 65816 has a 16-bit stack pointer and stack-relative addressing modes too. For an example, you can have an instruction access the sixth thing in the stack without pulling items 1 through 5 off to get there. For stacks other than the one pointed to by S, remember you can have 16-bit X and Y too.

With that said, the 6502 does make it rather practical to have gobs of stacks if you wish; but using ZP for the Forth data stack works out very efficiently, using X as the pointer. ZP instructions are shorter and you have more addressing modes. You seldom have to save the contents of X and use X for something else. It's almost like it was intended for this purpose. I've been using Forth for 13 years on the 65c02, with programs up to 10,000 lines with interrupts serviced in high-level Forth while NMIs interrupted those ISRs for service in assembly, and I've never gotten anywhere near running out of space on either stack except with the rare error situation that wouldn't be satisfied with 256MB of stack space. In my experience, splitting page 0 and page 1 into three parts each for multitasking would be entirely reasonable. More could be done, with caution.

Garth

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Stack Frames?
PostPosted: Sun Oct 05, 2003 4:17 pm 
Offline

Joined: Sun Oct 05, 2003 2:35 am
Posts: 3
Location: Texas
First off, thank you for your thorough reply. Although I am not responding to all of your points, I did get a lot out of your post.

kc5tja wrote:
You can split the data stack from the return stack...
If you're going to be dealing primarily with 16-bit values or 24-bit values, you can define multiple parallel byte-arrays for this purpose...

Another advantage of splitting the stacks is that return information isn't inter-mingled with the data being passed around...

Ah, yes. This is much cleaner than what I was doing. I tend to forget about using parallel pages this way. This is another case where that approach is quite elegant.

kc5tja wrote:
The problem with the above approach is that it's primarily optimized for integer math, and not memory accesses. If you're going to do an equal mix of the two, or primarily be working with addresses, you might find keeping your data stack entirely in zero-page is of major advantage.

Indeed, I'm using a good mix of pointers and scalars. I'm hoping to use my routines as a general purpose library. This is probably an old hesitation of mine, but it seems as though I should leave ZP open for the applications that might use my routines. But, I do see the advantage of the approach you describe.

I'll have to think through how this would impact building other code on top of my library. For now, I'll stick to copying the pointer into ZP even though it does incur a bit of a performance hit.

kc5tja wrote:
A slightly less flexible, but still very capable, solution is to let the caller worry about where the heck he's going to put the function's parameters, and let the caller pass in an explicit pointer to the function's parameter and result blocks.

So, I guess the caller and callee would use some agreed upon ZP address to hold these pointers?

kc5tja wrote:
Personally, I'm entirely in favor of the split-stack technique.

I agree. It feels elegant, fast, and it will have only a small impact on the code I've already written.

Thanks again for your input. I now feel much better about my implementation.

thelo


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 05, 2003 4:24 pm 
Offline

Joined: Sun Oct 05, 2003 2:35 am
Posts: 3
Location: Texas
GARTHWILSON wrote:
I've been using Forth for 13 years on the 65c02, with programs up to 10,000 lines with interrupts serviced in high-level Forth while NMIs interrupted those ISRs for service in assembly, and I've never gotten anywhere near running out of space on either stack except with the rare error situation that wouldn't be satisfied with 256MB of stack space.

That's good to know. As a matter of fact, I can't remember ever running out of stack space either, and I'm sure the code I'll be running will be much more simple than what you describe. I guess 256 bytes (or even 512 bytes) feels so small these days that it seems easy to exhaust it without much effort, but this is a case where experience overrides emotion :)

Thanks for the input,
thelo


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 05, 2003 8:12 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure". I did this for years too, and finally decided to do some tests to find out. I filled the stack areas with a constant value (maybe it was 00-- I don't remember), ran a heavyish application with all the interrupts going too, did compiling, assembling, and interpreting while running other things in the background on interrupts, and after awhile looked to see how much of the stack area had been written on. As I posted above, it wasn't really much-- less than 20% of each of the two pages.

Slight change of subject-- If you use the 65816, you won't want to split high byte and low byte into different pages. Besides being unnecessary because of the 16-bit S and optionally 16-bit X and Y, keeping the bytes together allows storing and fetching them together in a single instruction when they're in consecutive addresses. (A can be set to either 8 or 16 bits too.) Splitting the pages for the '02 however makes perfect sense for some purposes.

Garth

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 12, 2007 7:27 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
GARTHWILSON wrote:
When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure". I did this for years too, and finally decided to do some tests to find out.


My GeckOS multitasking operating system has a mode where the
single page stack is split among the threads. I found that most of the
time 48 bytes or even 32 bytes of stack are enough for (simple)
assembler applications.

I was surprised of this myself

André


Top
 Profile  
Reply with quote  
 Post subject: Re: Stack Frames?
PostPosted: Sat Apr 14, 2012 4:54 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
thelonious wrote:
[...] Right now, I push parameters onto the stack, then the subroutine accesses the values via something like:

Code:
some_subroutine
    tsx ; get current stack pointer
    inx ; skip over return address
    inx ; to point index to parameter
    inx
    lda $100,x

More efficient would be:
Code:
some_subroutine
    tsx ; get current stack pointer
    lda $103,x

then of course you can jump around without INX and DEX. Just adjust the number that's in the one hundreds.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: