A (WDC) 65816 calling convention
A (WDC) 65816 calling convention
There have been a couple of discussions around stack frames and such and I really want to document them all in one place.
To this end the calling convention I've settled on using is the WDC C compiler's (previously the Zardoz C compiler).
In summary it assumes arguments that are going to be passed to a function are pushed onto the hardware stack before the function is called. On entry (JSR / JSL) to the function being called the Direct Page is set the address of the Stack Pointer (+1). This allows the those arguments to be accesses using Direct Page addressing rather than the limited Stack addressing modes.
This leaves out stuff like: what about local variables to the function? Is it safe keep using the stack? How does returning work? And what happens if a function is called again?
And I'll get to all of that but for now the general idea is just that the Direct Page IS the Stack Frame for the function that has just been called.
However the examples I'm going to use will make extensive use of the PEI instruction to push arguments before the function is called so a good grasp of both PEA and PEI is going to be helpful. If you already understand them you can skip over the rest of this post as I'll only start writing about calling conventions in the next post below.
PEA - opcode F4
PEA pushes 16bits of immediate data onto the stack. It's described as pushing an address by WDC but the value can be used as anything you want. PEA affects the stack as in the example below: PEI - opcode D4
PEI pushes 16bits of data located at a Direct Page address onto the stack. It's very similar to PEA except that the data pushed does not immediately follow the op-code but is rather in Direct Page memory. PEI affects the stack as in the example below: PEA and PEI are always push 16bit data regardless of what the Memory Width (M) is set to.
And with that out the way we have a mechanism for pushing from the Direct Page (our functions stack frame) onto the stack (which will eventually become the called functions stack frame).
To this end the calling convention I've settled on using is the WDC C compiler's (previously the Zardoz C compiler).
In summary it assumes arguments that are going to be passed to a function are pushed onto the hardware stack before the function is called. On entry (JSR / JSL) to the function being called the Direct Page is set the address of the Stack Pointer (+1). This allows the those arguments to be accesses using Direct Page addressing rather than the limited Stack addressing modes.
This leaves out stuff like: what about local variables to the function? Is it safe keep using the stack? How does returning work? And what happens if a function is called again?
And I'll get to all of that but for now the general idea is just that the Direct Page IS the Stack Frame for the function that has just been called.
However the examples I'm going to use will make extensive use of the PEI instruction to push arguments before the function is called so a good grasp of both PEA and PEI is going to be helpful. If you already understand them you can skip over the rest of this post as I'll only start writing about calling conventions in the next post below.
PEA - opcode F4
PEA pushes 16bits of immediate data onto the stack. It's described as pushing an address by WDC but the value can be used as anything you want. PEA affects the stack as in the example below: PEI - opcode D4
PEI pushes 16bits of data located at a Direct Page address onto the stack. It's very similar to PEA except that the data pushed does not immediately follow the op-code but is rather in Direct Page memory. PEI affects the stack as in the example below: PEA and PEI are always push 16bit data regardless of what the Memory Width (M) is set to.
And with that out the way we have a mechanism for pushing from the Direct Page (our functions stack frame) onto the stack (which will eventually become the called functions stack frame).
Last edited by AndrewP on Sun May 12, 2024 7:16 am, edited 3 times in total.
Re: A (WDC) 65816 calling convention
[Still a place holder]
Last edited by AndrewP on Sun May 12, 2024 7:11 pm, edited 1 time in total.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A (WDC) 65816 calling convention
I’ll throw this into the hopper for possible later discussion.
What follows is a macro in Kowalski assembler syntax that generates a stack frame for use in passing parameters into a called function (subroutine). An arbitrary number of arguments is passed to the macro, these arguments being interpreted as 32-bit pointers to data. Along with each pointer argument, a corresponding mode argument is passed to tell the macro how to interpret the pointer argument. The bottom of the stack frame, i.e., at SP+1, will have a 16-bit number that is a count of the number of words in the rest of the stack frame.
I use pushparm in much of my code to feed parameters into functions. As a fairly general rule, required parameters are 32-bit pointers to data, so this macro “automates” the process, cutting down on typing...and potential bugs. Use of 32-bit pointers simplifies the performance of pointer arithmetic, as it avoids constant REP and SEP instructions to manipulate register sizes.
What follows is a macro in Kowalski assembler syntax that generates a stack frame for use in passing parameters into a called function (subroutine). An arbitrary number of arguments is passed to the macro, these arguments being interpreted as 32-bit pointers to data. Along with each pointer argument, a corresponding mode argument is passed to tell the macro how to interpret the pointer argument. The bottom of the stack frame, i.e., at SP+1, will have a 16-bit number that is a count of the number of words in the rest of the stack frame.
Code: Select all
;—————————————————————————————————————————————————————————
; pushparm parm1 [,parm2 [,parmN]],mode1 [,mode2 [,modeN]]
;
; This macro generates a parameter stack frame. The frame
; consists of 1 or more 32-bit values, pushed in big-end-
; ian order (which places the LSW on the stack below the
; MSW). Parameters are pushed from right to left; PARM1
; will be the lowest on the stack. Below PARM1 will be a
; word value that indicates the number of words that
; were pushed, e.g.:
;
; pushparm addr1,addr2,addr3,'f','f','f'
;
; will produce the following stack frame:
;
; addr3 (MSW)
; addr3 (LSW)
; addr2 (MSW)
; addr2 (LSW)
; addr1 (MSW)
; addr1 (LSW)
; $0006 (words pushed)
;
; For each parameter in the invocation, a corresponding
; mode must be passed. Recognized modes are:
;
; 'd' The corresponding parameter is a direct-page
; address from which the value to be pushed will
; be fetched. The resulting instruction sequence
; will be:
;
; PEI <parm>+2
; PEI <parm>
;
; 'f' The corresponding parameter is a value that is
; to be processed as a “far” address. The resul-
; ting instruction sequence will be:
;
; PEA #<parm> >> 16
; PEA #<parm> & $FFFF
;
; 'n' The corresponding parameter is to be processed
; as a “near” address. The resulting instruction
; sequence will be:
;
; PHK
; PHK
; PER <parm> & $FFFF
;
; Note that although the execution bank is pushed
; twice, the extra push is ignored during indi-
; rect long addressing. Also note that if <parm>
; is specified as a 24- or 32-bit address, only the
; LSW will be processed, since PER is bound to the
; execution bank.
;
; 'r' The corresponding parameter is assumed to be in
; the index registers, .X = LSW & .Y = MSW. The
; index registers must be set to 16-bits —— there
; is no check for this. The corresponding parame-
; ter is ignored, but is required for syntax rea-
; sons. It is recommended that it be $00 to make
; it clear that the value being processed is being
; passed in the registers. Although of limited
; value, this mode may be used multiple times.
;
; Modes are case-insensitive; 'N' & 'n' are functionally
; identical.
;—————————————————————————————————————————————————————————
;
pushparm .macro ...
.tp =@0
.if .tp @ 2
.error ""+@0$+": parameter & mode counts mismatch"
.endif
.if .tp == 0
.error "syntax: "+@0$+" parm1 [,parm2 [,parmN]],mode1 [,mode2 [,modeN]]"
.endif
.np =.tp/2
.ix .set .np
.rept .np
.m .= @{.ix+.np} | %00100000
.if .m == 'd'
.if @.ix > $ff
.error ""+@0$+": 'd' mode requires direct page address parameter"
.endif
pei {@.ix}+2
pei @.ix
.else
.if .m == 'f'
pea #{@.ix} >> 16
pea #{@.ix} & $ffff
.else
.if .m == 'n'
phk
phk
per {@.ix} & $ffff
.else
.if .m == 'r'
phy
phx
.else
.error ""+@0$+": mode must be 'd', 'f', 'n' or 'r'"
.endif
.endif
.endif
.endif
.ix .= .ix-1
.endr
pea #.np*2
.endmI use pushparm in much of my code to feed parameters into functions. As a fairly general rule, required parameters are 32-bit pointers to data, so this macro “automates” the process, cutting down on typing...and potential bugs. Use of 32-bit pointers simplifies the performance of pointer arithmetic, as it avoids constant REP and SEP instructions to manipulate register sizes.
x86? We ain't got no x86. We don't NEED no stinking x86!
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: A (WDC) 65816 calling convention
Your stack diagram for PEA seems to imply that the low byte of the operand is pushed before the high byte, which contradicts my 65xx instincts. I have a reputation for being correct more than 50% of the time. 
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A (WDC) 65816 calling convention
barrym95838 wrote:
Your stack diagram for PEA seems to imply that the low byte of the operand is pushed before the high byte, which contradicts my 65xx instincts. I have a reputation for being correct more than 50% of the time. 
I am having trouble reading Andrew’s stack diagrams due to the color, so I am entirely not sure what is in them. If they are implying that the LSB is being pushed first, then that would be incorrect. If, for example, SP (stack pointer) is currently loaded with $BFFF and the 65C816 executes PEA #$12AB, the sequence will be:
Code: Select all
#$12 ——> $BFFF (SP)
#$AB ——> $BFFE (SP-1)After PEA #$12AB had finished, SP would be $BFFD. With the 816, all word pushes are big-endian. When the stack is accessed using the stack pointer-relative addressing mode, e.g., LDA <offset>,S, and the accumulator is “wide” (set to 16 bits), the access will be in the customary little-endian style.
I draw all memory maps top-down, logically placing higher addresses higher in the map. Hence in diagramming a stack “picture,” it is helpful to draw it top-down as well.
Unlike SP in the 65C02 (and other 8-bit 6502 members), the native-mode 65C816 SP is a true pointer—it holds an address, unlike the 65C02’s SP, which is merely an index into a fixed memory range (page $01). That the native-mode 816 SP is a pointer makes it easy to do all sorts of useful tricks with it, which are facilitated by various stack-oriented machine instructions. For example, the 816 makes it convenient to reserve ephemeral workspace on the stack, giving each function in your program a “scratchpad” for doing local stuff:
Code: Select all
;reserve 16 bytes of workspace on the 65C816 stack
;
; ———————————————————————————————————————————————————————————
; In this example, it is assumed that SP == $BFFF upon entry.
; Hex numbers in parentheses are the value of SP after the
; operation has been performed.
; ———————————————————————————————————————————————————————————
;
phd ;save current direct page (BFFD)
rep #%00100000 ;select wide accumulator
sec
tsc ;SP ——> accumulator
sbc !#16 ;reserve workspace
tcs ;accumulator ——> SP (BFED)
inc A ;accumulator ++
tcd ;accumulator ——> DP
———————————————————————————————————————————————————————————
At this point, direct page is on the stack, starting at
$BFEE. An instruction such as LDA $00 will actually load
from $BFEE, LDA $01 would be loaded from $BFEF, etc. As
DP (direct page pointer) is not page-aligned, there will
be a 1 cycle penalty per access. In practice, this penalty
is a trivial price to pay for the convenience of being able
to address stack content with direct-page instructions.
Note that DP’s entry value may be fetched with LDA $10,
which will be loaded from $BFFE.
———————————————————————————————————————————————————————————
; when done, do stack housekeeping...
;
rep #%00100000 ;just in case (implied CLC)
lda $10 ;DP’s entry value
tcd ;restore it
tsc ;SP ——> accumulator (BFED)
adc !#16+2 ;discard workspace & stack copy of DP
tcs ;accumulator ——> SP (BFFF)In my functions, I define local symbols that are used to describe the layout of the stack frame. That way, I can readily change the workspace size if needed by a future revision. Also, I habitually do not embed “magic numbers” in code. In an actual program, the 2 in the 16+2 expression is symbolically set in an INCLUDE file that defines 65C816 register sizes.
Last edited by BigDumbDinosaur on Wed Jan 01, 2025 8:15 am, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: A (WDC) 65816 calling convention
barrym95838 wrote:
Your stack diagram for PEA seems to imply that the low byte of the operand is pushed before the high byte, which contradicts my 65xx instincts. I have a reputation for being correct more than 50% of the time.
BigDumbDinosaur wrote:
What follows is a macro in Kowalski assembler syntax that generates a stack frame for use in passing parameters into a called function (subroutine). An arbitrary number of arguments is passed to the macro, these arguments being interpreted as 32-bit pointers to data. Along with each pointer argument, a corresponding mode argument is passed to tell the macro how to interpret the pointer argument. The bottom of the stack frame, i.e., at SP+1, will have a 16-bit number that is a count of the number of words in the rest of the stack frame.
BigDumbDinosaur wrote:
Unlike SP in the 65C02 (and other 8-bit 6502 members), the native-mode 65C816 SP is a true pointer—it holds an address, unlike the 65C02’s SP, which is merely an index into a fixed memory range (page $01). That the native-mode 816 SP is a pointer makes it easy to do all sorts of useful tricks with it, which are facilitated by various stack-oriented machine instructions. For example, the 816 makes it convenient to reserve ephemeral workspace on the stack, giving each function in your program a “scratchpad” for doing local stuff
* And ... BDD had already made exactly the same point in his last post. Doh!
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A (WDC) 65816 calling convention
AndrewP wrote:
BigDumbDinosaur wrote:
...the 816 makes it convenient to reserve ephemeral workspace on the stack...
Up until very recently, I was not familiar with how the WDC C compiler does it.
The method I developed was intended to fulfill several requirements: transparency, recursiveness, ease of accessing entry register values (or changing them before exit), and the desire to be able to index the stack-bound zero page from $00. Hence upon entry, a function will push the registers that will be used, in a defined order, and then tinker with the stack pointer to reserve needed workspace. Below is an example, culled from my SCSI function library. I edited out some of the commentary for brevity:
Code: Select all
;===============================================================================
;
;blkread: READ FROM SCSI BLOCK DEVICE
———————————————————————————————————————————————————————————————————————
; Invocation example: pea #buf >> 16 ;buffer pointer MSW
; pea #buf & $ffff ;buffer pointer LSW
; pea #nblk >> 16 ;block count pointer MSW
; pea #nblk & $ffff ;block count pointer LSW
; pea #lba >> 16 ;LBA pointer MSW
; pea #lba & $ffff ;LBA pointer LSW
; pea #scsi_id >> 16 ;device ID pointer MSW
; pea #scsi_id & $ffff ;device ID pointer LSW
; .IF .DEF(_SCSI_)
; JSL blkread
; .ELSE
; JSR blkread
; .ENDIF
; BCS ERROR
———————————————————————————————————————————————————————————————————————
;
blkread clc ;assume no error
php ;save machine state
rep #m_setr | sr_bdm ;16-bit registers & binary math
phy
phx
pha
phb
phd
;
;—————————————————————————————————————————————————————————
;
;LOCAL DEFINITIONS
;
.maxblk =128 ;max blocks per transaction +1
.sfbase .= 0 ;base stack index (assembly-time variable)
.sfidx .= .sfbase ;workspace index (assembly-time variable)
;
;—————————> workspace stack frame start <—————————
;
.cdbptr =.sfidx ;local CDB pointer (16 bits)
.sfidx .= .sfidx+s_ptr
.lbaflag =.sfidx ;$8080 = 32-bit LBA (16 bits)
.sfidx .= .sfidx+s_word
.nblks =.sfidx ;requested blocks (16 bits)
.sfidx .= .sfidx+s_word
;
;—————————> workspace stack frame end <—————————
;
.s_wsf =.sfidx-.sfbase ;workspace size
.sfbase .= .sfidx
;
;—————————> register stack frame start <—————————
;
.reg_dp =.sfidx ;DP
.sfidx .= .sfidx+s_mpudpx
.reg_db =.sfidx ;DB
.sfidx .= .sfidx+s_mpudbx
.reg_c =.sfidx ;.C
.sfidx .= .sfidx+s_word
.reg_x =.sfidx ;.X
.sfidx .= .sfidx+s_word
.reg_y =.sfidx ;.Y
.sfidx .= .sfidx+s_word
.reg_sr =.sfidx ;SR
.sfidx .= .sfidx+s_mpusrx
.reg_pc =.sfidx ;PC
.sfidx .= .sfidx+s_mpupcx
.if .def(_SCSI_) ;if using remote calls...
.reg_pb =.sfidx ;PB
.sfidx .= .sfidx+s_mpupbx
.endif
;
;—————————> register stack frame end <—————————
;
.s_rsf =.sfidx-.sfbase ;register frame size
.sfbase .= .sfidx
;
;—————————> parameter stack frame start <—————————
;
.idptr =.sfidx ;*SCSI_ID
.sfidx .= .sfidx+s_dptr
.lbaptr =.sfidx ;*LBA
.sfidx .= .sfidx+s_dptr
.nblkptr =.sfidx ;*NBLK
.sfidx .= .sfidx+s_dptr
.bufptr =.sfidx ;*BUF
.sfidx .= .sfidx+s_dptr
;
;—————————> parameter stack frame end <—————————
;
.s_psf =.sfidx-.sfbase ;parameter frame size
;—————————————————————————————————————————————————————————
;
pea #$00
plb
plb ;configure data bank...
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
; As local fetches & stores are to the stack, PB
; is set to bank $00 to avoid long addressing.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
;
sec
tsc ;SP ——> .C
sbc !#.s_wsf ;allocate workspace...
tcs ;for local variables
inc ;point direct page to...
tcd ;workspace...
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
; Note that the above sequence makes the temporary
; direct page zero-based for convenience. Also,
; fetches & stores from the stack copies of the
; entry registers are possible using DP addressing,
; e.g.:
;
; LDA #sr_car
; TSB .reg_sr
;
; to set carry in the returned status register.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;
tsc ;SP ——> .C
sbc !#s_cdbg2 ;allocate space for...
tcs ;local CDB...
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
; The above allocates sufficient space to accommodate
; a CDB defined for a short or long read operation.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;
inc
sta .cdbptr ;local CDB pointer LSW
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
; The pointer MSW is implied because the CDB
; is on the stack & hence is in bank $00.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
...etc...Note in the stack definitions block that everything is symbolically defined. Since workspace starts at $00 in the local direct page, I can manipulate anything on the stack by using the corresponding symbol as an operand to an instruction, e.g, STA .reg_c to change the value returned in the accumulator, or by indexing direct page with .X.
Note that the space for the command descriptor block (CDB) that is submitted to the SCSI driver API is reserved after local workspace has been reserved and defined as direct page. As the CDB is in stack space, it is implicitly in bank $00 and can be accessed with (.cdbptr),Y addressing after setting DB to $00.
Last edited by BigDumbDinosaur on Thu Aug 15, 2024 5:46 pm, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: A (WDC) 65816 calling convention
BigDumbDinosaur wrote:
The method I developed was intended to fulfill several requirements: transparency, recursiveness, ease of accessing entry register values (or changing them before exit), and the desire to be able to index the stack-bound zero page from $00. Hence upon entry, a function will push the registers that will be used, in a defined order, and then tinker with the stack pointer to reserve needed workspace.
Oh boy have things not.
This week has already gone properly off the rails with no chance of recovery any time soon.
So why have I quoted BDD above and why am I mentioning this? Well he's mentioned the gist of what I want to post and I'm going to be unlikely to find the time to finish my place holder post above anytime soon.
I had started writing up the post and if you're interested (and for comparison) the following two images kinda show how the WDC setup (preamble) and cleanup (postamble) are done.
BigDumbDinosaur wrote:
Code: Select all
pea #$00
plb
plb ;configure data bank...- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A (WDC) 65816 calling convention
AndrewP wrote:
Unfortunately the last month or more at the office have been quite unpleasant with last week (at last) seeming to have calmed down.
Oh boy have things not.
This week has already gone properly off the rails with no chance of recovery any time soon.
Oh boy have things not.
This week has already gone properly off the rails with no chance of recovery any time soon.
Whatever is going on at the office doesn’t sound good.
Quote:
So why have I quoted BDD above and why am I mentioning this? Well he's mentioned the gist of what I want to post and I'm going to be unlikely to find the time to finish my place holder post above anytime soon...
Unfortunately, I can’t read some of that...
Quote:
...the following two images kinda show how the WDC setup (preamble) and cleanup (postamble) are done.
For comparison purposes, the following is the postamble to my earlier example for the BLKREAD function:
Code: Select all
; ———————————
; COMMON EXIT
; ———————————
;
.done rep #m_setr | sr_car ;16-bit registers & clear carry
tsc ;SP ——> .C
adc !#.s_wsf+s_cdbg2 ;total size of local workspace...
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
; Note that .S_WSF was earlier set in the stack frame definitions
; section. S_CDBG2 is defined in ~/include/scsi/atomic.asm & is
; the size of a group 2 command descriptor block.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;
tcs ;.C ——> SP, expunges workspace
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
; Next, we “rebalance” the stack. The procedure is to move the
; the register frame up in the stack so the top of the frame is
; immediately below the return address, then reset SP so when the
; registers are pulled right before exit, SP+1 will be at the re-
; turn address.
;
; At this point in the code, .C still contains a copy of SP.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;
adc !#.s_rsf ;point to top of register stack frame &...
tax ;set as copy source address...
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
; The source is set to the top of the register frame because we
; will be copying in reverse during stack rebalance. .S_RSF is
; the register frame size that was set in the stack frame def-
; initions.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;
adc !#.s_psf ;point to top of parameter stack frame &...
tay ;set as copy destination address...
;
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
; .S_PSF is the parameter frame size that was set in the stack
; frame definitions.
; —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
;
lda !#.s_rsf-1 ;bytes to be copied -1
mvp #0,#0 ;relocate register stack frame
tyx ;adjust SP to point to...
txs ;register frame -1
pld ;restore machine state
plb
pla
plx
ply
plp
.if .def(_SCSI_) ;if using remote calls...
rtl
.else
rts
.endifMuch of what that goes on in the above is “automatic,” in that the adjustments that occur are based on how the stack frames are defined, which, of course can vary from function to function.
Quote:
Quote:
Code: Select all
pea #$00
plb
plb ;configure data bank...Another trick would be something like the following to set DP to an address known only at run time:
Code: Select all
PER local_dp
PLDPEA, PEI and PER are good for all sorts of obscure purposes, e.g., reserving and clearing two bytes of stack space in one operation by using PEA #$00. The reserved space would be at SP+1.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: A (WDC) 65816 calling convention
I have been reading through all the 65816 forum posts to get a jumpstart on programming one.
Having a stack with stack relative addressing is a big upgrade for parameter passing. I plan to create a set of reusable functions in bank zero EEPROM. Nothing fancy, just basic character I/O, string manipulation, printing, and integer trig functions. But a few things concern me:
If you're developing reusable functions, you should require a JSL to enter and RTL to return. That way it can be called from any bank. It also allows your code to know the offset from the current SP upon entry. This does mean that code calling your functions from the same bank does a JSL where a JSR would do.
Return values from functions are another concern. On the 6502 I created a page zero data stack where I returned values. Basically, two stacks Forth style calling semantics. On the VAX and System 360 return values are put in R0 or through reference parameters. I suppose the 16-bit accumulator works as an R0 analog, and updating the input stack parameters would be the way to achieve this.
Having a stack with stack relative addressing is a big upgrade for parameter passing. I plan to create a set of reusable functions in bank zero EEPROM. Nothing fancy, just basic character I/O, string manipulation, printing, and integer trig functions. But a few things concern me:
If you're developing reusable functions, you should require a JSL to enter and RTL to return. That way it can be called from any bank. It also allows your code to know the offset from the current SP upon entry. This does mean that code calling your functions from the same bank does a JSL where a JSR would do.
Return values from functions are another concern. On the 6502 I created a page zero data stack where I returned values. Basically, two stacks Forth style calling semantics. On the VAX and System 360 return values are put in R0 or through reference parameters. I suppose the 16-bit accumulator works as an R0 analog, and updating the input stack parameters would be the way to achieve this.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: A (WDC) 65816 calling convention
You could store return values directly into the caller's CPU stack frame, past the return address. The top value/values of the caller can be considered temporary, depending on what it calls.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A (WDC) 65816 calling convention
Martin_H wrote:
If you're developing reusable functions, you should require a JSL to enter and RTL to return. That way it can be called from any bank.
What exactly do you mean by “reusable functions?” Are you referring to sharable functions that are separately loaded into some arbitrary bank to be called by programs running in other banks?
Quote:
It also allows your code to know the offset from the current SP upon entry. This does mean that code calling your functions from the same bank does a JSL where a JSR would do.
That only becomes a concern if the function being called is independently loaded from any other program. In most of my libraries, I use a symbol to indicate when a function is being remotely called, which of course implies that the return must be via RTL. To date, I have not used that mechanism—library functions are conditionally INCLUDEd and assembled as part of the main program. I generally avoid use of cross-bank function calls.
Quote:
Return values from functions are another concern. On the 6502 I created a page zero data stack where I returned values. Basically, two stacks Forth style calling semantics. On the VAX and System 360 return values are put in R0 or through reference parameters. I suppose the 16-bit accumulator works as an R0 analog, and updating the input stack parameters would be the way to achieve this.
In my code, it depends on what is being called. A call to the BIOS API (using COP #<sig>) passes parameters and receives returns via the registers—the stack is not part of the call/return mechanism used by the calling program. This model is practical because all API calls access primitives that work with small pieces of data, mostly at the hardware level.
My library functions work with pointers passed through a stack frame. Data structures referenced by the pointers are directly accessed by the function, mostly using indirect-long addressing. Library functions do not return values to the caller through the stack—the stack is rebalanced prior to exit to get rid of the entry stack frame and any function-defined workspace.
Functions typically will indicate success or failure by clearing or setting carry and if failure, returning a status word (error code) in .C (where the meaning of “success” and “failure” depends on the called function). It is implied the user’s data structures will not be touched if a function exits with a failure.
White Flame wrote:
You could store return values directly into the caller's CPU stack frame, past the return address. The top value/values of the caller can be considered temporary, depending on what it calls.
My beef with doing so is it puts the onus on the caller to keep the stack in balance, which I see as disruptive to the overall program flow, as well as transparency. In my libraries, if a function returns multiple pieces of data, the user is expected to push pointers to where said data are to be deposited—the function cleans up the stack prior to exit.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: A (WDC) 65816 calling convention
BigDumbDinosaur wrote:
That only becomes a concern if the function being called is independently loaded from any other program. In most of my libraries, I use a symbol to indicate when a function is being remotely called, which of course implies that the return must be via RTL.
Quote:
My library functions work with pointers passed through a stack frame. Data structures referenced by the pointers are directly accessed by the function, mostly using indirect-long addressing. Library functions do not return values to the caller through the stack—the stack is rebalanced prior to exit to get rid of the entry stack frame and any function-defined workspace.
Re: A (WDC) 65816 calling convention
BigDumbDinosaur wrote:
Martin_H wrote:
If you're developing reusable functions, you should require a JSL to enter and RTL to return. That way it can be called from any bank.
What exactly do you mean by “reusable functions?” Are you referring to sharable functions that are separately loaded into some arbitrary bank to be called by programs running in other banks?
An example is below. I'm porting my console I/O functions from 6502 to 65816. On the 6502 I kept them in the EEPROM and when I loaded a program into RAM it could call those functions in the EEPROM. I also had them use a pointer to a function for stdin and stdout. That way I could change the console from the serial terminal to the PC keyboard and my onboard video.
Code: Select all
;
; Functions
;
; Routine to initialize console pointers and state.
; input - two pointers on the stack
; output - none
conIoInit:
.scope
pop _STDOUT
pop _STDIN
stz _echo
stz _readIdx
stz _writeIdx
rts
.endscope
; Enable or disable character echo during line editing mode.
; input - boolean in accumulator.
; output - none
conioSetEcho:
.scope
sta _echo
rts
.endscope
; cgets is similar to the MSDOS console I/O function that reads an entire
; line from _stdin. A line is terminated by a CR, and backspace deletes
; the previous character in the buffer.
; input - implicit from init function.
; output - implicit in that the line buffer is filled.
cgets:
.scope
phy
ldy _writeIdx
_while:
jsr _getch
sta _tib,y
lda _echo
beq :+
lda _tib,y
jsr putch
: lda _tib,y
cmp #AscBS
bne :+
_decIdx
bra _while
: _incIdx
cmp #AscCR
beq _end
cmp #$00
bne _while
_end:
sty _writeIdx
ply
rts
_getch:
jmp (_STDIN)
.endscope
; cputs is like the MSDOS console I/O function. It prints a null terminated
; string to the console using putch.
cputs:
.scope
_loop:
lda (TOS_LSB,x) ; get the string via address from zero page
beq _exit ; if it is a zero, we quit and leave
jsr putch ; if not, write one character
incTos ; get the next byte
bra _loop
_exit:
rts
.endscope
; gets a character from the terminal input buffer, or gets more
; characters if it is empty.
getch:
.scope
phy
ldy _readIdx
cpy _writeIdx
bne :+
jsr cgets ; buffer empty, get more characters.
: lda _tib,y
_incIdx
sty _readIdx ; store next read index.
ply
rts
.endscope
; puts a character in the accumulator back into the terminal input buffer.
ungetch:
.scope
phy
ldy _readIdx
_decIdx
sta _tib,y
sty _readIdx
ply
rts
.endscope
; puts a character in accumulator to stdout by doing an indirect jump to
; that handler. The handler will RTS to our caller.
putch:
.scope
jmp (_STDOUT)
.endscope
Re: A (WDC) 65816 calling convention
gfoot wrote:
That is interesting - you mean the called function removes things from the stack that were put there by the caller? I did that in my printf function and felt it had some benefits, but wasn't sure if it was a good approach or not.
Neil