I actually haven't found much 16b '816 source floating around, outside of just examples, and those simply aren't really representative.
I found this:
https://github.com/Olde-Skuul/spaceaceiigsIt's the source of Space Ace for the Apple II GS, and it's pretty much pure 16b. It does the SP == DP method of parameter passing. Stuff things on to the stack, then set DP to the top so you can refer to everything via their offsets.
This is a snippet:
Code:
*
* Unpack an animation frame with bank crossing
*
UnpackAnimSlow
:RTSVal = 1
:DestPtr = 3
:UnpackPtr = 7
:EndDirect = 11
TSC
PHB
PHD
TCD
SEP #$20
PEI :DestPtr+1
PLB
PLB
LDX :DestPtr
LDY #0
]A LDA [:UnpackPtr],Y
BNE :NotPack
INY
LDA [:UnpackPtr],Y
STA :DestPtr+2
INY
LDA [:UnpackPtr],Y
INY
]B STA: $0000,X
INX
DEC :DestPtr+2
BNE ]B
BRA :Next
:NotPack BMI :NotTab
REP #$21
AND #$FF
STA :DestPtr+2
TXA
ADC :DestPtr+2
TAX
SEP #$20
INY
BRA :Next
:NotTab AND #$7F
STA :DestPtr+2
INY
]B LDA [:UnpackPtr],Y
STA: $0000,X
INY
INX
DEC :DestPtr+2
BNE ]B
:Next CPX #$9D00
BLT ]A
REP #$30
PLD
PLB
PLA
STA 8-1,S
CLC
TSC
ADC #8-2
TCS
RTS
I really like the style as presented here. Here, we see some local definitions that portray the offsets into the stack frame, then some stack maintenance. What I don't quite grok is the correction stuff at the end to clear the stack.
Code:
PLA
STA 8-1,S
CLC
TSC
ADC #8-2
TCS
RTS
I'm pretty sure this is getting the return value properly placed, and then reducing the stack frame. What I'm not sure about is where the "magic value" 8 comes from in this case, given there's at least 11 byte consumed. But what's nice here is that in the end, each routine effectively has their own little piece of "zero page" that they can use, which should perform pretty well. You'll notice also that they don't save any of the work registers, just B and D. Everything else pretty much relies on the values in memory.
I was hoping to find a disassembly of the II GS Roms somewhere to look at, but I haven't been able to find anything.