The other thing that I've not seen mentioned is the JMP ($xxxx,X) instruction. Load the function ID into X (ensuring bit 0 is clear), then JML, BRK, or COP. That leaves A,Y for passing an address to a parameter block, or using the stack.
The reason you want to avoid using a single COP and a single pool of function IDs as BDD feels more comfortable with is because there is **zero** performance advantage and **negative** usability advantages to monolithic kernels on the 65xx architecture. You absolutely want the ability to dynamically load and unload modules/libraries (exactly the same thing in 65xx land), and these modules can be (and almost certainly will be) loaded at a different address every time you load them.
So, if you use COP, you'll want one register to select which module you're addressing, and X to select a function within that module. A better solution is to use a well-known kernel function similar to Amiga's OpenLibrary() call which manages library sharing and lifecycle, and
returns the address of the library just opened (thus, if three processes all use the same library, they will all get the same address back). This pointer can be saved into your program's data area like so:
Code:
; somehow get address of library in A,X
sta libA+1
stx libA+3
php
sep #$30
lda #OPCODE_JML
sta libA
plp
For example, here's how I'd 65816-ize the start-up code found in Tripos-compatible applications, assuming someone were to port Tripos to the 65816:
Code:
_start:
; First, call the kernel to find our DOS entrypoint.
ldx #sc_FindDos
cop #0
sta DosLib+1
stx DosLib+3
; Now, open a file, then close it again, just because we can.
pea #MODE_OLD
pea #>filename
pea #<filename
ldx #LVO_Open
jsl DosLib
; A already has the reference to the SCB if the file existed.
; For brevity, I omit error handling.
pha
ldx #LVO_Close
jsl DosLib
; Clean up the stack. If we used more than 4 slots,
; then TSC;ADC;TCS would be smaller.
plx
plx
plx
plx
; Exit back to the shell.
ldx #LVO_Exit
pea #0
jsl DosLib ; never returns.
.data
DosLib:
jml $000000
filename:
.byte "RAM:Foo/Bar",0
As you can see, now you can call the library directly without incurring any kernel overhead by loading your function selector in X and executing JSL libA. Easy peasy, it's probably the 2nd or 3rd fastest method of invoking services on a 65xx CPU (you might remember an earlier set of reasoning I did here on these forums a decade or so ago about how to support OOP dispatch -- it's exactly the same problem), and it's infinitely (RAM permitting) open-ended. You're not limited to just what the kernel offers in functionality. (While I enjoy the everything-is-a-file metaphor, I don't believe it should be followed religiously. There are legitimate counter-cases where this is demonstrably not true, such as GUIs).