A similar question was asked on Reddit.Unfortunately, I regard the most suitable and efficient call convention on 6502 as an open problem. It is not helped by the minimalism of 6502 and therefore not having any official operating system hook. This has led to Commodore applications using JSR abs with three byte spacing, Acorn applications using JSR (abs) with two byte spacing within the same address range and other systems use wildly different schemes. For example, BigDumbDinosaur now uses 65816 only COP opcode.
Register use also differs wildly. Some use a Forth style data stack in page zero and RegX as the data stack pointer. Commander X16 is downwardly compatible with Commodore 64 but also uses Apple SWEET16 virtual registers - skewed by two bytes for Commodore bank switching compatibility. Some use 8 bit RegY:RegX to pass a 16 bit pointer to a data structure (and BigDumbDinosaur considered 16 bit RegY:RegX to pass a 32 bit pointer). Some use RegA to pass function number. Some pre-scale RegA to pass even numbers only. Some pass even function number in RegX to facilitate JMP (abs,X). Some use separate entry points to reduce register pressure. Some use the carry flag to indicate error, end of file or no input. Overall, it is a mess. However, some schemes must be objectively worse than others.
The RegY:RegX scheme typically requires an application to source a pointer from its section of page zero - which is not consistent across systems. The operating system then typically stores the pointer in its own section page zero before jumping to the required function. This would be STX zp+0 // STY zp+1 // ASL // TAX // JMP (abs,X). This could be reduced to STA zp+0 // STY zp+1 // JMP (abs,X) but I'm not going to suggest that a RegY:RegA pointer call convention is an improvement.
The RegX, page zero data stack seems like a good idea. It appears to eliminate redundant copying of pointers. It also raises the possible number of parameters quite significantly. However, it will break one or more things elsewhere. If a kernel uses this call convention everywhere and a subroutine (or kernel entry) is required within interrupt then RegX must be a valid data pointer at all times, alternative uses of RegX must be interrupt masked or interrupts require their own stack. None of these choices are good restrictions. In particular, interrupt masking fixes IRQ at the expense of interrupt latency while leaving NMI broken.
C++ interface may be desirable. In this case, methods may be a vtable to application wrapper functions, system library functions or kernel calls. Kernel calls may or may not raise privileges via COP vector,
NMI doorbell or similar. Furthermore, an arbitrary compiler call convention can be bridged to an arbitrary kernel call convention with assembly. Indeed, it is possible to bridge to multiple kernel call conventions within one portable binary. The efficiency of this arrangement is secondary.
As a practical example, how would we implement the Arduino function digitalWrite(char pin, char val), where val is LOW or HIGH? On AVR, digitalWrite is a subroutine with its own address and the 8 bit values are assigned sequentially to 8 bit registers. On 6502, pin and val could placed on stack, placed in an agreed part of page zero or placed in registers. I prefer the overhead of stack - partly because it opens the intriguing possibility of Arduino on Forth. This also works more graciously when parameters exceed registers. Indeed, with three bytes of register, this is fairly common. Actually, allocating pin to RegA and val to RegX would be extremely unhelpful. A 6502's address mode asymmetry allows data to be sourced from more places. Parameters in page zero or stack corrects register asymmetry. State in registers is an optimization hack between friendly functions.
With a 16 bit aligned data stack, it may be preferable to mask val with LDA 2,X // AND #1 to determine if a bit should be clear/set. After branching, load pin with LDY 0,X to set RegY with an index for mask operations. Unfortunately, all of this ignores return values. It may be faster to return an error flag, a value in RegA, an even value in RegX suitable for JMP (abs,X) or one or more values on data stack or call stack. What is the most efficient? I don't know.