In the 6502 universe, the traditional API method has been to treat the kernel as a collection of subroutines. Commodore (CBM), in their eight-bit machines, formalized this concept into the "kernal" jump table, which was guaranteed to stay in the same location despite revisions to the kernel or the introduction of a new machine. For example, if the programmer wants to output a byte to the current output device, he/she can load the accumulator with the byte and execute JSR $FFD2, knowing it will be a valid way of outputting a byte on all CBM eight bit machines.
In a system with no more than 64KB of address space and no hardware memory protection, treating the kernel as a collection of subroutines is practical and efficient, especially if the kernel is running in ROM. In a 65C816 native mode environment, the subroutine approach entails some new considerations. As the address space of a 65C816 machine can substantially exceed 64KB and as programs are effectively “sandboxed” into 64KB boundaries (banks), the use of a subroutine call to access the kernel API becomes potentially troublesome. No longer will a simple JSR <api_addr> suffice, as a JSR target is limited to the bank in which the JSR instruction is located. In practical terms, if a 65C816 kernel is running in, say, bank $00 and the program wishing to make an API call is located in, say, bank $0C, JSR is useless.
Bill Mensch anticipated this limitation in the 65C816 and created the JSL (Jump to Subroutine Long) instruction, which uses a 24-bit address to reach its target. A companion instruction, RTL (ReTurn Long) is used to return from a subroutine called with JSL. RTL is necessary because JSL pushes a 24-bit return address to the stack—RTS would only pull 16 of those 24 bits, causing a major malfunction.
Another method of calling the API from a remote bank is to use a software interrupt, which is how just about all present-day operating systems give a user-space program access to the kernel. The 65C816 has two software interrupts: BRK and COP. In native mode, BRK is independently vectored from an IRQ. Both are two-byte instructions, the second byte commonly referred to as the “signature.” Both also behave in identical fashion in native mode. Upon executing BRK or COP, the 816 pushes PB (program bank), PC (program counter) and SR (status register), sets the I-bit in SR, clears the D-bit in SR, loads PB with $00 and then jumps through the appropriate hardware vector, which is implicitly in bank $00. When the interrupt service routine (ISR) has been completed, an RTI will cause the ’816 to pull SR, PC and PB, and execution will resume at the address formed from PB:PC (see here for a full discussion of the 65C816’s interrupt-processing behavior).
A useful feature of any computer expected to support multiple processes is hardware protection. In general, the protection allows the system to run in either user or kernel mode. By way of explanation, once the kernel has been loaded into memory, access to that memory, as well as memory used by the kernel for data, buffers, etc., would be made off-limits to user-space programs—also, the kernel itself would be run in write-protected memory. That way, an errant write instruction in a user program wouldn’t mangle the kernel or its data. However, such arrangements would preclude the use of JSL to access a 65C816 kernel, since hardware protection would raise an exception when the user-space program attempted to enter kernel space via the API jump table. However, there is a solution: the vector pull (VPB) hardware output.
In the 65C816 running in native mode, VPB is driven low during cycles seven and eight of any interrupt sequence, which is when the MPU is fetching the interrupt vector address. It is conceivable that the glue logic could be arranged so the system switches from user mode to kernel mode when an API is called. The resulting relaxed protection would allow access to kernel data structures, as well as hardware registers. Upon exit from the kernel a write to a register in the hardware management unit (an abstract device implemented in a CPLD or FPGA) would return the system to user mode.
In my view, there are clear advantages to using a software interrupt for API access instead of a subroutine call. The handling of hardware protection is one of them. The bank-agnostic nature of this calling method is another. The automatic handling of SR, important in any API in which SR is used to return “success” or “fail” to the caller, is yet another. Furthermore, user programs need have no knowledge of a kernel API jump table, which means if the kernel is relocated in memory, the API call mechanism will continue to work without change. This would not be the case if JSL were the calling method. Last but not necessarily least, a software interrupt is a two-byte instruction; JSL is four bytes.
Of the two software interrupts that are available, it’s my opinion COP should be used for calling the kernel API. BRK is the “traditional” method by which a program is halted for debugging purposes, a “tradition” that I believe should be maintained. As BRK and COP are separately vectored, maintaining this usage is not a problem.
Okay, I’ve started the topic. Let the arguing...er...discussion begin!