Joined: Thu May 28, 2009 9:46 pm Posts: 8514 Location: Midwestern USA
|
Up to the present, all versions of the POC’s basic input/output system (BIOS) have used the typical 6502 method of treating each API function as a subroutine to be called via JSR - RTS. The 65C816’s architecture limits the scope of JSR - RTS to the bank in which the subroutine is located. Prior to POC V1.3, only bank $00 existed, so the scope issue was a non-issue.
Starting with V1.3, it is possible to execute code outside of bank $00. However, such code can’t call BIOS API functions, due to the above-described limitation of JSR - RTS—the BIOS continues to be in bank $00. Ergo I need to change how API calls are made. There are two possible methods:
- Subroutine call with JSL - RTL. JSL is similar to JSR except the former takes a 24-bit address as its operand and pushes a 24-bit return address (minus one) to the stack. RTL’s behavior is similar to that of RTS, except the former pulls a 24-bit address to return to the caller. Succinctly, JSL - RTL works exactly like JSR - RTS, except for the size of the addresses being processed.
- Context change with COP - RTI. COP is one of the 65C816’s two software interrupts, the other, of course, being BRK. COP’s original purpose was to provide a software hook for temporarily giving a co-processor control of the system. As far as I know, there is no co-processor that is bus-compatible with the 65C816, which means COP is available for other purposes. As with any other interrupt, the COP handler returns control with an RTI.
After cogitating on it for a while, I decided to go with method two. While it would appear that COP would be a little slower in execution than JSL, the 816 uses eight clock cycles to process both instructions. RTI, however, requires seven clock cycles, versus six for RTL, so there is a slight performance difference on the return. Also, the COP front end has to figure out which function to run, which entails more computation than required to call an API function with JSL. So overall, COP - RTI will be somewhat slower, although the performance difference can be partially effaced by running at high clock speeds.
A notable feature of using COP - RTI is applications don’t need to know the addresses of API functions in order to use them. Instead, an API index, which is an integer, tells the COP front end which function is to be run. If a new function is added to the API in the future, existing applications don’t need to be reassembled to work with the new BIOS, as long as the new API index is higher than all existing ones. This feature helps with maintaining backward compatibility.
Functions can be made as transparent as is needed, since the COP handler’s front end will preserve machine state and the handler’s back end will restore it. Instead of each function preserving and restoring the registers, the COP handler will take care of that. If a function needs to return a value in one of the registers it can do so by rewriting the register value on the stack.
There are two ways to pass the API index to the COP handler. One is to load a register with the index, and the other is to pass the index in the signature byte associated with COP. Here again I weighed the pros and cons. Using a register makes for a simpler COP handler front end, especially if it is the accumulator that is used. The index merely has to be doubled and copied to the X-register. The front end would then execute JMP (COPTAB,X) or JSR (COPTAB,X) to select the appropriate function, where COPTAB is a 16-bit vector table. This would be the least processing-intensive method, but will mean the accumulator (or other register) will not be available for passing parameters into the called function.
Passing the API index in the COP signature is more complicated. In order to fetch the signature, the return address and bank that were pushed when COP was executed would have to be retrieved, pointing the way to the signature. Those values could be used to populate a direct page pointer, which then can be used in an LDA [<ptr>] instruction to load the signature. As the return address on the stack is one byte past the signature, it must be decremented before being set up in the direct page pointer.
Alternatively, the bank can be loaded into DB (data bank register) and, again, the return address decremented to point at the signature. That address would then be copied into the (16-bit) X-register so an absolute load can be performed to retrieve the signature. The code for this procedure is a bit convoluted:
Code: ;=============================================================================== ; ;icop: COPROCESSOR INTERRUPT SERVICE ROUTINE ; icop rep #%00110000 ;16-bit registers phb ;preserve machine state phd pha phx phy ; ;——————————————————————————————— ;COP REGISTER STACK FRAME ; cop_yrx =1 ;.Y (word) cop_xrx =cop_yrx+s_word ;.X (word) cop_arx =cop_xrx+s_word ;.A (word) cop_dpx =cop_arx+s_word ;DP (word) cop_dbx =cop_dpx+s_mpudpx ;DB (byte) cop_srx =cop_dbx+s_mpudbx ;SR (byte) cop_pcx =cop_srx+s_mpusrx ;PC (word) cop_pbx =cop_pcx+s_mpupcx ;PB (byte) ;——————————————————————————————— ; lda !#kerneldp tcd ;set kernel’s direct page cli ;resume IRQ processing lda cop_pcx,S ;get return address from stack dec A ;point at signature tax ;use as index sep #%00100000 ;8-bit accumulator lda cop_pbx,S ;get caller’s bank from stack pha ;make it a... plb ;temporary data bank lda \2$0,x ;fetch signature... ; ; ————————————————————————————————————————————————————————————————— ; The above instruction must be assembled with absolute addressing. ; Otherwise, it will attempt to fetch the COP signature from bank ; $00, ignoring the bank loaded into DB. The \2 operator forces ; the assembler to generate LDA $0000,X. ; ————————————————————————————————————————————————————————————————— ; phk ;kernel’s program bank plb ;now kernel’s data bank rep #%00100000 ;16-bit accumulator and !#%11111111 ;squelch noise in MSB beq icoperr ;API index is 0, error ; dec ;zero-align API index cmp !#maxapi ;number of defined API functions bcs icoperr ;API number out of range ; asl ;convert API index to... tax ;jump table offset jmp (apifntab,x) ;run API function
...etc... The above is in Kowalski assembler syntax—the !# operator means to load a 16-bit immediate value.
The gain in passing the API index in COP’s signature is all registers are unencumbered and thus may be used for parameter passing. I decided that unencumbered registers is of more value than slightly faster front-end processing. So the procedure will be to pass the index in the signature. Also, doing so lends itself well to API calls being wrapped up in macros.
Speaking of parameter passing, that can be done via the registers, via the stack, or via both. Returning results to the caller can be done through the registers or by rewriting the data pointed to in a stack frame. In the present BIOS, all APIs receive parameters and return results via the registers. In most cases, this aspect of the BIOS could remain unchanged. However, any API call in which one or more parameters is a pointer to data would have to be reworked to recognize 24-bit addresses, since the data could be anywhere in the MPU’s 16 MB address space. That gives rise to an interesting design consideration.
Experience in writing code for the 65C816 has demonstrated that 24-bit values are awkward to process, especially when carrying out pointer arithmetic. In developing my new-and-improved 65C816 string library, I elected to use 32-bit pointers so pointer manipulation would be more efficient. By treating a pointer as a 32-bit entity, arithmetic and other operations don’t entail constantly using REP and SEP to change register sizes according to which piece of the pointer is being handled. Despite a 16-bit fetch or store using an extra clock cycle over an eight bit operation, the overall speed at which pointer manipulation can be carried out is better, as executing REP and SEP isn’t part of the procedure. The only real penalty with using 32-bit pointers is slightly greater direct page consumption.
The wrinkle with using 32-bit pointers is in passing them to a function. It’s possible to pass one such pointer using the index registers, viz:
Code: rep #%00010000 ;16-bit index registers ldx !#addr & $ffff ;address LSW ldy !#addr >> 16 ;address MSW
...etc... Again, the above is in Kowalski assembler syntax. The >> 16 operation is a 16-bit right-shift of the ADDR operand, which results in .Y being loaded with bits 16-31 of ADDR.
In the present BIOS, all but one of the functions that takes pointers to data as input requires only one such pointer, which makes the above method practical for those functions. The exception is the SCSI command processing primitive, which needs two pointers, one pointing to the command descriptor block (CDB) and the other pointing to a buffer, e.g., a disk block buffer. There are three possible ways to handle this.
- Create a separate API call to set the buffer address, if needed. Following that operation, the SCSI command processor API can be called by loading the address of the CDB in .X and .Y, as above, and the target SCSI device ID in the accumulator. Some SCSI commands don’t involve use of a buffer, so the first step could be skipped in those cases. No stack processing would be required to get the parameters.
The downside is two API calls would be required to process read and write operations on the target SCSI device. As such operations are the ones most used, an execution speed penalty would result.
- Use a data structure containing the required parameters. The structure would contain the CDB pointer, buffer pointer and a pointer to the target device ID. The SCSI command processor code would get what it needs from the structure. As with the first method, no stack processing would be required to get parameters.
The downside is the caller has to do more work to set up the API call.
- Use a stack frame to pass parameters. Such a call would be something like the following:
Code: pea #buf & >> 16 ;buffer address MSW pea #buf & $ffff ;buffer address LSW pea #cdb & >> 16 ;command descriptor block address MSW pea #cdb & $ffff ;command descriptor block address LSW pea #scsiid ;target SCSI device bus ID cop #kscsicmd ;call SCSI command processor bcs error
...etc... The above takes care of the pointer requirements, as well as identifying the SCSI device that is the target of the operation. Only one API call is needed, which eliminates the overhead of a second call to set the buffer address.
The downside is processing will be more complicated due to stack addressing, and the SCSI command processor code would have to perform stack housekeeping prior to returning to the caller.
I have not yet decided which of the above methods I will use—each has its own complications.
One last thing. As COP is an interrupt, it has a useful hardware effect. During the seventh and eight clock cycles of processing COP, the MPU’s VPB (vector pull) output will go low. In a system that has privilege levels, the glue logic can react to VPB by relaxing restrictions, mimicking the user/supervisor modes of the Motorola 68K.
_________________ x86? We ain't got no x86. We don't NEED no stinking x86!
Last edited by BigDumbDinosaur on Mon Nov 06, 2023 9:25 am, edited 2 times in total.
|
|