cr1901 wrote:
BigDumbDinosaur wrote:
This is where pushing DB to the stack and temporarily loading it with bank $00 (or wherever the I/O hardware is located) helps out. In any 65C816 system with more than 64K, you are going to have to either use long addressing or be prepared to tinker with DB. It's unavoidable.
I probably should've been able to figure out to switch DB myself. In any case, I can see uses for both using long addressing and swapping DB. The former probably makes most sense for one-shot reads and writes, such as reading a status register or writing a single value. But the latter probably makes more sense to send blocks of data, i.e. with the ACIA, switch to bank 0, and use one of the complex addressing modes to access your data block wherever it is (bank 0, 1, 2,... etc).
In POC V2, which will have 512KB of RAM, I will have ROM and the I/O hardware visible in bank $00 only. The firmware I wrote for POC V1 uses mostly interrupt-driven I/O and will be largely transferred intact to V2. Hence the fact that I/O is in bank $00 is of little consequence, since any interrupt causes execution to revert to bank $00. However, DB is not changed by an interrupt, which is both a help and a hindrance.
By way of reference, POC V1's IRQ ISR starts thusly:
Code:
;================================================================================
;
;iirq: HARDWARE INTERRUPT REQUEST SERVICE ROUTINE
;
iirq phb ;save DB
phd ;save DP
longr ;16 bit registers
pha
phx
phy
;
;———————————————————————————————
; Stack Frame Definition
;
irq_yrx .= 1 ;.Y
irq_xrx .= irq_yrx+s_word ;.X
irq_arx .= irq_xrx+s_word ;.C
irq_dpx .= irq_arx+s_word ;DP
irq_dbx .= irq_dpx+s_mpudpx ;DB
irq_srx .= irq_dbx+s_mpudbx ;SR
irq_pcx .= irq_srx+s_mpusrx ;PC
irq_pbx .= irq_pcx+s_mpupcx ;PB
;———————————————————————————————
;
jmp (ivirq) ;IRQ indirect vector
;
iirqa longa ;ensure 16 bit accumulator
ldaw kerneldp ;set default...
tcd ;kernel direct page
shortr ;8 bit registers
lda #kerneldb ;set default...
pha ;kernel...
plb ;data bank
;
; —————————————————————————
; IRQ priority: a) SCSI
; b) UART RxD
; c) UART TxD
; d) RTC
; —————————————————————————
;
...etc...
POC V1 only has bank $00, but I wrote the necessary bank switching mumbo-jumbo into the ISR for testing purposes. It, of course, will become necessary in POC V2.
Note the indirect jump through the IRQ vector. The address of that vector must be in bank $00. If it isn't, then JMP (IVIRQ,X) would have to be used, with .X set to $0000. However, that means that .X gets clobbered, which then means that if an extension to the ISR is intercepting the jump (i.e., "wedged" into the ISR) it has to get .X off the stack, which clobbers .C, since only the accumulator can use stack pointer relative addressing...
Data flow between the UART (DUART in POC V1) and its buffers is interrupt-driven—only the BIOS directly accesses the FIFOs. SCSI I/O uses interrupts for vectoring the SCSI driver foreground according to bus phase changes, but uses a monkey-rigged quasi-DMA process for actually reading or writing data. Reading or writing the real-time clock is done from the foreground. However, the RTC is responsible for generating the 100 Hz jiffy IRQ, which when serviced cause the uptime timer to increment and programmable time delay counter to decrement.
Foreground access on the DUART is for register setup following reset, and incidental accesses for device control. For example, when one of the channels interrupts due to the TxD FIFO being empty, the transmitter has to be shut down if the associated buffer is empty. The ISR takes care of shutdown in such a case. However, the foreground part of the UART driver has to restart the transmitter, which it does by writing a control value into the hardware.
Since the TIA-232 buffers are in kernel space, it makes sense to set DB to the kernel's bank, take of business and then restore DB to the entry value. Using long addressing would complicate things because the UART channels are all driven by the same piece of code, with indexing used to select the channel being accessed. I can't do that with long addressing, but I can by using the stack as ephemeral workspace. However, the stack pointer relative addressing modes act on the current data bank only.
Now, suppose I had a running kernel, loaded into a specific bank (e.g., bank $01), with kernel-specific data structures and TIA-232 buffers in the same bank, and disk buffers in bank $02. During a kernel call or when the kernel has to process an interrupt, it would have to set DB to $01 to access its specific data structures, or use [<dp>],Y addressing to get at them. However, during disk I/O, the kernel has to read and write in bank $02. So there is a natural conflict built into this that doesn't have a 100 percent satisfactory resolution. Either DB is constantly manipulated, 24 bit instructions must be used to reach data that is not in the current data bank, or 24 bit direct page addressing must be used to avoid tinkering with DB. In the case of disk buffer accesses, which are usually in fixed increments (e.g., 512 bytes), MVN and/or MVP can be used to shuffle bytes. However, it still ultimately involves fiddling with the bank in some fashion.
You need to change your thinking to accommodate the 65C816's way of doing things. I make extensive use of stack pointer relative addressing for ephemeral storage, thus avoiding having to dedicate direct page to I/O addressing. The result is a fully reentrant ISR that can handle interrupts nested to almost any depth. It's much different than the "traditional" 6502 way of doing things.
GARTHWILSON wrote:
I suspect that doing 24-bit addressing would be more efficient than constantly changing data banks when transferring data from I/O to memory or vice-versa.
It depends a lot on how I/O access is occurring. In a machine with no centralized operating system like your workbench computer, your application(s) may well be directly accessing the I/O hardware, in which case there's a tradeoff between use of 24 bit instructions and setting DB. I don't know the particulars of how your workbench machine handles I/O, so obviously I can't opine one way or another.
In a general purpose machine with a centralized operating system, it makes sense to let the OS handle the ugly details of working with the hardware, with the application(s) making API calls. In such a case, buffering would be used for much I/O and the transfer of data from the OS buffer to the application space could be made with MVN or MVP (MVN if the source and destination banks are different). Those instructions neatly jump the bank chasm, and copy at the rate of one byte per seven clock cycles, much faster than can be accomplished with the traditional looping method.
Another method is to use direct page indirect long addressing, which is like the example I mentioned a few messages ago. It's slower than MVN or MVP, but is better suited to cases where data comes in a byte at a time and must be transferred to the application in that fashion.