cjs wrote:
BigDumbDinosaur wrote:
cjs wrote:
Hm! So that sounds like yet another reason to move the direct page to cover your I/O addresses when loading data, and then run the load routine in the bank you're loading. That would fix this problem, would it not?
...While accessing an I/O register as a direct page location does eliminate one clock cycle per access—assuming DP points at a page boundary—...
Um, I think two cycles per access in the situation were were talking about, right?
Nope. Non-indexed absolute loads and stores require four cycles, unless the load or store is 16 bits, in which case one additional cycle is required. Non-indexed zero (direct) page loads and stores require three cycles (8-bit transfer) or four (16-bit transfer). However, if the 65C816's DP is pointing to an address that is not on a page boundary a DP load or store will incur a one cycle penalty for each access.
Quote:
I was responding to your earlier comment:
BigDumbDinosaur wrote:
Something to be aware of is reading data from a fixed address with a 65C816, as would be the case with disk I/O, will involve long indirection if the data is going into or coming out of a different bank than the one in which the I/O device is located. Indirection of any kind costs clock cycles because it involves additional internal steps in the MPU. Any 24-bit load or store will incur a one clock cycle penalty for each access.
What I understood you to be saying here is that to load data from the address used for input from your device, say, $C012, you expected one would be using absolute long
LDA $00C012, a 4 byte/5 cycle instruction. (I don't actually see any indirection here, though; the address is being used as given, not loaded from another address.) I proposed replacing that with
LDA $12 with DP set to $C0 (2 bytes/3 cycles).
Reiterating an earlier statement, unless the hardware register and data structure being used with it are in the same bank, either a hard-coded 24-bit address is needed to access one or the other, or a 24-bit direct page pointer is needed if more than one structure may be associated with the hardware in question. As an example of the latter case, when I finally get my extended memory version of POC functioning, there will be the capability to read a block from one of the disks and deposit it in memory almost anywhere. "Almost anywhere" has to be defined in a 24-bit direct page pointer, since "almost anywhere" includes a bank other than the one in which the I/O hardware is present.
Quote:
I'm not seeing any issue with the target location of the data transfer, so long as you're willing to limit single transfers to 64K or less: just load an index register with $10000 - length, set the operand of your STA instruction to destaddr - $10000 - length, and loop until the index hits 0. Yes, this requires self-modifying code, but it's pretty innocuous as far as self-modifying code goes.
Yes that would work, but only if the program is running in RAM that hasn't been write-protected. However, consider that 24-bit indirect indexed requires 6 cycles, but is far more flexible, plus can easily straddle bank boundaries, making loads greater than 64K quite easy.
Quote:
Also, it means you need not worry about the DBR if you don't want to; STA seems to be the same number of cycles for for absolute and long indexed X, according to the WDC book.
In practice, most '816 programs would not be touching DB at all, as indirection can handle all cases where ad hoc access in arbitrary banks is needed.
As for absolute index and absolute indexed long using the same number of cycles, that would be expected. If a 16-bit address is specified DB has to be used to construct the base address, which then has to be added to by the index. If a 24-bit address is specified, the cycle that would have read DB will instead be used to fetch the MSB of the address.
Quote:
Quote:
I went through this exercise when I was designing the SCSI and multi-channel UART drivers for my POC V1 units.... Pointing DP at hardware not only proved to be of no value in performance, it resulted in a a lot of hoop-jumping in order to get at things such as indices and pointers that were needed by the driver.
If the driver needed a bunch of indices and pointers, yeah, you'd want the zero page pointing at those.
I've yet to run into a driver for 6502-type I/O hardware that didn't need some pointers. As I said, the dubious benefit of pointing direct page at hardware is more than offset by the convolutions of trying to make drivers sufficiently general to avoid resorting to self-modifying (and potentially buggy) code.