BigDumbDinosaur wrote:
Direct page accesses incur a one clock cycle penalty per instruction if DP doesn't start exactly on a page boundary, effacing the speed advantage of DP addressing modes.
Yup. There's a substantial speedup available, but only if you set DP to align with a page boundary.
Earlier I mentioned bit-bang SPI inputting at a rate of 18 or 19 CPU cycles per bit or outputting at 17 or 18 cycles per bit (the figures are data dependent). This already speedy throughput
increases another 12.5% (approx) if you use Direct Page accesses (for 6502, Zero page accesses) -- specifically, each of the cycles-per-bit figures gets reduced by 2.
Even without DP / Z-pg accesses, bit-banged SPI can be surprisingly effective. Talking to an SPI UART,
even a 1 MHz 6502 can bit-bang fast enough to achieve 19.2 or even 34.8 kbaud on the asynch connection -- and of course faster CPU's exceed this performance.
One final note: the
routines I use are written for 6502 / 'C02, and it seems doubtful that '816-specific code would be any faster. But, by design, my routines don't receive and transmit simultaneously. If there were a requirement for that then the B Accumulator might make an '816 faster than a 6502 / 'C02. Apologies for going slightly OT, talking about bit-banging but not on a W65C265.
-- Jeff
(edits -- last paragraph)
_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html