This flabdablet character is capable of some seriously twisted thinking.
Here's my take on his/her impressive hack.
The inner loop is tidy, and would look like this (below). I chose to unroll by eight, but only to keep the illustration brief. To make the thing worthwhile you'd want to unroll further than that.
All the surrounding code (which I didn't bother to work out) would be rather
untidy. You'd have to start by setting up the Stack Pointer and the Direct Page Register; and check that at least 16 bytes remain to be moved before each iteration of the inner loop is allowed to proceed. After the final iteration you'd need to use conventional coding to move the remaining bytes, if any. (Depending on circumstances, you might be able to ensure there are none.)
Code:
PEI $E
PEI $C
PEI $A
PEI $8
PEI $6
PEI $4
PEI $2
PEI $0
TDC
SEC
SBC #$10
TCD
To avoid a one-cycle penalty for every PEI, you need the low eight bits of the Direct Page Register to equal zero -- IOW keep D page-aligned. For that you'd have to unroll the loop so it contains 128 PEI's -- a reasonable tradeoff, if speed is a priority and memory is cheap.
Edit: this hack works out wonderfully if the block-move source, destination and length are known in advance (as in flabdablet's screen-scrolling scenario). For general use it'll be comparatively clumsy, but still worth pursuing if the block-moves are fairly large.
-- Jeff
_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html