But that cross banking issue should not be a proble with a 16 bit addressing space, should it? The MCU just write bytes to an memory address and that is... "linear address space" should stand for that: no banking at all.
Long addressing (that is, when you use a three-byte address) has no regard for bank boundaries or bank registers. However, addressing that is neither long, nor stack, nor direct-page, works in the previously selected bank, pointed to by the appropriate bank (data or program) register. If everything required long addressing, code size would become larger and performance would be degraded somewhat. It would also reduce the flexibility of loading various programs into whatever bank(s) are available at load time. It's another thing like 16-bit absolute versus zero page (or even 8-bit relative branch addresses) on the 6502, where the shorter one saves memory and execution time when it's practical to use. The secret then is to make banking to work
for you rather than
against you.
I might add a bit to what BDD said:
The MVN and MVP instructions cannot increment the bank if indexing takes the source or destination to a bank boundary. For copies that span banks you have to do it "manually" with load/store instructions.
The MVN and MVP instructions (which can also be used to do a fill, not just a move) do up to 64KB each time one is called. There's a little overhead to set it up which takes a minor amount of execution time compared to the time to move thousands of bytes. If you need to move a section that spans bank boundaries, you just do it in pieces, which again presents only minor additional overhead. You could instead set up a loop that increments 24-bit pointers in direct page, and it might appear simpler, but it would not execute as fast.
If I'm forgetting something, BDD, Dr Jefyll, or someone else who has used or studied the '816 more than I can correct me.