I am going implementation-first on this, which isn't necessarily wise, and the starting point is a 6502 core, not a 65816. So there are no bank registers to keep - they would have to be implemented, and that's not trivial - it means distinguishing different types of access in the core. But also, that seems to me to be a messier model which I don't like. In my model, each task lives in an address space of its own. It's written for a private CPU, but executes in a shared one.
Pretend you'd never come across the 816, and take a fresh look at these patches of relocated address space sitting in 4GB of address space, running 6502-like code all of which is written to the usual pattern of fast data, stack, program, heap. (Or a different usual pattern of program, static data, heap and stack.) Remember how confusing the 816 model is when you first come across it, and how it really arises from a banked memory model with short and long addresses. We don't need to recreate that, and have to explain it, and evangelise it.
As for performance hit for direct in-task I/O: a good point. To avoid calling back to T=0 mode, the relocated task needs
- to have no protection applied
- to know the relocated I/O addresses
and that second point could be addressed by load-time fixup, which would need refixing if there were ever any relocation. Slightly better, the task asks the OS at initialisation time where the I/O is by telling the OS which program addresses need to be patched, and the OS will look after the task from there on.
Alternatively, put the I/O critical code into a 'device driver', which is to say arrange to run it as part of the OS which generally runs with T=0. Think of this as task zero.
If we had I/O instructions then we could have an untranslated I/O space: that would be cleaner. Conceivably we could build translation hardware which translates only the lower 2GB of memory space, leaving the upper 2GB as physical memory, which gives you fixed I/O locations and monster lookup tables. I'm not especially happy with either of those, at first glance.
I'd probably argue for putting monster lookup tables into task zero: you can get to those routines with a BRK and it will be much faster than doing the work by hand. That leaves it as a software question as to how much space to dedicate to task zero in each circumstance. (Bear in mind that we'll have multiplication.) In the absence of inter-task protection, and relocation, you can again access that shared memory resource directly by patching up the references.
Perhaps to understand your story better I need to see how you'd allocate memory between tasks, and how your low-jitter i/o routines fit in with the rest of your program. I'm tempted to think you're a special case, but then again you are the only person actually expressing an interest in using the 65Org32!
I hadn't thought of shared libraries at all. I think I'll just ignore them! In other words,
In my limited understanding I think your scheme works as long as any program needs only itself and known OS services to do whatever needs to be done.
sounds good to me, and like we're on the same page!
This isn't meant to be a full MMU, so I think linking will have to take care of libraries. Load-time linking might be more flexible, but run-time linking (as the Amiga managed) would rule out task relocation, unless the OS were clever enough to patch up in-task library vectors when it relocated a task. (Task relocation is asking a lot of an OS writer - I wonder if we'll ever see it.)
Your point about parameter passing is a good one though. When the OS takes a call, it will need to transform any addresses it is handed manually, using the TBR which belongs to the calling task. If it passes any new addresses back, it will need to transform those too. That seems to work OK.
Now, those are my top-of-head responses. On the other hand, you have in mind that two relocated tasks could communicate by inspecting and passing their own TBRs on the stack. (Or, if they know that relocation doesn't happen, they could manually convert back to physical addresses in their parameters?)
I think you've already spotted a crucial point: if you execute an XTR when T=1, you immediately continue execution in the other task. It's a wormhole. The calling convention for inter-task calls would either be to rendezvous at some known locations, or to pass parameters to identify the call type (or a mixture.) The callee always, necessarily, has the caller's TBR in a register because the caller just swapped it. I think this means there's no need to push TBR on the stack manually (and no need for any auto-stacking operations in the core itself)
I am, of course, tempted to wonder if there is a minimal set of changes which make this a protected system. If some operations are forbidden or become NOPs when T=1, then T=0 truly becomes a supervisor mode. If it can be made a protected system, that's quite attractive to me. It would rule out direct inter-task calls (could cause havoc in the other task's data) and might well rule out in-task I/O. That's both of you disappointed! But both would follow from safety. We could make this optional of course, if indeed it's possible at all.