OS/9 was a multi-tasking OS for the '09, and, early on, all of its processes were written using position independent code, no launch time loader was necessary.
Early MacOS did this also, relying on PIC routines. This was fully exploited by the MultiFinder. There was system support (notably for calling different segments of code within the application), but no real loader per se. Of course, UCSD P-Machine did this also, at the p-code level. All of their code was PIC with the segments tied dynamically loaded at runtime. Both the Mac and UCSD required compiler support for multi-segment calls.
You could do the same thing on the '816 (IIGS did this a little bit, with desk accessories), with the caveat that your individual processes could only have < 64K of code, but they could access more than that for data. The OS would need to have a way to portion out Bank 0 as a shared stack space among the processes, but (almost) 64K of Bank 0 should handle hundreds of processes. Each processes meta data can specify a stack size.
None of this requires a loader, but it does need a decent memory manager. And, yes, there's no protection among the processes -- that's a different issue.