BigDumbDinosaur wrote:
In any case, running a multitasking environment on the NMOS 6502 is mostly an exercise in futility, in my opinion. It can be done with the 65C02 (I’ve been there and done that), but efforts to do so will be hampered by the C02’s inability to restart an instruction that had to be aborted due to a page fault, access violation, etc.
I'm interested in why that's seen as so important. I can see that it's required for things like virtual memory, copy-on-write, and memory-mapped files - which are great features to support - but I think they are icing on the cake, and I'd be willing to accept that a 65C02 is not cut out for those features, at least not without a much more active MMU (e.g. a coprocessor standing by to fix things up mid-instruction, like Andre's system has).
BigDumbDinosaur wrote:
Although not designed for use in a preemptive, multitasking environment, any effort expended in setting up such an arrangement would be far more profitable with the 65C816, mainly because it has an ABORT interrupt, as well as features that make it easier for system logic to know what is happening at any given instant. For example, monitoring VDA and VPA will always tell logic where the 816 is in the instruction cycle. You don’t have that with the 65C02, whose SYNC output only tells you when an instruction opcode is being fetched. That won’t help in trying to police for instructions that would “touch” memory outside of the allowed address range for the particular program that is currently executing.
Early ARM CPUs had a similar system, and extensive documentation on exactly what the abort handler needed to do to "unpick" the instruction that was aborted. Most instructions didn't need any work, but some were quite thorny IIRC. I guess it was as much as they could afford to support in the CPU, but just enough that it was possible for the OS to pick up the pieces. It is harder with something like the 6502 instruction set where so many instructions are pretty much impossible to undo.
BigDumbDinosaur wrote:
Your main concern would be what happens if a user-land process disables IRQs. That’s something you could address with a watchdog timer wired to NMI. Within your IRQ handler (not the NMI handler), you’d have a code snippet to reset the watchdog on each IRQ. If IRQ processing ceases, such as due to an application executing
SEI, the watchdog would eventually time out and force an NMI. The NMI handler’s job would be to re-enable IRQs by rewriting the stack copy of
SR (status register) that was pushed by the MPU when it acknowledged the NMI. Upon
RTIing out of the NMI handler, the
i bit in
SR will be cleared and IRQ processing will resume. With a 100 Hz jiffy IRQ, for example, you could set the watchdog to time out in 100-or-so milliseconds, minimizing the risk of deadlock. Presumably, your IRQ handler won’t take 100ms to complete.
Absolutely, I think we're on the same page - I was planning to hold the timer in reset whenever in supervisor mode. However, I hadn't thought of having the NMI handler simply reset the I bit to unblock the regular scheduler - I'd assumed it would also run the scheduler to select a new process itself. Just resetting the bit is much simpler in principle, and I like it, except that it doesn't help with the STP case. If we need to use resets for that, then I think we might as well also use resets for killing processes that have disabled interrupts for too long.
BigDumbDinosaur wrote:
Quote:
There is potentially some use for user processes to be able to disable interrupts - one example is atomic access to a multi-byte quantity, perhaps something that is updated by an interrupt or another process.
Why would that matter? If your preemptive, multitasking kernel is correctly written, a context change in such a situation should cause no problems whatsoever when your preempted process runs again.
The main case I had in mind was shared memory, but it's not something I've thought through a lot. I was considering the overheads of making system calls, and whether there might be some value in having some alternate interfaces available to user processes. For example, a shared page of memory containing read-only data about the system - e.g. elapsed time field, a list of keys currently held down, that sort of thing - updated by the kernel, so that user code can make some decisions based on this data, at least where there are no side-effects needed. In this case it's possible that an interrupt occurs leading to the elapsed time being updated, while the user process is halfway through reading the old value. Another case is where two processes are sharing a page of memory for IPC - given the lack of any atomic multi-byte operations in the 6502, it could be valuable to have a way to briefly prevent other processes running while you fetch some data. Again not something I've thought through very much.
BigDumbDinosaur wrote:
The logical thing to do if you are going to try to regain control via a hard reset is to soft-vector reset to code that will fix up the stack pointer, make sure IRQs are enabled, and decimal arithmetic is cleared, and then re-enter the kernel at a designated point. The only significant problem I see with this approach will be in figuring out who the offender is so it can be removed from the kernel’s run queue.
That's a good point, if the vectors are in writable memory we can just update those, I think that could work really well in my current design. I don't think it would be hard to work out which process was running - it will be whatever process the scheduler last resumed, and we can store the process ID in kernel RAM.
BigDumbDinosaur wrote:
Well, what you’ve proposed doesn’t appear address the problems that could be caused by a user-land process touching hardware or accessing RAM areas that are off-limits. Any thoughts on that?
User processes shouldn't have any access to I/O devices, because they shouldn't be in their memory map at all. My user processes only have paged RAM, across their entire address space. For logical pages that haven't been mapped to any RAM, I'm thinking of defaulting them to map reads to a page initialized with zeros, and writes to another dummy page of physical RAM, so that these operations have specific, safe effects. Beyond this low-effort approach, it might not be hard to have the hardware actually treat these page mappings as invalid, so that the kernel can at least terminate the process and report the exception.
BigDumbDinosaur wrote:
Also, what do you plan to do about that pesky
STP instruction? Furthermore, how do you propose to deal with a user-land program having the following instruction sequence?
Code:
sei
wai
My current (new) plan is to use timer-based IRQs to let the scheduler run, and probably also run it after every system call. Then use an additional timer to trigger reset if for whatever reason this IRQ doesn't achieve its purpose - whether that's due to SEI/WAI, STP, etc. For a while I thought about using the RDY pin to detect STP and WAI, but didn't see much benefit to be gained from that.
pjdennis wrote:
The main issue I could see is a negative impact on handling of time-sensitive interrupts. A one-off miss of e.g. a serial data byte transfer might not be a major problem, ...
I'd like to still avoid that, and have considered having a timer that runs whenever IRQ is low but the system is not in supervisor mode, so that if an IRQ is pending for longer than a tolerable period the process gets killed. The regular watchdog timer is going to need to be rather long, and only really works to break out of user processes that are completely blocking the system with STP etc.
pjdennis wrote:
but a malicious process, depending on OS capabilities, might be able to create a denial of service situation by repeatedly spawning a child process that disables interrupts. Of course you might be more concerned about accidental programming issues than deliberate misbehavior, though I guess you could handle this scenario by recursively killing the parent user process of any process caught by the watch dog.
This is still a vulnerability. I'm not sure how much we can hope to do about it though - I think even modern OSes running on much more suitable hardware still have trouble with this.