GARTHWILSON wrote:
Are you saying the interrupt service has to fit entirely between task switches in multitasking, and that if it's not finished before the next switch, it will crash?
That is correct, yes, particularly true on multi-processor systems.
Quote:
I was appalled when you said desktop PCs are usually running at least 60 tasks (although I'm not questioning it).
Why?
Quote:
The I/O processing in the hardware I've delt with (which is definitely not PC-class) has not been a speed limitation except where
But, look at what you're doing. Your ISRs don't do a whole heck of a lot of work. In fact, you contradict yourself earlier when you said that you didn't just set a flag, and a few paragraphs later, you wrote that one of your ISRs did exactly that, and no more.
Look at the typical device driver in an OS like, say, MS-DOS. A device driver basically consists of a library containing two entry-points: interrupt, and schedule. These routines are intended to be used by two different halves of the running system. Your application calls schedule to schedule an I/O request of some kind (for now, let's ignore error handling, and assume all I/O requests are perfect and valid). The device driver looks at the request, and queues it up. In the case of a VIA-based shift register, it might be described by a data structure like this:
Code:
MySendRequest:
.word 0 ; pointer to next I/O request (filled in my the schedule routine; used by interrupt; ignored otherwise)
.word opcode_write_request ; constant
.word 0 ; current offset within buffer
.word addressOfData
.word lengthOfData
.word memory_location_to_set_when_finished
.word bit_to_set_when_finished
So, schedule would keep a list of these requests. Once it's been added to the list, it's job is done. (If no pending I/O operation exists, it might send the first byte itself just to kick off the event loop.)
Now, when the VIA issues its next interrupt, the OS interrupt handling chain figures out that it needs to invoke this device driver's "interrupt" entry point. The interrupt routine, then, knows it needs to grab the next byte from its current buffer (or pick a new buffer from the list if it's done), and write it to the VIA. Note how simple this task is:
Code:
; assume 65816 code
; Is this an interrupt for a previous I/O request and we have no more I/O requests to process?
lda currentRequest
beq no_io_in_progress
; Fetch the next byte to send (well, actually a word, but we only work with the lower 8 bits)
ldy #6
lda (currentRequest),y
sta tmpPtr
lda #4
lda (currentRequest),y
tay
lda (tmpPtr),y
; shove it out the VIA serial port. We assume the VIA is mapped to every other byte in memory,
; otherwise, you'll need to fiddle with P's M bit here.
sta VIA_SR
; Increment our offset into the current I/O buffer.
ldy #6
lda (currentRequest),y
inc a
sta (currentRequest),y
; Have we exhausted all the bytes in the buffer?
ldy #8
cmp (currentRequest),y
bne not_yet_finish_with_this_request
; If so, then notify the application who submitted the I/O request
ldy #10
lda (currentRequest),y
sta tmpPtr
ldy #12
lda (currentRequest),y
ora (tmpPtr)
sta (tmpPtr)
; Find the next I/O request to process.
ldy #0
lda (currentRequest),y
sta currentRequest
no_io_in_progress:
not_yet_finished_with_this_request:
rts
There are ways of optimizing this, such as interning the I/O request record directly into zero page (the 8-bit Atari OS does this, for example), but it serves for illustrative purposes.
What we see, despite DOS not being a multitasking operating system, is the use of interrupts to give the illusion of multiple concurrently running processes. In this case, it makes the attached peripheral look very much like a cooperating, self-running service. This is actually what interrupts were invented for, for back in the days of mainframes, keeping all of your I/O devices as busy as possible while the CPU churned out reports was compulsory. Remember companies were paying IBM and Univac millions of dollars yearly for the privilege of having a mainframe in their data center!
Now, let's look at an operating system like OS/2 2.0, which uses the same basic I/O driver architecture, but now includes the ability to switch amongst any number of ready-to-run programs at any given moment. The operating system cannot switch programs while it's executing code in the kernel, because doing so represents a consistency problem. Knowing that switching user-mode contexts takes some amount of time means that you need a faster CPU to keep up with the same real-time constraints you had back in plain old DOS. However, it's not
that much, especially when you consider that most context switches occur only at I/O request completion times or, if no I/O is in progress, between 10 to 100 times per second. For a CPU running, say, 12MHz, that is basically nothing for all but
the most critical real-time constraints.
The structure of our ISR now looks something like this:
Code:
; assume 65816 code
; Is this an interrupt for a previous I/O request and we have no more I/O requests to process?
ldy #0
lda (currentRequest),y
beq no_io_in_progress
; Fetch the next byte to send (well, actually a word, but we only work with the lower 8 bits)
ldy #6
lda (currentRequest),y
sta tmpPtr
lda #4
lda (currentRequest),y
tay
lda (tmpPtr),y
; shove it out the VIA serial port. We assume the VIA is mapped to every other byte in memory,
; otherwise, you'll need to fiddle with P's M bit here.
sta VIA_SR
; Increment our offset into the current I/O buffer.
ldy #6
lda (currentRequest),y
inc a
sta (currentRequest),y
; Have we exhausted all the bytes in the buffer?
ldy #8
cmp (currentRequest),y
bne not_yet_finish_with_this_request
; If so, then notify the application who submitted the I/O request
ldy #10
lda (currentRequest),y
jsr system_alert_user_process
; Find the next I/O request to process.
ldx currentRequest
ldy #0
lda (currentRequest),y
sta currentRequest
; Free the previous I/O request since we don't need it anymore.
txa
jsr system_recycle_request_block
no_io_in_progress:
not_yet_finished_with_this_request:
rts
As you can see, not a whole lot has changed. The only real difference is that, in recognition of real multitasking, you now notify a user-level process upon completion of an I/O request, and you now have to be aware of dynamically managed memory.
But, notice, in neither case do I spend a whole heck of a lot of time doing any one thing. I have no tight loops, I have no dependencies on outside modules, and I certainly don't waste time doing block memory moves or the like. Those things are all to be done by user-level processes, precisely because they are interruptable.
Quote:
I/O accesses are slower than non-cache memory accesses (the latter requiring quite a few cycles before the first word in a burst can be read);
Nope -- at worst, they're the same, but at best, they're as fast as a cache hit. It depends on the cache settings for I/O space, the speed of the bus used to talk to the peripheral, and whether
the peripheral itself generates any wait-states or not.
Quote:
But back the the ISRs, if all they do is to say that "so-and-so interrupted and needs service" and the main program (or associated task in a multitasking system) has to keep checking, why not just wire that interrupt line to a hardware input bit that the task itself reads.
You can, if you don't particularly care about performance. (In fact, I'm using this exact technique with the Kestrel-2 right now, because the J1 lacks interrupt capability of any kind.) The reason you have interrupts is to give the OS a chance to
change CPU state, on demand. If I'm running a low-priority program on my computer which takes 30 seconds to complete a task, and I get a network packet, I do not want my computer to wait 30 seconds before handling that packet. The interrupt causes the OS to switch run-time state from the low-priority task to the device handler task. This is what makes
preemptive multitasking preemptive, and is precisely why cooperative multitasking
does not work in the general-purpose case.
Perhaps a better example is typing on a heavily-loaded web server. It takes a computer up to one millisecond to process a network packet from the Internet into an application's user-space, as measured by real-world, customer-facing services here at work. Can you imagine receiving hundreds of packets while the computer is at the same time trying to handle a keyboard, entirely in kernel-space? Typing speed slows noticeably, to a point where you just want to throw a brick at the computer's monitor. (Actually, I've experienced exactly this back in my CariNet days, albeit with Linux instead of something like WIndows 3.1, and instead of network packets, it'd be handling disk I/O requests on non-DMA IDE drives. Since the drives have to be spoon-fed by the CPU
in kernel space (where interrupts are non-reentrant for kernel consistency reasons, as explained above), keyboard interactivity would drop to near-useless levels. Had the disk drivers sat mostly in user-space, Linux would have kept up with the keyboard without breaking a sweat. Oh, how I miss the 90s.)
Quote:
If the main program has to keep checking, most of those checks will come back with the answer that no service is needed at this time, so they just waste time, like continually checking the door to see if someone is there instead of using the interrupt like a doorbell.
Right.
Quote:
It does seem again however that you're talking about a system where scores of tasks are running.
No, you can have as few as two running en toto, and you'll still run into jitter and latency caused by the cooperative nature of the system.
Quote:
For the kind of stuff I've done with the workbench computer though, it would take more time overall to not completely service the interrupt while you're in the ISR.
That's great, but you're not the only engineer using 65xx products or developing 65xx software. Because of this, we now need to consider the
general case.This whole argument started because you expressed concern that pipelines introduces lengthy delays in handling interrupts. I countered with my stating that's not necessarily the case if you switch contexts in the following cycle. Yes, this means you lose the ability to be re-entrant, but I've found 100% of all ISR cases are themselves not re-entrant (or, can be refactored so as to not require it),
unless you have long-running ISRs. By long running, of course, I mean that you have an ISR that runs as long as any user-level process would be expected to run. If you're the sole user of the machine, having been the sole author of the software running on said machine, running with hardware designed and implemented entirely by yourself, this might be acceptable. But, for the general purpose, that's a tall order to fulfill, and will breakdown almost immediately.
Quote:
Still, most ISRs are pretty short. The one I showed of running the RTC with a VIA timer is a long routine; but with 10ms resolution, 98.6% of the times it finishes after incrementing two things and doesn't have to carry, so it only does 10 instructions altogether, including PHA, PLA, and RTI.
And this is what I'm talking about when I refer to short-lived interrupt handlers. This is wonderful, and indeed exemplary.
Quote:
My fascination with coöperative multitasking is that a processor like the 6502 which wasn't made for multitasking can do it very efficiently in Forth with very little overhead. It does not rule out interrupts that get serviced immediately.
Non sequitor -- I never made this claim.
Quote:
OTOH, if you mean that dormant tasks in round-robin coöperative multitasking take a little time each time around even if to say
No, this is not what I'm saying. What I was saying was that cooperative task switching is far from real-time. Here's a great program to run that amply proves my point. Run this on Windows 3.1 some day, and tell me how it works out for you:
Code:
void WinMain(void) {
for(;;);
}
This is an infinite loop, and because it doesn't voluntarily give control back to Windows (by calling GetMsg()), it will kill your entire system dead, except for ISRs of course. You'll be able to move the mouse, but that's about it. Most if not all disk activity will cease. Network activity will stop cold. Animations playing on the screen will die. The only means of recovery is to kill the program through some means, or to just straight-up reboot.
Now, I know that most Forth implementations are aware of such deliberately malicious code, which is why most implementations of WHILE, UNTIL, AGAIN, and REPEAT embed a call to PAUSE as part of their definitions. But, even here, we can circumvent it. Try this on your next multitasking-capable Forth:
Code:
code delay
here jmp,
;code
: oops begin delay again ;
Ahh, but who in their right mind would deliberately write software to bring a machine to its knees like this? Well, a few people come to mind -- QA engineers (which I am), load-test engineers, and brats who think they're 1337 script-kiddies. But, on the whole the biggest source of sluggishness in a program won't come from deliberate attempts to bring a system down; rather, it'll come from busy engineers whose code sits at the "works for me" level of quality, and isn't adequately tested to cover all control flows. When you get that, invariably, you end up with
accidents that result in infinite loops, or people using algorithms with time-complexities that are O(n^2) instead of O(n log n), etc. So even if you do PAUSE judiciously in your sources, the dynamic behavior of a program
can (and I'm not saying yours does, but again, in the
general case) result in really lack-luster performance on cooperatively multitasking systems.
This is where interrupts and preemption come into play, and why they work so well together. Knowing that it's possible to wire the NMI handler to an OS routine that restores control over the machine, why not generalize this to allowing the computer to
forcefully switch to another program? That's all preemptive multitasking is, and it actually isn't hard to do. In fact, you have to write a cooperatively-multitasking kernel
first, and then make it preemptive by calling PAUSE (or whatever your system call equivalent is) in the ISR of the timer handler.
You can get very good performance if your kernel is built to handle real-time constraints (consider QNX or L4 microkernels, for example; and, you can snow even these if you elect to forego the use of MMU for memory protection), even if your code sits in user-space.
Quote:
Although I know you've had distractions, I would still look forward to seeing you pull it off if you still have any interest in it;
While I periodically muse about it, the availability of FPGA hardware renders the 65xx architecture obsolete for my needs. The J1 processor, despite its 16K code-space, 64K data-space limitations, so completely out-classes the 65816 in performance that it's hard for me to consider going back. I do need to give it the ability to see more than 64K of memory some day, though. Oh, and I do need to add in interrupts too.
That being said, the Kestrel-2's I/O registers and memory interfaces should be relatively simple to adapt to the 65xx architecture, since the Wishbone bus is so close to the 65xx's. So, if I ever do manage to get into the 65xx mood again some day, it's a fairly simple matter of replacing the CPU binding and writing a new software tool-chain for it. Everything else should "just work."