6502.org • View topic - interrupt latency (and 6502 die size)

View unanswered posts | View active topics

Board index » 6502.org Users Forum » General Discussions

All times are UTC

interrupt latency (and 6502 die size)

Page 1 of 1

[ 13 posts ]

Previous topic | Next topic

Author

Message

BigEd

Post subject: interrupt latency (and 6502 die size)

Posted: Mon Dec 12, 2011 6:19 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

Elsewhere tor let us know about the NORD-10:

Tor wrote:

kc5tja wrote:

If the CPU maintains a whole separate context dedicated just for interrupt handling (e.g., does NOT fetch from CPU vectors, but has a whole different set of IP, A, X, Y, S, and P registers), you can switch contexts in a single cycle.

The Norsk Data NORD-10 16-bit minicomputer from the seventies had 16 sets of fully functioning registers, one for each of 16 (#0 to #15) interrupt levels. An interrupt on a higher level than the current level (user level was level #1) would just switch all the registers (including some internal memory/paging registers). Four of the levels were used for traditional vector-based interrupts, with 512 possible sources on each of those levels.
Anyway, a context switch for handling an interrupt took only 1.7 microseconds, not bad back in 1973. Level 15 was only ever used by the Cern facility in Geneva, so it was known as 'the Cyclotron level'.

Later models (the ND-100) still had those 16 sets, but those weren't fully functioning registers, they were register stores and the register set would be copied from there. Slightly slower: 5uS, but the rest of the system was faster so presumably they thought it paid off anyway.

(Probably not really relevant to this thread though - I don't assume such an architecture would be useful for the 65org32)

-Tor

Sam quoted comment is this one from the same thread (followups here, please, or in another new thread, as they are OT there) and reminds me of the Z80 which has an alternate register set.

I'm also reminded of ARM, which originally had 4 partially overlapping register banks for the 4 modes of operation. The FIQ (fast IRQ) mode overlays a total of 7 registers.

I've also brought this into a new thread because I'm reminded of this press clipping where Chuck Peddle is quoted as saying the 6502 team "tried an 8-state machine which would dump the accumulator as well as the program counter and status but since it added another 10 mils to the chip, we discarded it."

We know that the 6502 had an aggressive chip die size target (to make the aggressive price target sustainable) and we know that the die size was exceeded at some point during the chip layout, with surgery needed. Some untidyness remains which we (in the visual6502 team) believe to be remnants of that shrink. What we don't know for sure is what functional changes were needed to support the shrink. That quote might indicate that the interrupt handling originally pushed/restored A (like 6800), and the reason it doesn't is because of the need to reduce the size of the chip.

Cheers
Ed

Top

ChuckT

Post subject: Re: interrupt latency (and 6502 die size)

Posted: Mon Dec 12, 2011 6:31 am

Joined: Wed May 20, 2009 1:06 pm
Posts: 491

BigEd wrote:

We know that the 6502 had an aggressive chip die size target (to make the aggressive price target sustainable) and we know that the die size was exceeded at some point during the chip layout, with surgery needed. Some untidyness remains which we (in the visual6502 team) believe to be remnants of that shrink. What we don't know for sure is what functional changes were needed to support the shrink. That quote might indicate that the interrupt handling originally pushed/restored A (like 6800), and the reason it doesn't is because of the need to reduce the size of the chip.

Cheers
Ed

Maybe they took it out when Motorola sued MOS because the 6501 was pin compatible with the Motorola 6800 and they took it out when they made the 6502 which isn't pin compatible? I don't know because I'm only guessing...

Quote:

The 6501 and 6502 where nearly identical. The primary difference was the pin arrangement; a 6501 is pin compatible with the Motorola 6800 and 6502 is not.
In June of 1975, soon after the show, Motorola realized they had turned their engineers into their competition. Motorola got mad and sued MOS for infringement of 6800 patents. Chuck said "...there was no substance to their claims..." but it scared the old line industry management at Allen-Bradley. "As soon as lawyers got involved, they wanted out." said Chuck. As a shock to everyone, Allen-Bradley walked away from MOS and basically gave it to the existing MOS management team.

Quote:

It is interesting to note that Bill Mensch tells a more complete version of this important part of the story. "It was not about patent infringement; it was about intellectual property." Those eight engineers knew an awful lot of unpatented concepts developed at Motorola and that is what Motorola was trying to protect. "We knew we were (infringing). The (MOS) 6520 was a direct copy of the (Motorola) 6820." MOS had agreements in place with Motorola and "...We paid Motorola all along."

http://www.commodore.ca/history/company ... nology.htm

Top

BigEd

Post subject: Re: interrupt latency (and 6502 die size)

Posted: Mon Dec 12, 2011 6:42 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

ChuckT wrote:

It's an idea, but no: the die shrink was pre-production and the suit was afterwards (Nov 3rd, 1975)

The 6501 and 6502 chips were prepared concurrently: most likely only the contact masks differ (all other masks the same) - the schematics are labelled as 650X and cover both chips.

I think it's speculation as to whether MOS ever expected to get away with mass sales of the 6501, but they certainly succeeded in settling that suit and having the other product already on the market. The advert from August 75 is for 6501 but the one for September mentions both parts.

ChuckT wrote:

Quote:

In June of 1975, soon after the show

This quote presumably from here. But surely the wescon75 show was in September?

Cheers
Ed

Top

kc5tja

Post subject:

Posted: Tue Dec 13, 2011 3:54 am

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

Context is required to understand this thread. To bring everyone up to speed with the "executive summary" version, this is what happened to spawn this thread.

Garth had said that the existence of a pipeline made interrupt response time slow, implying it was unacceptably slow. I disagreed -- you'd need a pipeline in excess of 7 stages to begin to slow the processor response to an interrupt if the CPU had internal registers allowing it to switch contexts instantly. Further, maintaining a complete alternate state would allow the CPU to change contexts faster than current 6502 speeds, because after the CPU spends its 6 cycles fetching the interrupt vector, you still need to save and restore A, X, and Y.

Garth later (correctly) indicated that it's not re-entrant, to which I responded that re-entrancy isn't required if you structure your ISR as a coroutine instead of a subroutine.

Top

GARTHWILSON

Post subject:

Posted: Tue Dec 13, 2011 5:35 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California

Quote:

because after the CPU spends its 6 cycles fetching the interrupt vector, you still need to save and restore A, X, and Y.

Most of my ISRs don't save all three of those, although I'm sure there have been a few that did. I've had ISRs that just incremented a set of counter bytes in memory, and since you can do that without affecting any processor registers other than P which gets pushed and pulled automatically anyway, there was no PHA, PHX, PHY, PLY, PLX, or PLA at all. [Edit: To clarify, this is something that does not need attention from the main program before the next interrupt. The main program can look up the count whenever it wants to, or even never, without losing bytes; so it's not like the main program has to poll it.] When I do PIC16 development for our products, I try really hard to not have to save and restore the processor registers at all because it's such a huge, mickey-mouse job on that decrepit processor. It's definitely not as easy as PHW and PLW or PHP and PLP (the latter being implied and automatic on the 6502 for ISRs). I expect it would generally (but not universally) save time on the 6502 if saving A, X, and Y were part of the interrupt sequence and took one clock each.

It occurs to me that the processor could have registers for the interrupt vectors, registers that you would pre-load before enabling interrupts, so that the interrupt sequence doesn't have to go out on the bus to read them. Or is that what you meant with the I_PC?

Quote:

Garth later (correctly) indicated that it's not re-entrant, to which I responded that re-entrancy isn't required if you structure your ISR as a coroutine instead of a subroutine.

Your example code there looks somehing like what I did in my '816 Forth's prioritized interrupt system. The assembly-language ISRs get priority since the reason they're in assembly is for maximum speed, and the JSR instructions are put in RAM by the ISR installer, in the order that the ISR priority dictates. If you insert an ISR, it will scoot other JSR instructions around unless its priority is lower than all the already-installed ones so it gets put at the end. As you delete ISRs, the list get shorter, and, unless you delete the last one in the list, the lower-priority ones get scooted up to fill in the gap. You can still have an interrupt interrupt the service on another interrupt, but the need for it is rare. I'm not sure I've done re-entrancy beyond letting NMI's interrupt the service of IRQ's. I don't like to have an ISR just leave a flag or value though for something else in the main program to poll, since that defeats the purpose of an interrupt.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

kc5tja

Post subject:

Posted: Mon Dec 19, 2011 1:50 am

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

GARTHWILSON wrote:

Thank you for proving just about every one of my points above. But...

Quote:

I expect it would generally (but not universally) save time on the 6502 if saving A, X, and Y were part of the interrupt sequence and took one clock each.

Here, you're distracting from my point. The point is to not touch outside memory to save state. The CPU maintains internal registers. Register selection would work like this (in untested Verilog):

Code:

module Register(
  in [7:0] d,
  out [7:0] q,
  in  handlingIRQ,
  in  addressingProblem,
  in  writeEnable,
  in  clk
)

reg [7:0] regP;  /* Problem-state accumulator */
reg [7:0] regS;  /* Supervisor-state accumulator */
reg [7:0] out;
assign q = out;

wire supervisor = ~addressingProblem & handlingIRQ;
wire regSwe = writeEnable & supervisor;
wire regPwe = writeEnable & ~supervisor;

always @* begin
  out <= (supervisor)? regS : regU;
end

always @(posedge clk) begin
  if(writeEnable && regUwe)    regU <= d;
  if(writeEnable && regSwe)    regS <= d;
end
endmodule;

where handlingIRQ is true if the processor has received an interrupt. Note how as soon as this signal is asserted, the selection of internal register changes immediately. It's not all that much harder to extend this to handle the case of pipelining either.

The 'addressingProblem' signal exists because, sometimes (e.g., as when using BRK to implement debugger breakpoints) you want to read the "problem-state" CPU registers. I didn't state how exactly this is to be done, only that it is to be made possible. It is valid if, and only if, handlingIRQ is true. If false, it's completely ignored, and all accesses occur to the problem-state register only.

Quote:

It occurs to me that the processor could have registers for the interrupt vectors, registers that you would pre-load before enabling interrupts, so that the interrupt sequence doesn't have to go out on the bus to read them. Or is that what you meant with the I_PC?

I_PC is the interrupt handler's program counter, yes. R_PC is the "return to" program counter which RTI would stuff into PC when it's done.

Quote:

Your example code there looks somehing like what I did in my '816 Forth's prioritized interrupt system.

Your system assumes RAM and the ability to redefine code on the fly. I assume ROM, and the handler tree is static. In either case, it's an implementation detail intended to not to recommend how to structure the IRQ handler except insofar as to demonstrate how an interrupt co-routine would look like as compared to a typical sub-routine.

Note that your approach, as described, still treats the interrupt handler as a subroutine.

Quote:

I don't like to have an ISR just leave a flag or value though for something else in the main program to poll, since that defeats the purpose of an interrupt.

No it doesn't. It's exactly what interrupts are for -- notification of some external stimulus to a main program.

For example, your approach to handling interrupts, where you do all interrupt processing right in the ISR would emphatically crash nearly every multitasking kernel I can think of. This is because kernels depend on IRQ handlers running very, very, very, very fast. Most peripherals, being I/O bound, are horrifically slow compared to the speed at which the kernel would expect to change tasks. Doing all I/O processing in the interrupt handler that generated the need for it would starve programs of their required processing time. You'd never get anything done!

You complain about polling, but yet you celebrate cooperative multitasking. These are identical in performance, and as Windows 3.0 and its predecessors have shown, disastrous in all but the most specialized cases. What you want is preemptive multitasking. Instead of the CPU hardware itself saving and restoring program state, which you depend on in your model, the kernel decides which context to enact next. This process is called scheduling. This is most easily achieved by simply switching kernel stacks before executing RTI and its required register reloads.

Remember, the Commodore-Amiga was an 7.15909MHz 68000 (less than 2 MIPS on average!) with gargantuan interrupt latencies compared to the 6502. Yet, its preemptively multitasking kernel was necessary and sufficient for it to find applications ranging as menial as a games machine that raced the electron beam on a CRT all the way up to helping to monitor nuclear reactor stations, powering NASA data acquisition systems, and more. Note that the 68000 has to save and restore, manually, a heck of a lot more state than the 6502 or 65816, and must do so manually, to a large extent. Clearly, handling most interrupts in user-space was not a problem for this machine.

What you're doing is using interrupts as a means to support a kind of hardware-implemented multitasking, which is fine for only the simplest of requirements. But, it will fail utterly in the general case.

Top

GARTHWILSON

Post subject:

Posted: Tue Dec 20, 2011 2:47 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California

I had to go back and review how we got here.

Quote:

I expect it would generally (but not universally) save time on the 6502 if saving A, X, and Y were part of the interrupt sequence and took one clock each.

Here, you're distracting from my point. The point is to not touch outside memory to save state. The CPU maintains internal registers.

If I'm understanding you right, this is the same as the (non-re-entrant) way of having a second set of registers so you can just switch sets when you get an interrupt, right?

Quote:

I don't like to have an ISR just leave a flag or value though for something else in the main program to poll, since that defeats the purpose of an interrupt.

Are you saying the interrupt service has to fit entirely between task switches in multitasking, and that if it's not finished before the next switch, it will crash? I was appalled when you said desktop PCs are usually running at least 60 tasks (although I'm not questioning it). The I/O processing in the hardware I've delt with (which is definitely not PC-class) has not been a speed limitation except where, in non-ISR routines, I've input or output multiple bytes through the VIA's shift register for example and put a few NOPs in between to give it just enough time to finish shifting a byte before addressing it again. I can imagine that in the multi-GHz PCs, I/O accesses are slower than non-cache memory accesses (the latter requiring quite a few cycles before the first word in a burst can be read); but in the 65-family designs we're talking about here on this forum, I/O accesses will be as quick as any memory access, meaning they will not be I/O bound.

But back the the ISRs, if all they do is to say that "so-and-so interrupted and needs service" and the main program (or associated task in a multitasking system) has to keep checking, why not just wire that interrupt line to a hardware input bit that the task itself reads. If the main program has to keep checking, most of those checks will come back with the answer that no service is needed at this time, so they just waste time, like continually checking the door to see if someone is there instead of using the interrupt like a doorbell. The ISR then just acts as an unnecessary middleman. It does seem again however that you're talking about a system where scores of tasks are running. I won't pretend to defend my point in that situation, since I know almost nothing about it. For the kind of stuff I've done with the workbench computer though, it would take more time overall to not completely service the interrupt while you're in the ISR. Still, most ISRs are pretty short. The one I showed of running the RTC with a VIA timer is a long routine; but with 10ms resolution, 98.6% of the times it finishes after incrementing two things and doesn't have to carry, so it only does 10 instructions altogether, including PHA, PLA, and RTI.

Quote:

You complain about polling, but yet you celebrate cooperative multitasking.

My fascination with coöperative multitasking is that a processor like the 6502 which wasn't made for multitasking can do it very efficiently in Forth with very little overhead. It does not rule out interrupts that get serviced immediately. Since each task decides when to give control to the next one, a long interrupt service only delays that transfer of control to the next task, and it doesn't hurt anything if it overlaps the time that the trasfer would have taken place without the interrupt. I was not thinking of using another task to do most interrupt service after having polled something to see if the interrupt did actually occur. I had not communicated that, so possibly it was a source of confusion. OTOH, if you mean that dormant tasks in round-robin coöperative multitasking take a little time each time around even if to say "I don't need to do anything right now" and pass control to the next task, I see what you're saying, that that would be a waste of time, essentially letting each task poll itself instead of having a task scheduler.

Quote:

What you're doing is using interrupts as a means to support a kind of hardware-implemented multitasking, which is fine for only the simplest of requirements. But, it will fail utterly in the general case.

In the past, you have been interested in making a 65-family PC. Although I know you've had distractions, I would still look forward to seeing you pull it off if you still have any interest in it; but I do recognize that's a different class of computer from what I've been focusing on, and I've thought the two are probably not compatible. (We had some email exchanges on this quite some time back.) One way to do both is probably to have the workbench-computer-type computer to fit in a slot (or the equivalent) of the PC, instead of trying to do it all with one computer. This is something I've brought up before in coprocessing musings. I may still do something like that myself in my efforts to wean the workbench computer off the PC as a host which I use for its full keyboard, monitor, disc drives, and full-featured programmer's text editor.

My introduction to round-robin coöperative multitasking came by way of a possibly early 1990's article in The Computer Journal IIRC, a magazine that appeared to be published in someone's garage and was dedicated to vintage and lower-technology computers. For many years, I had in mind implementing this multitasking, but as I saw more and more possibilities of what I could do with the 6502's interrupts, I went that way instead.

In any case, I'm not opposed to enhancing a new 65-family processor with things like I_PC and R_PC if the 65 flavor and bus simplicity and programming simplicity are still there. As we discuss this more and more, I feel like the lines are getting blurred between enhancing the 65 and going to an entirely different processor, of which there are many and there wouldn't be any point in adding another one to the list. The only reason to do our own would be to get more power while preserving the simplicity. And actually, as I saw when writing my '816 Forth that the '816 was easier to program than the 6502, I anticipate further simplifications in going to 32 bits. The 32-bit DO LOOP and related words in 6502 Forth that I wrote were ridiculously long, but would be almost trivial with 32-bit registers.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

kc5tja

Post subject:

Posted: Wed Dec 21, 2011 3:15 am

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

GARTHWILSON wrote:

Are you saying the interrupt service has to fit entirely between task switches in multitasking, and that if it's not finished before the next switch, it will crash?

That is correct, yes, particularly true on multi-processor systems.

Quote:

I was appalled when you said desktop PCs are usually running at least 60 tasks (although I'm not questioning it).

Why?

Quote:

The I/O processing in the hardware I've delt with (which is definitely not PC-class) has not been a speed limitation except where

But, look at what you're doing. Your ISRs don't do a whole heck of a lot of work. In fact, you contradict yourself earlier when you said that you didn't just set a flag, and a few paragraphs later, you wrote that one of your ISRs did exactly that, and no more.

Look at the typical device driver in an OS like, say, MS-DOS. A device driver basically consists of a library containing two entry-points: interrupt, and schedule. These routines are intended to be used by two different halves of the running system. Your application calls schedule to schedule an I/O request of some kind (for now, let's ignore error handling, and assume all I/O requests are perfect and valid). The device driver looks at the request, and queues it up. In the case of a VIA-based shift register, it might be described by a data structure like this:

Code:

MySendRequest:
  .word  0  ; pointer to next I/O request (filled in my the schedule routine; used by interrupt; ignored otherwise)
  .word  opcode_write_request  ; constant
  .word  0  ; current offset within buffer
  .word  addressOfData
  .word  lengthOfData
  .word  memory_location_to_set_when_finished
  .word  bit_to_set_when_finished

So, schedule would keep a list of these requests. Once it's been added to the list, it's job is done. (If no pending I/O operation exists, it might send the first byte itself just to kick off the event loop.)

Now, when the VIA issues its next interrupt, the OS interrupt handling chain figures out that it needs to invoke this device driver's "interrupt" entry point. The interrupt routine, then, knows it needs to grab the next byte from its current buffer (or pick a new buffer from the list if it's done), and write it to the VIA. Note how simple this task is:

Code:

  ; assume 65816 code
  ; Is this an interrupt for a previous I/O request and we have no more I/O requests to process?
  lda currentRequest
  beq no_io_in_progress

  ; Fetch the next byte to send (well, actually a word, but we only work with the lower 8 bits)
  ldy #6
  lda (currentRequest),y
  sta tmpPtr
  lda #4
  lda (currentRequest),y
  tay
  lda (tmpPtr),y

  ; shove it out the VIA serial port.  We assume the VIA is mapped to every other byte in memory,
  ; otherwise, you'll need to fiddle with P's M bit here.
  sta VIA_SR

  ; Increment our offset into the current I/O buffer.
  ldy #6
  lda (currentRequest),y
  inc a
  sta (currentRequest),y

  ; Have we exhausted all the bytes in the buffer?
  ldy #8
  cmp (currentRequest),y
  bne not_yet_finish_with_this_request

  ; If so, then notify the application who submitted the I/O request
  ldy #10
  lda (currentRequest),y
  sta tmpPtr
  ldy #12
  lda (currentRequest),y
  ora (tmpPtr)
  sta (tmpPtr)

  ; Find the next I/O request to process.
  ldy #0
  lda (currentRequest),y
  sta currentRequest

no_io_in_progress:
not_yet_finished_with_this_request:
  rts

There are ways of optimizing this, such as interning the I/O request record directly into zero page (the 8-bit Atari OS does this, for example), but it serves for illustrative purposes.

What we see, despite DOS not being a multitasking operating system, is the use of interrupts to give the illusion of multiple concurrently running processes. In this case, it makes the attached peripheral look very much like a cooperating, self-running service. This is actually what interrupts were invented for, for back in the days of mainframes, keeping all of your I/O devices as busy as possible while the CPU churned out reports was compulsory. Remember companies were paying IBM and Univac millions of dollars yearly for the privilege of having a mainframe in their data center!

Now, let's look at an operating system like OS/2 2.0, which uses the same basic I/O driver architecture, but now includes the ability to switch amongst any number of ready-to-run programs at any given moment. The operating system cannot switch programs while it's executing code in the kernel, because doing so represents a consistency problem. Knowing that switching user-mode contexts takes some amount of time means that you need a faster CPU to keep up with the same real-time constraints you had back in plain old DOS. However, it's not that much, especially when you consider that most context switches occur only at I/O request completion times or, if no I/O is in progress, between 10 to 100 times per second. For a CPU running, say, 12MHz, that is basically nothing for all but the most critical real-time constraints.

The structure of our ISR now looks something like this:

Code:

  ; assume 65816 code
  ; Is this an interrupt for a previous I/O request and we have no more I/O requests to process?
  ldy #0
  lda (currentRequest),y
  beq no_io_in_progress

  ; Fetch the next byte to send (well, actually a word, but we only work with the lower 8 bits)
  ldy #6
  lda (currentRequest),y
  sta tmpPtr
  lda #4
  lda (currentRequest),y
  tay
  lda (tmpPtr),y

  ; shove it out the VIA serial port.  We assume the VIA is mapped to every other byte in memory,
  ; otherwise, you'll need to fiddle with P's M bit here.
  sta VIA_SR

  ; Increment our offset into the current I/O buffer.
  ldy #6
  lda (currentRequest),y
  inc a
  sta (currentRequest),y

  ; Have we exhausted all the bytes in the buffer?
  ldy #8
  cmp (currentRequest),y
  bne not_yet_finish_with_this_request

  ; If so, then notify the application who submitted the I/O request
  ldy #10
  lda (currentRequest),y
  jsr system_alert_user_process

  ; Find the next I/O request to process.
  ldx currentRequest
  ldy #0
  lda (currentRequest),y
  sta currentRequest

  ; Free the previous I/O request since we don't need it anymore.
  txa
  jsr system_recycle_request_block

no_io_in_progress:
not_yet_finished_with_this_request:
  rts

As you can see, not a whole lot has changed. The only real difference is that, in recognition of real multitasking, you now notify a user-level process upon completion of an I/O request, and you now have to be aware of dynamically managed memory.

But, notice, in neither case do I spend a whole heck of a lot of time doing any one thing. I have no tight loops, I have no dependencies on outside modules, and I certainly don't waste time doing block memory moves or the like. Those things are all to be done by user-level processes, precisely because they are interruptable.

Quote:

I/O accesses are slower than non-cache memory accesses (the latter requiring quite a few cycles before the first word in a burst can be read);

Nope -- at worst, they're the same, but at best, they're as fast as a cache hit. It depends on the cache settings for I/O space, the speed of the bus used to talk to the peripheral, and whether the peripheral itself generates any wait-states or not.

Quote:

But back the the ISRs, if all they do is to say that "so-and-so interrupted and needs service" and the main program (or associated task in a multitasking system) has to keep checking, why not just wire that interrupt line to a hardware input bit that the task itself reads.

You can, if you don't particularly care about performance. (In fact, I'm using this exact technique with the Kestrel-2 right now, because the J1 lacks interrupt capability of any kind.) The reason you have interrupts is to give the OS a chance to change CPU state, on demand. If I'm running a low-priority program on my computer which takes 30 seconds to complete a task, and I get a network packet, I do not want my computer to wait 30 seconds before handling that packet. The interrupt causes the OS to switch run-time state from the low-priority task to the device handler task. This is what makes preemptive multitasking preemptive, and is precisely why cooperative multitasking does not work in the general-purpose case.

Perhaps a better example is typing on a heavily-loaded web server. It takes a computer up to one millisecond to process a network packet from the Internet into an application's user-space, as measured by real-world, customer-facing services here at work. Can you imagine receiving hundreds of packets while the computer is at the same time trying to handle a keyboard, entirely in kernel-space? Typing speed slows noticeably, to a point where you just want to throw a brick at the computer's monitor. (Actually, I've experienced exactly this back in my CariNet days, albeit with Linux instead of something like WIndows 3.1, and instead of network packets, it'd be handling disk I/O requests on non-DMA IDE drives. Since the drives have to be spoon-fed by the CPU in kernel space (where interrupts are non-reentrant for kernel consistency reasons, as explained above), keyboard interactivity would drop to near-useless levels. Had the disk drivers sat mostly in user-space, Linux would have kept up with the keyboard without breaking a sweat. Oh, how I miss the 90s.)

Quote:

If the main program has to keep checking, most of those checks will come back with the answer that no service is needed at this time, so they just waste time, like continually checking the door to see if someone is there instead of using the interrupt like a doorbell.

Right.

Quote:

It does seem again however that you're talking about a system where scores of tasks are running.

No, you can have as few as two running en toto, and you'll still run into jitter and latency caused by the cooperative nature of the system.

Quote:

For the kind of stuff I've done with the workbench computer though, it would take more time overall to not completely service the interrupt while you're in the ISR.

That's great, but you're not the only engineer using 65xx products or developing 65xx software. Because of this, we now need to consider the general case.

This whole argument started because you expressed concern that pipelines introduces lengthy delays in handling interrupts. I countered with my stating that's not necessarily the case if you switch contexts in the following cycle. Yes, this means you lose the ability to be re-entrant, but I've found 100% of all ISR cases are themselves not re-entrant (or, can be refactored so as to not require it), unless you have long-running ISRs. By long running, of course, I mean that you have an ISR that runs as long as any user-level process would be expected to run. If you're the sole user of the machine, having been the sole author of the software running on said machine, running with hardware designed and implemented entirely by yourself, this might be acceptable. But, for the general purpose, that's a tall order to fulfill, and will breakdown almost immediately.

Quote:

Still, most ISRs are pretty short. The one I showed of running the RTC with a VIA timer is a long routine; but with 10ms resolution, 98.6% of the times it finishes after incrementing two things and doesn't have to carry, so it only does 10 instructions altogether, including PHA, PLA, and RTI.

And this is what I'm talking about when I refer to short-lived interrupt handlers. This is wonderful, and indeed exemplary.

Quote:

Non sequitor -- I never made this claim.

Quote:

OTOH, if you mean that dormant tasks in round-robin coöperative multitasking take a little time each time around even if to say

No, this is not what I'm saying. What I was saying was that cooperative task switching is far from real-time. Here's a great program to run that amply proves my point. Run this on Windows 3.1 some day, and tell me how it works out for you:

Code:

void WinMain(void) {
  for(;;);
}

This is an infinite loop, and because it doesn't voluntarily give control back to Windows (by calling GetMsg()), it will kill your entire system dead, except for ISRs of course. You'll be able to move the mouse, but that's about it. Most if not all disk activity will cease. Network activity will stop cold. Animations playing on the screen will die. The only means of recovery is to kill the program through some means, or to just straight-up reboot.

Now, I know that most Forth implementations are aware of such deliberately malicious code, which is why most implementations of WHILE, UNTIL, AGAIN, and REPEAT embed a call to PAUSE as part of their definitions. But, even here, we can circumvent it. Try this on your next multitasking-capable Forth:

Code:

code delay
  here jmp,
;code

: oops   begin delay again ;

Ahh, but who in their right mind would deliberately write software to bring a machine to its knees like this? Well, a few people come to mind -- QA engineers (which I am), load-test engineers, and brats who think they're 1337 script-kiddies. But, on the whole the biggest source of sluggishness in a program won't come from deliberate attempts to bring a system down; rather, it'll come from busy engineers whose code sits at the "works for me" level of quality, and isn't adequately tested to cover all control flows. When you get that, invariably, you end up with accidents that result in infinite loops, or people using algorithms with time-complexities that are O(n^2) instead of O(n log n), etc. So even if you do PAUSE judiciously in your sources, the dynamic behavior of a program can (and I'm not saying yours does, but again, in the general case) result in really lack-luster performance on cooperatively multitasking systems.

This is where interrupts and preemption come into play, and why they work so well together. Knowing that it's possible to wire the NMI handler to an OS routine that restores control over the machine, why not generalize this to allowing the computer to forcefully switch to another program? That's all preemptive multitasking is, and it actually isn't hard to do. In fact, you have to write a cooperatively-multitasking kernel first, and then make it preemptive by calling PAUSE (or whatever your system call equivalent is) in the ISR of the timer handler.

You can get very good performance if your kernel is built to handle real-time constraints (consider QNX or L4 microkernels, for example; and, you can snow even these if you elect to forego the use of MMU for memory protection), even if your code sits in user-space.

Quote:

Although I know you've had distractions, I would still look forward to seeing you pull it off if you still have any interest in it;

While I periodically muse about it, the availability of FPGA hardware renders the 65xx architecture obsolete for my needs. The J1 processor, despite its 16K code-space, 64K data-space limitations, so completely out-classes the 65816 in performance that it's hard for me to consider going back. I do need to give it the ability to see more than 64K of memory some day, though. Oh, and I do need to add in interrupts too.

That being said, the Kestrel-2's I/O registers and memory interfaces should be relatively simple to adapt to the 65xx architecture, since the Wishbone bus is so close to the 65xx's. So, if I ever do manage to get into the 65xx mood again some day, it's a fairly simple matter of replacing the CPU binding and writing a new software tool-chain for it. Everything else should "just work."

Top

Arlet

Post subject:

Posted: Wed Dec 21, 2011 6:58 am

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

GARTHWILSON wrote:

I was appalled when you said desktop PCs are usually running at least 60 tasks (although I'm not questioning it).

My Linux desktop PC is running 250 tasks right now, of which about half are kernel tasks, and the rest user tasks. Some tasks run 4 times, one on each (virtual) core. Total CPU time taken by all these tasks is less than 1% on average, so it's not a big deal.

Top

Arlet

Post subject:

Posted: Wed Dec 21, 2011 7:28 am

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

kc5tja wrote:

This whole argument started because you expressed concern that pipelines introduces lengthy delays in handling interrupts. I countered with my stating that's not necessarily the case if you switch contexts in the following cycle. Yes, this means you lose the ability to be re-entrant, but I've found 100% of all ISR cases are themselves not re-entrant (or, can be refactored so as to not require it), unless you have long-running ISRs. By long running, of course, I mean that you have an ISR that runs as long as any user-level process would be expected to run.

On the other hand, if you have re-entrant interrupts, with different priorities, long running ISRs are not a problem.

I'm doing that right now for my latest project. I have some hardware device that produces some data every millisecond, and that data is only valid for 100 microseconds. Processing the data takes about 40 microseconds. My solution was to put the processing in an ISR. This is simpler, and more efficient than using a task. As a consequence, I also use fewer tasks (saving stack memory), and they all have fairly low priority. This allows me to use a non-preemptive scheduler, which is also simpler, because it avoids all the critical sections you'd need with a preemptive scheduler.

Note that I write real-time embedded software, not general purpose PC software, so your mileage may vary.

Top

GARTHWILSON

Post subject:

Posted: Wed Dec 21, 2011 8:53 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California

Quote:

I was appalled when you said desktop PCs are usually running at least 60 tasks (although I'm not questioning it).

Why?

I have not been able to find where we were discussing that; but I still think in terms of many of those things being part of the system, and I have control of that and can re-assemble it or compile new parts if necessary, whether it's because I find a bug (which hasn't been the case in maybe ten years), or I add a new kind of I/O. You or someone was saying that those things were made to be separate tasks so that if one crashes, it doesn't take the whole thing down. When I did find a bug though, it was not anything that would make it crash. The last one I found (and it wasn't until I had used it for years) may have been that the features for the delay before a key started repeating, and the repeat rate, didn't always work right.

Quote:

If you're the sole user of the machine, having been the sole author of the software running on said machine, running with hardware designed and implemented entirely by yourself, this might be acceptable.

You're blowing my cover! You're blowing my cover! Yes, it's true, this is how I've done most of my electronics work, even for employers; but! I do document things well, because I know what it's like to have to go back and make something compatible years later when I don't remember what I've done.

Quote:

I said it only incremented a counter byte in memory, which was for keeping time. If the byte rolled over, the next one would get incremented. This can be done without affecting A, X, or Y.
...unless you mean the ISR in the high-level Forth interrupt system which sets which version of NEXT will run next time; but NEXT is actually faster in the interrupt case than in the non-interrupt case, since it doesn't have to increment IP.

Quote:

Note that I write real-time embedded software, not general purpose PC software, so your mileage may vary.

Ditto here, but the small-time microcontrollers I've designed into products couldn't do true multitasking if they wanted to, outside of what can be done with interrupts. I've run over 100,000 evenly spaced interrupts per second on my 6502 workbench computer, although anything above about 24,000 is extremely rare. The 24,000 were to read an A/D converter or write to a D/A converter, always the same number of cycles into the ISR, so the only significant jitter came from the length of time to complete the instruction that was executing when the interrupt hit.

The main loop in one of our products might look like the following, after the timer interrupt is set up. The time record from the interrupts is used by several of the routines called from the main loop.

Code:

MAIN_LOOP:
    BEGIN
        CLRWDT
        CALL  CHECK_SETUP
        CALL  CHECK_BATTERY
        CALL  REPORT_BATTERY_VOLTAGE
        CALL  AUDIO_ANNUNCIATORS
        CALL  PTT_MONITOR
        CALL  KEY_HANDLER
        CALL  COMPUTE_INTERCOM_DYNAMICS
    AGAIN

So it is similar to multitasking, although especially being in ROM there's no way to add or delete tasks, and if one could crash, it would take everything down. In practice, the product appears to be doing all the jobs at once. It runs through the loop a few hundred times a second, and the beeps for audio annunciation, whether for a warning, or indicating that a keypress was received (or that a key held down is repeating), are set up by the routines in the loop but handled by the timer ISR, so their pitch, spacing, etc. does not require one routine to hold everything else captive until the annunciation is done. The same goes for flashing annunciator LEDs. [Edit, 5/15/14: I posted an article on simple methods of doing multitasking without a multitasking OS, at http://wilsonminesco.com/multitask/index.html.]

CALL in PIC assembly is like JSR in 6502, CLRWDT is "clear watchdog timer" and this is the only place it gets cleared, and BEGIN and AGAIN are part of my macros for program structures in assembly (which I show how to do in 6502 assembly at http://wilsonminesco.com/StructureMacros/index.html).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Last edited by GARTHWILSON on Wed Dec 21, 2011 5:39 pm, edited 1 time in total.

Top

Arlet

Post subject:

Posted: Wed Dec 21, 2011 9:02 am

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

I've also done quite a few projects like that too, where there's just a big main loop polling all kinds of events. Anything that needs to be done faster is handled by an ISR. Many kinds of problems can be handled quite well that way.

Top

Tor

Post subject:

Posted: Wed Dec 21, 2011 12:06 pm

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan

Well, my Linux desktop runs some 765 processes right now, with load average: 0.00, 0.03, 0.05

As to a multitasking system crashing if an interrupt handler takes longer than context switch time.. no, it doesn't, but of course if the interrupt handler doesn't return then the system will never get back to user level. Higher-priority interrupts may still be handled, so that it's possible to 'ping' the network interface and get a response, for example. But generally a too slow interrupt handler will simply slow down the system and give other problems. Not crashing, though (as in 'kernel panic' and boom).

-Tor

Top

Page 1 of 1

[ 13 posts ]

Board index » 6502.org Users Forum » General Discussions

All times are UTC

Who is online

Users browsing this forum: No registered users and 12 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum