A taken branch delays interrupt handling by one instruction

BigEd · Post by **BigEd** » Wed Sep 21, 2011 9:39 pm

We may need to recap what we saw. We've got better simulation tabulation output these days.

Here's a taken branch being interrupted during T2 (the following instruction is fetched and executed):

Code: Select all

cycle   ab      db      rw      Fetch   pc      Execute State   nmi     D1x1
0       2303    38      1       SEC     2303    BRK     T1      1       1
0       2303    38      1       SEC     2303    BRK     T1      1       1
1       2304    b0      1               2304    SEC     T0+T2   1       1
1       2304    b0      1               2304    SEC     T0+T2   1       1
2       2304    b0      1       BCS     2304    SEC     T1      1       1
2       2304    b0      1       BCS     2304    SEC     T1      1       1
3       2305    fe      1               2305    BCS     T2      1       1
3       2305    fe      1               2305    BCS     T2      0       1
4       2306    a9      1               2306    BCS     T3      0       1
4       2306    a9      1               2306    BCS     T3      0       1
5       2304    b0      1       BCS     2304    BCS             0       1
5       2304    b0      1       BCS     2304    BCS             0       1
6       2305    fe      1               2305    BCS     T2      0       1
6       2305    fe      1               2305    BCS     T2      0       0
7       2306    a9      1               2306    BCS     T3      0       0
7       2306    a9      1               2306    BCS     T3      0       0
8       2304    b0      1       BCS     2304    BCS             0       0
8       2304    b0      1       BCS     2304    BCS             0       0
9       2304    b0      1               2304    BRK     T2      0       0
9       2304    b0      1               2304    BRK     T2      0       0
10      01fd    b0      0               2304    BRK     T3      0       0
10      01fd    23      0               2304    BRK     T3      0       0

By comparison, if the branch had been a push, which is also a 3-cycle instruction and would also be in T2 at the same time, the following instruction would have been usurped by the interrupt:

Code: Select all

cycle   ab      db      rw      Fetch   pc      Execute State   nmi     D1x1
0       2303    38      1       SEC     2303    BRK     T1      1       1
0       2303    38      1       SEC     2303    BRK     T1      1       1
1       2304    48      1               2304    SEC     T0+T2   1       1
1       2304    48      1               2304    SEC     T0+T2   1       1
2       2304    48      1       PHA     2304    SEC     T1      1       1
2       2304    48      1       PHA     2304    SEC     T1      1       1
3       2305    48      1               2305    PHA     T2      1       1
3       2305    48      1               2305    PHA     T2      0       1
4       01fd    48      0               2305    PHA     T0      0       1
4       01fd    aa      0               2305    PHA     T0      0       0
5       2305    48      1       PHA     2305    PHA     T1      0       0
5       2305    48      1       PHA     2305    PHA     T1      0       0
6       2305    48      1               2305    BRK     T2      0       0
6       2305    48      1               2305    BRK     T2      0       0
7       01fc    48      0               2305    BRK     T3      0       0
7       01fc    23      0               2305    BRK     T3      0       0

Again, as HiassofT says, if the taken branch crosses a page boundary then the interrupt is taken sooner - we see D1x1 respond during T0, just as it does for a push or a jump:

Code: Select all

cycle   ab      db      rw      Fetch   pc      Execute State   nmi     D1x1
0       23fe    38      1       SEC     23fe    BRK     T1      1       1
0       23fe    38      1       SEC     23fe    BRK     T1      1       1
1       23ff    b0      1               23ff    SEC     T0+T2   1       1
1       23ff    b0      1               23ff    SEC     T0+T2   1       1
2       23ff    b0      1       BCS     23ff    SEC     T1      1       1
2       23ff    b0      1       BCS     23ff    SEC     T1      1       1
3       2400    fe      1               2400    BCS     T2      1       1
3       2400    fe      1               2400    BCS     T2      0       1
4       2401    a9      1               2401    BCS     T3      0       1
4       2401    a9      1               2401    BCS     T3      0       1
5       24ff    00      1               24ff    BCS     T0      0       1
5       24ff    00      1               24ff    BCS     T0      0       0
6       23ff    b0      1       BCS     23ff    BCS     T1      0       0
6       23ff    b0      1       BCS     23ff    BCS     T1      0       0
7       23ff    b0      1               23ff    BRK     T2      0       0
7       23ff    b0      1               23ff    BRK     T2      0       0
8       01fd    b0      0               23ff    BRK     T3      0       0
8       01fd    23      0               23ff    BRK     T3      0       0

No conclusions or explanation from me today though.

Cheers
Ed

BigEd · Post by **BigEd** » Mon Sep 26, 2011 10:19 am

Just a little more on this. We'd need to redraw this section

of Balazs' schematic, because there's a missing NOR gate: a signal we've called C1x5Reset comes into play, although it's not active in these simulations.

I've traced a few more signals of interest here.

Node 480 signals that an interrupt is pending, and node D1x1 signals the imminent replacement of the next fetch with a BRK. For the logic in between, the crucial point remains that the machine waits for one of two situations:

- pipe-T0 in visual6502 terms, or not-Clock2 as marked on the schematic
- op-T2-branch in visual6502, or PLA signal 83 on the schematic

This is my take on the story:

If you were designing a machine from scratch, your first revision might just wait for T0 in every case. At some later point you optimise the branch sequence so that a taken branch misses out T0. At that point you'd want to find a way to make sure that a succession of taken branches still can be interrupted - in this case, T2.

(Pedantic point: I think this observation should only be called a bug if the original 6502 datasheet tells us what the interrupt latency is. If it doesn't, then it's just an unexpected deviation from the simplistic rule that each instruction can be interrupted at such a point in all cases. The true rule is that the maximum latency is a little more. Later implementations improved this latency.)

Cheers
Ed

RichTW · Post by **RichTW** » Thu Oct 13, 2011 3:09 pm

Thanks Ed for all your detail... yes, that makes sense. Now I've finally had a chance to consider what all the timing states mean.

Please correct me if I'm wrong, but my interpretation is that T0 is a state in which the next opcode is fetched or the interrupt sequence is initiated. Is it also the case that the decision to commence with the interrupt sequence is only made in T0 (except obviously in this special case with branches)?

Here's why I want to understand. Imagine this scenario:

$FE45 is the address of a 6522 VIA timer (MSB).

Suppose this timer generates an interrupt at the moment that this instruction starts to be executed:

ROL $FE45

Writing $FE45 will acknowledge the IRQ, presumably pulling the IRQ line high, so does this mean that the interrupt will not be taken (as the IRQ line will not be low by the T0 stage of the next instruction)?

I was reading an account here and here which claims that the IRQ will still be taken, but the 6522 flags will be cleared meaning that its 'cause' is lost! I don't really see how this fits with the above model, unless there's actually a delay in the 6522 pulling the IRQ line high when acknowledging the interrupt?

Any thoughts on this one anyone?

GARTHWILSON · Post by **GARTHWILSON** » Thu Oct 13, 2011 7:24 pm

Quote:

I was reading an account here and here which claims that the IRQ will still be taken, but the 6522 flags will be cleared meaning that its 'cause' is lost!

I've actually had that happen. I call them "ghost interrupts." I addressed it in the interrupts article about half way down the long page, next to the cartoon of the little boy ringing the doorbell and running away.

BigEd · Post by **BigEd** » Thu Oct 13, 2011 8:37 pm

That mystery or ghost interrupt can be understood once you bear in mind that the 6502 has registered the intention to do something about the interrupt. It has an internal bit of state, which we've called D1x1, so there comes a time when maintaining \IRQ low no longer matters. That's a small but variable number of cycles after it first went low.

I think we've discussed elsewhere some detail as to how many cycles \IRQ must be low for, to ensure that the interrupt will be taken. (That's a detail which isn't normally important, because normally the system design will keep \IRQ low until something in the service routine deals with the interrupting device.)

(Conversely, the ghost interrupt is baffling if your model is that the 6502 monitors the \IRQ pin directly during each T state.)

Hope this helps
Ed

RichTW · Post by **RichTW** » Fri Oct 14, 2011 7:19 am

That's a great article Garth - thanks a lot for that. It's good to see the effect confirmed elsewhere. Did you ever do any accurate measurements on these various latencies you mention, or are they documented in any 6522 or 6526 documentation (or are they pure conjecture)? There are certainly various undocumented 6522 effects (e.g. the 2 cycle cost of reloading a free-run timer from the latch) but it would be good to find any kind of definitive list of such things.

Ed - not sure of the significance of D1x1 in this context. It seems to me that this goes low after the interrupt sequence has already been initiated (and my guess would be that it exists to modify the behaviour of the BRK instruction in the IR, i.e. to inhibit the increment of the PC and setting the B flag on the stack). But could you just confirm that my understanding is correct in that the 6502 will only make the decision to enter the interrupt sequence at T0 (except in the branch special case)?

GARTHWILSON · Post by **GARTHWILSON** » Fri Oct 14, 2011 8:11 am

Quote:

That's a great article Garth - thanks a lot for that.

You're welcome. When I looked at it again, I did see where I would make minor changes if I could easily do it, but mostly to make it more readable. My very out-of-date cartoons probably add to the humor though. I drew them and Mike's sister scanned and colored them. That was before I had a scanner and gimp photo editing software.

Quote:

Did you ever do any accurate measurements on these various latencies you mention, or are they documented in any 6522 or 6526 documentation (or are they pure conjecture)? There are certainly various undocumented 6522 effects (e.g. the 2-cycle cost of reloading a free-run timer from the latch) but it would be good to find any kind of definitive list of such things.

I just went by the documentation. One thing I do have records of actually having tested for is that if NMI goes low and then high again inside one executing instruction, ie, it was high when the instruction started and high when it ended, being low only one clock inside there, the NMI sequence is started immediately after the instruction is finished executing. I just thought of another cartoon. You know how people push the "walk" button at a street corner a hundred times or more while waiting for the green light and the walk sign, as if it would forget after the first good press.

BigEd · Post by **BigEd** » Fri Oct 14, 2011 10:30 am

Hi Rich
This simulation might help to illustrate, and also this one. I think I'm right in holding onto D1X1 as the key here, although I did overstate it just now. It looks like the interrupt signal needs to be kept asserted until D1x1 goes low, which defines the point of commitment. This is a few cycles before the BRK substitution, and yes, normally this is a T0 cycle (in the visual6502 nomenclature.)

There is some input conditioning between the pin and the D1x1 mechanism, which must add a little latency, but I don't think that changes the argument.

Cheers
Ed

RichTW · Post by **RichTW** » Mon Jul 31, 2017 8:40 am

Just wanted to bump this thread to add an observation (which is already implied by the discussion, but worth clarifying). In the case of a taken branch crossing a page boundary, the CPU checks for interrupts on two occasions: first during T2, and second during T0:

http://visual6502.org/JSSim/expert.html ... TG,480,629 (4 cycle branch, IRQ checked in T2)
http://visual6502.org/JSSim/expert.html ... TG,480,629 (4 cycle branch, IRQ checked in T0)

For other instructions, the check is only done in T0 (looking at the irq signal from the previous phase latched by node 480). In the case of a branch, the check is done in T2 (which also works for non-taken branches), and then additionally in T0 for taken branches crossing a page boundary. In the case of three cycle branches, no additional check is performed, meaning that they get this extra latency from the check being done a cycle before the 'normal' moment.

RichTW · Post by **RichTW** » Mon Jul 31, 2017 9:21 am

Incidentally, the first time this behaviour was documented, to my knowledge, was over in the StarDot forum back in 2006!

http://stardot.org.uk/forums/viewtopic.php?p=4833#p4833

Funny how a few people discovered it independently on their own platforms, but that it took so long to come to light in the first place!

BigEd · Post by **BigEd** » Mon Jul 31, 2017 3:01 pm

Remarkable. 2006 is much earlier than I joined over there, and indeed earlier than I showed up over here. And I feel I've been here for a while!

More remarkable is that this observation appeared when it did, so shortly before visual6502 went live on the web. I think that's a coincidence.

hoglet · Post by **hoglet** » Mon Sep 10, 2018 11:01 am

Hi all,

Over the weekend I did a some measurements to experimentally determine the range of NMI/IRQ latency values of:
- an NMOS 6502 (a Synertek SY6502A)
- a CMOS 65C02 (a Rockwell R65C02P2)

Both processors are genuine from original machines.

I did the experiments on an Acorn Atom, as this has a 1MHz clock, with no use of clock stretching or wait states, and also no normal use of IRQ/NMI by the Acorn MOS ROM.

I used the Atom's 60Hz VSync output as an interrupt source, as this comes from a different oscillator so is truly asynchronous to the CPU. For testing NMI, Vsync was connected directly to the 6502's NMI input pin. For testing IRQ, VSync was connected to CA1 of a 6522, and the 6522's interrupt output connects to the 6502's IRQ input pin.

The foreground program that was running was:

Code: Select all

        org $7F00
.loop   INC $8000,X
        LDX #0
        NOP
        BNE loop
        BEQ loop

The program should exhibit the worst case interrupt latency, as in includes a 3-cycle taken branch followed by a 7-cycle RMW instruction. (The 6502 does have instructions that take 8 cycles, but these are undocumented)

I'm triggering the scope on the 3-cycle write from the 6502's interrupt sequence, which is very stable. The NMI/IRQ is measured directly on the 6502. The scope is in persistence mode and results were accumulated over about a minute.

The interrupt latency is measured from the interrupt signal going low to the falling edge of Phi2 just prior to the sync that marks the start of the 6502's 7-cycle interrupt sequence.

Here's NMI with a NMOS 6502:

Here's IRQ with a NMOS 6502 (signal labelled NMI is actually IRQ):

Here's NMI with a CMOS 65C02:

Here's IRQ with a CMOS 65C02 (signal labelled NMI is actually IRQ):

In all cases the minimum interrupt latency is exactly one cycle and:
- for the NMOS 6502, the maximum interrupt latency is exactly 9 cycles.
- for the CMOS 6502, the maximum interrupt latency is exactly 8 cycles.

I don't observe any difference in latency between IRQ and NMI (this is slightly surprising, I was expecting to see some evidence of a required setup time).

So it would appear that the "taken branch delays interrupt handling by one instruction" issues only affects the NMOS 6502, and it's effect is to increase the maximum interrupt latency by just one cycle. i.e. It appears from an interrupt perspective the 3-cycle branch on an NMOS CPU behaves like a two-cycle branch with one extra cycle tagged onto the front of the next instruction.

If change the address of loop so that a page crossing occurs on the final branch:
- for the NMOS 6502, the maximum NMI latency is 8 cycles.
- for the CMOS 6502, the maximum NMI latency is 8 cycles.

If I replace the BEQ loop with JMP loop:
- for the NMOS 6502, the maximum NMI latency is 8 cycles.
- for the CMOS 6502, the maximum NMI latency is 8 cycles.

These are as expected I think.

Dave

BigEd · Post by **BigEd** » Mon Sep 10, 2018 11:06 am

Great experiment and explanation!

barrym95838 · Post by **barrym95838** » Mon Sep 10, 2018 3:09 pm

So changing a non-page-crossing BEQ to a page-crossing BEQ has no worst-case latency effect on CMOS, and actually improves the worst-case latency on NMOS? I would personally call that unexpected.

hoglet · Post by **hoglet** » Mon Sep 10, 2018 5:16 pm

barrym95838 wrote:

So changing a non-page-crossing BEQ to a page-crossing BEQ has no worst-case latency effect on CMOS, and actually improves the worst-case latency on NMOS? I would personally call that unexpected.

My reading of this thread was that the interrupt latency "bug" being discussed was quite specific to the 3-cycle taken branch, so I think (at least on the NMOS processor) my results reinforce this.

There was also some speculation (on stardot) that this had been fixed on the CMOS processor, which does indeed seem to be the case.

What was more of a surprise to me was:
- there doesn't seem to be any difference in timings between IRQ and NMI
- the minimum interrupt latency is only one clock cycle (elsewhere I'd seen this stated as two clock cycles)
- there doesn't seem to be much setup time required for IRQ/NMI prior to the falling edge of Phi2 (i.e. the min and max latency values line up very closely with the falling edge of Phi2**)

Dave

** Actually, it turns out on the Atom expansion bus Phi2 is buffered by two 74LS04s, so that's adding about about 20ns delay.

A taken branch delays interrupt handling by one instruction

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct