A taken branch delays interrupt handling by one instruction

nichtsnutz · Post by **nichtsnutz** » Sat Sep 04, 2010 7:04 am

Hello all,

the detailed description of this 6502 behaviour here is very interesting but it is not new.
See the fpga64 project at http://www.syntiac.com/fpga64.html
and search for "Fixed IRQ/NMI timing of the branch instructions".
Peter has written an absolutely cycle exact 6510 for the chameleon project.There are also some other quirks concerning an IRQ interrupting and NMI or so,I have not try out those things myself.

Greetings,
Vassilis

P.S: I do not want to make the hard work of Hias bad,just to point this thing out.

HiassofT · Post by **HiassofT** » Sat Sep 04, 2010 4:37 pm

BigDumbDinosaur wrote:

After reading your analysis and looking at the logic graphs, I have to wonder if this characteristic was carried over the the 65C02 and 65816. As you say, it appears to be harmless, but does have some perceptible effect on interrupt latency.

The 65816 doesn't seem to be affected (according to the tests of bob1200xl), but the question about the 65C02 still remains unanswered.

Quote:

Maybe I should build an extra POC unit and send it to you for testing (no logic analyzer here).

I had another idea: I have an Atari 800 (which uses the original 6502) at my parent's home, and just remembered that I bought a 65(C?)02 some 20 years ago for a SBC I wanted to create (but never finished it). I'll be visiting my parents next weekend, so I can check if this really is a 2MHz 65C02 and if yes take it all home, put the 65C02 into the Atari and run some tests.

This would save you the time of building a unit and shipping it over to Europe.

so long,

Hias

HiassofT · Post by **HiassofT** » Sat Sep 04, 2010 5:12 pm

Dr Jefyll wrote:

In fairness I will add that overall I find the MOS Hardware Manual and Programming Manual to be excellent.

I can second that, but discovered it's value really late. I had the PDFs on my PC for quite some time, only had a quick glance at them back then, and first had a closer look some weeks ago. I wished I had read these docs a lot earlier, would have saved me quite some time :-)

Quote:

HiassofT wrote:

I ran across this issue when I did a worst-case cycle analysis of my highspeed SIO code [...] According to my calculations I still had one cycle left in the worst case in the vertical blank NMI code, but if I added a single cycle to my code [...] I ran into very rare transmission problems[.] I had a single byte missed every 10-60 minutes.

Well, that answers Garth's question re: what tipped you off! You must have needed some real perseverence to track the symptom back to its cause. Nice work!

Most of the work was analyzing the Pokey. Some specifics were unclear, unprecise and/or documented wrong, so I started to do my own tests some 1.5 years ago. I discovered several important things and got my code to work. But some (timing) details still remained unclear. I tried to do a worst-case cycle analysis but failed miserably (calculating the maximum cycles of the interrupt handler was quite easy, but the rest is quite difficult because Antic halts the CPU quite often, and at different times, to fetch display data from RAM). The result of the analysis was that my code couldn't work at all, so I gave up on it.

This summer I picked up my work, optimized my code further, and did some more analysis. I hadn't expected to open this big can of worms, though. This monday I tracked it down to one last missing cycle, and on tuesday I knew it was this "interrupt bug". So tracking down this specific issue wasn't a too big deal.

I still have to do the worst-case analysis of what's going on during screen access and also have an idea how I could simplify the calculations (with a little helper program).

Quote:

I don't have a pat answer, but an engineer's mind-set will lead us in the right direction. Here goes:
[...]
Sorry to be so wordy; the point I'm establishing has to do with timing. I believe an interrupt begins like any other instruction: on T0 -- a SYNC cycle. I don't have a specific answer about the deferred interrupt on the conditional branch, but I do maintain that the interrupt can't commence until the soon-to-be return address appears on the bus. This would be a plausible cause for a deferral, even if the mechanism is unknown. The odd behavior suggests there might be a late-in-the-design, band-aid solution involved.

Very interesting thoughts! I guess I'll have to think about it for a while.

so long,

Hias

HiassofT · Post by **HiassofT** » Sat Sep 04, 2010 5:32 pm

nichtsnutz wrote:

the detailed description of this 6502 behaviour here is very interesting but it is not new.
See the fpga64 project at http://www.syntiac.com/fpga64.html
and search for "Fixed IRQ/NMI timing of the branch instructions".
Peter has written an absolutely cycle exact 6510 for the chameleon project.There are also some other quirks concerning an IRQ interrupting and NMI or so,I have not try out those things myself.

Thanks for the link! I knew about the fpga64 project before, but not that they also ran into this "interrupt bug".

I did some searching with google and discovered that some NES guys also ran into this issue in June:
http://nesdev.parodius.com/bbs/viewtopic.php?t=6510

I think we should maybe add a section to the (excellent, BTW) "Investigating Interrupts" document when we finished the 65C02 tests. Garth, what do you think about it?

Quote:

P.S: I do not want to make the hard work of Hias bad,just to point this thing out.

No problem, actually I couldn't really believe that a bug in such a widely used CPU hadn't been noticed before for all this time.

so long,

Hias

GARTHWILSON · Post by **GARTHWILSON** » Sat Sep 04, 2010 6:07 pm

Quote:

I think we should maybe add a section to the (excellent, BTW) "Investigating Interrupts" document when we finished the 65C02 tests. Garth, what do you think about it?

I was thinking it would be appropriate, but didn't want to put much time into it. At least one of my cartoons there is terribly outdated now (maybe that just adds to the humor?), and I never did finish getting information on some processors' interrupt-sequence lengths for the end. Maybe I could just add a link to this discussion, as the topic goes beyond the basics that the article was intended to address to help beginners get going with interrups.

BigEd · Post by **BigEd** » Sat Sep 04, 2010 7:34 pm

Hias
thanks for posting such a thorough investigation of this interesting observation. And welcome!

Dr Jefyll wrote:

I believe that the 65xx economizes on silicon by NOT providing specialized hardware to perform the interrupt sequence. Instead, the interrupt is actually an instruction.... Like other instructions, it has a unique op-code that's latched into the chip's Instruction Register when SYNC is high. On subsequent cycles the internal PLA pulls whatever strings and levers are required to get the job done. This approach avoids the cost of adding a complex sequencing mechanism solely to implement interrupts.

To add to your ideas, Jeff, I had another look at Beregnyei's giant schematic

Here's the logic which clears the IR - as you suspect, it uses SYNC as a qualifier:

(picture only shows one half of the IR but the other half is identical)

The signal D1x1 is doing all the work. It's also used to control the PC incrementer (although the circuit for that doesn't look quite right to me)

D1x1 is the output of this little collection:

and you see PLA output 83 is one of the inputs, also Clock2 which is one of the six Timing Control signals - the other important ones being IRQ with the I flag, and a composite signal labelling IRQ, NMI, Reset. (Also Rdy0 which is a combination of RnW and RDY and turns up everywhere)

Lo and behold, PLA output 83 detects the 8 branch instructions. So they are treated specially, and deliberately so. Although I still don't quite see why.

(That composite signal is certainly a function of NMI, and I see BRK detection nearby, but I don't see IRQ or RESET, so something isn't quite making sense there.)

Cheers
Ed

Dr Jefyll · Post by **Dr Jefyll** » Sun Sep 05, 2010 4:16 am

BigEd wrote:

To add to your ideas, Jeff, I had another look at Beregnyei's giant schematic
Here's the logic which clears the IR - as you suspect, it uses SYNC as a qualifier

Hi, Ed -- Glad to have your contribution. I had a feeling you might be following this! (Ed and I had a PM discussion about ABORT, which of course is another type of interrupt.)

BigEd wrote:

D1x1 is the output of this little collection:

I think "this little collection" is the Interrupt-Acceptance-Flip-Flop I suggested. Although the designers' dynamic logic techniques (using gate-capacitances to store bits) make me reluctant to unravel the flip-flop itself, I'm sure D1x1 is the "IAFF" output signal. Referring to the complete diagram and following D1x1 around I quickly found where it gets gated onto bit 4 of the data bus. This is exactly as anticipated, since, when the interrupt pushes the processor flags to the stack, it is the BRK bit, bit 4 in the flags byte, which indicates -- by being 0, actually-- that the BRK is of the hardware ("forced") variety.

Quote:

The signal D1x1 [is] used to control the PC incrementer

Again this agrees with expectations. Hardware BRK is a zero-byte instruction but software BRK is 2-byte. I expect D1x1 allows PC to be incremented (or not) in order to derive the appropriate return address.

I think the "interrupt as an instruction" theory is amply established. (BTW my insight owes a lot to page A-11, Table A. 5.4 of the MOS Hardware Manual, where they pretty much spill the beans if you're paying attention.) Understanding the interrupt as an instruction shows us its limitations -- why, for example, the Conditional Branch logic might need to take heed. More on that topic some other time, maybe. I don't expect that the '02, the 'C02 and the '816 deal with the matter in exactly the same fashion. And the '816's ABORT introduces yet another flavor! (along with NMI, IRQ, RST and software BRK). Finally:

Quote:

Lo and behold, PLA output 83 detects the 8 branch instructions. So they are treated specially, and deliberately so.

Yes, and 76 is another that activates for the 8 branch instructions. As for timing, my take is that 83 is high for all branch-instruction states except T3, and 76 for all except T1 -- do have that right? Not sure. Also pertinent are PLA outputs 105, 80 & 25 (BRK), 92 & 99 (BRK,RTI) and 40 (BRK,JSR abs) -- in case anyone has the patience to track them all down!

-- Jeff

BigEd · Post by **BigEd** » Sun Sep 05, 2010 9:48 am

Dr Jefyll wrote:

BigEd wrote:

D1x1 is the output of this little collection:

I think "this little collection" is the Interrupt-Acceptance-Flip-Flop I suggested.

Agreed - I hadn't quite spotted that this is a flop, I was only seeing significant input signals. It's a bit of a strain to make sense of this stuff.

Dr Jefyll wrote:

Quote:

Lo and behold, PLA output 83 detects the 8 branch instructions. So they are treated specially, and deliberately so.

Yes, and 76 is another that activates for the 8 branch instructions. As for timing, my take is that 83 is high for all branch-instruction states except T3, and 76 for all except T1 -- do have that right?

I'm not sure - 83 and 76 are sensitive to 2 signals from Timing Control in that way but I'm not certain of the T states.

Timing Control looks like it has 5 bits of state(#). There's Clock1 as a signal related to Sync(*), and Clock2 is 1 cycle delayed version, subject to RDY.

And then there's a 4-bit shift register starting at SYNC and subject to RDY. The two Clock signals and the 4 shift register signals go into the PLA. The block diagram would have these signals as T0, T1X, T2, T3, T4, T5. (It has T1 as Sync but T1X to the PLA.) I think there's a subtlety here.

(#) Hmm, but the hardware ref manual mentions 7 states, T0 to T6, for instructions such as ABS,X and BRK.

(*) actually Sync0 which is perhaps an early version of Sync? Is this perhaps valid in the cycle before Sync??

BigEd · Post by **BigEd** » Tue Sep 07, 2010 9:43 pm

Now, if only we had something more accurate and interactive than the giant schematic (wonderfully useful and interesting though it is)

BigEd · Post by **BigEd** » Fri Sep 10, 2010 6:24 pm

Eureka! With a transistor-level simulation of the 6502 in hand, we can see exactly what's going on cycle by cycle. (Or even phase by phase.) Huge thanks to Greg James, Barry Silverman and Brian Silverman for their Visual6502 project (website under construction - see other thread) which provided the simulator and the netlist.

Here's a late interrupt into a taken branch, should be timed the same as the example above:

Code: Select all

testNMI(28)
 cyc:10 cp1:1 cp2:0 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:0 clearIR:1 D1x1:1
 cyc:10 cp1:0 cp2:1 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:1 clearIR:0 D1x1:1
 cyc:11 cp1:1 cp2:0 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:1 clearIR:0 D1x1:1
 cyc:11 cp1:0 cp2:1 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:0 clearIR:1 D1x1:1
 cyc:12 cp1:1 cp2:0 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:12 cp1:0 cp2:1 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:13 cp1:1 cp2:0 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:0 clearIR:1 D1x1:1
 cyc:13 cp1:0 cp2:1 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:1 clearIR:0 D1x1:1
 cyc:14 cp1:1 cp2:0 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:1 clearIR:0 D1x1:1
 cyc:14 cp1:1 cp2:0 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:110111  fetch:1 clearIR:0 D1x1:1
 cyc:14 cp1:0 cp2:1 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:110111  fetch:0 clearIR:1 D1x1:1
 cyc:15 cp1:1 cp2:0 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:15 cp1:0 cp2:1 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:16 cp1:1 cp2:0 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:0  IR:b0 TCstate:111111  fetch:0 clearIR:1 D1x1:1
 cyc:16 cp1:0 cp2:1 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:0  IR:b0 TCstate:111111  fetch:1 clearIR:0 D1x1:1
 cyc:17 cp1:1 cp2:0 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:110111  fetch:1 clearIR:0 D1x1:1
 cyc:17 cp1:0 cp2:1 Add:2305 D:fe RnW:1  PC:2305 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:110111  fetch:0 clearIR:1 D1x1:0
 cyc:18 cp1:1 cp2:0 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:0  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:0
 cyc:18 cp1:1 cp2:0 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:0
 cyc:18 cp1:0 cp2:1 Add:2306 D:00 RnW:1  PC:2306 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:0
 cyc:19 cp1:1 cp2:0 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:0 clearIR:1 D1x1:0
 cyc:19 cp1:0 cp2:1 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:1 clearIR:1 D1x1:0
 cyc:20 cp1:1 cp2:0 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:0 NMI:1  IR:00 TCstate:110111  fetch:1 clearIR:1 D1x1:0
 cyc:20 cp1:0 cp2:1 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:0 NMI:1  IR:00 TCstate:110111  fetch:0 clearIR:1 D1x1:0
 cyc:21 cp1:1 cp2:0 Add:01fd D:b0 RnW:0  PC:2304 SP:fd  Sync:0 NMI:1  IR:00 TCstate:111011  fetch:0 clearIR:1 D1x1:0
 cyc:21 cp1:0 cp2:1 Add:01fd D:23 RnW:0  PC:2304 SP:fd  Sync:0 NMI:1  IR:00 TCstate:111011  fetch:0 clearIR:1 D1x1:0

Here's a trace of the simulator's test program:

Code: Select all

 cyc:2 cp1:1 cp2:0 Add:0000 D:a9 RnW:1  PC:0000 SP:fd  Sync:1 NMI:1  IR:00 TCstate:011111  fetch:0 clearIR:1 D1x1:1
 cyc:2 cp1:0 cp2:1 Add:0000 D:a9 RnW:1  PC:0000 SP:fd  Sync:1 NMI:1  IR:00 TCstate:011111  fetch:1 clearIR:0 D1x1:1
 cyc:3 cp1:1 cp2:0 Add:0001 D:00 RnW:1  PC:0001 SP:fd  Sync:0 NMI:1  IR:a9 TCstate:100111  fetch:1 clearIR:0 D1x1:1
 cyc:3 cp1:0 cp2:1 Add:0001 D:00 RnW:1  PC:0001 SP:fd  Sync:0 NMI:1  IR:a9 TCstate:100111  fetch:0 clearIR:1 D1x1:1
 cyc:4 cp1:1 cp2:0 Add:0002 D:20 RnW:1  PC:0002 SP:fd  Sync:1 NMI:1  IR:a9 TCstate:011111  fetch:0 clearIR:1 D1x1:1
 cyc:4 cp1:0 cp2:1 Add:0002 D:20 RnW:1  PC:0002 SP:fd  Sync:1 NMI:1  IR:a9 TCstate:011111  fetch:1 clearIR:0 D1x1:1
 cyc:5 cp1:1 cp2:0 Add:0003 D:10 RnW:1  PC:0003 SP:fd  Sync:0 NMI:1  IR:20 TCstate:110111  fetch:1 clearIR:0 D1x1:1
 cyc:5 cp1:0 cp2:1 Add:0003 D:10 RnW:1  PC:0003 SP:fd  Sync:0 NMI:1  IR:20 TCstate:110111  fetch:0 clearIR:1 D1x1:1
 cyc:6 cp1:1 cp2:0 Add:01fd D:00 RnW:1  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:6 cp1:0 cp2:1 Add:01fd D:00 RnW:1  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:7 cp1:1 cp2:0 Add:01fd D:00 RnW:0  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:111101  fetch:0 clearIR:1 D1x1:1
 cyc:7 cp1:0 cp2:1 Add:01fd D:00 RnW:0  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:111101  fetch:0 clearIR:1 D1x1:1
 cyc:8 cp1:1 cp2:0 Add:01fc D:00 RnW:0  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:111110  fetch:0 clearIR:1 D1x1:1
 cyc:8 cp1:0 cp2:1 Add:01fc D:04 RnW:0  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:111110  fetch:0 clearIR:1 D1x1:1
 cyc:9 cp1:1 cp2:0 Add:0004 D:00 RnW:1  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:101111  fetch:0 clearIR:1 D1x1:1
 cyc:9 cp1:0 cp2:1 Add:0004 D:00 RnW:1  PC:0004 SP:10  Sync:0 NMI:1  IR:20 TCstate:101111  fetch:0 clearIR:1 D1x1:1

Now we know exactly what happens, we just need to figure out why! And we can trace any signal on the chip, so long as we can find it. (We can also add and remove transistors if that seems appealling. There are only 3510 transistors there - all the pullups are modelled separately.)

My current thinking is that the succession of Tstates for a normal instruction stream visits 101111 in the last cycle of an instruction, and this is when the interrupt handler kicks off. Taken branches never visit 101111 - they sort of interrupt themselves - and so they wouldn't be interruptible if it weren't for the special handling.

We can see that an interrupt acts differently by watching the Tstates as an interrupt is taken - we don't visit 011111

Code: Select all

  cyc:7 cp1:1 cp2:0 Add:2306 D:b0 RnW:1  PC:2306 SP:fd  Sync:1 NMI:1  IR:4c TCstate:011111  fetch:0 clearIR:1 D1x1:1
  cyc:7 cp1:0 cp2:1 Add:2306 D:b0 RnW:1  PC:2306 SP:fd  Sync:1 NMI:1  IR:4c TCstate:011111  fetch:1 clearIR:0 D1x1:1
  cyc:8 cp1:1 cp2:0 Add:2307 D:01 RnW:1  PC:2307 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:1 clearIR:0 D1x1:1
  cyc:8 cp1:0 cp2:1 Add:2307 D:01 RnW:1  PC:2307 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:0 clearIR:1 D1x1:1
  cyc:9 cp1:1 cp2:0 Add:2308 D:00 RnW:1  PC:2308 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
  cyc:9 cp1:0 cp2:1 Add:2308 D:00 RnW:1  PC:2308 SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:10 cp1:1 cp2:0 Add:2309 D:a9 RnW:1  PC:2309 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:0 clearIR:1 D1x1:1
 cyc:10 cp1:0 cp2:1 Add:2309 D:a9 RnW:1  PC:2309 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:1 clearIR:0 D1x1:1
 cyc:11 cp1:1 cp2:0 Add:230a D:de RnW:1  PC:230a SP:fd  Sync:0 NMI:1  IR:a9 TCstate:100111  fetch:1 clearIR:0 D1x1:1
 cyc:11 cp1:0 cp2:1 Add:230a D:de RnW:1  PC:230a SP:fd  Sync:0 NMI:1  IR:a9 TCstate:100111  fetch:0 clearIR:1 D1x1:1

followed here by another which crosses a page boundary:

Code: Select all

 cyc:12 cp1:1 cp2:0 Add:230b D:b0 RnW:1  PC:230b SP:fd  Sync:1 NMI:1  IR:a9 TCstate:011111  fetch:0 clearIR:1 D1x1:1
 cyc:12 cp1:0 cp2:1 Add:230b D:b0 RnW:1  PC:230b SP:fd  Sync:1 NMI:1  IR:a9 TCstate:011111  fetch:1 clearIR:0 D1x1:1
 cyc:13 cp1:1 cp2:0 Add:230c D:f2 RnW:1  PC:230c SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:1 clearIR:0 D1x1:1
 cyc:13 cp1:0 cp2:1 Add:230c D:f2 RnW:1  PC:230c SP:fd  Sync:0 NMI:1  IR:b0 TCstate:110111  fetch:0 clearIR:1 D1x1:1
 cyc:14 cp1:1 cp2:0 Add:230d D:00 RnW:1  PC:230d SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:14 cp1:0 cp2:1 Add:230d D:00 RnW:1  PC:230d SP:fd  Sync:0 NMI:1  IR:b0 TCstate:111011  fetch:0 clearIR:1 D1x1:1
 cyc:15 cp1:1 cp2:0 Add:23ff D:00 RnW:1  PC:23ff SP:fd  Sync:0 NMI:1  IR:b0 TCstate:101111  fetch:0 clearIR:1 D1x1:1
 cyc:15 cp1:0 cp2:1 Add:23ff D:00 RnW:1  PC:23ff SP:fd  Sync:0 NMI:1  IR:b0 TCstate:101111  fetch:0 clearIR:1 D1x1:1
 cyc:16 cp1:1 cp2:0 Add:22ff D:38 RnW:1  PC:22ff SP:fd  Sync:1 NMI:1  IR:b0 TCstate:011111  fetch:0 clearIR:1 D1x1:1
 cyc:16 cp1:0 cp2:1 Add:22ff D:38 RnW:1  PC:22ff SP:fd  Sync:1 NMI:1  IR:b0 TCstate:011111  fetch:1 clearIR:0 D1x1:1

in which case the branch does visit the final state 101111 and the next instruction does start with a 011111 as usual.

A branch may be a case a bit like an indexed write where the final instruction state must be reserved in case a page is crossed. In the case of a branch, the next instruction isn't known to have started until it's known that there's no page crossing. The following instruction after a taken branch doesn't have a normal first cycle. And so a taken branch doesn't have a 'final' state, and so it wouldn't be interruptible. Instead it begins the interrupt handler during T2, which means the NMI needs to enter the chip a bit earlier than the usual case in order to catch the window. We see that the other way around: the interrupt appears to have been deferred.

Hope this makes some degree of sense!

Cheers
Ed

HiassofT · Post by **HiassofT** » Mon Sep 13, 2010 8:54 pm

HiassofT wrote:

BigDumbDinosaur wrote:

After reading your analysis and looking at the logic graphs, I have to wonder if this characteristic was carried over the the 65C02 and 65816. As you say, it appears to be harmless, but does have some perceptible effect on interrupt latency.

I had another idea: I have an Atari 800 (which uses the original 6502) at my parent's home, and just remembered that I bought a 65(C?)02 some 20 years ago for a SBC I wanted to create (but never finished it). I'll be visiting my parents next weekend, so I can check if this really is a 2MHz 65C02 and if yes take it all home, put the 65C02 into the Atari and run some tests.

These test have to wait a little bit. I opened my Atari 800 and realized it was one of the (later) models which had a CPU board with Atari's "6502C Sally" CPU - the same as in my 800XL.

So I need to build an adapter board for my 800XL first...

But, good news is: a friend is sending me some other CPUs to test with, a Rockwell 65C02, a CMD 65SC02 and a 65816. My 2 MHz 65C02 seems to have gone missing during the years, but I pulled a Rockwell 65C02 4 MHz from another board I had completely forgotten about :-)

so long,

Hias

HiassofT · Post by **HiassofT** » Mon Sep 13, 2010 9:01 pm

Hi Ed!

BigEd wrote:

Eureka! With a transistor-level simulation of the 6502 in hand, we can see exactly what's going on cycle by cycle. (Or even phase by phase.) Huge thanks to Greg James, Barry Silverman and Brian Silverman for their Visual6502 project (website under construction - see other thread) which provided the simulator and the netlist.

Wow, this is really awesome!

Did you get a pre-release of their software? I checked visual6502.org but couldn't find anything except for the static pic yet.

Quote:

And so a taken branch doesn't have a 'final' state, and so it wouldn't be interruptible. Instead it begins the interrupt handler during T2, which means the NMI needs to enter the chip a bit earlier than the usual case in order to catch the window. We see that the other way around: the interrupt appears to have been deferred.

Hope this makes some degree of sense!

Thanks a lot for the tests and for your analysis! And, at least to me, this makes some sense :-)

so long,

Hias

BigEd · Post by **BigEd** » Mon Sep 13, 2010 9:38 pm

HiassofT wrote:

Did you get a pre-release of their software? I checked visual6502.org but couldn't find anything except for the static pic yet.

Yes, but I think it'll be released soon. (I haven't seen the python version yet.)

Quote:

Thanks a lot for the tests and for your analysis! And, at least to me, this makes some sense :-)

Great!

Dr Jefyll · Post by **Dr Jefyll** » Tue Sep 14, 2010 2:46 pm

BigEd wrote:

Eureka! With a transistor-level simulation of the 6502 in hand, we can see exactly what's going on cycle by cycle. (Or even phase by phase.)

Code: Select all

cyc:10 cp1:1 cp2:0 Add:2304 D:b0 RnW:1  PC:2304 SP:fd  Sync:1 NMI:1  IR:b0 TCstate:111111  fetch:0 clearIR:1 D1x1:1

Wow. This is exciting material, but a little tough to comb through. The first thing I did was to reformat the simulation output text (by copy-and-pasting into a text editor). It's much more readable when partially double-spaced; also the original Forum view may (depending on screen resolution) cause the browser to wrap each line, which is very unhelpful. But I'm still not at a point where I can share your speculations, Ed. I'm not disagreeing; I'm just in the dark still! A few questions, if you don't mind:

How does one interpret the TCstate columns? Do these six-digit binary values correspond to the six timing lines feeding the PLA? Which is which?

What is the signal fetch?

Do clearIR and D1x1 correspond to the signals we were discussing from the schematic (reverse-engineered in previous research)? Apparently you have cross-referenced the old work with the new -- presumably by geographically locating specific circuit features in both contexts!

Unfortunately I haven't had time to fully scrutinize what you posted. But I noticed there's a lot of activity on clearIR, yet the Instruction Register rarely clears. Puzzling -- to me, at least!

-- Jeff

BigEd · Post by **BigEd** » Tue Sep 14, 2010 4:52 pm

Hi Jeff
yes, the forum's line-wrapping is a bit of a hindrance.

yes, the three signals at the end are all from Balazs' schematic. I think clearIR has no effect unless fetch is active.

The 6 TCState lines are indeed the PLA 'timing' inputs - the last four are conventionally labelled T2-T5 I think, and the first two are sync-related. The first would conventionally be called T0 but doesn't precisely correspond to Sync because of the taken-branch effect. Even if it did correspond to Sync, I think T0 is a misnomer because the active instruction in the IR is still the previous one. The second signal is normally active in the cycle before Sync - not sure what that should be called!

Notice in the traces that these signals aren't quite one-cold: sometimes we visit 111111 and sometimes 100111.

I completely agree that this is going to take a bit of digesting!

It would be good to run a set of LDA and STA in all addressing modes, and with page crossings where applicable. But even then, interactions with the previous instruction could be interesting.

(I don't think I've yet traced an indirect or indexed addressing mode - there's no disassembly so I'm not sure.)

It would be interesting to see if this specific chip - a revision D - has the famous ROR bug.

Cheers
Ed

A taken branch delays interrupt handling by one instruction

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct

Re: A taken branch delays interrupt handling by one instruct