I had a look at this, and I think I see where the problem might lie - I'm not sure whether any signals in your circuit already are quite right for what's needed. Here's a timing diagram as far as I can see, showing a couple of fast cycles and one slow cycle:
Attachment:
paganini_clockstretch.png [ 6.13 KiB | Viewed 6610 times ]
(Sorry it's so large, the forum seems to scale up the previews when things are not very tall!)
The signal I'd suggest sending to IFCLK is the one from Q0 of the counter - it is consistent and fast enough for the logic analyser, and it falls in sync with PHI2 most of the time, except during stretching which we'll need to fix.
I would try just wiring Q0 to IFCLK and not trying to make it skip any cycles first of all, to look at the raw data stream and see if it at least looks like it's working but capturing too much during stretching.
During stretching it is going to sample on all of the red and green marked edges, when we'd really like it to only sample on the green ones. So we need a signal that's high for the green edges, and low for the red edges - or vice versa. In my circuit I happened to have one handy, but in yours unfortunately CLK_STRETCH isn't it - it doesn't change back from low to high soon enough at the end of stretching.
The thing that's actually causing PHI2 to go low - stretching or not - is Q1 and Q2 both being high prior to the marked edges. In this case a falling edge on Q0 is going to "increment" Q1, causing it to carry into Q2, causing it in turn to go from high to low. So one option that might work could be an AND between Q1 and Q2, which will be high if the logic analyser should take a sample, and low otherwise.
Another option that ought to work is to use the TC output of the counter - this will go low (or is it high? I forget) for the cycle just before PHI2 goes low, so it seems ideal.
Either of these might work, but they may need slight delays or latching, as while they're both valid leading up to the fall of Q0, they change immediately after it - especially TC, which will change at the same time as Q0 - and the logic analyser might want a bit of hold time there.
As you said, with clock stretching (equally, RDY, WAI, etc) this is a bit fiddly to get to work, and it's interesting that our two circuits need different handling. I have made an adapter to make it easier and more reliable to connect the logic analyser to my system - and my system has other elements that make it rather awkward without the adapter - and having thought about your circuit, I think I can see a way to make the adapter work for both. I will have to think about that some more.
In the meantime, I'm hopeful the above ideas will work, and if they don't, I'll be interested to hear what goes wrong!