But it should allow the main program to reach an SEI at some point without extraordinary effort; albeit quite slowly.
Once you turn on an interrupt, you've wandered off the reservation anyways, and all the dragons and sea serpents that will appear as a consequence will be there because you invited them in.
Furthermore, it is very common for the main program to disable/enable interrupts in order to access common data structures between itself and the ISR(s). Under these circumstances, I don't think that it's realistic to express concern regarding the additional latency introduced by allowing the instruction at the return address to execute before the ISR is re-entered. The latency introduced by the enable/disable sequences of the main program are likely to be much longer than that of the longest executing 65C02 instructions, e.g. 7 or 8 cycles for jmp (abs,X).
If the hardware and the software are well synchronized so that there's no need to enable/disable the interrupts, then interrupts are a convenience and not a necessity. The question then becomes is how much time is the main program spending waiting for the interrupt to occur.
Finally, the service time jitter introduced by the use of SEI/CLI to access critical regions should be on the order of 50, 100, or more clock cycles for any processor performing any meaningful work in an interrupt driven environment. The jitter has to be a fraction of the interrupt rate and interrupt service time in order for there to be any processing time available for the main program to perform any meaningful work.
Given all of the effects that can affect the regularity of interrupt service routines, I don't consider using interrupts for timing or sampling of events which require precision greater than about 1000 clock cycles. For those types of events, I use hardware timers and independent hardware feeding into, or being fed from, FIFOs/queues. That's not to say that there's not a place for interrupts in these type of tasks, but it's that I lean toward a blended solution between custom/semi-custom HW and SW.
Thus, if I were to use interrupts in the manner that Garth described for audio sampling, then I would be concerned about the additional latency that my design decision for the M65C02 core might introduce. However, the core is targeted at FPGAs, and as a consequence, it is not just the core but any additional logic that may be required by an application that will be included in the FPGA. If the core was intended as a general purpose processor, then I would consider changing the mix of interruptable and non-interruptable instructions. I expect that only the interrupt handling microsequence and the microsequences of the program flow control instructions would need adjustment; no change required to any of the logic.