Are you referring to Read Modify Write instructions like INC & DEC for example?
No, I'm talking about JSR and BRK (and IRQs, but they are handled like BRK).
The BRK instruction, for instance, has 7 cycles. In my code, these cycles are:
DECODE - decode the BRK instruction
BRK0 - write PCH to stack
BRK1 - write PCL to stack
BRK2 - write P to stack
BRK3 - read from FFFE
JMP0 - read from FFFF
JMP1 - fetch opcode from new PC
The critical states here are BRK0, BRK1, and BRK2, in which 3 write cycles are performed in a row. The internal RAM blocks can do this with no problem, but external SRAM needs to have the WE asserted/deasserted for each write cycle. The original 6502 didn't have a problem, because it had the two-phase clocks, so it can assert WE on the first phase, and deassert it on the second phase, all within the same cycle.
There are several approaches you can take. I decided I'd just put the stack in the block RAM, which is the easiest.
You can also forget about cycle accuracy, and make the BRK take 9 cycles, by inserting extra wait states between BRK0/BRK1 and BRK1/BRK2, where the WE is deasserted. Since I managed to get everything else cycle accurate, I didn't want to break it for 2 instructions.
You can also redefine cycle accuracy by mimicking the 2 phase clock with a double clock frequency, so the BRK would be 14 cycles, and every other instruction would be exactly double too. This would be nice and simple, but also slower overall.
Lastly, you could try to reorder the state machine, so you'd get the following bus cycles:
DECODE - decode the BRK instruction
BRK0 - write PCH to stack
BRK1 - read from FFFE
BRK2 - write PCL to stack
BRK3 - read from FFFF
BRK4 - write P to stack
BRK5 - fetch opcode from new PC
By interleave the write cycles with read cycles, the WE is deasserted at the same time. This is a bit more complex, and probably requires some more resources. It is also not how the 6502 did it.
By the way, the INC ZP instruction has the following states in my core:
DECODE - decode instruction, read zeropage address
ZP0 - read zeropage location
READ - send data to ALU for +1 operation
WRITE - send ALU output to zeropage location
FETCH - get next instruction
In this case, there is only one write cycle on the bus (WRITE) which will assert WE, and it will be deasserted on the FETCH, so there's no problem. None of the other instructions have a problem either. It's only the BRK/JSR because they write multiple bytes to the stack.