I'm not sure exactly how the timing works on this, but it is easiest to consider the T0 state as the last state of the previous instruction. This state needs to fetch the new instruction into the instruction register, but is also free to finish off the current instruction. So, for example, while it is using the bus to read the next instruction it could also be updating an internal register.
So for the AND instruction you mention, it would go something like this (I've simplified a bit as the two phase clock adds a lot of complexity):
T1 Read the opcode (last cycle of previous instruction) T2 Read address low into Input Data Latch (low) T3 Read address high into Input Data Latch (high) T4 Read value into ALU input register B, set up the ALU for AND, move A register to ALU input register A T1 fetch next instruction, move ALU result (ADD) into A register
There are two registers that hold the instruction (partially decoded), the Pre-decode Register and the Instruction Register. Pre-decode is latched on Phi2 while Instruction Register is latched on Phi1. This is probably what allows the use of the current instruction while the next instruction is loaded into the CPU, The instruction register can be driving the state machine while the Pre-decode register is simultaneously being written to.
I don't think that all this is documented anywhere so you need to figure it out from the diagram and possibly look at Visual6502 to test your assumptions. With enough patience you could analyse what is happening in Visual6502 to understand this, but I think starting with the diagram is easier.
Hope that helps.
|