I removed a couple of the 'clever' tricks in my state machine that I had introduced to reduce number of states. Instead of having the JMP piggyback on the ABS path, and then diverging, the JMP instruction now has its own states: JMP0->JMP1->SYNC. This same path is used by JSR and BRK. The indirect JMP(IND) also got its own states: IND0->IND1->JMP0->JMP1->SYNC.
This increases the number of states, but it reduces the number of extra inputs we have to deal with. The only 2 extra inputs left are for Read-Modify-Write, and for the BCD adjustment step. Removing those two would require duplicating a bunch of states.
By getting rid of the INIT state (which was only necessary for testing without proper reset support), the total number of states is now
exactly 32. My idea is to implement these 32 states in 5 bits, using standard binary encoding.
Each bit gets 1 flop to hold the current state, and a bit of logic, according to schematic. Each state flop is controlled by a 6-input LUT, of which 5 inputs are used to track the current state, and the 6th input has multiple functions, depending on current state. Since the FSM LUT knows the full current state, it knows exactly how to use the 6th input. This 6th input has 4 different sources:
- The opcode decoder, which produces the initial state number for the particular addressing mode/special instruction (probably takes 2 LUTs per bit, using different DB bits for each state bit).
- The RMW input, to add an extra write back state.
- The BCD input, to add an extra BCD adjustment state.
- The IRQ input, to go to the BRK0 state (also used for RST/NMI).
These 4 sources are selected based on 2 of the state bits, using appropriate encoding. The FSM LUT knows to ignore the 6th input when it's not in the proper state.