BigEd wrote:
I don't see any obvious reason, so I suppose it just fell out of the way the control logic is written - perhaps there was a chance to optimise which they missed, or perhaps the simplicity of doing it the same way saved a few transistors.
This pattern shows up in a number of places and it's timing related.
The processor needs an additional cycle to perform the addition by 1 with the ALU. In the meantime, it has to keep the external bus busy with a minimum of side effects. There is no such thing as an "idle" bus, so it will just plow along either reading or writing data with whatever bits happen to be active on the data and address buses. Leaving AB and DB with R would re-read the same value and overwrite the increment result, but flipping to W effectively makes it a no-op. That's how you end up with the extra write.
Btw, it gets worse with ABS,X. Here the CPU has to do an additional 'no-op' read to buy time for the carry addition with the address:
INC $2080,x (with x=0x90)
1: external: read instruction
2: external: read 0x80
3: external: read 0x20, internal: performs address addition on low byte: 0x80 + x (0x90) -> 0x10, c=1
4: external: read from $2010, internal: performs address addition on high byte (w/carry): 0x20 + c (1) -> 0x21
5: external: read from $2110 (now has correct address)
6: external: write unchanged value to $2110, internal: performs actual increment
7: external: write actual result to $2110
What's neat is that this technique helps save a cycle on read operations like "LDA" with ABS,X when the read happens on the same page, since step 4 amounts to a speculative read. When C happens to be 0 the address was already correct and the instruction is done. If C is 1 (across page boundary), it has to do an additional read with the correct address.