6502.org

Posted: **Fri May 08, 2009 7:17 pm**

Each 6502 instruction takes so many cycles as number of R/W operations plus sometimes some dead cycles. I heard that first 6502 models even on dead cycles did garbage R/W operations. I tried to systematize all kinds of dead cycles and ask you to correct my wrong assumptions.

1. All single-byte instructions, even NOP, takes at least 2 cycles. Any single-byte instruction have at least 1 dead cycle. Possible explanation - while CPU busy with decoding of fetched opcode it prefetch next byte for case of there can be operand. But if there is no operands CPU forget this prefetched byte and do not use it for decoding of next instruction. Worst thing in 6502, 1-cycle dex/txa/asl/... would did great boost to total speed. Location of this dead cycle always immediately after first fetch (BRK - fetch,dead,push,push,push,read,read)

2. Any R/W operation (asl mem/dec mem/rol mem/...) add one dead cycle. Immediately before writing cycle, i guess

3. All X/Y indexing modes add 1 dead cycle with one strange exception - "(zp),Y" "abs,X" and "abs,X" have no penalty if it is read-only instruction _and_ adding of X/Y do not change high byte of address. I have no assumption why it is only for this 3 modes, why only on reading and at what location dead cycle happens

4. Conditional branches. When branch not happens there no pealty, only 2 cycles for instruction fetching. When branch happens there 1 dead cycle (why?? why JMP have no dead cycles?) and plus 1 dead cycle if branch change high byte of PC (again same questions. what differenct with JMP?).

5. any POP instruction (PLP, PLA, RTI, RTS) have 1 dead cycle, regardless of number of bytes pulled from stack. PUSH operations have no dead cycles. strange thing

6. and last exceptions. RTS have 2 dead cycles, 1 related to POP and 1 unknown. JSR have 1 dead cycle.

ps. sorry for my english

Posted: **Fri May 08, 2009 7:22 pm**

I remember seeing your paper on it a couple of days ago while doing some google searching. An interesting idea of opcode comparison and look ahead to tell when to be able to use the dead cycles. You may want to post up your findings on here....

Posted: **Fri May 08, 2009 7:28 pm**

In 1-6 collected all dead cycles throw all 256 opcodes, including undocumented (except KIL codes of course). Maybe I found incorrect explanation of them, but list is full

Posted: **Fri May 08, 2009 9:40 pm**

I can only imagine (not know) the answers to your questions, based on the cycle-by-cycle bus info from WDC. They only put "internal operation" on dead bus cycles, not explaining exactly what's going on inside. The Commodore 65CE02 used a proprietary design to eliminate most dead bus cycles, so it had over 30 op codes that executed in 100ns @ 10MHz. You might be interested in the topic at viewtopic.php?t=1160 about using those dead bus cycles for DMA without stealing cycles and slowing anything down.

Posted: **Mon May 11, 2009 9:53 am**

The dead cycles happen when the 6502 needs a cycle for internal processing.

1. Single-byte instructions. In all of these, the first cycle is for fetching the instruction. There then follows a dead cycle while the operation is carried out. With improved pipelining (such as with the 65CE02), this dead cycle could be eliminated but I guess this was either too difficult or required too much silicon at the time the processor was designed.

2. On all read-modify-write instructions, the dead cycle is the second-to-last cycle. During it, the processor is performing the modify part of the instruction.

3. Indexed instructions need a cycle for the processor to add the index register to the fetched address. With (zp),y , abs,X and abs,Y addressing modes, this addition can take place during the fetch of the high address byte, so a dead cycle is not needed. In all cases, if a page boundry is crossed, the processor needs another cycle to increment the high byte of the address, causing an extra dead cycle. I think the extra dead cycle in store operations was added to ensure the processor doesn't add an invalid WRITE cycle by adding a dead cycle to the instruction - if the dead cycle is always there, the internal logic can ensure it is a read cycle.

4. A conditional branch that is not taken needs no processing. If the branch is taken, a dead cycle is inserted while the processor calculates the new PC value. If a page boundry is crossed, another dead cycle is needed to calculate the PC's high byte. All internal operations in the 6502 are only 8-bit wide, even address calculations. Jump instructions don't need a dead cycle as there is no processing of the address read in - it is just loaded straight into the program counter.

5. Pop instructions need a dead cycle to increment the stack pointer BEFORE the poped data can be read from the stack. Push instructions increment the stack pointer AFTER the data has been written, this can be done during the next instruction fetch.

6. RTS is effectively two pops, presumably it needs two dead cycles to increment the stack pointer twice. JSR would need one dead cycle to decrement the stack pointer between the pushing of the two bytes of the return address. The second decrement would happen during the fetch of the next instruction.

Hope you find this usefull, or at least interesting!

Posted: **Wed May 13, 2009 12:59 pm**

PaulF, thanks, now clear almost all! especially difference between push/pop and 16-bit indexing.

but jsr/rts are still 'exceptions'.

BRK do even 3 push, but have only 2 dead cycles (1 for single byte instruction and 1 push related) while JSR have 1 dead cycle for 2 push

same for RTS/RTI pair.

Posted: **Wed May 20, 2009 1:00 pm**

JSR needs to push 2 bytes, so it needs a dead cycle to decrement the stack pointer between the two pushes. A second decrement of the stack pointer is needed, this happens during the fetch of the following instruction.

RTS needs to pop two bytes so it needs two dead cycles, one to increment the stack pointer before the first pop, one to increment the stack pointer before the second pop.

I don't know how BRK & RTI work. They don't have enougth dead cycles in them to handle the stack pointer.

Break has only 1 dead cycle. (The other 6 are op-code fetch (1), push flags (1), push return address (2) and fetch vector (2).)

There are enougth cycles in BRK if the pushes to the stack are interleaved with the vector fetches, then the stack pointer could be decremented during the vector fetches. Were the 6502 designers this sneeky?

6502.org

Dead cycles

Dead cycles