6502 instruction decoding

ScottySR · Post by **ScottySR** » Mon Jan 07, 2019 11:25 am

I'm trying to make a 6502 emulator (mainly 2A03) that has the instruction decoding in it as well. For this reason I'm trying to understand the decoder to implement it correctly.
According to several sites the decoder is a 130x21 logic array. Where do these values come from?
According to one site the logic ignores certain bits in the opcode. How is it defined which are and which aren't?
Some sites say that the next opcode is read at the beginning of the instruction execution and some say it happens at the end of the previous instruction. Which is correct?
Do all instructions have 7 cycles (afaik this is the longest instruction can take) reserved for them in the array and if so what is past the end of an instruction?
This is all that comes to mind right now. I'll post more later.

BigEd · Post by **BigEd** » Mon Jan 07, 2019 1:59 pm

Hi ScottySR, and welcome.

As to which cycle is the first one of a new instruction and which is the last of the previous instruction, that's almost a philosophical question. Usually, you'll see the cycle in which SYNC is high called the first cycle, and it is indeed the fetch cycle. During this cycle the previous instruction may still be finishing - there is some potential for overlap. (But, I think, that's not true for all instructions. I could be wrong.)

There's quite a lot of messy detail in the 6502 implementation, so while the regular logic array is a sign of some regularity and order, it's by no means the full story. So, for example, it isn't a full microcode which gives all the control signals for each cycle of each opcode, although that is a possible way to implement a workalike machine.

The large regular array is sometimes called the decode ROM and sometimes called the PLA. It's just a nice way to implement a large number of NOR gates with inputs drawn from a common set. I think we know that the original NMOS 6502 from MOS has a different array from the one documented in the Atari documentation. I use the NMOS 6502 from MOS as captured in visual6502 as my go-to example, but it's worth noting that it's not the only 6502, and indeed it's only one specific revision.

Some possible reading matter:
https://www.pagetable.com/?p=39
http://visual6502.org/wiki/index.php?title=MOS_6502

Klaus2m5 · Post by **Klaus2m5** » Wed Jan 09, 2019 9:00 am

ScottySR wrote:

Do all instructions have 7 cycles (afaik this is the longest instruction can take) reserved for them in the array and if so what is past the end of an instruction?

In general a single 6502 cycle reflects the ability to access a single byte in the address space of the CPU. This may be ROM, RAM or I/O, read or write. Of course you may have dummy cycles to allow time for internal operations of the CPU. The shortest instruction takes at least 2 cycles.

2 cycle examples:
+ load accumulator immediate: 1. fetch opcode, 2. fetch immediate operand, the accumulator gets loaded during the fetch opcode cycle of the next instruction.
+ transfer accumulator to index register: 1. fetch opcode, 2. dummy cycle while fetching the data from accumulator, the index register is loaded during the fetch opcode cycle of the next instruction.

You need more cycles when the instruction specifies an address rather than an immediate operand, when the address is indirect and/or indexed and when data needs to be stored at the end of an instruction.

7 cycle example: increment absolute indexed
+ 1. fetch opcode, 2. fetch address low, 3. fetch address high, add index low, 4. (dummy) add index carry (known as page crosser), 5. fetch operand at indexed address, 6. (dummy) perform ALU increment operation, 7. store result to indexed address

ScottySR wrote:

Some sites say that the next opcode is read at the beginning of the instruction execution and some say it happens at the end of the previous instruction. Which is correct?

For obvious reasons a store at the end of an instruction must have its own cycle and cannot overlap with the opcode fetch of the following instruction. However, when the target of an instruction is a register the internal write operation is performed during the opcode fetch of the next instruction.

ScottySR · Post by **ScottySR** » Fri Feb 01, 2019 8:00 am

What about interrupts signals? As far as I know interrupts wait for current instruction to finish with BRK begin only exception if it hasn't started storing anything to the stack yet. Interrupts should have fairly similar micro op structure to BRK, but do they have to use a cycle to detect the signal or can it start to process it instantly? I guess you could compare this to how regular instructions need to fetch the opcode before knowing what to do. Does the CPU have some sort of identification for knowing if it is executing interrupt code (like a flag that is set on interrupt and cleared on RTI)? As far as I know not acknowledging the interrupt would cause it to trigger again after the previous interrupt has been completed.

But how about reset then? How many cycles does it take (one site says probably 6 cycles) and what operations does it take on each cycle? As far as finishing the previous instruction at the start of the next one, does this also apply with interrupt and reset signals or does the CPU wait for the instruction to fully finish before handling the signal? On that note, does reset even wait for the instruction to finish or does it happen instantly?

I also noticed that one document said about JMP and JSR that they fetch the low address byte. Some cycles later it is copied to PCL at the same time high address byte is fetched to PCH. Where does the CPU keep the low byte until it is copied to PCL?

BigEd · Post by **BigEd** » Fri Feb 01, 2019 10:32 am

It's important to note that the 6502 makes a record of an incoming interrupt so that the next instruction fetch can be modified. So, although the next instruction is fetched, the IR is loaded with BRK instead, and then the sequencer proceeds as usual. BRK, interrupts, and even reset, use the same sequence.

It's well worth using visual6502 to study the behaviour of the machine. Here, for example, we see a KIL opcode and then a reset. We see that the machine gets into a T0+T1 state and then when reset is released follows the BRK behaviour, more or less. (Edit: the point being that for reset to be recognised, a few cycles have to pass while the state machine gets into the right state. How many cycles depends on what it was doing at the time.)

It's also worth studying Donald Hansons's big block diagram. You'll notice that there is microarchitectural state which is not programmer visible. The 6502 doesn't have a hidden temporary register as the 6800 does, but it does have registered inputs to the ALU. Therefore the ALU can be used as temporary storage, and often is. I think you'll see that the first operand byte of a JMP takes a trip through the ALU.

And for JSR, indeed, the first operand byte has to be stored, and it turns out to be stored temporary in the stack pointer register S, which is fine because the value in S has to take a trip through the ALU to be decremented. See this happening in visual6502 here.

ScottySR · Post by **ScottySR** » Mon Feb 04, 2019 6:59 am

I'm afraid that visual6502 won't be useful for me any time soon. It's just hard to read and understanding what you are seeing will take some time to get used to.

Using the block diagram and this document, I think I have a pretty good understanding of how JSR operates.

Code: Select all


1. Fetch opcode to IR and Increment PC
2. Fetch low address byte to DL, move S to IB and Increment PC
3. Move low address byte from DL to S
4. Puch PCH to stack and decrement S (routes ADD back to IB for next decrement?)
5. Puch PCL to stack and decrement S
6. Fetch high address byte to PCH and move low address byte from S to PCL

(Stack pointer is restored to S during next opcode fetch?)

Does that seem right?

The block diagram seems to have the timing signal named as "T1X" between "T1" and "T2". Is this timing state somehow special? I'm assuming this is the same state as the "T0+T1" state that was mentioned.

The last thing I wanted to ask for now is related to the clock of the CPU. There seems to be two outputs, ø1 and ø2. According to one CPU clock diagram ø1 is high when the clock generates a high signal and ø2 when the signal is low (basically inverted output). Is this how the 6502 CPU clock works? More importantly, what purpose does the ø2 serve in the functionality of the CPU?

Arlet · Post by **Arlet** » Mon Feb 04, 2019 7:35 am

The old 6502 programming manual is also useful for understanding the bus cycles. http://6502.org/documents/books/mcs6500 ... manual.pdf

Not everything is explained, but scattered throughout the manual you'll find most instructions/addressing modes.

BigEd · Post by **BigEd** » Mon Feb 04, 2019 11:26 am

ScottySR wrote:

The block diagram seems to have the timing signal named as "T1X" between "T1" and "T2". Is this timing state somehow special? I'm assuming this is the same state as the "T0+T1" state that was mentioned.

Might be worth reading this page on the visual6502 wiki. My feeling is that you have to go quite deep to put together the various descriptions and notations of the timing states which different people at different times have put together. You can also search this forum for T1x, but you'll have some reading to do!

Quote:

The last thing I wanted to ask for now is related to the clock of the CPU. There seems to be two outputs, ø1 and ø2. According to one CPU clock diagram ø1 is high when the clock generates a high signal and ø2 when the signal is low (basically inverted output). Is this how the 6502 CPU clock works? More importantly, what purpose does the ø2 serve in the functionality of the CPU?

Not directly answering your question, but:
- phi2 as an output (pin 39) tracks the phi0 input (pin 37), with a slight delay.
- phi1 (pin 3) is very rarely used.
- there is some subtlety as to whether it's best, for the rest of the system design, to use the phi2 as it is fed into the 6502 or the version which comes out of the 6502.
- internally to the chip, phi1 and phi2 signals are non-overlapping, which isn't quite the same as being inverses of one another.

So, within the chip, the alternation of phi1 and phi2 is what allows data to be shuffled without loss from one transparent latch to the next. Two transparent latches make a flop, but unlike a flop it's possible to have logic in between the master and slave.

For most purposes, we can just use phi2 as a reference. Inside the chip, some things happen during not-phi2, the first half of a cycle, and some things happen during phi2, the second half. Mostly, actions are spread over two adjacent phases, either phi1 and phi2, or phi2 and phi1.

Outside the chip, all events are relative to the falling edge of phi2. The rising edge of phi2 is a convenient signal but does not define the timing of any external event for the 6502. (It is used by the '816 to multiplex the high byte of the 24 bit address onto the databus.) It is conventional, and convenient, to use phi2 as a mask to distinguish the early part of a cycle, where the address and control lines are changing, from the later part of a cycle, when the address and control lines are stable, and a write can be committed to exactly the right location. (It's also possible to use a different reference, if for example there's a 6x or 8x clock available in the system, or indeed to use logic delays, for the bold and intrepid.)

Dr Jefyll · Post by **Dr Jefyll** » Mon Feb 04, 2019 2:23 pm

Arlet wrote:

The old 6502 programming manual is also useful for understanding the bus cycles. http://6502.org/documents/books/mcs6500 ... manual.pdf

Not everything is explained, but scattered throughout the manual you'll find most instructions/addressing modes.

On the specific subject of the bus cycles, the Hardware Manual is probably a better reference. Those details are all collected in Appendix A.

MCS 6500 Family Hardware Manual

-- Jeff

ScottySR · Post by **ScottySR** » Tue Feb 05, 2019 10:45 am

Okay, next thing:
The phi2 seems to be connected in the precharge mosfets, so I guess I could try to figure them out now.

I believe the precharge mosfets are used to "latch" the bus state. But how does it actually work? The mosfet shouldn't be active during phi1, but it still keeps the value in the bus, for some time at least. Open drain mosfets are most likely used to clear the bits in the bus (or only some of them). The O/ADH0 and O/ADH(1-7) would then explain how either one or zero is inserted to ABH during zero page addressing and stack instructions. Although during stack instructions there has to be a way to guarantee bit 0 to be 1, but I'm not sure where the CPU gets that. The same goes with when the ALU loads 1 to one of it's inputs for incrementing (and assumingly 255 added when decrementing). And finally pass mosfets. They most likely just connect the two busses, but what happens to the bus values when the busses are connected? I'm also guessing that data can pass through the mosfet both ways.

BigEd · Post by **BigEd** » Tue Feb 05, 2019 12:19 pm

ScottySR wrote:

Okay, next thing:
The phi2 seems to be connected in the precharge mosfets, so I guess I could try to figure them out now.
I believe the precharge mosfets are used to "latch" the bus state.

They are there to precharge the bus - if it's charged to 1 then it's easy to arrange a conditional discharge to zero. An undriven node will act as a latch for free, at least for milliseconds, and that's easily long enough.

Quote:

But how does it actually work? The mosfet shouldn't be active during phi1, but it still keeps the value in the bus, for some time at least.

Right.

Quote:

Open drain mosfets are most likely used to clear the bits in the bus (or only some of them). The O/ADH0 and O/ADH(1-7) would then explain how either one or zero is inserted to ABH during zero page addressing and stack instructions.

Yes, exactly.

Quote:

Although during stack instructions there has to be a way to guarantee bit 0 to be 1, but I'm not sure where the CPU gets that.

The precharge puts all 1's on the bus, then the conditional pulldowns bring it to page 0 or 1 depending on the other conditional pulldown. The same mechanism gives the high-memory vector addresses: no pulldown means a bit stays as a 1.

Quote:

The same goes with when the ALU loads 1 to one of it's inputs for incrementing (and assumingly 255 added when decrementing).

I think 255 (0xFF) is used in both cases - if you subtract FF, that's the same as adding 01.

Quote:

And finally pass mosfets. They most likely just connect the two busses, but what happens to the bus values when the busses are connected?

That's something which might need careful modelling if two floating busses were connected and then the value used. Or it might not matter, if the busses are always driven at the time they are read. As it turns out, the visual6502 didn't need very careful modelling. Just somewhat careful. If one side or other of the bus is driven, and the busses connected, the right value will get to both halves.

Quote:

I'm also guessing that data can pass through the mosfet both ways.

Indeed. But 1s which pass through a mosfet are weak - about 3.5V maybe - which is why precharge can help, by starting with a solid 5V.

ScottySR · Post by **ScottySR** » Thu Feb 07, 2019 9:24 am

BigEd wrote:

They are there to precharge the bus - if it's charged to 1 then it's easy to arrange a conditional discharge to zero. An undriven node will act as a latch for free, at least for milliseconds, and that's easily long enough.

The whole concept of precharging might need a bit more explanation.

The next thing is the predecode logic block and some control signals. What does the predecode logic actually do? On that note, the little bit I was able to use visual6502, I noticed that not all bits from IR and timing logic go to the decode ROM. Is there a reason for why only some of them are used and do the seemingly unused ones go somewhere else or are they completely unused?

As for the control signals, what are the following signals used for:

-IR5/C, IR5/I, IR5/D
-I/V
-DBZ/Z
-TZPRE
-SV
-S/S
-DSA, DAA

Also, is there a reason why the increment logic of PCH is divided in two blocks rather than having one block like PCL has?

BigEd · Post by **BigEd** » Thu Feb 07, 2019 11:56 am

ScottySR wrote:

Also, is there a reason why the increment logic of PCH is divided in two blocks rather than having one block like PCL has?

One possibility is that they found they needed to have some carry lookahead but didn't need it for both bytes. It's thought that the PC increment is a critical path.

BigEd · Post by **BigEd** » Thu Feb 07, 2019 1:33 pm

(Can I suggest you do some searching, including on this forum and on the visual6502 wiki? It's very much easier (for you) to ask a question than (for me) to answer one.)

6502 instruction decoding

6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding

Re: 6502 instruction decoding