6502.org • View topic - 6502/65816 Pipeline

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Hardware

All times are UTC

6502/65816 Pipeline

Page 2 of 2

[ 20 posts ]

Go to page Previous 1, 2

Previous topic | Next topic

Author

Message

BigEd

Post subject: Re: 6502/65816 Pipeline

Posted: Fri Mar 27, 2015 11:08 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England

Hi cr1901
just in case it helps, here's the tabulation from the visual6502 simulation:

Code:

cycle ab   db   rw  Fetch   pc    a    x    y    s   p         ir
0   0000   88   1   DEY   0000   aa   00   00   fd   nv‑BdIZc   00
0   0000   88   1   DEY   0000   aa   00   00   fd   nv‑BdIZc   00
1   0001   d0   1         0001   aa   00   00   fd   nv‑BdIZc   88
1   0001   d0   1         0001   aa   00   00   fd   nv‑BdIZc   88
2   0001   d0   1   BNE   0001   aa   00   00   fd   nv‑BdIZc   88
2   0001   d0   1   BNE   0001   aa   00   00   fd   nv‑BdIZc   88
3   0002   02   1         0002   aa   00   ff   fd   Nv‑BdIzc   d0
3   0002   02   1         0002   aa   00   ff   fd   Nv‑BdIzc   d0

We can all agree that adding a DEY to a program will add two cycles. As Michael puts it:

Quote:

As pointed our here (or on another thread), a rudimentary amount of overlapped instruction fetch and execution is provided. Perhaps it may be more clearly stated to say that the these processors try to overlap the instruction fetch and write back phases of an instruction's execution.

From a CPU implementation perspective, I think it is helpful to see four cycles of activity corresponding to that DEY - two or three of which are overlapped. The final write back cycle is worth including because there is no overlap in the case where it's a memory store, only when it's a write back to a register. For the case of an instruction like DEY, we have:
Cycle 0: fetch, maybe overlapped with previous instruction's writeback
Cycle 1: decode
Cycle 2: execute, may be overlapped with next instruction's fetch
Cycle 3: writeback, may be overlapped with next instruction's decode

If you trace also the signals sb, alu, DPControl in visual6502, you'll see that the new value for Y is written back to the Y register during the first half of cycle 3, over the sb (Special Bus) - therefore it's valid to say that the 6502 is still working on the DEY in that cycle, even though the IR now contains the next instruction. There are pipelined state bits in the random control logic which continue to control the datapath with the appropriate actions for the previous instruction. For example, node 460 goes low in the second half of Cycle 2, and then dpc1_SBY is valid in the first half of Cycle 3 for the writeback.

It can be useful also to trace the Execute and State pseudosignals, or to use the Logmore button.
http://visual6502.org/JSSim/expert.html ... ,DPControl

If you're not thinking about CPU implementation, but only thinking about programming the CPU, then you don't need to go into such fine detail - you probably only care about the incremental extra cycles for each instruction, if you even care about that. Of course, the datasheets are written for programmers, not for CPU designers.

Hope this helps
Ed

Top

cr1901

Post subject: Re: 6502/65816 Pipeline

Posted: Sat Sep 12, 2015 8:28 am

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158

Just to make sure... all 6502 clock cycle counts take into account that the last cycle may not be doing anything besides fetching the next byte in the case that the instruction need to write/read memory (example STA $FE00 takes for cycles, but the write is committed during the third clock cycle)?

FWIW, I'm looking at a Verilog 6502 core's signals- cycle count is identical to an ASIC 6502, but what's actually on the bus during a given clock cycle may not.

Top

BigEd

Post subject: Re: 6502/65816 Pipeline

Posted: Sat Sep 12, 2015 8:45 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England

Arlet's core is quite different from most 6502s, in the bus timing. So let's put that to one side for a moment...

The NMOS 6502 will indeed do a write on the fourth cycle of STA absolute, and you're right to say that it sets up that write in the previous cycle. And yes, in a sense the core can't do anything useful during the write - there are no flags to update, no registers to change, and it can't use the bus to fetch the next opcode. From an outside perspective, cycle 4 is a write. That's the usual perspective. From an inside perspective, it's a wait. (There might be a small optimisation for a core: leave that write pending, fetch the next opcode instead, and perform the write during the following cycle. If that fetched opcode turns out to be a single byte, that's a win. But this will come at some cost in complexity and verification!)

For most (well, many) instructions, the "last" cycle is actually only the last bus cycle. The next cycle is used to fetch the next instruction and is counted as part of that instruction, but it's also used to update the registers and flags - it's the write-back.

Top

Rob Finch

Post subject: Re: 6502/65816 Pipeline

Posted: Sat Sep 12, 2015 12:35 pm

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 452
Location: Canada

The 65xx typically moves one byte per clock cycle which is what makes it so fast. To get even better performance the processor has to move more than one byte per cycle. In order to improve performance in the RTF65003, I have the processor read all instructions bytes from a cache in a single clock cycle. So that a number of longer instructions (eg. LDA $123456) take fewer clock cycles to execute than they would on a regular '02 / '816. The LDA $123456 takes only a single cycle to read four bytes as opposed to the four cycles on the '816.

For higher performance the '816 could probably benefit from an instruction cache. But it really needs to also be able to read more than one byte of an instruction per clock cycle which would change the processor quite a bit.

_________________
http://www.finitron.ca

Top

BigEd

Post subject: Re: 6502/65816 Pipeline

Posted: Sat Sep 12, 2015 12:41 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England

The other trick that's been proposed but not, I think, implemented by anyone yet is to offer a two-byte-wide port to an on-chip memory for page zero and maybe also page one. You'd end up with a core that has a fetch port, a read port, a write port, and a low memory port.

Top

Page 2 of 2

[ 20 posts ]

Go to page Previous 1, 2

Board index » 6502.org Users Forum » Hardware

All times are UTC

Who is online

Users browsing this forum: No registered users and 46 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum