Pipelined 6502
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Pipelined 6502
BigEd wrote:
an affordable FPGA has only 64k RAM on board - are you thinking of a different meaning of on board here Garth?
Quote:
As for the idea of fast predictable response for embedded computing, yes and no. A CPU running at 100MHz or 200MHz may well have the freedom to take a few more cycles, or a variable number of cycles, to respond to interrupts, and still be a great improvement on a 1MHz or 14MHz CPU which behaves exactly like a 6502. Or it may not - it depends on what the requirements are for any specific use case. A very fast and very cycle-efficient and entirely deterministic CPU is a difficult target to hit - again, there are engineering tradeoffs. What made sense at 1MHz in the early 70s may need adjusting forty years later.)
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Pipelined 6502
So, for any given choice of technology, you can have a bigger memory outside the chip than you can on it. So, caches make sense at certain operating points. If you keep changing the technology so you don't need a cache, then you don't need a cache - but you haven't proved that caches are not useful!
As for chip speed, you often make the point that you like low latency and fixed latency. But your latency will always vary by one clock, or indeed by one instruction. For a conventional 6502, that's say seven cycles, or at 14MHz it's 500nS. For a 100MHz pipelined CPU, that's some number of 10nS cycles. I hope you see how it might be that the less deterministic system counted in cycles might have better behaviour when looked at in nanoseconds - just change the numbers accordingly if they don't quite convince. The real world is measured in nanoseconds, so that's what counts.
I don't think the 10GHz idea from Bill means much here - you won't have a single-cycle large RAM at that speed, so Bill would end up in the same part of the design space as any other of us.
As for chip speed, you often make the point that you like low latency and fixed latency. But your latency will always vary by one clock, or indeed by one instruction. For a conventional 6502, that's say seven cycles, or at 14MHz it's 500nS. For a 100MHz pipelined CPU, that's some number of 10nS cycles. I hope you see how it might be that the less deterministic system counted in cycles might have better behaviour when looked at in nanoseconds - just change the numbers accordingly if they don't quite convince. The real world is measured in nanoseconds, so that's what counts.
I don't think the 10GHz idea from Bill means much here - you won't have a single-cycle large RAM at that speed, so Bill would end up in the same part of the design space as any other of us.
Re: Pipelined 6502
Welcome, manili.
The 6502 instruction set looks simple at first sight, but when trying to implement something 100% compatible to it,
a few subtle things in there might give you quite a headache.
There is a saying, that the devil hides in the details...
To list a few things:
Another discussion that might (or might not) be of interest for your project is here.
What makes the 6502 instruction decoder\sequencer a bit difficult are all those fancy addressing modes.
Would suggest to keep things as simple as possible from the start (while sticking with the NMOS 6502 instruction set),
because things tend to have a habit to become more and more complicated all by themselves later.
Good luck with your project,
looking forward to following your progress.
The 6502 instruction set looks simple at first sight, but when trying to implement something 100% compatible to it,
a few subtle things in there might give you quite a headache.
There is a saying, that the devil hides in the details...
To list a few things:
- The difference in the return address between RTS and RTI, as BigEd already had pointed out.
- For PHA\PLA etc., unlike most of the other CPUs the 6502 uses pre_increment and post_decrement,
means that the stackpointer decrements _after_ a Byte is written to the stack,
and increments _before_ a Byte is read from stack. - The 'B flag' in the status register which identifies a BRK instruction for the interrupt service routine isn't really a flag,
It's a control signal from the "instruction sequencer".
Means, after a BRK instruction, the 6502 pushes status register and PC on stack before fetching the vector,
and only in the status Byte pushed on stack the Bit that resembles the B flag would be 1.
Another discussion that might (or might not) be of interest for your project is here.
What makes the 6502 instruction decoder\sequencer a bit difficult are all those fancy addressing modes.
Would suggest to keep things as simple as possible from the start (while sticking with the NMOS 6502 instruction set),
because things tend to have a habit to become more and more complicated all by themselves later.
Good luck with your project,
looking forward to following your progress.
Re: Pipelined 6502
BigEd wrote:
The real world is measured in nanoseconds, so that's what counts.
Re: Pipelined 6502
Arlet wrote:
And as systems have gotten faster, the physical limits have not improved.
One solution would be having ball grid array pins on top and at the bottom of the CPU chip,
then to solder the CPU chip into the PCB... and SDRAM on top of the CPU chip.
But speaking of speed:
PCs seem to be getting faster with every year, but if you happen to be into process controll
or real time data aquisition stuff, it feels like getting the data in and out of the PC
at a reasonable speed is getting more and more difficult\complicated with every year, too.
Of course, having two separate bus systems in a CPU, one for memory and one for I\O,
would require a lot more pins on the chip.
For instance, the obsolete TMS320C30 had two bus systems (block diagram is on page 12).
Re: Pipelined 6502
Indeed, even the humble Raspberry Pi uses Package on Package - diagram here.
But I feel we've strayed rather off-topic. The idea of this thread is a B.S. project to build a pipelined CPU somewhat like a 6502, with caches.
But I feel we've strayed rather off-topic. The idea of this thread is a B.S. project to build a pipelined CPU somewhat like a 6502, with caches.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Pipelined 6502
BigEd wrote:
As for chip speed, you often make the point that you like low latency and fixed latency. But your latency will always vary by one clock, or indeed by one instruction. For a conventional 6502, that's say seven cycles, or at 14MHz it's 500nS. For a 100MHz pipelined CPU, that's some number of 10nS cycles. I hope you see how it might be that the less deterministic system counted in cycles might have better behaviour when looked at in nanoseconds - just change the numbers accordingly if they don't quite convince. The real world is measured in nanoseconds, so that's what counts.
I'm not sure what you're getting at here. If the system is kept simple enough to get all the memory and processor on the same IC, you can run it faster, right? And wouldn't running from cache make it a lot less deterministic? If you're in the middle of a refill because of a cache miss when an interrupt hits, won't that cause a huge increase in latency?
Arlet wrote:
BigEd wrote:
The real world is measured in nanoseconds, so that's what counts.
That's why I'm saying that for the small amount of memory likely to be used in a 65 system (even an '816), all the memory can be on the same chip with the processor and I/O, and still be a much smaller die than many of the modern processors. The memory bus(es) won't go out on the PCB at all. How fast are the fastest caches?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Pipelined 6502
Quote:
How fast are the fastest caches?
Re: Pipelined 6502
What I'm saying is that if you move the goalposts, you can always seem to score. There are circumstances in which caches add performance, and there are circumstances in which some variability in latency is acceptable. There are other circumstances in which the original 6502 or '816 are absolutely the ideal solution. But we ought to be able to organise our thoughts well enough to distinguish these cases.
There's yet another case, in which it's worthwhile pursuing the design and implementation of a more advanced microarchitecture, as a learning experience, even if the end result is not an engineering solution to any specific problem.
Perhaps we could take this to another thread? Something like 'Is a cache ever an advantage?' or less problematic 'When is a cache a good solution?'
There's yet another case, in which it's worthwhile pursuing the design and implementation of a more advanced microarchitecture, as a learning experience, even if the end result is not an engineering solution to any specific problem.
Perhaps we could take this to another thread? Something like 'Is a cache ever an advantage?' or less problematic 'When is a cache a good solution?'
Re: Pipelined 6502
GARTHWILSON wrote:
If you're in the middle of a refill because of a cache miss when an interrupt hits, won't that cause a huge increase in latency?
Re: Pipelined 6502
IMHO a cache is a "kludge" to compensate for slow memory speed\bandwidth,
and it won't make the "real time" response of a computer "more deterministic".
6502 can address 64kB of memory in total.
If 64kB of RAM would fit into the FPGA you choose for the CPU implementation,
the interesting question is, if we really need a cache when creatively "arranging"
the RAM blocks in the FPGA. //That's why I had mentioned the TMS320C30 DSP, too.
Before we are getting lost in an endless debate about processor architecture,
it certainly would be good to know what FPGA hardware (or evaluation board)
manili already has (or intends to buy) for his project...
...and then to sort out what is feasible with the hardware and what is not.
and it won't make the "real time" response of a computer "more deterministic".
6502 can address 64kB of memory in total.
If 64kB of RAM would fit into the FPGA you choose for the CPU implementation,
the interesting question is, if we really need a cache when creatively "arranging"
the RAM blocks in the FPGA. //That's why I had mentioned the TMS320C30 DSP, too.
Before we are getting lost in an endless debate about processor architecture,
it certainly would be good to know what FPGA hardware (or evaluation board)
manili already has (or intends to buy) for his project...
...and then to sort out what is feasible with the hardware and what is not.
Re: Pipelined 6502
I'm not even sure that's true - the idea is to make a successful B.S. project which demonstrates some level of engineering understanding. Even if it goes really slow, has some bugs, doesn't fit on the FPGA, it still could be a successful project depending on how it is written up.
We need a completely separate discussion for projects we might like to make for other purposes. Some of us are thinking about projects to learn and demonstrate, and others of us are thinking about projects with commercial applicability. We are not all on the same page!
We need a completely separate discussion for projects we might like to make for other purposes. Some of us are thinking about projects to learn and demonstrate, and others of us are thinking about projects with commercial applicability. We are not all on the same page!
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Pipelined 6502
Arlet wrote:
Quote:
How fast are the fastest caches?
Quote:
Perhaps we could take this to another thread? Something like 'Is a cache ever an advantage?' or less problematic 'When is a cache a good solution?'
Arlet wrote:
GARTHWILSON wrote:
If you're in the middle of a refill because of a cache miss when an interrupt hits, won't that cause a huge increase in latency?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Pipelined 6502
GARTHWILSON wrote:
I don't see anything there about nanoseconds.
Re: Pipelined 6502
Thank you all for this hot discussion
!
The topic grows really fast, and I don't know how to answer previous REPLYs (that was a typo and thanks for your notification
).
Frist thing first, I should say my special thanks to all people who took part in this discussion and help me learn much more things, and most of them encouraged me to continue this project (as it become more difficult day after day). I'm currently busy working on stack related instructions. And believe me they are really hard to implement !
There are two important points here :
1. As I said before this is going to be a B.S. project and the last step is to just synthesis it. So there is no kind of FPGAs or ASICs. Anything I'll do/I did should be seen under this scope (including implementing caches, 6 pipeline stages and etc...).
2. I'm really at the middle of the project (80%). So I can't just destruct the whole thing and try to rebuild it again. That's impossible because of point #1.
I think one of the most important parts which Garth pointed out is "Why WISHBONE" ?
1. Again look at #1
.
2. Many of opencores.org IP cores are WISHBONE compatible. So they can just talk to my processor without any problem. This is the main reason behind the WISHBONE. And also you can use memories/peripherals with different clock speeds beside the processor.
Again thank you all for your REPLIES
.
P.S. : Again sorry for my bad English.
The topic grows really fast, and I don't know how to answer previous REPLYs (that was a typo and thanks for your notification
Frist thing first, I should say my special thanks to all people who took part in this discussion and help me learn much more things, and most of them encouraged me to continue this project (as it become more difficult day after day). I'm currently busy working on stack related instructions. And believe me they are really hard to implement !
There are two important points here :
1. As I said before this is going to be a B.S. project and the last step is to just synthesis it. So there is no kind of FPGAs or ASICs. Anything I'll do/I did should be seen under this scope (including implementing caches, 6 pipeline stages and etc...).
2. I'm really at the middle of the project (80%). So I can't just destruct the whole thing and try to rebuild it again. That's impossible because of point #1.
I think one of the most important parts which Garth pointed out is "Why WISHBONE" ?
1. Again look at #1
2. Many of opencores.org IP cores are WISHBONE compatible. So they can just talk to my processor without any problem. This is the main reason behind the WISHBONE. And also you can use memories/peripherals with different clock speeds beside the processor.
Again thank you all for your REPLIES
P.S. : Again sorry for my bad English.
Last edited by manili on Mon Oct 10, 2016 9:32 am, edited 3 times in total.