6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Jun 20, 2024 11:50 am

All times are UTC




Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 6:22 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8459
Location: Southern California
BigEd wrote:
an affordable FPGA has only 64k RAM on board - are you thinking of a different meaning of on board here Garth?

I was thinking of ASICs, not FPGAs, since the school project implies something that would be cheap in high volumes, and low-power. I doubt that FPGAs would qualify. For low-volume designs, we'll do it for fun, or to prove a concept, or to preserve a software investment, otherwise go to an ARM or other processor.

Quote:
As for the idea of fast predictable response for embedded computing, yes and no. A CPU running at 100MHz or 200MHz may well have the freedom to take a few more cycles, or a variable number of cycles, to respond to interrupts, and still be a great improvement on a 1MHz or 14MHz CPU which behaves exactly like a 6502. Or it may not - it depends on what the requirements are for any specific use case. A very fast and very cycle-efficient and entirely deterministic CPU is a difficult target to hit - again, there are engineering tradeoffs. What made sense at 1MHz in the early 70s may need adjusting forty years later.)

Bill Mensch said in an interview last year that if the '02 were made in the latest (for last year) deep-submicron geometry, he expects it could do 10GHz, which would be about 2.5 GIPS.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 6:28 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10829
Location: England
So, for any given choice of technology, you can have a bigger memory outside the chip than you can on it. So, caches make sense at certain operating points. If you keep changing the technology so you don't need a cache, then you don't need a cache - but you haven't proved that caches are not useful!

As for chip speed, you often make the point that you like low latency and fixed latency. But your latency will always vary by one clock, or indeed by one instruction. For a conventional 6502, that's say seven cycles, or at 14MHz it's 500nS. For a 100MHz pipelined CPU, that's some number of 10nS cycles. I hope you see how it might be that the less deterministic system counted in cycles might have better behaviour when looked at in nanoseconds - just change the numbers accordingly if they don't quite convince. The real world is measured in nanoseconds, so that's what counts.

I don't think the 10GHz idea from Bill means much here - you won't have a single-cycle large RAM at that speed, so Bill would end up in the same part of the design space as any other of us.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 6:55 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1397
Welcome, manili.

The 6502 instruction set looks simple at first sight, but when trying to implement something 100% compatible to it,
a few subtle things in there might give you quite a headache.

There is a saying, that the devil hides in the details...
To list a few things:

  • The difference in the return address between RTS and RTI, as BigEd already had pointed out.
  • For PHA\PLA etc., unlike most of the other CPUs the 6502 uses pre_increment and post_decrement,
    means that the stackpointer decrements _after_ a Byte is written to the stack,
    and increments _before_ a Byte is read from stack.
  • The 'B flag' in the status register which identifies a BRK instruction for the interrupt service routine isn't really a flag,
    It's a control signal from the "instruction sequencer".
    Means, after a BRK instruction, the 6502 pushes status register and PC on stack before fetching the vector,
    and only in the status Byte pushed on stack the Bit that resembles the B flag would be 1.


Another discussion that might (or might not) be of interest for your project is here.

What makes the 6502 instruction decoder\sequencer a bit difficult are all those fancy addressing modes.
Would suggest to keep things as simple as possible from the start (while sticking with the NMOS 6502 instruction set),
because things tend to have a habit to become more and more complicated all by themselves later.

Good luck with your project,
looking forward to following your progress.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 6:59 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigEd wrote:
The real world is measured in nanoseconds, so that's what counts.

And as systems have gotten faster, the physical limits have not improved. In the time that a hypothetical 10GHz CPU does one clock cycle, the data from external memory has traveled less than an inch across the circuit board. It is unavoidable that a real life system would require several clocks to fetch data, which means that you'd want to send data in bursts, which adds to complexity and latency for individual transfers.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 7:20 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1397
Arlet wrote:
And as systems have gotten faster, the physical limits have not improved.

True, true.
One solution would be having ball grid array pins on top and at the bottom of the CPU chip,
then to solder the CPU chip into the PCB... and SDRAM on top of the CPU chip.

But speaking of speed:
PCs seem to be getting faster with every year, but if you happen to be into process controll
or real time data aquisition stuff, it feels like getting the data in and out of the PC
at a reasonable speed is getting more and more difficult\complicated with every year, too.

Of course, having two separate bus systems in a CPU, one for memory and one for I\O,
would require a lot more pins on the chip.

For instance, the obsolete TMS320C30 had two bus systems (block diagram is on page 12).


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 7:59 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10829
Location: England
Indeed, even the humble Raspberry Pi uses Package on Package - diagram here.

But I feel we've strayed rather off-topic. The idea of this thread is a B.S. project to build a pipelined CPU somewhat like a 6502, with caches.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:02 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8459
Location: Southern California
BigEd wrote:
As for chip speed, you often make the point that you like low latency and fixed latency. But your latency will always vary by one clock, or indeed by one instruction. For a conventional 6502, that's say seven cycles, or at 14MHz it's 500nS. For a 100MHz pipelined CPU, that's some number of 10nS cycles. I hope you see how it might be that the less deterministic system counted in cycles might have better behaviour when looked at in nanoseconds - just change the numbers accordingly if they don't quite convince. The real world is measured in nanoseconds, so that's what counts.

I'm not sure what you're getting at here. If the system is kept simple enough to get all the memory and processor on the same IC, you can run it faster, right? And wouldn't running from cache make it a lot less deterministic? If you're in the middle of a refill because of a cache miss when an interrupt hits, won't that cause a huge increase in latency?

Arlet wrote:
BigEd wrote:
The real world is measured in nanoseconds, so that's what counts.

And as systems have gotten faster, the physical limits have not improved. In the time that a hypothetical 10GHz CPU does one clock cycle, the data from external memory has traveled less than an inch across the circuit board. It is unavoidable that a real life system would require several clocks to fetch data, which means that you'd want to send data in bursts, which adds to complexity and latency for individual transfers.

That's why I'm saying that for the small amount of memory likely to be used in a 65 system (even an '816), all the memory can be on the same chip with the processor and I/O, and still be a much smaller die than many of the modern processors. The memory bus(es) won't go out on the PCB at all. How fast are the fastest caches?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:14 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
How fast are the fastest caches?

Here's some useful data on a recent high performance CPU. It appears that the L1 cache has a 4 cycle latency, and is only 32kB big. And I assume Intel hardware engineers have pulled every trick from the book to make it as fast as possible on the given technology.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:14 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10829
Location: England
What I'm saying is that if you move the goalposts, you can always seem to score. There are circumstances in which caches add performance, and there are circumstances in which some variability in latency is acceptable. There are other circumstances in which the original 6502 or '816 are absolutely the ideal solution. But we ought to be able to organise our thoughts well enough to distinguish these cases.

There's yet another case, in which it's worthwhile pursuing the design and implementation of a more advanced microarchitecture, as a learning experience, even if the end result is not an engineering solution to any specific problem.

Perhaps we could take this to another thread? Something like 'Is a cache ever an advantage?' or less problematic 'When is a cache a good solution?'


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:21 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
If you're in the middle of a refill because of a cache miss when an interrupt hits, won't that cause a huge increase in latency?

Possibly, yes. But 'huge' is a relative word. We're still talking nanoseconds here, and there are few real-life events that need 0.1 microsecond interrupt response. And for those cases where this is a requirement, people usually solve this by adding some special purpose hardware, or a dedicated I/O processor.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:33 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1397
IMHO a cache is a "kludge" to compensate for slow memory speed\bandwidth,
and it won't make the "real time" response of a computer "more deterministic".

6502 can address 64kB of memory in total.

If 64kB of RAM would fit into the FPGA you choose for the CPU implementation,
the interesting question is, if we really need a cache when creatively "arranging"
the RAM blocks in the FPGA. //That's why I had mentioned the TMS320C30 DSP, too.


Before we are getting lost in an endless debate about processor architecture,
it certainly would be good to know what FPGA hardware (or evaluation board)
manili already has (or intends to buy) for his project...

...and then to sort out what is feasible with the hardware and what is not. ;)


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:36 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10829
Location: England
I'm not even sure that's true - the idea is to make a successful B.S. project which demonstrates some level of engineering understanding. Even if it goes really slow, has some bugs, doesn't fit on the FPGA, it still could be a successful project depending on how it is written up.

We need a completely separate discussion for projects we might like to make for other purposes. Some of us are thinking about projects to learn and demonstrate, and others of us are thinking about projects with commercial applicability. We are not all on the same page!


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:49 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8459
Location: Southern California
Arlet wrote:
Quote:
How fast are the fastest caches?

Here's some useful data on a recent high performance CPU. It appears that the L1 cache has a 4 cycle latency, and is only 32kB big. And I assume Intel hardware engineers have pulled every trick from the book to make it as fast as possible on the given technology.

I don't see anything there about nanoseconds. I've seen leaded SRAM ICs down to about 5ns, truly random-access memory, ie, not burst mode with a latency to get the burst going, nor requiring the following bytes to come from successive addresses.

Quote:
Perhaps we could take this to another thread? Something like 'Is a cache ever an advantage?' or less problematic 'When is a cache a good solution?'

Good idea. Go ahead. Much appreciated.

Arlet wrote:
GARTHWILSON wrote:
If you're in the middle of a refill because of a cache miss when an interrupt hits, won't that cause a huge increase in latency?

Possibly, yes. But 'huge' is a relative word. We're still talking nanoseconds here, and there are few real-life events that need 0.1 microsecond interrupt response. And for those cases where this is a requirement, people usually solve this by adding some special purpose hardware, or a dedicated I/O processor.

If you have to load 1K at a time, at 100MB/s (for the sake of discussion), that's 10µs, which is a long, long time to wait for interrupt service on a processor designed for performance high enough to justify having caches. However, if interrupt and direct (non-cache) memory performance is high enough, you can eliminate the separate sound cards, and going to the hypothetical extreme, even video cards or video chip sets, in the hypothetical case of a timer-driven interrupt for each pixel.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 8:53 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
I don't see anything there about nanoseconds.

On the first line it mentions that the clock is 3.4 GHz, so 4 cycles would be just over 1 nanosecond for the L1 cache. The L3 cache has a latency of 36 cycles, or just over 10 nanoseconds.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 9:04 am 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Thank you all for this hot discussion :mrgreen: !

The topic grows really fast, and I don't know how to answer previous REPLYs (that was a typo and thanks for your notification :mrgreen: ).

Frist thing first, I should say my special thanks to all people who took part in this discussion and help me learn much more things, and most of them encouraged me to continue this project (as it become more difficult day after day). I'm currently busy working on stack related instructions. And believe me they are really hard to implement !
There are two important points here :
1. As I said before this is going to be a B.S. project and the last step is to just synthesis it. So there is no kind of FPGAs or ASICs. Anything I'll do/I did should be seen under this scope (including implementing caches, 6 pipeline stages and etc...).
2. I'm really at the middle of the project (80%). So I can't just destruct the whole thing and try to rebuild it again. That's impossible because of point #1.

I think one of the most important parts which Garth pointed out is "Why WISHBONE" ?
1. Again look at #1 :mrgreen: .
2. Many of opencores.org IP cores are WISHBONE compatible. So they can just talk to my processor without any problem. This is the main reason behind the WISHBONE. And also you can use memories/peripherals with different clock speeds beside the processor.

Again thank you all for your REPLIES :D .

P.S. : Again sorry for my bad English.


Last edited by manili on Mon Oct 10, 2016 9:32 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 54 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: