6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Oct 07, 2024 12:29 pm

All times are UTC




Post new topic Reply to topic  [ 33 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Thu Mar 21, 2013 9:35 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I have no doubt that the x86 instruction decoding is amazingly complex, but for a large part that's due to optimizations. For instance, you can unroll loops, combine instructions, or split them up, rename memory by registers, or do different things based on the rest of the state. Variable length instructions are annoying for alignment, but they keep cache efficiency high.

You can simplify the instruction set, but if you still want to do all the optimizations, I don't think you can save a lot on the instruction decoder. And if you leave out all the optimizations, you save area and power, but then you're going to be stuck with a performance gap. I think that's where ARM is now. Lower power, but also lower performance (compared to the high end x86 stuff), and I don't think they can close the performance gap without also increasing power consumption and complexity. As ugly as the x86 instruction set may look, it does its job remarkably well.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 21, 2013 9:37 pm 
Offline

Joined: Wed Jul 07, 2010 4:51 am
Posts: 18
Risc v Cisc: A history lesson

The way to maximize a cpu's performance is to match the instruction cycle period to the memory cycle period.

Back in the 60's memory accesses were SLOWwwwwwww while transistor based cpu instructions were relatively fast. So everyone tried to make their instructions big and complex to accomplish the most work for each memory access.

In the 70's we started to put memory on silicon and the memory cycle times started to close in on the instruction cycle times. The Risc movement realized that this meant if you wanted maximum performance then you must limit each instruction to only what could be done in a single memory access and predicted that Risc would one day rule the world as instructions shrank to fit in the ever shrinking memory cycle.

In the 90's we discovered that clock doubling and internal caches could push back the day when you would have to switch over from cisc to risc.

In the end it wasn't memory or instruction cycle times that made any difference at all.

It was the amount of time it would take to rewrite all that IBM PC software that mattered.

John Eaton


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 21, 2013 9:48 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
It will certainly be very interesting to see how ARM64 shapes up - it's ARM's, and perhaps RISC's, big chance to show where on power/performance/cost they can get to.

Edit: But see also IBM's POWER7 - highest clocked commercial cpu at 5.5GHz. POWER certainly started as RISC although I'm not sure that it still is. The Wii, PS3 and Xbox360 are all Power-based. Plenty of RISC out there!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 5:15 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
The 6502 isn't a RISC processor for two reasons:

1. Because the term "RISC" was created long after it had been designed.
2. It doesn't really fit the common traits of RISC processors.

From "Reduced Instruction Set Computers" by David A Patterson, "Communctions of the ACM" January 1985:

"Common RISC Traits
...
1. Operations are register-to-register, with only LOAD and STORE accessing memory
...
2. Operations and addressing modes are reduced. Operations between registers complete in one cycle, permitting a simpler, hardwired control for each RISC, instead of microcode.
...
3 Instruction formats are simple and do not cross word boundaries.
...
4. RISC branches avoid pipeline penalties."

1. 6502 does not meet this requirement. ALU ops can have memory operands.
2. 6502 does not meet this requirement. There are no one-cycle instructions.
3. 6502 does not meet this requirement. Instructions cross 8-bit words.
4. 6502 does not meet this requirement. Branches are multicycle and do not use branch delay slots.

Therefore, the 6502 does not have RISC traits as defined by David Patterson.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 5:21 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8522
Location: Southern California
Quote:
Therefore, the 6502 does not have RISC traits as defined by David Patterson.

Would the PIC16? Microchip calls it a RISC and says it has "single-cycle execution," but what they mean is an instruction cycle, not a sigle pulse of the clock, and it can't do anything at all in less than four clocks.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 5:51 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
GARTHWILSON wrote:
Quote:
Therefore, the 6502 does not have RISC traits as defined by David Patterson.

Would the PIC16? Microchip calls it a RISC and says it has "single-cycle execution," but what they mean is an instruction cycle, not a sigle pulse of the clock, and it can't do anything at all in less than four clocks.


That sounds like it fails requirement #2.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 6:13 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Of course, these are common traits, not requirements. The ARM qualifies as RISC, but never had exposed branch delay slots (which are a bad design, IMHO)


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 8:36 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
TMorita wrote:
GARTHWILSON wrote:
Quote:
Therefore, the 6502 does not have RISC traits as defined by David Patterson.

Would the PIC16? Microchip calls it a RISC and says it has "single-cycle execution," but what they mean is an instruction cycle, not a sigle pulse of the clock, and it can't do anything at all in less than four clocks.


That sounds like it fails requirement #2.
I would have to disagree with this (if you were referring to the 'one cycle' part of of #2). I believe 'instruction cycle' should be what matters - after all, the physical clock pulse is just a technicality. The chip could (and some do) utilize a slow-ticking external clock and multiply it internally instead (like e.g. the MIPS R4000 which is normally called a RISC processor). Or even multiply or divide it into different frequencies for different sections of the internals. How the clock pulse relates to the instruction cycle varies a lot between chips, which for example means that you can't directly compare the clock frequency of a Z80 with the clock frequency of the 6502 when comparing how much the CPU can do per cycle (an old battle.. :))

-Tor


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 9:28 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
> I would have to disagree with this (if you were referring to the 'one cycle' part of of #2). I believe
> 'instruction cycle' should be what matters - after all, the physical clock pulse is just a technicality.
> The chip could (and some do) utilize a slow-ticking external clock and multiply it internally instead
> (like e.g. the MIPS R4000 which is normally called a RISC processor). Or even multiply or divide it into
> different frequencies for different sections of the internals. How the clock pulse relates to the
> instruction cycle varies a lot between chips, which for example means that you can't directly
> compare the clock frequency of a Z80 with the clock frequency of the 6502 when comparing how
> much the CPU can do per cycle (an old battle.. :))
>
> -Tor

AFAIK the R4000 (and other processors) multiply the clock frequency internally because it is a problem to maintain signal integrity of high speed clocks along PCB traces. So therefore externally the clock frequency is halved.

Even if you consider PIC16 to not violate RISC trait #2, it still violates RISC trait #4 because the implementations require two clocks for a branch. There is no design feature to hide branch penalties.

To use fuzzy logic terminology, the set of "RISC processors" is not a crisp set; it is more of a fuzzy set with various degrees of membership. Some processors are very RISC, and some are barely RISC.
My personal taxonomy of RISC processors is:

Very RISC: MIPS, Alpha, M88K
Mostly RISC: SPARC, AM29K, PowerPC
Somewhat RISC: PIC16
Barely RISC: ARM AArch32

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 10:06 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8522
Location: Southern California
Quote:
Even if you consider PIC16 to not violate RISC trait #2, it still violates RISC trait #4 because the implementations require two clocks for a branch. There is no design feature to hide branch penalties.

I don't know how this monkeywrench affects the meeting of the requirements, but it gets worse-- The only branch available is to go around the next instruction; so if you want any other conditional branch, it involves a conditional skip of a GOTO instruction and takes three instruction cycles, or 12 clocks, to do what the 6502 does in 3. There are quite a few of these jury rigs, and I find the PIC16 generally takes about twice as many instructions, and twice as many clocks as the 65c02 to do a job, if the PIC16 can do it at all.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 10:19 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
Arlet wrote:
Of course, these are common traits, not requirements. The ARM qualifies as RISC, but never had exposed branch delay slots (which are a bad design, IMHO)


It's a microarchitectural optimization which was incorporated into architecture.

I've worked with branch delay slots, both conditional and unconditional.
I hate conditional branch delay slots. They are difficult to implement, at least in simulators.
Unconditional branch delay slots are not that bad IMHO. The implementation is pretty simple.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 10:29 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The problem with exposing the internal architecture is that it locks you in a certain implementation. You can't easily add improvements such as branch prediction without forcing the programmer to rewrite his code.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 18, 2013 10:41 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
Arlet wrote:
The problem with exposing the internal architecture is that it locks you in a certain implementation. You can't easily add improvements such as branch prediction without forcing the programmer to rewrite his code.


Branch prediction is a microarchitectural feature, not an architectural feature.
It does not necessitate a code rewrite.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 20, 2013 7:03 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
I'd agree with Toshi that RISC is a fuzzy set, but I agree with Arlet in considering ARM (certainly ARM1) as RISC. It might be that Patterson would disagree!

The interesting aspects of RISC to me are
    - lots of registers, mostly general-purpose
    - load and store as the only memory operations (no RMW)
    - fixed length instructions, easy to decode
    - admits a simple and regular implementation with dramatically reduced effort

I suppose I need some point about having no long-running instructions although I wouldn't go so far as to require all instructions to be single-cycle.

I posted some related resources here with some excerpts here.


Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 21, 2013 12:38 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
BigEd wrote:
I'd agree with Toshi that RISC is a fuzzy set, but I agree with Arlet in considering ARM (certainly ARM1) as RISC. It might be that Patterson would disagree!

Cheers
Ed


I consider Patterson to be the definitive reference on what constitutes a RISC processors, since he coined the term "RISC", and also since he wrote many of the foundational papers for RISC, in addition to leading the RISC-1 project at Berkeley.

Here are a few more reasons why ARM32 is not very RISC:

1. Load multiple and store multiple instructions (LDM, STM)
2. Multiple addressing modes (pre/post inc/dec addressing modes available for some instructions)
3. Multiple views of register set depending on processor mode (User, Supervisor, FIRQ, etc)

In the paper "The RISC Concept: A Survey of Implementations" by Esponda and Rojas:

http://page.mi.fu-berlin.de/rojas/pub/R ... pt1991.pdf

...they use a Kiviat graph to evaluate whether each processor fits RISC processor characteristics, albeit using their own criteria. The ARM32 processors deviates from their ideal RISC by the following characteristics:

1. Number of addressing modes is not one
2. Does not have delayed branches
3. CPI is considerably higher than 1 due to LDM/STM (this is a microarchitectural characteristic, but anyway)
4. Only 16 integer registers visible at a time

Toshi


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: