6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 4:12 pm

All times are UTC




Post new topic Reply to topic  [ 44 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: CISC or RISC
PostPosted: Sat Oct 01, 2005 9:28 pm 
Offline

Joined: Thu Jul 07, 2005 12:34 am
Posts: 23
Location: Minnesota
Is the 6502 a RISC or CISC processor? Thanks for your help.
bvold,
:twisted:


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 02, 2005 1:24 am 
Offline

Joined: Sat Aug 31, 2002 12:33 pm
Posts: 64
Location: USA
Hi Everyone,

This is a heavily debated subject, but I'll try and to speak about the most basic definition (there isn't really an absolute one though) of a RISC processor. Some people define RISC processors as ones where the instruction length (of opcode and operand) is fixed where CISC processors have a variable instruction length. Since the instruction length in the 6502 varies from 1 to 3 bytes I would say that the 6502 is not a RISC processor.

Cheers,

Paul


Top
 Profile  
Reply with quote  
 Post subject: Re: CISC or RISC
PostPosted: Sun Oct 02, 2005 1:53 pm 
Offline
User avatar

Joined: Sun Nov 28, 2004 3:07 pm
Posts: 28
Location: Budapest, Hungary
bvold wrote:
Is the 6502 a RISC or CISC processor? Thanks for your help.
bvold,
:twisted:


Well, this is a common idea that 6502 is a RISC processor probably because having somewhat simplier instructions compared to other CPUs, like Z80. And yes, 6502 executes an average operation faster than Z80 on the same clock. But this is not (or not only) what notion of RISC means. 6502 _is_ CISC CPU, a "real" RISC CPU does not even know more addressing modes than only a few. The idea of RISC is to have a "minimalistic" opcode set which then can be executed / paralellized / etc very fast because they're really simple. 6502 has _VERY_ complex addressing modes compared to the RISC world. An average RISC design have got 16 or even more general purpose registers without functional difference. And only one major addressing mode is used:
address with a content of a register. Since we've quite lots of registers it does not a problem, you can do the job of more complicated addressings "by hand". Besides my bad English I hope it can help.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 03, 2005 4:29 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
According to Bill Mensch, the 65c02's and 65816's designer, RISC is register-pointer based, with very limited addressing modes, and has wide, integrated instructions that are not suitable for 8-bit buses.  He calls CISC "confused instruction set computer" however.  He has been referring to 65xx sometimes as "notRISC" (nRISC), "best of both" (BoB) worlds, "notCISC" (nCISC), or most simply as 65xx to leave to the pundits to figure out.  On 5/18/05 he said, "For many application we just simply work better and engineers can apply this 'little engine that can' to many a varied system that powers people (defibrolators), industrial controllers, automobiles, consumer toys, TV games, cell phone sound chips, and many more.  The 65xx architecture is sold in hundreds of millions of chips per year and the volume is growing.  We now find engineers porting applications that may have first been market-tested on PC or ARM processors and finding the 65xx economy (both business and silicon) attractive and well worth the effort to change to the 65xx.  As the inventor of the 65xx architecture it is refreshing to see this trend."

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Oct 05, 2005 4:32 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
The 6502/65816 processors employ zero page precisely as a RISC processor makes use of its internal registers. Indeed, the internal register sets of most RISCs often approaches 128 bytes of memory -- half that of the 6502's zero page.

Looking at most RISC instruction sets, you'll see that, except the 3-operand architecture, the 6502's instruction set closely parallels the *patterns* used in RISC instruction sets. That is, it favors *small* addresses (8-bits in the 6502, 5-bits for most RISCs).

As a result, most of the addressing modes used in the 6502 *do* have analogs in RISC processors:

Code:
RISC    6502        Notes
-------------------------
Rx      $xx
(Rx)    ($xx)       Certain opcodes only.
y(Rx)   ($xx),Y     Load offset in Y first
(Rx+y)  ($xx,X)     No equivalent in most RISCs; found in PA-RISC and
                    IA-64 though.  Implicit in Sparcs via register windows.
R(x+z)  $xx,Z       Z is either X or Y, depending on opcode.
                    NO equivalent in any RISC I'm aware of.


Note that most other addressing modes are essentially just variations of those four above. The 65816 introduces some more sophisticated addressing modes in an effort to minimize the amount of software that needs to be generated by a compiler (and thus, makes program execution for code written in those languages faster).

However, despite the similarity between RISCs, the 6502 is most definitely a CISC processor. Many instructions are not orthogonal -- for example, JMP indirect has no JSR-equivalent. BIT doesn't have the same address modes as AND, etc.

As indicated elsewhere, the instruction set fo the 6502 is hand-optimized for the types of software being developed for the time it was made. Therefore, the 6502 is pretty primitive, since the code written in the 70s was pretty primitive. The 65816 obviously supports more sophisticated code, as its instruction set was reviewed with more modern applications in mind. The Terbium ought to be more "complete" still.

Another distinguishing factor of RISC versus CISC is that many sub-fields of an opcode are directly fed into the ALU. For example, many RISCs have a single "instruction form" for all arithmetic or logical instructions, where the only difference between them is a "sub-opcode" field, which in reality is fed directly to the ALU, just as the register fields are hardwired to the pipeline fetch and writeback stages. The 6502 shows no signs of such regularity (and is even worse with the 65816!), and hence requires more extensive decoding logic.

That being said, the much underused ($xx,X) addressing mode can be used in conjunction with the $xx,X addressing mode to make a VERY effective data stack. Hence, the 6502 is, unlike most modern CPUs, equally adept at running C-generated code as it is with Java or Forth code. In fact, it quite likely will execute code for a dual-stack virtual architecture faster than it will execute code for a single-stack architecture (e.g., Algol-family of languages).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Oct 14, 2005 12:11 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
Well, the 6502 was created way before the concept of RISC processors was imagined, so it's basically pre-RISC.

It does embody some of the basic tenets of RISC processors, though:

o simpler instructions
o hardwired instruction decoder
o pipelined execution of instructions

It doesn't have these characteristics:

o interchangeable registers
o orthogonal addressing modes

Toshi


Top
 Profile  
Reply with quote  
 Post subject: Re: CISC or RISC
PostPosted: Fri Oct 14, 2005 12:14 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
LGB wrote:
...An average RISC design have got 16 or even more general purpose registers without functional difference. And only one major addressing mode is used:
address with a content of a register
...


Nope.

Most RISC processors have at least register indirect, and register plus displacement indirect. MIPS, PowerPC, Alpha, SH, etc have both addressing modes.

If you don't have a register + displacement indirect addressing mode, then it causes a huge problem for compilers. I spent about ten years dealing with this problem on one architecture with GCC.

Toshi


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Oct 14, 2005 11:42 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
"John Mashey on RISC/CISC" should be compulsory reading before getting involved in discussions like this.

http://userpages.umbc.edu/~vijay/mashey.on.risc.html

6502 has multiple instruction sizes, many addressing modes, indirect addressing modes, instructions that combine load/store with arithmetic, instructions that perform multiple data accesses to memory, arbitrary alignment of data, and too few registers. It's CISC, or at least not RISC. Just consider INC $xxxx, X.

You could claim that it's an 8 bit machine, so everything is naturally byte-aligned. But the indirect addressing modes use addresses in memory. Those addresses are 16 bit.

Quote:
As indicated elsewhere, the instruction set fo the 6502 is hand-optimized for the types of software being developed for the time it was made. Therefore, the 6502 is pretty primitive, since the code written in the 70s was pretty primitive.


Much of the software written in the 70s was not primitive. The 6502 ISA was designed for hand-written assembly code, fast memory (relative to the CPU clock), what they could fit on a chip, and what microprocessor programmers were used to.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Oct 14, 2005 2:44 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
John West wrote:
Much of the software written in the 70s was not primitive. The 6502 ISA was designed for hand-written assembly code, fast memory (relative to the CPU clock), what they could fit on a chip, and what microprocessor programmers were used to.


Compared to today's software, which tends to be *strongly* object oriented and even component oriented, it is primitive. The 6502 ISA makes indirect vectored jumps off a base pointer register absolutely *painful*. The 6502 ISA has little to no support for single-stack, frame-based languages. Etc.

The 65816 corrects the latter deficiency acceptably, but still does not address the former. There is still no truely fast (< 8 cycles) method of invoking an object method by way of a virtual method table. The fastest approach is to compile the following code which runs inside the class definition:

Code:
__do_method_entry_point:
  jmp (vtable,x)

vtable:
  dw method1
  dw method2
  dw method3
  ..etc..


where X is loaded with the method ID (in reality, an offset into the vtable) before-hand. Since an application really won't know where __do_method_entry_point is located first, it must first query the object for this code (since different classes of objects have, by definition, different vtables):

Code:
  ldy #0
  lda (obj),y
  sta foo+1
  iny
  lda (obj),y
  sta foo+2

  ldx #method_id
  jsr foo

foo:
  jmp $0000


So, assuming we can cache vtable pointers as an optimization, we still need to have the initial JSR (6 cycles), the absolute JMP (6 cycles), the JMP (,X) (another 7 cycles), and finally the RTS (another 6 cycles). All told, that 37 cycles for a method call -- very expensive, very cumbersome, very primitive.

In contrast, most other CISC processors make this process utterly trivial. And RISCs, while it takes more instructions than most CISCs, do it with great speed (typically only 3 to 4 cycles, max).

This would not have happened in the 6502 ISA had the designers recognized and developers wanted to code such software. Back then, object orientation was still in its infancy (hell, even modular programming was still in its infancy, having gotten over the hubbub of Pascal and structured programming), what with Smalltalk still behind the closed doors of Xerox Parc, and Simula never really having caught on except in Holland and Finland. Even then, Smalltalk was dynamically dispatched, not statically like the above, which incurred still more run-time overheads. You can use polymorphic inline caching to help speed up method dispatches by comparing method IDs to class IDs and direct-jumping to method implementations, I suppose (see code fragment below for an example), but it still results in substantial runtime losses, especially for the case where a class is infrequently accessed to start with, but is called with increasing frequency later on.

Code:
; This code is dynamically generated by the language run-time environment.
client_do_method_X:
  ldy #00
  lda (obj),y
  sta clsID
  iny
  lda (obj),y
  sta clsID+1

  cmp #CLSID_1_HIGH
  bne .try.class.2h
  lda clsID
  cmp #CLSID_1_LOW
  bne .try.class.2l
  jmp method_X_for_class_1

.try.class.2l:
  lda clsID+1
.try.class.2h:
  ...etc.. you get the idea.


Yeah, that's pretty bulky, and awfully primitive in my eyes.

Granted, you can state that any program you can write in an object oriented language, you can also write in a non-OO language. I do this all the time myself. But, it's also true that any program you can write with multithreading can also be written as a pure event-driven, single-threaded application too. But sometimes, there are advantages to choosing one over the other. Maybe the multithreaded application uses less memory, or is more conceptually correct (and thus easier to maintain from a coding and debugging point of view). Likewise, programming in an OO language, or even just using OO methodology, often results in more correct code. GUI programming, for example, is a *natural* for OO technology, as is simulation software. So I hope to nix that whole "who needs OO anyway?" argument in the bud right now before it becomes the non-issue that it is.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Oct 20, 2005 5:44 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
There was an interesting (especially so in retrospect) 1989 (or so) article in Call-Apple (I still have the issue) about the RISC design philosophy, and whether the 6502 was a RISC processor. Several (common) characteristics of RISC were identified, and the article ultimately concluded that the 6502 was not RISC. One characterstic was a large number of general purpose registers. Here's where it's something of a matter of interpretation. Of A, X, Y, P, PC, and S, only A is really general purpose, and even it can't be used for indexing. However, if you consider the zero page (and its addressing modes) to be equivalent to a set of registers, then you do have a large number of general purpose registers, but the article used the former interpretation.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 15, 2005 1:26 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
kc5tja wrote:
John West wrote:
...
The 6502 ISA has little to no support for single-stack, frame-based languages. Etc.

The 65816 corrects the latter deficiency acceptably, but still does not address the former.


Actually, the stack support on the 65816 is pretty weak.

If you use a large array as a local, then you're forced to do lots of nasty
pointer arithmetic to access the elements. I had to deal with this in lcc816.

I like the 65816, but it's really not suitable for most modern compiled languages such as C, C++, etc.

Toshi


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Dec 17, 2005 7:04 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
kc5tja, although I have not used the action I think you're after in your last post's code, it appears to be a synthesis of the non-existent indirect indexed instruction JSR(obj),Y.  Is this correct?  So could the whole thing be done on the '816 (with 16-bit accumulator) with

Code:
    LDY  #0         ; Start with method ID in X.  Zero Y
    LDA  (obj),Y    ; since there's no LDA() istruction.
    STA  $+3        ; Store the resulting address in the
    JSR  (0000,X)   ; operand of the JSR indexed indirect
                    ; instruction.


BTW, JMP abs is only 3 clocks, not 6.  But still, why wouldn't you do the STA's to the JSR instruction's operand area and skip the JMP instruction altogether?  Is it because you want that JMP in RAM whereas the rest might be in ROM?  And approximately how often are these method calls done in this kind of language, to figure out what percentage of the processor time is taken by your 34-clock overhead?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 18, 2005 6:17 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
kc5tja, although I have not used the action I think you're after in your last post's code, it appears to be a synthesis of the non-existent indirect indexed instruction JSR(obj),Y. Is this correct?


Maybe. The idea is that each object contains a pointer to its jump table. Methods on an object are referenced by offset into a table:

Code:
+--------+     +---------------------------+       +---------------+
|  vtbl  |---->| ptr to class, if required |   +-->| myMethod:     |
+--------+     +---------------------------+   |   |    lda #$DEAD |
|  ....  |     | ptr to Method A           |---+   |    ldx #$BEEF |
|  inst  |     +---------------------------+       |    jsr DoWork |
|  data  |     | ptr to Method B           |       |    ...etc...  |
|  ....  |     +---------------------------+       +---------------+
+--------+     |     ... et cetera ...     |
               +---------------------------+


The idea is that all objects of the same type all share a common set of jump table vectors. These vectors are defined in memory precisely once per unique type, obviously to save memory. However, you can have lots of different objects that expose the same basic interface but which are distinctly different types. For example, a read-only file is the same as a read-write file, except the method for "write" does nothing or returns an error. There is no need for a flag to determine if the file is read-only or read-write and to check it each and every time you think you might need to write to a file. Additionally, new types of objects can be defined at a later time -- what if you wanted a write-only file, for example? These are ideal for logging applications, especially over a network interface (remote logs can reduce load on a heavily accessed server, for example, despite the increase in network overhead).

Quote:
So could the whole thing be done on the '816 (with 16-bit accumulator) with

Code:
    LDY  #0         ; Start with method ID in X.  Zero Y
    LDA  (obj),Y    ; since there's no LDA() istruction.
    STA  $+3        ; Store the resulting address in the
    JSR  (0000,X)   ; operand of the JSR indexed indirect
                    ; instruction.



Doable, but the overhead of setting the obj variable in zero/direct-page must be accounted for as well. And, in all likelihood, you'll probably want to use the [dp],Y mode instead of (dp),Y, which will add still more cycles. If your code is sophisticated enough to warrent object oriented programming, odds are likely that you're going to be working with a lot of similar kinds of objects, and that means, you'll need likely more than 64K of RAM to hold them in, local optimizations not withstanding.

Quote:
BTW, JMP abs is only 3 clocks, not 6. But still, why wouldn't you do the STA's to the JSR instruction's operand area and skip the JMP instruction altogether?


Because now you're spending 11 bytes *per* method invokation instead of only 3, and saving very little in the process. The point is, even your above code is a helluva lot more cycles than most other CPUs of its era for the type of operation described.

Quote:
Is it because you want that JMP in RAM whereas the rest might be in ROM? And approximately how often are these method calls done in this kind of language, to figure out what percentage of the processor time is taken by your 34-clock overhead?


Since object oriented programming languages deal with objects all the time, you can bet that method invokations will occur virtually constantly. While you can optimize out much of the dynamism in the local scope (e.g., within the scope of a single application module that you're responsible for writing, for example), you certainly cannot optimize it out over the global scope (e.g., you can't even begin to predict how your customers will attempt to expand upon your code). Note that this is precisely the same problem most operating systems have when designing interfaces for things like device drivers (most device drivers expose the same interface, but only the implementation differs -- this is *the* quintessential demonstration of object orientation in a chunk of code otherwise considered to be procedural. And it is amusing to see that it happens in a time-critical interface too).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 28, 2005 7:22 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
A couple of comments:

1. There is a LDA (dp) instruction, opcode $B2, available on the 65C02 and 65816. LDA [dp] is also available (on the 65816); it's opcode $A7.

2. If I understand the original code correctly (and please feel free to correct me if I am mistaken), the JMP at foo will ultimately jump to _do_method_entry_point, and (obj) -- or [obj] -- contains the address of _do_method_entry_point. However, Garth is right that you can get rid of the JMP (vtable,X) by putting the address of vtable at (obj) and use a JSR (abs,X) rather than JSR foo. In fact, if as long as the LDX #method_id is not self-modified (i.e. method_id is known at compile/assemble time) you can invoke a method with just:

Code:
LDA [obj]         ; 7 cycles assuming m=0
TAX               ; 2 cycles
JSR (method_id,X) ; 6 cycles


That's only 6 bytes, and 15 cycles. If method_id isn't known at compile/assemble time, you could put its value in A rather than X and use

Code:
CLC       ; 2 cycles
ADC [obj] ; 7 cycles assuming m=0
TAX       ; 2 cycles
JSR (0,X) ; 6 cycles


which is only 7 bytes, and 17 cycles. Of course, different processors have different strengths, and I'm certainly not claiming this is a strength of 65C816. I realize this is not "truly fast" (by your 8-cycle standard above) on the 65816. However, my point is that while the 65816 might not be a good fit for this, the performance is not nearly as bad as you're suggesting. And no, the above does not address (to pick one example) the >64k issue either.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 28, 2005 5:37 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
dclxvi wrote:
2. If I understand the original code correctly (and please feel free to correct me if I am mistaken), the JMP at foo will ultimately jump to _do_method_entry_point, and (obj) -- or [obj] -- contains the address of _do_method_entry_point.


Correct, because the client code does not know the serving class ahead of time -- therefore, it must detect this (even if the method ID is statically known) at run-time.

Quote:
However, Garth is right that you can get rid of the JMP (vtable,X) by putting the address of vtable at (obj) and use a JSR (abs,X) rather than JSR foo.


This I consider to be a "local optimization," because you can make this substitution only if you know a priori what the class the object is ahead of time. And if you know that, you might as well just issue a direct JSR to the method's code if the method ID is also static.

Quote:
Code:
LDA [obj]         ; 7 cycles assuming m=0
TAX               ; 2 cycles
JSR (method_id,X) ; 6 cycles



As I recall, JSR(abs,X) only fetches its addresses from bank 0, no? Otherwise, it would at least require that the object's vtable pointer refer to an item within its current bank. Depending on the nature of the software, this can be a problem (it's probably quite suitable for use with device drivers though, as there typically aren't more than several 10s of devices/objects, so co-residing vtables with instances shouldn't be an issue there. Chalk this one up for the desired 6502/65816 CP/M-like OS!).

Quote:
I realize this is not "truly fast" (by your 8-cycle standard above) on the 65816.


No, but I consider it well within the realm of acceptability at this point. However, I suspect there are strong environmental requirements that must be fulfilled in order to achieve those cycle counts, which may not always be possible to impose. But this approach is the most viable I've seen so far, speaking strictly from a performance perspective.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 44 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: