6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 10:17 pm

All times are UTC




Post new topic Reply to topic  [ 77 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Fri Aug 12, 2011 1:00 pm 
Offline

Joined: Fri Aug 12, 2011 12:49 pm
Posts: 7
Hi all,

I'm new on this forum. I found this forum really interesting bacause it probably collects the most useful and complete information for the 6502.... and probably the most expert users.

I need to select a processor for low domanding generic applications (control, DSP like on 8bit, etc..). I would like to have your opinion about the best choise between a 6502 compatible core and an ARM7TDMI compatible core (or similar). For sure the ARM7 has higher performance but my interest is to have the best compact code (low cost embedded application) and the lower gate count. Do you have any generic advices or guide line to compare these two cores? For example, it could be useful for me to know the equivalent DMIPS/Mhz for the 6502 because I didn'd find it on internet. Another useful information could be to know if 6502 perform better than ARM7 in term of code density or not. Having this data I could try to figure out tradeoff by myself.

I found some 6502 and some ARM7 compatible cores on internet and I would need some advices about the selecion criteria. Regarding the 6502 for example I found cores like these: bc6502 and T65 that they are claimed to be working. I found also a recent 6502 compatible core at www.ex6502.altervista.org called 6502EX. It seems a 6502 extended to 32 bit, so probably something in between the original 6502 and the ARM7, but I'm not sure if it is available.

Let me thanks in advance all of you that will like to give me some feedback to help me in this selection.

Best Regards


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 6:58 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Welcome!

Samuel found a table showing the 6502 (actually the 6510, but that's the same thing) as having 32 or 36 Dhrystone score at 1 MHz (and therefore about 0.02 DMIPS/MHz)

Thanks for finding the 6502EX - I was unaware of that(*). It's a 32-bit extension, with an 8-bit mode, and lots of registers and operations on 4 byte lanes and some remnant of 64k banks in 32bit mode. (But it seems to be closed source and presumably intended to be licensed for money, so not of direct interest to me.)

What ARM-compatible cores have you found? ARM are rather protective of their patents.

Cheers
Ed

Edit to add: for code density, see the paper "Code Density Concerns for New Architectures" by Vincent M. Weaver and Sally A. McKee

(*) But it does seem to have appeared only within the last month or so and is new to Google too. Are you the first person to find it?

Edit: added conversion between Dhrystone score and DMIPS


Last edited by BigEd on Sun Aug 14, 2011 2:52 pm, edited 3 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 7:24 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Welcome.

Quote:
For example, it could be useful for me to know the equivalent DMIPS/Mhz for the 6502 because I didn'd find it on internet.

The average will be about four clocks per instruction, IOW 1MIPS @ 4MHz, 4MIPS @ 16MHz, etc.. What beginners often don't realize is that certain instructions are merged, so they do redundant things like an addition and then compare the result to 0, which is already done. ADC# for example does its five steps in only two clocks, and that includes the implicit CMP#0.

I have brought several products to market with PIC16 microcontrollers and can tell you the PIC takes about twice as many instructions and twice as many clocks to do a job as the 6502 does-- if the PIC can do it at all.

Where the 6502 shines most is interrupt performance. It is not unrealistic for a 20MHz 6502 to service a million interrupts per second, although that means a very simple ISR that only increments a byte in memory for example. (See my interrupts article. I also have one on servicing interrupts in high-level Forth, with zero overhead.

If you really want code density, the best way I know of to get it is to use Forth. That assumes however that the application is large enough to justify having the kernel. Obviously it's not compact to have thousands of bytes of kernel for something that could be done in 70 bytes of assembly if the application is really that small.

Someone pointed to an article last year telling about the ARM being inspired by the 6502, but I didn't bookmark it and I can't remember the right search terms to find it again. Hopefully someone else will tell us. We all wish Samuel Falvo would come back. I'm sure he knows plenty about the comparison.

The fastest 6502's are running at over 200MHz in custom ICs. That would require extremely fast memory and I/O, onboard the same IC. Off-the-shelf ones are rated at 16MHz but will usually run quite a bit faster if the rest of your hardware is up to it. I would definitely recommend the 65816 though. When I was writing my '816 Forth kernel, I found that it was actually easier to program when constantly working with 16-bit cells, and my '816 Forth ran 2-3 times as fast as my '02 Forth at a given clock speed. The difference in price is negligible. The '816 is much better suited for multitasking, relocatable code, etc..

Electric_Eye here is working on a 16/32-bit version of the 6502 in an FPGA. My proposal for the 65Org32, an all-32-bit 6502, is described in this lengthy (9-page) topic. Note that there are no 8- or 16-bit entities (they're not needed), and that with all registers being 32-bit, it's like everything is in zero page, although the equivalent of the 65816's DP (direct-page) register is also 32-bit and allows offsets to be anything at all, with no page or bank boundaries. The same goes for the equivalent of the 65816's data and program banks-- there are the registers but they're 32-bit also so there are no bank boundaries.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 8:43 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet has done some comparing and mentioned here that he could get his 6502 8-bit core running up to 111MHz.

I'm just abit busy right now... How many MIPS would that be?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 9:38 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Quote:
I'm just a bit busy right now... How many MIPS would that be?

Around 28, depending. For example, a lot of ZP addressing without indirects will bring it up a bit, and a lot of indirects and indexing will bring it down a bit.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 9:51 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
GARTHWILSON wrote:
Someone pointed to an article last year telling about the ARM being inspired by the 6502, but I didn't bookmark it and I can't remember the right search terms to find it again.

I remembered having read that as well, and I remembered what I was searching for at the time - I was reading obituaries about Personal Computer World (I was a subscriber from the beginning back when I was 17. I learned English by reading that magazine. But I digress.) So I just went through the same chain of searches and ended up here: http://www.theregister.co.uk/2009/06/11/pcw/page2.html

Quote:
Inmos? Ask anybody in the street today: “Never heard of it.”

Let’s not list all the things that went wrong. There is one star survival, heritage, legacy, what you will, from the UK industry in the 80s. Sophie Wilson, the best 6502 programmer ever, became disappointed with what she could do with the BBC Micro, and went off on her own to design a RISC processor that would do all the good things she liked about the 6502, and all the other things which she wished the 6502 could do.

The chip was the Acorn Risc Machine – the ARM, which started out merely as “the chip inside the Acorn Archimedes”.


Edit: A less poetic reference is maybe: http://www.engineersgarage.com/articles ... processors
which merely refers to ".. with latencies as low as that of the 6502".
There's another one here, an interview with Steve Furber (the other designer of te ARM processor), where he refers to the 6502: http://queue.acm.org/detail.cfm?id=1716385
Quote:
What we found was that the 16-bit micros in the early 1980s had worse interrupt response time than the 6502.


-Tor


Last edited by Tor on Fri Aug 12, 2011 10:27 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 9:53 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
GARTHWILSON wrote:
Quote:
I'm just a bit busy right now... How many MIPS would that be?

Around 28, depending. For example, a lot of ZP addressing without indirects will bring it up a bit, and a lot of indirects and indexing will bring it down a bit.

I reckon that would equate to about 2 DMIPS according to Sam's figures (Dhrystone being a synthetic benchmark aiming to measure some approximation of VAX-equivalent MIPS, as opposed to the machine native operations per second, which I think is your 28)

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 12, 2011 10:25 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
On the business of 6502 inspiring the ARM, some of the story is elaborated on wikipedia, but Sophie maintains that "inspired by" isn't the right choice of words. Other people, myself included, also Sam, do use that word.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 13, 2011 10:00 am 
Offline

Joined: Fri Aug 12, 2011 12:49 pm
Posts: 7
Many thanks for your advise!

1. I'll carefully read Samuel table to understand how 0.02DMIPS/Mhz come out. It looks really low if compared to arm7 (about 0.7DMIPS/Mhz I think).

2. About ARM clone, I found two projects on www.opencores.org: Amber risc core and nnARM. They look working. Not sure yet if they implement thumb mode (I'm very interested to very compact code).

3. No idea if I'm the first to visit 6502EX. For sure this site is new, it has been update recently. I guess that the code will be get under payment but it's not clear. In this case I'm not interested too.

I fount also 65GZ032. Any comments on this? DMIPS? Code density? Is the code available for free?

Thanks again for your feedback.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 13, 2011 10:03 am 
Offline

Joined: Fri Aug 12, 2011 12:49 pm
Posts: 7
Hi,

Thanks a lot for your information. Really a lot and valuable.

I understood that 6502 will be better than ARM in term of interrupt latency. This is a good point!

You seem to suggest me to go for a 65816. Is there any RTL code for free? I'm interested to work on fpga.

For sure I'll have a look to the 9 pages describing electric_eye implementation. Really interesting to know!

Thanks again for your help!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 13, 2011 11:00 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
sixtyfive02 wrote:
I understood that 6502 will be better than ARM in term of interrupt latency. This is a good point!


This was only true in some cases on an ARM7, and is certainly no longer true on a Cortex. The Cortex core has an interrupt latency of only 12 cycles, which includes saving your registers, and dispatching to the correct handler.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 13, 2011 11:16 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
sixtyfive02 wrote:
I'll carefully read Samuel table to understand how 0.02DMIPS/Mhz come out. It looks really low if compared to arm7 (about 0.7DMIPS/Mhz I think).

Here are some ideas:
    lots of registers - avoids spilling to/from zero page a lot
    memory bus width - 4x difference
    cache - allows for harvard-like architecture internally, overlap of data and instruction activity
    predication - avoids a branch penalty in many cases
    more powerful instructions and addressing modes - get more work done for each instruction


Thanks for the pointer to Amber RISC Core - I've a feeling nnARM is defunct, due to legal pressure perhaps.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 13, 2011 1:06 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Tor wrote:
GARTHWILSON wrote:
Someone pointed to an article last year telling about the ARM being inspired by the 6502, but I didn't bookmark it and I can't remember the right search terms to find it again.

Edit: A less poetic reference is maybe: http://www.engineersgarage.com/articles ... processors
which merely refers to ".. with latencies as low as that of the 6502".
There's another one here, an interview with Steve Furber (the other designer of te ARM processor), where he refers to the 6502: http://queue.acm.org/detail.cfm?id=1716385
Quote:
What we found was that the 16-bit micros in the early 1980s had worse interrupt response time than the 6502.


-Tor

Hi Tor
thanks for your research: I've taken these comments and pasted into a new thread - hope that's OK.
Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 13, 2011 7:49 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
You haven't said what you want to do with it. The 6502 was not intended for multitasking [Edit, 5/15/14: I posted an article on simple methods of doing multitasking without a multitasking OS, at http://wilsonminesco.com/multitask/index.html] and large applications compiled from high-level languages (which is what Dhrystone is all about); but OTOH, it was simple enough that writing hand-optimized assembly was far more practical on the 6502 than it is on most modern processors. IOW, Dhrystone MIPS may or may not be relevant, depending on what you want to do with it.

Forth on the 6502 normally has 16-bit cells, and LOOP (or actually its internal, loop) has a fair amount of work to do to increment a 16-bit loop counter on the stack and compare it to a 16-bit limit which is also on the stack. But I wrote one for 32-bit, and it just got ridiculously long. If you had to do everything in 32-bit, it would be a sloth; but if you never need more than 8 bits for the loop counter and limit, DEX (possibly followed by CPX) and then BNE does the job, taking only 5 or 7 clocks per loop. Many of the 6502's contemporaries couldn't even execute a single instruction in so few clocks. If we made an all-32-bit 6502 (65Org32), the 32-bit one could also be done in 5 or 7 clocks.

But look how much more efficiently the 65816 does the 16-bit fetch (@ in Forth, which takes a 16-bit address at the top of the data stack and replaces it with the 16-bit contents of that address) compared to the 6502.

First for the 6502:
Code:
       LDA  (0,X)
       PHA
       INC  0,X
       BNE  fet1
       INC  1,X
fet1:  LDA  (0,X)
       JMP  PUT

PUT:   STA  1,X    ; and elsewhere, PUT which is used in so many places is:
       PLA
       STA  0,X

For the '816, the whole thing is only:
Code:
       LDA  (0,X)
       STA  0,X         ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.

So it takes 10 instructions on the 6502 to do what the 65816 can do in two.

Quote:
Quote:
I understood that 6502 will be better than ARM in term of interrupt latency. This is a good point!

This was only true in some cases on an ARM7, and is certainly no longer true on a Cortex. The Cortex core has an interrupt latency of only 12 cycles, which includes saving your registers, and dispatching to the correct handler.

Is that 12 actual clocks (pulses)? Most interrupt handling should not require saving all the registers, and with a hardware-complexity penalty (or actually not if you do it in an FPGA), you can have the interrupt put the right address in the vector location so the correct handler is jumped to directly. IOW, the 6502 may still be able to do it in 7 or 11 clocks, never more than 19 (that is, pushing A, X, and Y; P is already pushed). OTOH, the 6502's longer instructions will add an average of a couple more clocks because the currently-executing instruction has to finish before the interrupt sequence can start. (I call them clocks, not cycles, because on some other processors they use the term "cycle" when they mean a set of four or more clock pulses, which is deceiving, like PIC's four clocks per cycle.)

As for registers, basically all 256 bytes of ZP is processor registers on the 6502, and the '816 lets each task have its own ZP.

I'm not arguing of course that the 6502 is a performance match for the ARM, but rather that for small applications, much of the perceived advantage of a higher-end processor might evaporate.


Top
 Profile  
Reply with quote  
 Post subject: 16 Bit Forth
PostPosted: Sat Aug 13, 2011 11:39 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
GARTHWILSON wrote:
For the '816, the whole thing is only:
Code:
       LDA  (0,X)
       STA  0,X         ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.

Don't forget the above sequence would also include REP #%00100000 to enable 16 bit loads and stores and SEP #%00100000 to revert to 8 bit mode. So it would take a little longer to execute (4 cycles longer, to be exact). However, it's still much faster than anything that could be coded on the 65(c)02.

The 65C816 has a number of useful stack instructions that are almost tailor-made for languages such as Forth. For example, you can push an address to the stack and then read the contents of that address without touching any zero page memory. First, the 65(c)2 way:
Code:
          LDX #<addr
          LDY #>addr
          STX zpptr
          STY zpptr+1
          LDY #0
          LDA (zpptr),y

Now, the 65C816 way:
Code:
          REP #%00100000        ;select 16 bit .A
          LDA #addr             ;load full address in one operation
          PHA                   ;put it on the stack
          SEP #%00100000        ;select 8 bit .A
          LDY #0                ;index
          LDA (1,s),y           ;grab byte like LDA (zpaddr),y

One of the handy things about these sort of stack acrobatics is the relative ease at which fully relocatable code can be developed. However, the 65C816 offers another approach and that is the ability to relocate zero page. That feature makes it practical to give each subroutine its own zero page (and stack, which can also be relocated). When the subroutine has finished, put ZP and the stack back where they used to be and go on your way.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 77 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: