6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 26, 2024 2:33 am

All times are UTC




Post new topic Reply to topic  [ 94 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Sun Dec 01, 2013 6:07 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
datajerk wrote:
GARTHWILSON wrote:
The 32-bit stuff is why we were talking about the 65Org32, and why Mike is working on his 65m32.

I've been away for a while, is there a 65Org32/65m32?

The topic "Improving the 6502, some ideas" quickly turned into the discussion of ideas for a version of the 65816 (but with non-multiplexed address) where buses and all registers (except status) are 32-bit. The topic is 13 pages long (so far), and after three years of the topic being dormant, on page 10, barrym95838 (Michael Barry) picked it up in August with discussion on his 65m32 but then has been kind of quiet about here after three more pages (and has been discussing it with a few others by PM) because there's enough different from the 6502 that some might complain. The one of the greatest differences might be that operands that are less than 18 bits IIRC are integrated into the 32-bit instruction words. A special case allows operands of up to 32 bits to be in their own word following the instruction word. His assembly-language notation is quite different too but the processor is close enough to the 6502 that the 6502-style instructions can mostly be aliased to his notation.

Quote:
GARTHWILSON wrote:
There were a couple of paragraphs in an article by Jack Crenshaw in the 9/98 issue of Embedded Systems Programming where he talks about different BASICs he used on computers in the 1970's and 1980's, and said the 6800 and 6502 always seemed to run them faster than any other processor.
...
The 80's even at that time though [z80] were generally run at 4MHz or a little higher IIRC, and they were still losing to the 1MHz 6502's and 6800's.

I wonder how much that has to do with the BASIC implementation.

It was on multiple different BASICs on each, so it's not just that one Z80 programmer was careless.

Quote:
I've written benchmarks where a 2MHz z80 outperforms a 1MHz 6502 (but not by much 107 sec vs 114 sec). In both cases the benchmarks are written in assembly and every possible trick is being used to increase performance (e.g. unrolled block copy, using stack pointer for processing arrays, etc...). I do not expect the average BASIC to do the same. But I do expect an assembly programmer to max out performance.

As someone mentioned above, the way to do it is not to translate it, but approach the problem on the next processor as a clean start so it's not trying to do things the way the other one has to. I know when I just try to translate something, I tend to get pretty inefficient code. The topic at viewtopic.php?f=2&t=18 (less than two pages) is relevant, starting about a third of the way down the first page, and ending on standardized benchmarks, the last post from Toshi coming ten years after the 2nd-to-last. :)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 6:35 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
datajerk wrote:
I read a blog post where the blogger said that a 1MHz 6502 was faster than a 4.77 MHz 8088. I wanted to believe him until I read his method. He used the specs and measured the time it would take to fetch, decode, and execute a shift instruction. And if that is all the program was going to do (a *single* shift), then he was right. However if you did a series of shifts the 4.77 MHz 8088 was 2.6x faster than the 1MHz 6502 (when using words not bytes, bytes vs. bytes 8088 was 2x faster).

The point is that one should actually run real code to compare.

The published times for ten iterations of the Sieve benchmark are that a 5MHz 8088 takes 4.0 seconds while a 4MHz (not 1MHz) 6502 (NMOS) takes 3.1 seconds, making it about 60% more cycle-efficient than the 8088. Bill Mensch had 10MHz 6502's in the 1970's, although I think the off-the-shelf ones were limited to 2MHz at that time. (The IBM PC was not out yet.)

Quote:
BTW, are you looking at the 68008 or 68000 timings? The time to write a byte or word is the same for the 68000.

I don't know which one of us you're asking, but I was specifically looking at the 68008 timings since that's what was brought up.

Again of course the way the program is approached is super important-- making a lot more difference than which processor is used, if the processors are anywhere near the same class. The original 4.77MHz IBM PC took about 9 minutes to do a 1024-point complex FFT in GWBASIC. My 1980's handheld battery-powered HP-71 computer with the math module and a 625kHz clock and 4-bit data bus did it with greater precision in half the time (about 4:30), in its far superior BASIC. My 5MHz 6502 workbench computer does an FFT twice that big (2048 points, complex) in 16-bit scaled-integer in Forth in 5 seconds. If I were to implement the large look up tables, it might be down to a few percent of a second.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 7:56 am 
Offline

Joined: Mon Nov 11, 2002 6:53 pm
Posts: 79
Location: Seattle
I was looking at the 68008 timing in table 7-3, incidentally also on page 7-3 of the motorola / freescale 68k manual, here:

http://www.freescale.com/files/32bit/do ... 8000UM.pdf

A 16-bit move to an address in 68k ( i.e. move.w #$1234, 0(a0) ) takes 32 clocks / 34 clocks if you take an index register into account. The instruction's listed as 32(6/2) which means it does 6 read cycles (*4 clocks) and 2 writes (also *4 clocks) = 24 + 8 = 32 total.

Now before this turns into some sort of flame war, if it hasn't already, my main point was that while the '08 *seemed* like a good fit to me, a 6502 running at the same clock speed seems to performs much better when it comes to shoveling data around. I don't plan on using high-precision multiplies and divisions and while the increased address space is nice, I can achieve the same with bank-switching.

Yvo


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 8:20 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I don't think comparing cycle times is all that useful. Instead, I would look at overall system speed given a similar design effort and board technology. If that means that CPU runs at a higher internal clock that's not a big deal. What matters is how much power it uses, and what kind of peripherals are required.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 8:51 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1926
Location: Sacramento, CA, USA
Arlet wrote:
... I would look at overall system speed given a similar design effort and board technology. If that means that CPU runs at a higher internal clock that's not a big deal...

That makes a lot of sense to me. At any given machine cycle frequency, a 6xxx would require significantly faster (more expensive) support hardware than an 8xxx or a micro-coded 68xxx, due to the 6xxx's higher external communication bandwidth requirements. I still have a clear personal favorite, though :wink:

Mike


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 10:02 am 
Offline
User avatar

Joined: Sun Nov 28, 2004 3:07 pm
Posts: 28
Location: Budapest, Hungary
Quote:
The discrete form of the W65C02S readily operates at 20 MHz, as does the W65C816S. ASIC forms of the 65C02 have run at speeds up to 200 MHz, far beyond what the fastest versions of the 68K family could do.


Freescale Coldfire said to operate at even 300MHz (afaik the fastest "real" 68k is 68060 clocked at 75MHz, but don't forget that 68060 is about the same beast - or faster - as intel Pentium in performance - only the FPU was really bad in the comparsion - on the same clock, '060 is really fast compared to a 65C816, I guess) It's another question that Coldfire's 68K compatibility is not so clear for me, the first versions was told to be "similar but not compatible" only (AFAIK, later version(s?) got full software compatibility, while the first versions was only similar?). But if we compare things here, the "ASIC" versions of 65C02 can be as "real" as the Freescale ColdFire :) What I miss a lot is a version of 65C816 with _optional_ 16 bit real data bus and better the planned 32 bit version. Ok, Coldfire, 68060 are not the 68k 68000 base model, that's true :)


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 5:02 pm 
Offline

Joined: Mon Nov 11, 2002 6:53 pm
Posts: 79
Location: Seattle
Arlet wrote:
I don't think comparing cycle times is all that useful. Instead, I would look at overall system speed given a similar design effort and board technology. If that means that CPU runs at a higher internal clock that's not a big deal. What matters is how much power it uses, and what kind of peripherals are required.


In my case, the actual bus cycle time does matter as I can't clock it higher than 6.25mhz due to my target bus speed of 80ns. In this case, the 16-bit nature of the 68k is actually a hindrance rather than a boon as it needs more bus cycles for regular instructions. So at that speed, I get more (memory) performance out of the 6502 than an' 08.

But yes, overall the whole package counts. My z80 board also runs at 6.25mz but pretty much all of the hard work is taken care of using dma, which is something I'll be using in this system too.

Yvo


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 8:33 pm 
Offline

Joined: Mon Sep 10, 2012 6:37 pm
Posts: 18
GARTHWILSON wrote:
The published times for ten iterations of the Sieve benchmark are that a 5MHz 8088 takes 4.0 seconds while a 4MHz (not 1MHz) 6502 (NMOS) takes 3.1 seconds, making it about 60% more cycle-efficient than the 8088. Bill Mensch had 10MHz 6502's in the 1970's, although I think the off-the-shelf ones were limited to 2MHz at that time. (The IBM PC was not out yet.)

Was there an affordable system with a 4MHz 6502? I've limited my bechmarking to 8-bit bus processors widely available in hobby and personal computers from 1974-1986 as they were implemented. The 8088, 68008, and 65816 processors available in the more advanced and complex systems of the '80s all had cost/performance ratios (and perhaps politics) to contend with that limited their performance (e.g. memory and video refresh). Even the 2MHz 6502 in the Apple /// after memory refresh (effective rate of 1.8 MHz) and video refresh dropped to 1.4 MHz. I feel that those are some of the reasons we never saw a 4MHz or 10MHz 6502 in popular systems. I am not confident that a 4 MHz 6502 in 1980 would have run at 4 MHz and still be a compelling platform (feature-wise) since it would also have to contend with memory and video refresh.

I take issue with the Byte published Sieve benchmarks. They are a comparison of the processor, system, and in most cases the compiler. I'll be writing my own Sieve for the 6502 and 8088 to compare in assembly of course and run on real systems. I'll make it a xmas break project. Mhz to Mhz I have no idea what the outcome will be.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 8:57 pm 
Offline

Joined: Mon Sep 10, 2012 6:37 pm
Posts: 18
yzoer wrote:
Now before this turns into some sort of flame war, if it hasn't already,

Flame war? This has been an academic discussion amongst well respected gentlemen (and ladies?). Everybody will agree here that the 6502 was one of the greatest achievements of the last century, and given the popularity of the Apple II, the C64, and the BBC Micro and the 100 or so Apple II clones (http://en.wikipedia.org/wiki/List_of_Apple_II_clones), the 6502 was many, if not most's, first chip. It's all for fun. Too bad we cannot all meet at the same pub.

yzoer wrote:
6502 running at the same clock speed seems to performs much better when it comes to shoveling data around.

Careful with "seems to perform". E.g. with a single 68K instruction you can consume 40 sequential bytes (not bits) of data into 10 of your 32-bit registers. That's one fetch (68008, 4 memory reads, 68000, 2 memory reads) and decode, then 10 (68008) or 5 (68000) 4-cycle reads. You can write it out the same way. And if you have to process all that data before writing out, well then it's just fetching 2 byte instructions.

Without knowing your exact workload this is all academic. If you are doing a bunch of byte only random reads and writes, well then the agility of the 6502 Mhz to Mhz will win based on your current analysis.


Last edited by datajerk on Sun Dec 01, 2013 10:54 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 8:59 pm 
Offline

Joined: Mon Sep 10, 2012 6:37 pm
Posts: 18
Arlet wrote:
I don't think comparing cycle times is all that useful. Instead, I would look at overall system speed given a similar design effort and board technology. If that means that CPU runs at a higher internal clock that's not a big deal. What matters is how much power it uses, and what kind of peripherals are required.

100% agree. That is why I prefer to compare performance of systems and not processors.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 01, 2013 11:29 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
It's also why I proposed a higher input clock speed to get the same bus speed. Two or four input cycles for each bus cycle should give the necessary processing to eliminate dead bus cycles without deep pipelining.

Quote:
Careful with "seems to perform". E.g. with a single 68K instruction you can consume 40 sequential bytes (not bits) of data into 10 of your 32-bit registers.

The 68K can do a lot in one instruction, but doesn't necessarily mean good performance, as that instruction costs dozens of cycles. The 65816 can move dozens of KB in one instruction too.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 02, 2013 8:05 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Performance for a given speed of RAM seems to me the best measure. That's why it's not sensible to compare 6502 with Z80 on a clock-for-clock basic.

The BBC Micro ran at 2MHz for all RAM and ROM accesses, slowing to 1MHz for peripherals. The second processor, for those few who bought one, ran at 3MHz. I'm not aware of a 6502 micro which ran any faster.

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 02, 2013 11:53 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 277
BigEd wrote:
Performance for a given speed of RAM seems to me the best measure. That's why it's not sensible to compare 6502 with Z80 on a clock-for-clock basic.

The BBC Micro ran at 2MHz for all RAM and ROM accesses, slowing to 1MHz for peripherals. The second processor, for those few who bought one, ran at 3MHz. I'm not aware of a 6502 micro which ran any faster.

Cheers
Ed


The BBC second processor was (according to WikiPedia) a 65C02. The Apple IIc+ also had a 65C02, which would run at either 1MHz or 4MHz (depending on certain keypresses at "reset" time, I think). Some people have found that replacing one of the oscillators on the IIc+ would give it a top speed of at least 8 Mhz.


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 02, 2013 2:33 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
BigEd wrote:
Performance for a given speed of RAM seems to me the best measure.
Yes indeed! If a CPU demands comparatively fast RAM, that requirement will have major implications regarding the total cost of the system. And RAM speed requirements cannot be inferred from clock frequency. Example: a 6502 memory cycle takes one clock cycle, whereas a Z80 memory cycle takes 3 clocks IIRC (and 4 clocks for an opcode fetch. Edit: actually 2 clocks plus another 2 for the refresh -- see following post).

IOW, comparing a 6502 system and a Z80 system that both use a 4 Mhz clock is ridiculous. But comparing a 6502 system and a Z80 system that both use, say, 100 ns RAM is reasonable in light of cost -- typically the overarching concern.

Two footnotes:
  • besides RAM, the speed requirements for ROM and peripherals also matter. Nevertheless, I can accept "speed of RAM" as loosely meaning "bus transaction time." The point is to emphasize system cost.

  • It's a little-known fact that the Z80 cpu only has a 4-bit ALU ! IOW, 8-bit operations require two passes through the ALU. That seemingly backward design decision is counterbalanced by the choice of a fairly "busy" (high frequency) clock. So we see that clock frequency is a poor indicator of performance as well as of system cost.

What Ed implies (and I agree) is that one should begin a hypothetical evaluation by proposing a given speed of memory and peripherals. Then you evaluate how well each prospective CPU can perform in that environment.

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Mon Dec 02, 2013 3:32 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 02, 2013 3:05 pm 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
Dr Jefyll wrote:
Example: a 6502 memory cycle takes one clock cycle, whereas a Z80 memory cycle takes 3 clocks IIRC (and 4 clocks for an opcode fetch).


I've been hearing this for ages, and believing it for about as long, but something bugged me about it this morning, so I grabbed my Z80 databook and checked the timing diagrams. A Z80 opcode fetch does two memory reads in two clocks each, with the second read being ignored as it's intended as a DRAM refresh operation. The difference doesn't help as far as making the CPU any faster, but does provide a tighter bound on the memory speed unless you use wait-states.

Quote:
  • It's a little-known fact that the Z80 cpu only has a 4-bit ALU ! IOW, 8-bit operations require two passes through the ALU. That seemingly backward design decision is counterbalanced by the choice of a fairly "busy" (high frequency) clock. So we see that clock frequency is a poor indicator of performance as well as of system cost.


I hadn't known this, but it explains the "half-carry" flag quite well.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 94 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: