6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 12:15 pm

All times are UTC




Post new topic Reply to topic  [ 130 posts ]  Go to page 1, 2, 3, 4, 5 ... 9  Next
Author Message
PostPosted: Sat Aug 13, 2016 11:09 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
6502 has the fastest and most convenient way to handle BCD. It uses the decimal mode which about twice as fast as its BCD correction analogues for 8080, 6800, z80, 6809, 80x86. 680x0 uses 5 special instructions for this type of data - such a waste of the code space.
IMHO the support of BCD data is redundant. The only advantage for BCD for 6502 is fast decimal data input from text and output to text. However the division and the multiplication for BCD are twice slower than for binary data. It is because
Code:
ASL/ROL mem

must be replaced by the sequence
Code:
LDA mem
ADC mem
STA mem

BCD allow fast multiplication only by 10 and fast division only by 10 and 5. It is common to use the division and the multiplication much more often than the input/output operations which usually work with the slow peripheral devices. So it is generally much faster to use binaries as the main type and to convert it to decimal only for input and output.

It is possible to dream what could be made instead of these redundant operations?... I'd like to see the second accumulator like at 6800 or/and a block memory copy instruction like at 65816. :)

It is curious why BCD are supported even at 80x86 or 680x0 which have hardware multiplication and division? The modern CPU architectures like x86-64 or ARM have no hardware BCD support at all. It is worth to note that DEC CPU architectures also had no such support even at 70s or 80s. However DEC architectures had a lot of other redundancies.

The other redundant feature of 6502 IMHO is the overflow flag. The discussion at http://www.stardot.org.uk/forums/viewtopic.php?f=4&t=11543 shows that almost all programs do not use it. This flag is usable only to catch the errors which are not even considered as the errors in the modern programming languages by default. IMHO it would be better to have odd/even flag (bit 0 of result) instead of V-flag.

The question: are there other 6502 redundancies?

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 13, 2016 8:12 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Note: 6502.org forum veterans won't find much new material here.

We've had quite a few discussions about desired extra instructions on the '02; but I think the designers did a very good job of putting the best things into the silicon real estate budget at the time, for a processor that was intended to be an embedded controller and not particularly the home-computer and desktop master it quickly became in the 1970's and 80's.  Adding more instructions would not only take more silicon real estate, but would likely reduce the maximum operating frequency because of the more-complex instruction decoding.  (RISC tried to take advantage of this phenomenon.  What I originally read about RISC in the 1980's was that with the simpler instruction set, although it would take more instructions to get the job done, the clock rate could be raised more than enough to make up.  I quickly lost interest though when it became clear that the "higher performance at lower price" goal was obviously getting replaced with "maximum performance at any cost."  Since then of course the lines have been quite blurred.  ARM is not a RISC.  Bill Mensch says the '02 is neither RISC nor CISC, and calls it "addressable-register architecture," or "ARA.")

I used the 6502's decimal mode a lot in a major work project in the late 1980's, and wrote my own floating-point decimal routines.  I was super green, but there was no one else in this tiny company to show me, as I later learned, that there's seldom a need to do anything in decimal mode, that it almost always works better to let the processor work in hex and only convert when it's time for human I/O, in this case for pilots who mentally work only in decimal.  I also learned that floating-point is seldom necessary even for the things that most people think require floating-point, and that scaled-integer arithmetic (notice I didn't say "fixed-point," which is a limited subset of scaled-integer) is much more efficient for machines that don't have a hardware floating-point unit.

Nevertheless, the 6502's decimal-mode operations don't consume any space on the op-code table except SED and CLD.  The space left over on the table was needed later for the 65816 which in some ways was a hotter performer than the 68000.  I got rid of all my 68000 info a few years ago, but I seem to remember a DIV instruction took over 170 cycles, and that interrupts had to wait.  Again, the '02 was intended to be an embedded controller.  I've brought many products to market using PIC16 microcontrollers and a 65c02 that had no use at all for multiply or divide.  (The last project I worked on needed a divide but no multiply.)

The '02 was a pretty hot performer anyway.  You mention the 6800's two accumulators.  The 6502's designers had been in on the 6800 design team, and decided that two index registers, even 8-bit, would be more valuable than two accumulators.  FigForth for the '02 ran 25% faster than the same Forth for the 6800 at a given clock speed, and later the '02 reached much, much higher clock speeds anyway.  Both the 6800 and the '02 at 1MHz routinely outperformed a 4MHz Z80 in benchmarks in BASIC.  My 65816 Forth is two to three times as fast as my '02 Forth again at a given clock speed.  The '816 does have two 8-bit accumulators, but I used that feature in only one primitive (out of hundreds), and otherwise kept them combined for a single 16-bit accumulator.  (BTW, I never use decimal mode on either of them, even for base conversions.)  Sophie Wilson, chief architect of the ARM processor, said, "an 8MHz 32016 was completely trounced in performance terms by a 4MHz 6502."  (The 32016 was National's 32-bit processor, having 15 registers, including 8 general-purpose 32-bit registers.)

I've never used the oVerflow flag as an overflow indication myself either.  It did get used in early products as a fast one-bit input; so it served another purpose.  I have often wished for the odd/even flag you mention--not that I care if something is odd or even, but it would be an easy way to test an input bit without ANDing it out.  The BIT instruction lets you test bit 6 directly with the V flag, and of course bit 7 directly with the N flag, but more would be nice.  The '02 had an unused slot in the status register that it could have gone into.  The '816 used it, and added more, requiring the XCE instruction to exchange the C and E bits to get into and out of 6502-emulation mode.

One instruction I would really like is STF, STore $FF, for setting flag variables.  An increment-by-two and decrement-by-two would be nice.  I can think of others, but they've mostly been taken care of by the 65816.  One exception is Forth's NEXT which is its inner "loop" (not really a loop); but the 65816's 16-bit efficiency allows making more of the words to be primitives anyway, dramatically reducing the number of times NEXT needs to run to get a job done.  Anyone who likes the '02 and wants more should just go to the '816.  For Jeff Laughton's ultra-fast, non-memory-mapped bit I/O on the '816, we were wishing interrupts hitting during WDM would not start the interrupt sequence until after the instruction that followed WDM, to keep those two instructions together.  It would sure make things easier.

If you want really hot math performance on the '02, or easier, on the '816, look into large look-up tables.  Memory is cheap enough to do it now.  The page linked explains how to use them and links to the pre-calculated tables that have every single answer pre-calculated, accurate to every last bit, so there's no interpolating to do.  Most tables have 65,536 16-bit cells.  In some cases, looking up an answer is hundreds, or even near a thousand times as fast as having to actually calculate it.  There's also info on how the tables were calculated, how the Intel Hex files work, commonly used rational numbers, interface circuits for the 6502 which does not otherwise have enough address space, etc..  Even for division which you cannot look up, you can still look up a reciprocal (and I made the reciprocals 32-bit, so you can choose the field you need) and you can then multiply by the reciprocal.  It is faster than the 68000's DIV instruction.  But then there are also tables for trig, log, square root, and other things too.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 13, 2016 8:31 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Earlier related topics include:
Instructions that I missed
"Homebuilt" 6502 cpu's
Improving the 6502, some ideas
65Org16 - extending the instruction set

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 11:10 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
I have to fix my previous post. BCD multiply and divide are 3 and even more times slower than the binary because BCD data occupy about 50% more bytes. The addition and the subtraction are about 50% slower for the same reason.

GARTHWILSON wrote:
Since then of course the lines have been quite blurred. ARM is not a RISC.


Thank you very much for valuable information and the links. :)

I agree that ARM and 6502 are not RISC. IMHO these words RISC, CISC, ARA are kind of meaningless slogans. The wiki's page about ARM based Acorn Archimedes (https://en.wikipedia.org/wiki/Acorn_Archimedes) has "information": "since the 68000 is a CISC and ARM2 is a RISC architecture, the 68000 could execute more complex instructions in one step, while the ARM2 must do it in several steps. So, depending on the task and the code, the 68000 may outperform the ARM2 in several cases". IMHO 68000 can outperform ARM2 only with BCD operations. :) And I can add that a lot of ARM instructions are so complex that 68000 has to use 2-5 its instructions to emulate them.

GARTHWILSON wrote:
the 6502's decimal-mode operations don't consume any space on the op-code table except SED and CLD.


The decimal mode of 6502 consumes only two opcodes but also some of silicon space which may be used for ONE or TWO additional instruction(s)...

GARTHWILSON wrote:
I seem to remember a DIV instruction took over 170 cycles, and that interrupts had to wait.


I've checked the tables, 68000 DIVU instruction takes 144 cycles. The best known 6502 division routine (32 bits dividend and 16 bits divisor) takes about 700 cycles and occupies about 1 KB of memory. Its version with loops is about 100 bytes in size but about 850 cycles in time. Is there any problem in the interrupt delay for 144 cycles for 8 MHz 68000?

GARTHWILSON wrote:
Both the 6800 and the '02 at 1MHz routinely outperformed a 4MHz Z80 in benchmarks in BASIC.


Any proof? I'm aware of http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/archv017.cgi?read=120687, IMHO it gives very odd results close to absurd like z80 is faster than 80188. It uses different algorithms, multi-level emulation and other things that make the objective compare almost impossible.
I worked with Amstrad CPC (z80 at effective 3.2 MHz) and PCW models. I have to say that Amstrad CPC ROM Basic is 2-3 faster than Commodore Basic (with 1.1 MHz 6502). I have the evidences of commercial game programmers from USA who made the codes for 6502 and z80 that 6502 is generally about 2.2-2.4 times faster. My project http://litwr2.atspace.eu/xlife/retro/xlife8.html shows about the same ratio.

GARTHWILSON wrote:
My 65816 Forth is two to three times as fast as my '02 Forth again at a given clock speed.


My measurements show that 65816 in 16 bit mode is less that 50% faster with arithmetic, 100% with the memory block transfer (due to MVN/MVP), 0% with branches. 65816 has powerful stack addressing modes. So for stack based Forth it should give the significant speed boost maybe even more that 100%. This boost with the smaller scale will be applied to the programming languages which use subroutines with the formal parameters and the local variables. However the work with byte tables is faster in 8-bit mode and 68516 has no any advantages over 6502 in this area.

IMHO 65816 was realized a bit extensive like a bit ugly z80. It might be faster with instructions like CLC, TXA, ... like later 4510/65CE02. XBA, MVP, ... might also be faster.

I want to find opportunity to study 32016, the first 32-bit CPU... B-em, BBC Micro emulator has some support for it. I doubt in its accuracy though.

GARTHWILSON wrote:
An increment-by-two and decrement-by-two would be nice.


It is better to have ADDQ and SUBQ from 68000. :)

GARTHWILSON wrote:
It is faster than the 68000's DIV instruction.


68000 can also use tables. :) I had a surprising result with R800 (the fast version of z80 for MSX turboR, it is about 3-4 times faster than the original z80). R800 has fast hardware multiplication (36 cycles for 16 bit multipliers) but with a 768 bytes table I could get 35 cycles multiplication with a constant factor.
IMHO the fastest division is achievable via logarithmic tables. I suspect that 80186 uses them.

GARTHWILSON wrote:
Earlier related topics include:


I have just browsed all these topics. Thanks again. :) IMHO LSR4 instruction would be worth to mention too. 6502 requres 4 LSR to get the higher nibble. The lower nibble can be get by AND #15. So what's the purpose of SWN (swap nibbles)? It is better to have z80 RLD which allows to make fast 4-bit shift of the sequence of bytes.

I have another a bit ignoramus question. Why JMP ($xxFF) is often considered as a 6502 bug? IMHO it is very good to work with tables. It allows for jump tables to occupy exactly one page. For example, the next code provides jumps for 128 odd values

Code:
        ldx value
        stx mjmp+1
mjmp    jmp (divjmp)


the table divjmp occupies one page for 6502 but requires an ugly misaligned page for 65C02/65816...

_________________
my blog about processors


Last edited by litwr on Tue Aug 16, 2016 6:30 pm, edited 4 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 11:19 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
(Thanks Garth - you picked up a thread not previously present in my index.)


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 12:13 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
litwr wrote:
I've checked the tables, 68000 DIVU instruction takes 144 cycles. The best known 6502 division routine (32 bits dividend and 16 bits divisor) takes about 700 cycles and occupies about 1 KB of memory. Its version with loops is about 100 bytes in size but about 850 cycles in time. Is there any problem in the interrupt delay for 144 cycles for 8 MHz 68000?
For realtime it would be. And there was one particular device that was very realtime back then, and that was the floppy drive. Realtime in this case means that it has to be fast enough, but if it is, then it doesn't matter if it is even faster. So, the question is really: How long does it take to process 144 cycles? Acorn dismissed both the National Semiconductor and the Motorola offerings for that reason. Steve Furber explicitly mentions the NS one in this interview: http://cacm.acm.org/magazines/2011/5/10 ... r/fulltext
Quote:
"We looked at the 16-bit processors that were around and we didn't like them. We built test bench prototype machines with these. We didn't like them for two reasons. Firstly, they were all going to 16 bits by adopting minicomputer-style instruction sets, which meant they had some very complex instructions. The anecdote I always remember is that the National Semiconductor 32016 had a memory-to-memory divide instruction that took 360 clock cycles to complete; it was running at 6 megahertz, so 360 clock cycles was 60 microseconds; it was not interruptible while the instruction was executing. Single density floppies, if you handled them chiefly with interrupts, give an interrupt every 64 microseconds, double density every 32. Hence you couldn't handle double density floppy disks.[..]"

Sophie Wilson has elsewhere mentioned that the 68k had slower interrupt handling than the 6502. Back in the eighties I wrote software for a very realtime bird-monitoring system, lots of interrupts from many sources. The Apple II, or rather, the 6502, had no problems with that.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 12:28 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Garth in the past has, I think, mentioned audio as being very sensitive to jitter - regular interrupts need to be regular. If not audio, then other data acquisition. There will be other applications where it doesn't matter, of course.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 4:34 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
litwr wrote:
I have another a bit ignoramus question. Why JMP ($xxFF) is often considered as a 6502 bug?

If the NMOS 6502 (and others, such as the 6510) encounters JMP ($xxFF), it will incorrectly load PC with the contents of $xxFF and $xx00, failing to increment the most significant byte (MSB) of the operand to get the MSB of the target address. This error was corrected in the 65C02 and 65C816.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 9:05 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
litwr wrote:
Is there any problem in the interrupt delay for 144 cycles for 8 MHz 68000?

As Tor said, yes, for realtime applications it's critical.  I've run over 100,000 interrupts per second on a 5MHz 65c02, up to 140,000, IIRC, although it's seldom more than 24,000.  One thing I've done many times is sampling audio based on interrupts from a 65c22 timer.  It's not just a matter of being able to do it quickly enough.  The accuracy of the timing affects the noise and distortion, as Ed mentioned.  The instruction that's executing when the interrupt hits is allowed to finish, and there's a couple of cycles' time of RMS jitter.  I have a discussion of it at http://wilsonminesco.com/6502primer/potpourri.html#JIT .

Quote:
GARTHWILSON wrote:
Both the 6800 and the '02 at 1MHz routinely outperformed a 4MHz Z80 in benchmarks in BASIC.

Any proof?

That comes from Jack Crenshaw, an embedded-systems engineer who wrote regularly in Embedded Systems Programming magazine, who said in the 9/98 issue that he still couldn't figure out why, in BASIC benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly a more powerful instruction set, and ran at higher clock rates.

Quote:
GARTHWILSON wrote:
My 65816 Forth is two to three times as fast as my '02 Forth again at a given clock speed.

My measurements show that 65816 in 16 bit mode is less that 50% faster with arithmetic, 100% with the memory block transfer (due to MVN/MVP), 0% with branches. 65816 has powerful stack addressing modes. So for stack based Forth it should give the significant speed boost maybe even more that 100%. This boost with the smaller scale will be applied to the programming languages which use subroutines with the formal parameters and the local variables. However the work with byte tables is faster in 8-bit mode and 68516 has no any advantages over 6502 in this area.

Forth always handles cells of 16 bits.  Data are passed on a data stack which is separate from the return (hardware) stack.  This solves various problems, and has several advantages over trying to pass data on the return stack.  The data stack typically goes in ZP (or DP on the '816).  Yes, you could have 8-bit cells when you don't need 16, but it would be a nightmare to try to navigate a mix of cells sizes on the data stack.  So it's basically all 16-bit gulps, which the '816 handles far more efficiently.  From an earlier post:

      Quote:
      Could you say which aspects of the 65816 give you that advantage?

      It's primarily that the '816 is so much more efficient at handling the 16-bit cells than the '02 which has to take 8 bits at a time and increment addresses or indexes in between and such.  Here's the simple example of @ (pronounced "fetch"), which takes a 16-bit address placed on the top of the data stack and replaces it with the 16-bit contents of that address.

      First for 6502:
      Code:
             LDA  (0,X)
             PHA
             INC  0,X
             BNE  fet1
             INC  1,X
      fet1:  LDA  (0,X)
             JMP  PUT
      ; and elsewhere, PUT which is used in so many places is:
      PUT:   STA  1,X
             PLA
             STA  0,X


      For the '816, the whole thing is only:
      Code:
             LDA  (0,X)
             STA  0,X         ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.


      @ was given such a short name because it's one of the things used most.  You can see the difference in the code length, 2 instructions versus 10.

So although the '816 may only give a 50% improvement in performance for certain general-purpose assembly-language applications, the difference is much larger in Forth, and I suspect is a lot of other HLLs.

Quote:
GARTHWILSON wrote:
It is faster than the 68000's DIV instruction.

68000 can also use tables. :)

True, but then you won't need the the hardware instructions.  The tables additionally cover several things that would take a lot of MUL's and DIV's to calculate though.

Quote:
IMHO the fastest division is achievable via logarithmic tables.

My tables (linked earlier) include logs too, in various forms, to get the desired accuracy over a wider range.

Quote:
LSR4 instruction would be worth to mention too. 6502 requres 4 LSR to get the higher nibble. The lower nibble can be get by AND #15. So what's the purpose of SWN (swap nibbles)? It is better to have z80 RLD which allows to make fast 4-bit shift of the sequence of bytes.

Perhaps SWN was one of the things Chuck and Bill considered when designing the '02 and decided it wouldn't get used enough to be worth the silicon real estate.  Although I have wished for it, or for an LSR4, it's clear that they did a good job in deciding what to put there, and Bill in the later 65c02 and '816.  [Edit, later:  I posted a SWN routine that takes only 8 bytes and 12 clocks, at http://6502.org/source/general/SWN.html .]

Quote:
I have another a bit ignoramus question. Why JMP ($xxFF) is often considered as a 6502 bug? IMHO it is very good to work with tables. It allows for jump tables to occupy exactly one page.

The addressing mode is a good one.  The problem is that in the NMOS 6502, it doesn't work right if the low byte of the operand if FF.  As BDD said, that, and all NMOS bugs, were fixed in the CMOS version.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 9:18 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Bear in mind that different Basics can have quite different performance. Mallard Basic on the Z80 seems especially good.
http://www.retroprogramming.com/2010/01 ... marks.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 5:37 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
GARTHWILSON wrote:
So it's basically all 16-bit gulps, which the '816 handles far more efficiently.

Again nothing new here for the forum veterans, but I'll pound this out anyway.  Hopefully I can make it somewhat concise.

What I should have added above is that one of the reasons the '816 performs so much better than the '02 in Forth is that it makes it practical to write more of the system "words" in assembly, rather than defining so many of them in terms of other words.  (Everything in Forth is a "word," whether it's a subroutine, a variable (because there's code to put its address on the data stack), a constant, array, your top-level main program, whatever.)  A word written in assembly is called a "primitive," or "code definition."  One written in terms of other Forth words is called a "secondary," or "colon definition" (because the colon starts a word definition and turns on the compiler).

On the '02, it simply takes too much program memory to write hundreds of words as primitives.  It cannot handle 16 bits at once, so things need to be pieced together using more assembly-language instructions, making the primitive take more program memory, often more than you can justify, so you limit the primitives to the lowest-level stuff.

Simplifying it almost to the point of lying, compiled code (for secondaries) is usually a list of addresses of other routines.  For the 65 family, these come typically in byte pairs, ie, 16-bit addresses.  There's an inner "loop" (not really a "loop") that handles these addresses and sends the program pointer to the right code for each routine.  It's called NEXT NEXT has several instruction in it, so yes, it takes time.  (Nested definitions, using the words nest and unnest, add further overhead.)  If you can write a word as a primitive, it will avoid running NEXT lots of times in the process of executing that word (and maybe nest and unnest too).

My experience on the '816 was that when I wrote hundreds of words, I found that when they're re-written as primitives, they not only executed a lot faster (of course), but in many cases even took less memory than the normal secondary version, meaning that the '816 wiped out the the 6502's prohibitive program memory penalty, now justifying using the faster method.  In many cases they were even easier to write.

Another factor:  When you take a word that is normally written as a secondary, and re-write it as a primitive, there are often some stack gymnastics that can be eliminated, further speeding things up.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 11:06 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
I just thought I would mention that the Z80(8088) and 68000 aren't as slow as they might seem if one looks only at the clock frequency. Both of these processors use multiple clock cycles to interface to the bus. For instance a Z80 uses four clock cycles per machine cycle when reading instructions, so it's really only running at a 1MHz rate, the same as the 6502. The 68000 uses four clock cycles minimum per bus cycle (reading instructions) while running at 8MHz so it's really only running at a 2MHz instruction fetch rate. It's not really the instruction set that slows things down, but the implementation (bus interface). There are a couple of implementations of the 68000 done in an FPGA for instance that gain back performance by using a simpler single clock bus cycle. These cores get a 4x speedup effectively. Later versions of the 68x series shortened the bus interface to three cycles and gained 25% in performance.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 11:18 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Same thing is true of Z80 on FPGA. Actually, I think it's also true of commodity Z80 these days. For old-school comparisons, I can think of two fair ways to compare
- the fastest part you can reasonably buy
- a part running with memory of a reasonable speed

For newer comparisons, probably a benchmark is the only way, as the CPI (clocks per instruction) will vary between implementations.

It might be worth noting that the original design goals of the ARM led to a 32-bit wide memory and maximum use of paged mode accesses, to get best performance without the benefit of cache (which would have been too expensive).


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 11:26 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Rob Finch wrote:
For instance a Z80 uses four clock cycles per machine cycle when reading instructions, [...]. The 68000 uses four clock cycles minimum per bus cycle (reading instructions)[...]

I wonder why it used 4 cycles per memory access. 2 cycles is common because of adress and data multiplexing on the chip, but 4?


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 11:32 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I think in fact it's sometimes 3 and sometimes 4. It allows for refresh in the 4-cycle case but it turns out the timing is tighter in the 3-cycle case. See
http://www.piclist.com/techref/mem/dram/slide4.html

A possible reason for the design choice is that the designers could see ways to make the chip clock faster than currently available memory. Famously, the Z80 has only a 4-bit ALU which therefore needs two ticks to perform a byte wide operation. So that will save some transistors - it's already a big chip so that helps. (The designers also wanted to avoid patent trouble with Intel, so being different was a virtue.)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 130 posts ]  Go to page 1, 2, 3, 4, 5 ... 9  Next

All times are UTC


Who is online

Users browsing this forum: barnacle and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: