6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Sep 28, 2024 10:09 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Sun Oct 17, 2010 6:06 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
In another article, Garth wrote:

Quote:
I was disappointed that the only 6502-related part seemed to be that it had a low-powered 6502 mode similar to the 65816's emulation mode, apparently so they could start it running in a Commodore 64 on the 64's ROM to set things up, and then it would go into its own mode which didn't look anything like a 6502 and they lost my interest.


I wanted to respond to this, but it was so far off-topic that I decided to start this one.

The reason why most 32-bit "6502s" look and feel different is because they emphasize increased compatibility with higher-level languages. You can only get so much oomph out of an accumulator/memory[1] architecture with a rather baroque set of addressing modes and under-powered PC-relative or indirect modes. High-level languages like C or Pascal (or, for that matter, compiled BASIC) really want orthogonal access to CPU registers, plus a clean and orthogonal set of addressing modes. So, for example, X or Y could be used as an accumulator here, and as an index there, etc.

If all you want is a 32-bit extension to the 6502, you have two ways to achieve it -- one is to redefine your byte to be 32-bits wide, and just go that route (this is Garth's preferred embodiment), or you can retain the 8-bit byte and extend the CPU's instruction set with additional modes and/or prefix bytes to adjust operand size while retaining existing addressing modes. This would be the way WDC would choose if it cared at all about backward compatibility.

The former approach requires less hardware to implement, but opcodes (and almost certainly operands) waste 50% to 75% of the memory space they consume, since they'll still be 8-bits (or 16-bits if you merge direct page references directly into the instruction word). The latter approach yields smaller binaries, but requires greater logic to support (e.g.) 16-bit or 32-bit wide buses with two or four 8-bit lane selects, instruction prefetch queues, etc.

In either case, you're still limited by the memory bus bottleneck. I proposed using a specially-ported, on-chip data cache to relieve the bottleneck and to treat the direct page exactly like your typical internal register set, but it seems this approach isn't favored, and many think it's just easier to add more CPU registers. RISC processors with 32 32-bit registers, for example, provides the equivalent working set as 128 bytes of direct page.

I seriously doubt we'll ever see a real 32-bit 6502 that looks and feels the same as the 6502 or 65816.

_______________
1. An "accumulator/memory" architecture (aka simply "accumulator" architecture) is a CPU design which uses a single accumulator as one ALU input, and external memory fetches for the second ALU input. Note that stack architectures are a proper subset of accumulator architectures. Contrast this with Intel's 80x86 line, which is a "register/memory" architecture, so named because it allows most registers (not just a single accumulator) to be used as ALU inputs. In addition, there exists register/register architectures, where only registers can be used as ALU inputs -- typically, these are RISC-type machines. Memory/memory architectures exist too, such as the TMS9900 processor, and arguably, the 680x0 series too.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 17, 2010 7:32 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8520
Location: Southern California
Quote:
The reason why most 32-bit "6502s" look and feel different is because they emphasize increased compatibility with higher-level languages. You can only get so much oomph out of an accumulator/memory[1] architecture with a rather baroque set of addressing modes and under-powered PC-relative or indirect modes. High-level languages like C or Pascal (or, for that matter, compiled BASIC) really want orthogonal access to CPU registers, plus a clean and orthogonal set of addressing modes. So, for example, X or Y could be used as an accumulator here, and as an index there, etc.


Would you say that's because of:
  • the languages themselves?
  • the types of applications done most with those languages, and that the computers they are run on normally have more hardware like video and sound cards (or chip sets) to support the processor?
  • the size of the job, or maybe that the industry is stuck on preëmptive multitasking?
  • or maybe that portability has been more sacred than the ability to get close to the heart of the machine?
I'm trying to understand if any of it is truly relevant to us who may just want to handle bigger numbers in one gulp. For my own applications, I can definitely imagine the details of how a lot more power could come from something that still is a 65-family processor.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 17, 2010 10:24 am 
Offline

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106
@Garth

It's actually got nothing to do with multitasking or specific types of application. An orthogonal instruction set simply reduces the number of hoops that one needs to jump through in order to express an idea into assembly language for that particular CPU. It makes writing a compiler back-end (for any language) a lot easier, and code for that platform ends up being faster as well because one doesn't need to juggle data to make sure it's in the right place to be used by a given opcode. It's not limited to compilers either, the M68000 had a very orthogonal instruction set and it made programming for it in assembler a pleasure.

OK, so that explanation was probably not very good for someone who's not a programmer... let me give you an example that's closer to the 6502. The reason we say the 6502 does not have an orthogonal instruction set is because while instructions like ADC let you use all addressing modes (the ones that make sense), instructions like JSR are artificially limited - eg JSR abs,x makes sense as an instruction, but it just doesn't exist. If you want to do JSR via a table of pointers you can't just use a simple instruction, you need to either write it yourself or adjust your algorithm so it doesn't require a jump table. In the first place you end up with code that's bigger and slower, in the second you end up with code that isn't as good or as elegant as it could be.

An 8 bit CPU that's a lot like a 6502 and has a reasonably orthogonal instruction set is the M6809. Take a look at its instruction set and you'll see how some things are just easier to express with it (for example, addressing modes that use the X or Y registers work exactly the same way the stack pointer)


Last edited by faybs on Sun Oct 17, 2010 10:41 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 17, 2010 10:36 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
GARTHWILSON wrote:
Quote:
The reason why most 32-bit "6502s" look and feel different is because they emphasize increased compatibility with higher-level languages. You can only get so much oomph out of an accumulator/memory[1] architecture with a rather baroque set of addressing modes and under-powered PC-relative or indirect modes.


Would you say that's because of:
  • the languages themselves?
  • the types of applications done most with those languages, and that the computers they are run on normally have more hardware like video and sound cards (or chip sets) to support the processor?
  • the size of the job, or maybe that the industry is stuck on preëmptive multitasking?
  • or maybe that portability has been more sacred than the ability to get close to the heart of the machine?
I'm trying to understand if any of it is truly relevant to us who may just want to handle bigger numbers in one gulp. For my own applications, I can definitely imagine the details of how a lot more power could come from something that still is a 65-family processor.


Good questions. I'd like to have answers to them myself. In my opinion the 6502 has a limited set of registers only - but it has the great advantage of zeropage addressing that can be leveraged even with larger processors.

(spoiler ahead:)
My 65k will have a prefix opcode modifier to add either the PC or a new base offset register to the address - which allows for flexible re-location of addressing modes, including the zeropage addressing mode. You could set the base offset register to the address of your data structure and then address all fields with zeropage addressing.
(spoiler end)

kc5tja wrote:
I seriously doubt we'll ever see a real 32-bit 6502 that looks and feels the same as the 6502 or 65816.

I hope my 65k will give you that feeling. At least it gives it to me...

I am almost done with the specs (no code yet!). I know, release early and often, but there are some things I still want to finish before I publish it. I hope to get it out next weekend latest.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 17, 2010 8:19 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
* the languages themselves?


No. This applies just as well to functional, to object-oriented, and to procedural language categories. Because stack architectures are a subset of accumulator architectures, only stack-based languages like Forth categorically prove the exception.

Quote:
* the types of applications done most with those languages


No. C, Java, and numerous other languages of the C-family have been applied to problems as diverse as arcade games to finance to medical imaging to CAD and more.

Quote:
that the computers they are run on normally have more hardware like video and sound cards (or chip sets) to support the processor?


No. Hand-crafted 2.8MHz 65816 assembly demonstrates a level of graphical performance for the Apple IIgs that competes surprisingly favorably with the Amiga's dedicated blitter hardware running a 3.57MHz[1]. Anyone who has used Deluxe Paint II on the Apple IIgs will prove this point with great facility by grabbing an eighth of a 320x200x16-color screen as a brush, then selecting the spirograph tool, and drawing with the brush. Granted, the Amiga is still faster in practice, but when push comes to shove, the 65816 has sufficient horsepower to alleviate the need for dedicated graphics hardware compared to comparable 68K-based systems.

That being said, it's clear from published benchmarks that compilers for the 68K architecture produces 2x to 3x faster code. How does one explain this performance discrepency? I predict compilers for the 68K family do not have to juggle memory to and from a single accumulator, or tweak word-width flags, or have to constantly reload the Y register for random-access to data structures. Each of these offers a small speed up, but since they all occur in combination, their effects multiply.

Quote:
* the size of the job, or maybe that the industry is stuck on preëmptive multitasking?


Preemptive multitasking is a huge, huge, huge performance booster in practice. Cooperative multitasking is more time efficient only in closed environments such as you'd find in deep-embedded applications. For everything else, cooperative has been shown to be fatally susceptible to poor programming practices (even accidental), to the point of rendering a computer so unresponsive as to require a reboot. C.f. Windows 3.0, OS/2 1.3 and earlier, the Contiki event-driven OS, etc.

Additionally, if your kernel has a well-chosen set of task primitives, you'll find coding for a preemptive multitasking environment quite easy. The Amiga operating system, for example has one, and only one, system call that puts a task to sleep -- Wait(). Now, there exists other blocking system calls, but all of these functions ultimately have to call Wait() in order to put the task to sleep. Conversely, there exists one, and only one, mechanism for waking a task -- Signal().

On top of these basic primitives, AmigaOS provides message queues (called "message ports" in AmigaOS lingo), semaphores, and I think a few other basic primitives. But of all the primitives supplied, take a guess as to which one (and I do emphasize the singular here) is preferred for very nearly everything in the OS?

Whether you're implementing a GUI application, a device driver, or a filesystem implementation, you're going to be working with message queues. It's far simpler than the bullshido you get in PThreads or Win32 Threads, and it essentially mimics how you'd build a real-world, multi-processor embedded application anyway. Oh, and it's also the programming model used for Erlang -- if you've coded an Intuition application for AmigaOS, you already know how to code parallel applications in Erlang even if you don't know how to code in Erlang yet.

The numbers speak for themselves. In 256K of space (mostly generated from a C compiler, at that), Kickstart provides a plurality of device drivers, libraries, and supporting background tasks, all communicating with message queues. Add in the disk-resident software, and that figure goes up to about 1MB or so (Workbench shipped on an 880K floppy, uncompressed binary images). Binaries were kept small, the OS is still considered darn tiny by today's standards, and yet it still relies heavily on preemptive multitasking.

No MMU required -- just brains.

Quote:
* or maybe that portability has been more sacred than the ability to get close to the heart of the machine?


I suspect that without the business imperative (deliver a product on-time and within budget), the need for high-level programming languages would not be as strong. As it is, companies use HLLs with the full knowledge and acceptance that they're trading some fraction of runtime performance for improved programmer productivity. Coding a program faster than your competitor means, automatically, that you can respond to customer desires faster, which earns you a greater market share. You don't use Ruby to write a real-time engine control package. But it works great for delivering a production-ready web application to millions of users in only two months.

Alternatively, now that the market is saturated with HLL coders, it also means you need not spend so much time training new hires, which also reduces overheads for the company.

Quote:
I'm trying to understand if any of it is truly relevant to us who may just want to handle bigger numbers in one gulp.


Like I said in my original post, if all you want is a bigger gulp size, then you can get by with a wider byte. But, even this fundamentally alters the "look and feel" of the CPU.

Code:
lda aBigNumber
clc
adc anotherBigNumber


simply feels a lot different when coding than:

Code:
lda aBigNumber
clc
adc anotherBigNumber
sta aBigNumber
lda aBigNumber+2
adc anotherBigNumber+2
sta aBigNumber+2


___________________
1. Before people chop my head off for making this statement, if you pick up the Amiga ROM Hardware Reference Manual, you will observe that the blitter can write to RAM no faster than one word every two cycles at 7.15909MHz, assuming all you're doing is filling memory with a fixed value. It runs slower still if you're using it to process one or more source channels.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 17, 2010 8:23 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8520
Location: Southern California
@faybs

Not knowing C or Pascal, I can draw only a vague understanding of why the extra registers and orthogonal instruction set may make it easier to write compilers for them. But I don't think the extra registers necessarily make for any more power though, as the Z80, in spite of its higher clock frequency, more registers, and wider registers, was outperformed by the 6502, the 6809 does not outperform the 65816, and the 68000 (which apparently is very nice to program) does not even really perform any better than the '816 does in spite of 16-bit data bus, 32-bit registers, and a lot more of them. The extra performance of the processors that have it seems to come from things like deep pipelining, caches, branch prediction, and separate data and instruction buses which come at a very high cost. If I "move up" from the simplicity of the 65 family, I think I would want to go to a stack processor (which kc5tja advocates too) which, in my observation from the outside, give the highest performance-to-complexity ratio, sometimes doing more MIPS than MHz, and interrupt latency is four clocks or less, and a return from interrupt may take 0 or 1 clocks.

Quote:
Good questions. I'd like to have answers to them myself. In my opinion the 6502 has a limited set of registers only - but it has the great advantage of zeropage addressing that can be leveraged even with larger processors.

And if we make an all-32-bit one, the entire memory map of over four billion 32-bit locations is basically zero page (with the starting point movable with the 32-bit DP register), so no instruction is more than 2 (32-bit) bytes, grabbed in two clocks, and ZP pre- and post-indexed indirect instructions can apply to any address, without limitations. Bank registers become optional-use offset registers like the DP register. There are no bank boundaries, since any address is still 32 bits and covers the entire range.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Oct 17, 2010 8:32 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
I can draw only a vague understanding of why the extra registers and orthogonal instruction set may make it easier to write compilers for them.


This is not a C or Pascal-specific thing.

Why do you prefer to use direct page for stuff? For the same reasons compilers prefer to use CPU registers. A compiler would not have any problem using direct page instead of CPU registers, if the CPU's instruction set provided instructions like:

Code:
sec
sbc $02, $06, #$30


which would be equivalent to:

Code:
sec
lda $06
sbc #$30
sta $02


Quote:
But I don't think the extra registers necessarily make for any more power though, as the Z80, in spite of its higher clock frequency, more registers, and wider registers, was outperformed by the 6502


I openly challenge you to pit a Rabbit CPU (Z-80 compatible with a much faster bus interface) to a 6502 and see who wins, clock for clock. The Rabbit, like the 6502, has single-cycle bus transactions.

Quote:
the 6809 does not outperform the 65816


Clock for clock? I'd have to see numbers on this. I suspect they'd be a good match.

Quote:
and the 68000 (which apparently is very nice to program) does not even really perform any better than the '816 does in spite of 16-bit data bus, 32-bit registers, and a lot more of them.


But the 68020 does snow the 65816.

And, again, you're conflating bus performance with instruction execution performance. These are NOT the same thing. Not even close!!

Quote:
If I "move up" from the simplicity of the 65 family, I think I would want to go to a stack processor (which kc5tja advocates too) which, in my observation from the outside, give the highest performance-to-complexity ratio


Stack architectures achieves this by retiring one instruction per clock cycle, while still keeping the ability to fetch one instruction per cycle as well.

Quote:
And if we make an all-32-bit one, the entire memory map of over four billion 32-bit locations is basically zero page (with the starting point movable with the 32-bit DP register), so no instruction is more than 2 (32-bit) bytes, grabbed in two clocks, and ZP pre- and post-indexed indirect instructions can apply to any address, without limitations.


Thus turning your CPU into a memory/memory architecture along the same lines as the TMS9900, and you'll forever be bottlenecked by RAM access speeds.

A larger set of on-chip registers allows the CPU clock to increase faster than your bus speed. When this is permitted to happen, you can (if you're willing to deal with implementing the cache on-die) benefit substantially from it.

You will never get real-time MPEG playback like we have today without it. DVD players could not exist without the kind of complexity you're railing against.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 18, 2010 12:32 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8520
Location: Southern California
Quote:
No. This applies just as well to functional, to object-oriented, and to procedural language categories. Because stack architectures are a subset of accumulator architectures, only stack-based languages like Forth categorically prove the exception.

And that right there might be much of where I'm coming from. Early in my computing years, I thought I wanted to learn lots of programming languages, although I probably wasn't even aware of as many at the time as you have under your belt now. But after I found Forth, I no longer wanted to ever pick up especially another algebraic language. I kept imagining the huge complexity it would be to write a compiler for other languages though.

Quote:
I openly challenge you to pit a Rabbit CPU (Z-80 compatible with a much faster bus interface) to a 6502 and see who wins, clock for clock.

Ok, I don't know the answer, and I'm not going to buy one and take the time to learn to use it to try the experiment. But you say it is a Z-80 compatible. It took something other than more & wider registers to make it a good performer, because the original Z80 had that advantage and yet it could not keep up with a 6502.

Quote:
Quote:
the 6809 does not outperform the 65816

Clock for clock? I'd have to see numbers on this. I suspect they'd be a good match.

which is why I can't say for sure that the '816 outperforms the 6809 either. [Edit: Later, I was informed that the 6809 hardly outperformed the 6502. The '816 definitely outperforms the '02 though.]

Quote:
But the 68020 does snow the 65816.

The 020 was the next entire generation up, from what little I ever knew from when another engineer at work we showed interest in the 68000 family for a project (which we never went through with) and Mot sent a large box of data books, data sheets, ap. notes, etc. which I still have but it has been rather inaccessible for many years, and then I went to a 68060 seminar when the 060 came out. Too bad they didn't stick with it.

Quote:
And, again, you're conflating bus performance with instruction execution performance. These are NOT the same thing. Not even close!!

I know they're not the same. I'm looking at overall performance, and saying that some of the things people have advocated adding to a new 65-family processor (which would take away its "65ness") do not guarantee improved overall performance. The constant example I'm up against is the PIC16 family. I know it does not qualify as a RISC (as Microchip claims) in that it does not have a bunch of registers; but does have pipelining, and retires supposedly one instruction per cycle, and has separate instruction and data buses, yet it takes approximately twice as many instructions and twice as many clocks to do a job as the 6502, if the PIC can do it at all. (I started with PICs partly because the low cost meant management would not say "no" when I wanted to replace a gob of inexpensive parts with a microcontroller, and now I have a lot invested in it.) I find big disadvantages in its effort to merge op codes and operands and in having the separate instruction and data buses.

Quote:
and you'll forever be bottlenecked by RAM access speeds.

I guess I'm willing to live with that; but a 32-bit bus does mean four times as much can be fetched or stored in one cycle as with an 8-bit bus, and today lots of fast SRAM (2ns?) could be put on-chip similar to cache but it's the main memory, along with the I/O so the buses don't have to go outboard at all where things get slow.

Quote:
You will never get real-time MPEG playback like we have today without it. DVD players could not exist without the kind of complexity you're railing against.

Until optical computing or something entirely different comes along, I can believe it. I personally am not interested in those though. I don't even have (or want) an MP3 player. I would still say that we are far from reaching the ceiling. If a 6502 has been run at over 200MHz in one of WDC's licensees' products, and we can still widen the buses and registers...

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 18, 2010 2:29 am 
Offline

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106
GARTHWILSON wrote:
But I don't think the extra registers necessarily make for any more power though, as the Z80, in spite of its higher clock frequency, more registers, and wider registers, was outperformed by the 6502, the 6809 does not outperform the 65816, and the 68000 (which apparently is very nice to program) does not even really perform any better than the '816 does in spite of 16-bit data bus, 32-bit registers, and a lot more of them.

The 6502's speed over those other CPUs comes from two sources: the fact that 6502s access memory every cycle like kc5tja said, and the 6502's use of little endian byte ordering, which allows many instructions to take one cycle less to execute than they do on a 68xx.
Quote:
The extra performance of the processors that have it seems to come from things like deep pipelining, caches, branch prediction, and separate data and instruction buses which come at a very high cost.

Actually, most of those exist either to compensate for the vast difference in speed between memory and CPU registers, or to compensate for each other (eg branch prediction exists to compensate for a performance hit inherent in pipelining, which in turn exists to compensate for the amount of time it takes a modern CPU to access memory). The 6502 doesn't need any of those for the simple reason that it can access memory on every cycle.

As far as number of registers is concerned, I firmly believe that zero page (or DP on the 816) does a good enough job.

Since we're discussing this, I thought I'd mention my own dream 32bit 6502. I'm not much of a hardware guy (software's my thing), but if any of it inspires the solder jockeys here I'd be quite pleased. If I was given the job of making a 32 bit capable 6502 that is backwards compatible with the 8 bit version, I'd do it like this:
1. Add a new 32 bit accumulator (call it Q for quad)
2. Add LDQ and STQ instructions with the full complement of addressing modes that LDA and STA have
3. Add a new addressing mode that allows 32 bit ALU operations between Q and integers in zero page addresses, eg ADC Q,zp
4. Add another addressing mode that allows 32 bit ALU operations between Q and 32 bit immediate values, eg SBC Q,#$11223344
5. Add two 32 bit index registers (call them V and W) and allow them to be used with the same addressing modes as X and Y (eg LDA offset,w). Also add LDV, STV, etc: the same set of operations that X and Y get. Obviously instead of TAW you'd have TQW and so on.
6. Add JSL and JML instructions that take 32 bit absolute addresses, as well as V and W as targets (and add RTL for 32 bit returns)

That design keeps the feel of 6502 assembler, while keeping compatibility with the 8 bit instruction set and expanding and allowing for 32 bit operations and a 4GB address space. It also does away with messy register width mode bits and bank bytes, which are my pet peeves with the 65816.


Last edited by faybs on Mon Oct 18, 2010 2:42 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 18, 2010 2:39 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
But after I found Forth, I no longer wanted to ever pick up especially another algebraic language. I kept imagining the huge complexity it would be to write a compiler for other languages though.


Compiler technology isn't simple, but one can write a naive compiler for a register/register architecture that produces quite adequate code without too much difficulty. Remember when I translated Forth code to random-access direct page references, automatically optimizing out all the swaps, drops, nips, and what-not? This would suggest that if you can translate other languages into reverse polish form (trivially done once you have the parse tree available), the code generator can be used for any kind of language you throw at it.

Quote:
Ok, I don't know the answer, and I'm not going to buy one and take the time to learn to use it to try the experiment. But you say it is a Z-80 compatible. It took something other than more & wider registers to make it a good performer, because the original Z80 had that advantage and yet it could not keep up with a 6502.


As you worded it, you voiced the concern that a large register set isn't an indicator of processor performance, and cited two processors (Z80 and 68000) with known inferior bus architectures. I claim that this simply isn't a fair comparison to make in the context of the discussion.

The rest of the world indicates clearly the opposite -- once the CPU's instruction retirement rate exceeds the memory fetch capacity (which doesn't take much to do, I might add; 16MHz buses are about the reasonable limit, though through very careful PCB design you can push into the 30MHz regions. Beyond this, you're looking at differential signaling, stupendously wide buses, etc.), on-board registers become one of the determining factors of processor performance.

Quote:
The 020 was the next entire generation up


Yes, but what made it so grand was the all 32-bit infrastructure. 32-bit external data bus, 32-bit internal ALUs, etc. Internally, the microcode and nanocode backing the CPU's instruction set had not differed significantly from the 68010 which preceded it. We didn't see real cause to drool until the 68030 rolled, and the 68040 even more so.

Quote:
I'm looking at overall performance, and saying that some of the things people have advocated adding to a new 65-family processor (which would take away its "65ness") do not guarantee improved overall performance.


This needs to be measured, as inefficiencies introduced in one area may be more than made up for by vastly improved efficiencies elsewhere. You can't really make a static determination of efficiency based on anticipated cycles consumed for some operation.

Quote:
The constant example I'm up against is the PIC16 family. I know it does not qualify as a RISC (as Microchip claims) in that it does not have a bunch of registers


Actually, RISC says nothing about the number of registers offered by the architecture. All it says is that the CPU offers a reduction in instruction offerings (and addressing modes), preferably arrived at by studying the instructions generated by theoretically perfect compilers for theoretically perfect computing hardware as well as existing software for existing architectures.

The large register files found in today's RISCs are there because of pressures found when writing compilers. Early RISC architectures, like the CYBER 604, had as few as 4 registers, while the Cray-1 supercomputer had only 8 integer registers (not including FPU and vector registers).

Also, pipelining isn't the sole domain of RISC; CISC mainframes have used pipelining as a means of improving throughput for some time.

Quote:
twice as many instructions and twice as many clocks to do a job as the 6502, if the PIC can do it at all.


I would agree, but this only indicates the inferior instruction set of the PIC16. The ATmega microcontrollers, for example, perform much better, and can keep up with the 65816 (let alone 6502) at most tasks. There are places where it's inconveniently slow (e.g., accessing ROM, or having to work with more than two or three pointers at a time), but on the whole it's quite a zippy little engine.

Quote:
Until optical computing or something entirely different comes along,


Which it won't, because of power consumption and physical size limitations. One could expect a 4-bit optical computer to be about the size of those old-school Apollo workstations that used to hold up the end of your desk back in the late 70s/early 80s.

Quote:
If a 6502 has been run at over 200MHz in one of WDC's licensees' products, and we can still widen the buses and registers...


Right, but this is talking about something completely different from what I was addressing.

The topic that spawned this thread is simple, concise, and to the point: 32-bit 6502 spinoffs don't feel like a 6502, and that seems to bum people out. MY argument is equally simple, concise, and to the point: it'll never happen. ANY time you attempt to change the 6502 to support a 32-bit wide computational capability, it WILL change the fundamental nature of the CPU. Just as the 65816 in native-mode feels very different than the 6502, so too will a 32-bit re-imagining of the architecture.

Please also note that I'm not making a value judgment here. I'm not saying that this is a bad thing. I'm merely advocating that one should not expect a 32-bit 6502. The ARM is perhaps as close as you're going to come.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 18, 2010 4:04 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8520
Location: Southern California
Quote:
The topic that spawned this thread is simple, concise, and to the point: 32-bit 6502 spinoffs don't feel like a 6502, and that seems to bum people out. MY argument is equally simple, concise, and to the point: it'll never happen. ANY time you attempt to change the 6502 to support a 32-bit wide computational capability, it WILL change the fundamental nature of the CPU. Just as the 65816 in native-mode feels very different than the 6502, so too will a 32-bit re-imagining of the architecture.

Then perhaps we just had different ideas of what "feels different." To me the 65816 native mode feels like a 6502 with sensible added capabilities and things just made easier. I almost never touch the register-width mode bits faybs complains about, and you can get loads of benefits while totally ignoring that there's more than one bank—but we've been over that before. [Edit, 11 years later: I have an article about '816 misunderstandings, misunderstandings which make people shy away from it, at http://wilsonminesco.com/816myths/ .]

Quote:
Please also note that I'm not making a value judgment here. I'm not saying that this is a bad thing. I'm merely advocating that one should not expect a 32-bit 6502. The ARM is perhaps as close as you're going to come.

Do you have a favorite place for getting started with ARM?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 18, 2010 6:32 am 
Offline

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106
Don't get me wrong, the 65816 has a pretty nice instruction set overall. I guess I see the banks and width bits as the wart on Mona Lisa's nose :-P


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Oct 18, 2010 10:26 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
I think one problem here is that everybody thinks about something different what "feels" like a 6502. So I can only speak for myself...

In my opinion the 6502 "feels" like a 6502 because of

- the (small) register set, with a single general purpose register AC and index registers X,Y
- the register/memory architecture.
- zeropage addressing modes, indirect indexed addressing modes
- cycle efficiency (most / almost every cycle has a valid memory access, interrupt efficiency)
- general simplicity

A 32 bit approach for me would still be 6502ish when the above points are fulfilled. So for me it would be ok when the registers are wider than eight bit. But a RISC processor that "by chance" translates 6502 opcodes into its own RISC opcode space is not 6502ish for me (sorry 65GZ032 - but still a great accomplishment). Also just adding more and wider registers would not feel 6502ish - it would just interleave the 6502 basically with another processors.

It seems - I have no experience in writing compilers - that a processor with (many) more general purpose registers would be much better suited for compiler. On the other hand I believe that with offsetting zeropage the processor would get a set of 256 register bytes - more than many other processors - with automatic register window (changing the offset).

Working on my 65k I think I have preserved a lot of 6502ishness, while still providing many modern features. It is an interesting experience I am going through right now. Looking forward to your comments next weekend...

André


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: