6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 2:34 am

All times are UTC




Post new topic Reply to topic  [ 28 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Mar 19, 2007 9:47 pm 
Offline

Joined: Thu Sep 09, 2004 10:13 pm
Posts: 8
Hi all,

I want to put on the table a questions (may be stupid, pardon to me in this case).

I still see a big interest around the 6502, from C64 fanatics to Automitive system companies. A wide range of designers are still using this glorious processor.

I also got evidence of many attempts to give a future to this processor, designing new core at 32bit (RISC like) with the capabilities to emulate 6502 original instructions (tyically not clock accurate or to complex to be validated). None of these had a lot of success I believe: validation problems, maturity, new tools chain to be designed, etc...

On the other hand, I have to admit that having a 6502 with larger memory space and a 32bit access could be quite useful (just for fun, I like to immagine a C64 with the processor replaced with an FPGA with inside new peripherlas and lot of memory).

Giving, that I was thinking about the possiblity to design a wrapper (instead of a new processor) around a generic 6502 (one of the already sythetizable cores) to give it 32bit access capabilties and a 4Gbyte of memory space. This modified 8-bit processor should use the already available tools (assembler).

The advandage of this approach would be:
- easy to verify (the core is still an already working 8bit 6502)
- 4 Gbyte of memory space (segmented)
- 16 bit Code Segment (64 K segments of 64Kbyte each)
- 16 bit Data Segment (idem)
- 16 bit Stack Segment (idem)
- 32 bit General Pourpoise Register (for 32bit access)
- Automatic Code Segment switch (context) on JMP, JSR, INT, NMI and BRK
- compatible with already available 32bit peripherals (see www.opencores.org to have an idea about availability)
- unmodified original ISA
- compatible with standard tools
- fully backword comaptible with 6502 binary code

The wrapper would have a 6502-like pipeline working in parallel to the 6502 one. The wrapper pipeline will execute only the new instructions.

Of course, it must not be intended as a project to originate a powerfull processr but, instead, a funny project to enlarge original 6502 capabilities may be running a simple uCLinux (may be slow but ....)

I would appreciate your comments in order to know if I'm going to wast time or if I'm going to do something that could be useful at least inside the 6502 fanatics community :-)

Thanks anyway to be so patient to read this long email.

Cheers
Pipettas


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 21, 2007 4:19 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
This has been discussed countless times before, but the Terbium processor is so long overdue that we're all wondering if it will ever come out, so I'm game to dream again.  I haven't seen one of these discussions in a couple of years.  The price of memory keeps coming down, and other factors keep changing as well, affecting design philosophy.

Much of what you want is already available in the 65816, although it only addresses 16MB instead of 4GB (or 4 giga words).  The '816 is actually easier to program in many respects, because 16-bit numbers can be handled all at once instead of only 8 bits at a time, more instructions available means further reducing the number of steps required to do various things, it's better-suited for multitasking and relocatable code, etc..  The price is hardly any more than the '02.  If you're building a new computer, there's almost no reason to chose the '02 over the '816.

There's a good reason the '816 used banking.  Not having to load all 24 bits of every absolute address saved memory when memory was much more expensive.  It also saved clock cycles, because operands were read 8 bits at a time.  When I made my first computer in 1985 (after the '816 was out), I remember wishing I could afford the 8Kx8 SRAM instead of something smaller, but Jameco sold 8Kx8's for over $40, back when $40 was a lot more money than it is today.  Before I finished my plans, it came down to $8, so I bought a couple.  The price would not have come down so fast if the Japanese hadn't been dumping; but today memory is cheap enough to store a movie in, so there's no problem with going to a 32-bit data bus (except making all those connections!), and grabbing the whole address at once instead of one byte at a time would save time.  (I know-- someone will say, "Why not use 8 bits for the op code and the other 24 for an address in the current a 16MW bank?"  It's worth considering, but then we're starting down that slippery slope to a too-complex machine.)

I wouldn't say the 6502 has no future though, as more 6502's are being sold every year now than at any time in history-- hundreds of millions per year.  We just don't see them, because they're almost all embedded in custom ICs that control various products, instead of going into desktop computers.

You should read Bill Mensch's philosophy on RISC and CISC.  He's the designer of the 65c02 and 65816, which he says are intentionally neither RISC nor CISC.  As you seem to say near the end, I really doubt that we need a 6502-flavored Pentium.

There are other threads on the follies of software bloat where cheap memory and storage and multi-GHz processors have allowed software companies get extremely wasteful in their programming.  The penalty, as I see it, is not just needing more of what's cheap, ie RAM and disc space, but also longer load times, slower execution, and most of all, bugs from sloppy, hurried programming.  [Edit, 1/4/24:  I just found out this is called "Wirth's law."  The Wikipedia article starts with, "Wirth's law is an adage on computer performance which states that software is getting slower more rapidly than hardware is becoming faster.  The adage is named after Niklaus Wirth, a computer scientist who discussed it in his 1995 article 'A Plea for Lean Software'."]  Programs could be written to do just as much with far less resources, and I favor efforts in that direction.  To me, the only reason to have a ton of memory is for data, like for digital recording and pictures.

I would like to take a simple processor like the 6502 and, without forfeiting the simplicity, make the busses and registers 32-bit and add more instructions to make relative addressing more practical for a given relocatable program's tables, variables, data, etc..  With the data bus as wide as the address bus, basically everything is in zero page (or direct page), and calculating full addresses can be done in single instructions.  Relative branches can branch anywhere in the 4GigaWord address space.  With 32-bit index registers, indexing into a 1.2GB table could take place with a single 2-word, 3- or 4-clock LDA addr,X for example.  I would like it to have the 65816's instructions for things like pushing a relative offset address onto the stack, moving memory, etc..  Some of the 65816's instructions relating to memory banks would become irrelevant, because everything now goes in the same 4GW (16GB) memory bank.  Everything would also be in direct page, although the direct-page register could point to different locations as the beginning or "zero".

Making the input clock a multiple of the phase-2 bus rate should make it possible to eliminate all the dead bus cycles and make certain additional instructions like a divide very fast.  My interest here is in better performance with large data arrays, higher-level languages, and even DSP, while keeping it one of the simplest processors around.  Only the Reset vector should be in ROM, with the other vectors in RAM, initialized by the Reset routine.  This allows booting up from slower ROM, and then one of the the first instructions executed from faster RAM after the Reset routine is finished is to divide the clock by a smaller number in order to kick the clock speed way up.  This has been done with separate hardware in the clock circuit of course, but it would be nice to have it onboard the processor itself.

I personally had some trouble getting interested in the 65gz032 32-bit 6502 project, since the designers were going for deep pipelining, branch prediction, caches, and all kinds of things that are outside the realm of simpler processors.

Quote:
Thanks anyway to be so patient to read this long email.
Your post is pretty short compared to some on this forum.  My own here is not very well written, but I've been interested in a 32-bit 6502 especially in the last couple of months.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 21, 2007 8:29 pm 
Offline

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111
I think this subject is quite a bit out of scope for "6502.org". I do not think anybody reading this forum has the means to design new MC chips. And no chip designer would design a new chip having as main market the hobbyist reading this forum! But obviously, dreams are allowed!

Anyway, for "main computers" the world is obviously on the Intel (Pentium) track (including software compatible competitors)! This I personally think is unfortunately as for example the basic Motorola architecture ( which is the architecture of 6502) is quite a bit more elegant. The Intel developments were always hampered by the need of being backwards compatible with earlier models and this resulted at the end in quite strange designs. But also these designs work and has now a "quasi monopole" all over the word (MS + Linux systems). This development is not reversible! Possibly (hopefully!) will the (excellent!) Sparc processors from Sun find a durable place in the market but to enter this market with a new design would be complete madness and no management controlling a sufficiently capable company is "completely mad"

As everybody knows, this is all a result of the IBM decision to for its first PC use an Intel processor (and to let Micro Soft write the operating system). They could have chosen Motorola instead but they did not.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 21, 2007 9:32 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
I mostly agree, but I would say it's not out of the question if done with programmable logic, probably meaning an FPGA. The per-processor price may be quite high ($25? I really don't know) and the power consumption may be too high for any commercial products, but it's certainly doable for a serious hobbyist.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 21, 2007 10:48 pm 
Offline

Joined: Thu Sep 09, 2004 10:13 pm
Posts: 8
Hi,

first of all let me apologize with all of you, this doesn't seem the right forum were to discuss about this item. I supposed that 6502.org was a good place were to ask opinion about new ideas around 6502 cores. You should be 6502 user specialists, hence the right guys to be asked.

Let me clarify also that my proposal was aimed to address only hobbistic applications where FPGA are used. More and more guys are today using cheap FPGA to build in system. There is no intention to make a product or to compete with Intel, is just a modest idea to increase fun with the scope to donate the synthesizable RTL code.

Giving that, since you can easily find at least two or three synthesizable 6502 cores on internet that are perfectly working (tested on silicon), I was wondering whether to extend these cores to be 32bit capable (without reinvent the wheel) can be useful for 6502 based designers or not. That's it.

I hope that now the scope is clearer.

BTW, I want to thanks Garth Wilson for his suggestions. You are explaining a philosophy close to my one and to the intention of my proposal: "keep it simple, stupid" = KISS. I was initially attracted by the 65GZ032 project too (probably a very good processor) but to far from the original 6502 simplicity.

You are basically considering 65xx improvements in three directions:
- 32bit data bus access
- 32bit data management
- performance
- ISA extension

Well, with my proposal I'm going to address the first two topics. Performance are achieved exploiting FPGA speed: a 6502 implementation into an FPGA can easily reach 50-60Mhz. ISA extension is for the moment not considered just to keep things simple and don't involve new tools designing like assembler and compiler.

These 6502 core modifications will give you the possibility to easily manage the 4Gbyte of memory as segments of 64Kbyte each, inside of which the standard 6502 ISA works. JSR, JMP, NMI, BRK opcodes/events will be intercepted and they will cause segment switch according to a 16bit Code Segment register content (opportunely loaded before jump invocation).
The address and data busses will be 32bit large even if the core is 8bit based. 32 bit accesses will be done with a single instruction thanks to the 32bit data bus and a dedicated 32bit registers (memory will be 32bit large and 8bit accessible). Then, this register will be managed 8bit wise by the core. You will have the possibility to preload also a Data Segment and a Stack Segment register in order to keep data and stack in different segments. Finally, a second Data Segment register should give the possibility to point a second data zone in order to make 32bit memory-to-memory data moving very fast. There is also the possibility to remap the zero page (direct page) and to have one direct page for each segment.

The possibility to directly access 32bit memory (like a RISC processor) allows 32bit peripherals reuse. You can find many of these peripherals on internet (mem controller, USB, Ethernet, VGA Controller, etc....).

When all of these things are working a potential second step is to add 16x32bit registers and a 32bit ALU in order to allow 32bit data manipulation and 32bit load-store from/to memory using single instructions. At the end, the new core will have 8bit and 32bit computational capability to be used according to the specific application.

If my previous email was long, this email is a sleeping pill :-)

Thanks again


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Mar 29, 2007 12:52 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Mats wrote:
I think this subject is quite a bit out of scope for "6502.org". I do not think anybody reading this forum has the means to design new MC chips. And no chip designer would design a new chip having as main market the hobbyist reading this forum! But obviously, dreams are allowed!


http://www.opencores.org/pnews.cgi/list ... o_loop=yes

Yes, that's right, a complete RISC engine (that looks really juicy too! Has many features from MIPS and PowerPC, but is unique in its own right), fully open source, and fully sythesizable in either FPGA or VLSI.

Quote:
The Intel developments were always hampered by the need of being backwards compatible with earlier models and this resulted at the end in quite strange designs.


As with Motorola's designs (which are not based on the 6502; the reverse is actually more correct, but even here, the 6502 departs heavily from the 6800 and 68000 lineups). Anyone trying to port an OS from a 68020 to a 68040 to a 68060 will tell you a few horror stories.

Quote:
place in the market but to enter this market with a new design would be complete madness and no management controlling a sufficiently capable company is "completely mad"


Unfortunately:

* Sparc is completely mad, because all member companies are, well, companies.

* Sparc is an inferior architecture compared to PowerPC, which itself is now also open (or at least WAS) in the same sense as Sparc. All those register windows? Have you any idea how long it takes to perform a thread switch because of those (potentially hundreds) of registers?

I would much prefer a completely open source design, based on a high-performance interconnect like Wishbone. My vote is for OpenRISC.

But, why invest all those transistors for a register-based machine when you can use a stack architecture instead? I'll posit that a 64-bit stack machine will use about the same number of transistors as a 32-bit register-based machine, and can run software faster (no pipeline to stall, instantaneous interrupt response, etc.).

Quote:
They could have chosen Motorola instead but they did not.


That was Motorola's fault. As usual, Motorola dropped the ball by (a) not having a stable supply in time, and (b) not believing enough in the market to give a damn. Not only that, but IBM had stock in Intel, so it was a natural no-brainer.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 30, 2007 5:44 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
kc5tja wrote:
Mats wrote:
...
But, why invest all those transistors for a register-based machine when you can use a stack architecture instead? I'll posit that a 64-bit stack machine will use about the same number of transistors as a 32-bit register-based machine, and can run software faster (no pipeline to stall, instantaneous interrupt response, etc.).
...


This is like saying "Why don't we eliminate memory, because then we can eliminate memory latency".

I would really recommend you read Hennessey's book on computer architecture.

Toshi


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 30, 2007 5:59 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Oh, you mean that book I have sitting on my shelf? Yeah, the one that describes the MIPS-I clone in a low-level detailed fashion. The one I read three times over when I first got it. How about this instead: I recommend you read "Stack Computers, a New Wave" by Philip Koopman.

Your response was wholesale non sequitor, and utterly out of line, I'm afraid. I stand by my arguments.

Just face it -- stack CPUs take less resources to do the same amount of work. Period. Smaller instructions makes for smaller RAM consumption (so, for a given bus width, higher effective instruction throughputs are possible). Fewer transistors makes for smaller power consumptions.

The laws of physics cannot be broken, and the math doesn't lie. I really don't know where you came up with that memory strawman -- you're free to defend a flawed architecture if you wish, but if you feel the need to, please do so with cogent arguments.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 29, 2007 11:52 pm 
Offline

Joined: Tue May 29, 2007 1:29 pm
Posts: 25
This is a slight tangent, but there's a straightforward way to extend the 6502's addressing without using messy segmentation: simply add new addressing modes which use 3 (or more) bytes for an address. The existing modes would continue to address the lower 64K of memory. No need for new registers, new modes, or anything - very simple and elegent.

For code density, it would also be desirable to have PC-relative addressing modes (at least jsr/jmp).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 30, 2007 2:57 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
VBR, the things you're talking about are a couple of the already-existing attractions to the 65816. It has a lot more too though. See the second post near the top.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jun 02, 2007 10:49 pm 
Offline

Joined: Thu Sep 09, 2004 10:13 pm
Posts: 8
GARTHWILSON wrote:
VBR, the things you're talking about are a couple of the already-existing attractions to the 65816. It has a lot more too though. See the second post near the top.


VBR, first of all thank you for your feedback.

Yes, you are right, the possiblity to address linear memory space is always to be preferred but this would require to add new addressing modes and to enlarge instructions (up to 4 byte) and this is just wath I would like avoiding to do. Since I'm designing this core to be used in FPGA or ASIC project, code density and core simplicity is a must.

Memory and data will be addressed using address or data extension registers (like the DBR and the PBR of the 65816) but the instruction and data bus will be 32-bit wide, allowing real 32-bit access to the memory ( this is the real advantage). This way, the external address seen by peripherals will look like a "liner" address. Blocks of data could be moved on a "per word" basis increasing by a factor four the data troughput at the same clock speed. At the same time, system bus will be locked by the CPU one fourth of the time with respect to a normal 6502, leaving the bus free for other master like a DMA.
Having a 32-bit bus (no data/address multipled) will give you also the possiblity to save the whole 6502 registers bank (A, X, Y, P) into the memory in one clock cyle (single memory access) with abvious performance gain during subroutine call and context change.

The memory segmentation will give you the possiblity to manage each 64Kbyte segment independently form each other, so, for example, you will have local "zero page" and local data stack for each segment.

The other advantage would be the possiblity to use 32-bit peripherals available on internet for free (see www.opencores.org) and assembling standard platform using RISC peripherals.

In the first version the ALU will remain 8bit wide (even if memory access is 32 bit capable) but in the second version also the ALU will be enlarged to 32bit in order to gain a factor four also during data handling. A set of 16 registers (32bit each) will be added as well in order to minimize as much as possible memory accesses.

Cheers,
Stefano


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jun 02, 2007 11:56 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Quote:
Just face it -- stack CPUs take less resources to do the same amount of work. Period. Smaller instructions makes for smaller RAM consumption (so, for a given bus width, higher effective instruction throughputs are possible). Fewer transistors makes for smaller power consumptions.

This is getting a bit O.T., but are there any stack processors in production now? I kept a magazine page from 20 years ago, kind of a description but not an ad, of a SBC using a Harris RTX-2000 that routinely did 16MIPS @ 12MHz (peak MIPS was more), with a maximum of four clocks of interrupt latency and 0 clocks for return-from-interrupt. Incredible performance for the day, with a rather simple processor.

Pipettas, it's hard for me to tell how well you know the '816. It still sounds like you might be dreaming of things the '816 already offers (except, of course, that it does not have a 4-GigaWord address space).

On the matter of registers however, I would mention that when someone says, "The 6502 is very limited in registers!", I just respond that essentially all 256 bytes of zero page (or the relocatable direct page for the '816) is processor registers. The way it is set up makes for much faster interrupt response time and context switches than the processors with lots of registers can deliver. For this reason I'm not too fond of the idea of having 32 processor registers. The exception I would like to see is where the registers are used in onboard stacks for a stack processor so it does not have to go out on the bus(es) for stack operations. The 6502 is quite efficient at multitasking in Forth, although its ZP and page-1 sizes for data and return stacks are small enough that you'd have to be careful with stack space if you had more than about four or five tasks set up at once.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jun 03, 2007 6:32 am 
Offline

Joined: Thu Sep 09, 2004 10:13 pm
Posts: 8
GARTHWILSON, thank you for your replay.

I believe that 65816 has a lot of of improvments w.r.t 6502 in fact but it is still a 16-bit processor. Offering real 32-bit access things should go faster, is'n it?

Can some one tell me the performance improving moving from a standard 6502 to a 65816 (same clock speed)? It should be very interesting form me.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jun 03, 2007 2:57 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Quote:
Can some one tell me the performance improving moving from a standard 6502 to a 65816 (same clock speed)?

My indirect-threaded '816 Forth is two to three times as fast at the same clock speed as my 6502 Forth. (As usual, the exact speed difference depends on what you're doing.) The '816 allowed more optimization.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jun 03, 2007 4:58 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
This is getting a bit O.T., but are there any stack processors in production now?


PTSC-1000, by Patriot Scientific. Intelasys multi-core arrays (so-called SeaForth) as well.

Nowadays, with the sheer simplicity of stack architectures, I doubt they'd actually sell well because it's so easy to reverse-engineer a competing design and stuff it into an FPGA.

Quote:
For this reason I'm not too fond of the idea of having 32 processor registers.


Having all those registers is OK for embedded environments because you need only save/restore affected registers. For desktop use, however, context switch overhead can be substantial.

The IA-64 architecture (aka Itanium) actually had status bits in a control register that indicated which (sets of) registers have been modified since last context switch, so it could intelligently write them back into RAM. Most RISCs don't bother with this scheme of course.

Also, most RISC ABIs (Application Binary Interfaces) map out various registers for various purposes. Not all registers may be written back to a thread's state (for example, some registers may be reserved JUST for ISR use, some JUST for kernel use, etc). This mapping is, as you might expect, 100% analogous to a systems software designer attempting to create a memory map of zero-page.

For example, compare a hypothetical zero-page layout like this:

Code:
$00-$7F is kernel
  $00-$2F is floating point routines
  $30-$6F is for BASIC
  $70-$7F is for kernel I/O routines
$80-$FF is for applications


to a typical, but still hypothetical RISC ABI register usage convention like this:

Code:
R0 is always zero [sometimes enforced in the hardware]
R1-R4 are for kernel use
R5-R6 are for return values
R7-R14 are for input parameters for functions
R15-R22 are for temporaries [not guaranteed to be saved across function calls]
R23-R25 for ISR use [never use these as their values can change at random!]
R26-R29 are for structured exception handling
R30 is the user stack pointer
R31 is reserved for future allocation


Knowing the ABI's preferred register layout allows you to optimize context switching by saving only those registers that are really important.

Quote:
The exception I would like to see is where the registers are used in onboard stacks for a stack processor so it does not have to go out on the bus(es) for stack operations.


I prefer in-RAM stacks, but backed by specially engineered data caches to achieve the same level of performance. Although this requires more circuitry (and hence, more bus interface complexity inside the chip), when it comes time to do a context switch for multitasking, you don't end up having to write out both data and return stacks explicitly. Also, doing this stack flush will necessarily require two data stack elements to hold the memory base address and index. Therefore, if your chip has 16-deep stacks, you can only reliably support 14-deep stacks in a multitasking environment. Yucky.

Quote:
The 6502 is quite efficient at multitasking in Forth, although its ZP and page-1 sizes for data and return stacks are small enough that you'd have to be careful with stack space if you had more than about four or five tasks set up at once.


With an external MMU that detects and re-directs accesses to $00xx and $01xx to other pages, multitasking becomes even more trivial. Commodore 128's MMU could do this, for example, but for the life of me, I'm not sure why nobody has, to date, employed this capability.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: