6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Jun 04, 2024 12:18 pm

All times are UTC




Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Apr 02, 2020 2:48 am 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
32-bit successor to 6502

There have been many attempts to design both direct and “spiritual” successors to the legendary 6502 processor. In most cases these designs fall into two broad categories:

A. Designs that simply widen the existing registers to 32-bit with some even retaining an 8-bit wide bus (WDC proposal for a 65832 is an example of this design). These designs rarely add more than one or two registers to the overall design and adhere fairly closely to original accumulator-memory design.

B. Expanded designs with more general purpose registers, but having a broadly similar instruction set with familiar mnemonics and additional instructions. These designs tend to end up looking like a “CISC-lite” design with clever instruction encoding and only a few instruction lengths.

Within these two categories you will find that the proposals differ greatly on the following points:

• Instruction width and encoding

• Number of additional registers, if any

• Number of new addressing modes, if any

• New instructions, if any

• Fixed purpose vs. general purpose registers

• External bus width

• Addressing modes

• Width of the registers

• Level of backwards compatibility, if any

• Byte addressable or Word addressable

• Pipelining, if any

• Caches, if any

Floating point and memory management are also two topics that vary between designs, but many designs leave them out entirely, or it is stated that these features can be added later.

Variations in design are also driven by a number of factors:

• Designer preference

• Target language for primary users (Forth, C, Assembly, etc.)

• General purpose computing vs. embedded computing

• Linux Support

• Multitasking support

• Target cost point

• Optimize for low latency/context switching/interrupts

With so much to consider, it is no wonder there are so many divergent ideas on how to move forward with 65XX.

My Opinion

In my opinion, approaches that adhere more closely to the original accumulator-memory design of the 6502 are more interesting and potentially more applicable to the embedded market.

Adding a bunch of general purpose registers adds size and complexity (dual, triple port, etc.) to the register file. Turning the 6502 into a clever, but efficient CISC architecture feels less interesting to me and would lump the 6502 in with other efficient CISC architectures. Why build another Motorola Coldfire?

I know folks love their registers, and their C-compilers, but it is not for me.

But what if we could design a 65XX architecture that was easy to target with C, but still adhered to a 5 32-bit register design?

My Design (from 1000 feet) – YA326502

A……………………………….. 32-bit Accumulator
X……………………………….. 32-bit Index/Data Register
Y……………………………….. 32-bit Index/Data Register
SP0, SP1, SP2, SP3……........ 32-bit (4 x 8-bit stack pointers)
PC……………………………... 32-bit Program Counter
SR……………………………… 8-bit Status Register

There are also 4 “fast page” areas, byte addressable that take up the first kilobyte of memory (4 x 256 bytes).

The stacks are byte addressable, so the 4 stacks take up the second kilobyte of memory (4 x 256 bytes).

Byte addressing is very important to the embedded market and makes string handling easier.

Instructions are all 16-bits long plus an operand (16, 24, 32, or 48 bits long in total). Single operands or offsets only. If you run a 16 or 32 bit operation on the stack or "fast page" it operates on the word or longword starting at the byte referenced.

****

Why not have a single, flat 32-bit stack? First of all, this is boring. Secondly, the stack operations will be slow without a lot of cache, because you have to go out to main memory. I would like to run this at a few hundred Mhz. My design has 1K of stack, which can be trivially included on-die. Same with the 4 “fast pages.”

Having 4 small stacks also allows you to easily implement threaded languages. Chuck Moore would think 256-deep stacks are luxurious :-) In fact if I were to add instructions, I might include some that work on stack pairs explicitly as data and return stacks.

Having 4 “fast pages” makes implementing a C compiler much easier.

Having 4 small stacks and 4 “fast pages” also makes “small multitasking” easy to implement, allowing you to run a couple of tasks concurrently without having to swap in and out of memory.

Finally, I can envision the processor also having what I would call a “Fast interrupt” mode. In this mode only 1 stack and 1 “fast page” are exposed to the programmer. In the event of an interrupt that requires a context switch, the other stacks and “fast pages” would allow you to go three levels deep on context switching without going out to main memory.

If a wide memory bus (64/128bits) were implemented in the 2K of on-die stack/fast page memory, and you reserved some fast page to save the registers, you could switch context very, very quickly.

****

Of course an MMU could be added later which could isolate stacks between kernel and user space programs. It could also remap the 4 stacks and "fast pages" anywhere in memory, but then they aren’t as fast anymore. I really dislike it when the 65XX starts to look too much like just another “large system” processor.

The MMU could also prevent one program from overwriting the stack and fast page of another program, or decide when programs can share a fast page, etc. I think in this regard a simple MMU might be very useful. Sharing access to a direct page can be a great way to pass data between programs.

****

I haven’t decided on how to tell the processor to switch stacks and fast pages. I could have addressing modes for each stack and fast page. With 16 bits of instructions there are more than enough opcodes for this approach and a half-decent assembler with simple mnemonics would make it bone simple. This is what I am leaning toward.

Another approach would be to flip some bits in the status register, but I don’t like adding more state to the processor.

****

What do folks think, would this be an interesting design to pursue?

I think I’ve cooked up something that would be fun to program in assembly, easy to implement C and Forth, and would still fit very well into the embedded market where 65XX currently lives.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 7:49 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10822
Location: England
Certainly something a bit different! Some nice ideas there.

I'm a bit nervous of the stacks design: if 4 stack pointers is good, that's great, but it should be good because you feel it's good, not because you can put 1k of memory on a chip: it may well be good to put 1k of memory on a chip, but that's an implementation detail which you're feeding into the programmer's model and the instruction set architecture. So, that's two ideas which are part of different levels of the design, and it might be better to separate them. (Certainly two stacks is better than one, but it's not obvious to me that four stacks is better than two.)

On terminology: if you're building a 32 bit machine, please don't refer to 16 bits as a word! I find it hugely confusing. I know it's been done before, so perhaps it's just me.

It might be useful to headline some design decisions, for example about timing. Is this to be an architecture will allows for cycle counting, or does it allow for caches? Personally, given the choice, I'd go for caches, at least as a possibility, so I'd say upfront that timing will not be simply determined and cycle-counting isn't going to be an option. Some people would then find that a complete turn-off, but at least it wouldn't be a long-running argument and indecision. Or, of course, go the other way, in which cases cycle-counting is in but caches are out.

Ruling out C is a good example of that: be very clear that this machine isn't meant to be a good target for a C compiler, if that's the case. Oh, except I see you close by saying it is meant to be, so that's me confused!


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 9:30 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 170
I’m not sure how useful 4 stacks would be. I guess it depends on what language you’re working in. One larger stack could be more useful as some languages targeting the 6502 implement their own stack so they can get more that 256 bytes.

Maybe some other registers would be useful for high level languages, such as a link register to hold return addresses using a BranchAndLink instruction and a corresponding return. A base register could be more useful than the data bank register of the 65816. And maybe some more instructions dedicated to managing stack frames could be useful? That and the two index registers and accumulator would provide a useful register set. Some architectures also provide a second accumulator, or a way to swap between two accumulators, although I’m not sure how useful that would be.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 9:45 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 278
The ND-100 had 16 interrupt levels, with separate register sets for each interrupt level. Maybe that would be an idea? It should be possible to get better interrupt performance using this.

The Z80 had something similar, with an alternate register set; I once used that to improve interrupt performance.

As an alternative to having multiple stacks, it may be an idea to have a single, larger stack, with caching of the most recent elements (probably a lot harder to implement, though).


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 10:20 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 693
Location: North Tejas
Why bother?

If you need 32 bits, use an ARM and be done with it.

That is what Acorn did.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 11:03 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10822
Location: England
We don't normally do that here BillG: If you don't see merit in a project, best to ignore it, and let those who are interested enjoy themselves.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 12:30 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Ok, so I have some comments. Keep in mind I'm not expert at 6502 or even computer architectures in general, though I have read up a lot on them recently.

First, I'm a bit confused by your terminology regarding the "A" design as an "accumulator-memory design" and the "B" design as "CISC." From my understanding, both A and B are more at the CISC end of the CISC-RISC spectrum, where the RISC end usually uses the opposite of a register-memory architecture: a load-store architecture where all instruction operands (except for loads and stores themselves) must be in registers.

Perhaps you're really talking about keeping the register-memory architecture for both A and B, and instead the difference between the two is just in the number and "general-purposeness" of the registers? (That differentiation does make good sense to me.)

I notice that, rather than adding the idea of a Direct Page register DP (as used in the 6809 and 65816, among others), you instead fix the direct pages, albeit adding more of them. You also mention that they would be storied on-die rather than in RAM (presumably speeding access), which though may seem like a good idea, perhaps is not only not necessary but would slow other things? The speed (and space) advantage of the direct page on current designs comes simply from needing to load only one address byte from RAM rather than two (for an instruction operand), and that's used in situations where on-die memory or registers (but see below) would not help, such as doing I/O. E.g., the (6809-based) Fujitsu FM-7 BIOS temporarily sets the direct page to the system's I/O page when doing I/O to/from the diskette, which speeds it up by some 20-30% by my guess.

But now that I think about it, the major use of the direct page on the 6502 seems to be to make up for the lack of registers, particularly the complete absence of full-address-width index registers. You've already fixed the latter problem in your design; I wonder if it would make more sense to focus on simply adding more registers, which is what zero-page addresses in on-board RAM that can't be used with off-board memory or I/O really are. (But that would go against your "keep it more like 6502 approach, I suppose." And I must admit myself to finding the direct page idea more interesting. Hmm...I wonder about making it a cache....)

How are you going to handle having 10-bit rather than 8-bit addresses for the "fast pages"? Perhaps fix the lowest two bits of a fast-page address to 00 (addressing it only on fullword boundaries) thus making it fit into a byte?

Four stacks is good, of course. It might be worth considering making those index registers, too, though.

-----

Contemplating the above further: here's an idea that would let you have register-speed (i.e., no direct-page RAM access for critical sections of code) "direct page" access, including improved indexing, while still preserving the 6502 zero-page feel.

Make the direct page a 1K block addressed on fullword boundaries that is stored in RAM when necessary, but otherwise cached in on-chip memory with a DP register value (call it a DP tag) and fullword value for each location. The first time you read a direct page location with a given DP register value, it loads the value from RAM into the cache and sets the DP tag for that location to the contents of the DP register. Reads and Writes stay in the cache so long as the DP tag is equal to the DP register. If it's different, the CPU must write the current value (if changed since load) to the DP location given by the tag and then reload from the new DP location, updating the value and tag. Does that make sense?

(This could even preserve cycle counting: any routine using a cycle counting would could ensure strict access timing by merely reading the DP locations it's going to use before entering the part where cycles must be counted.)

Then add more (and more orthogonal) addressing modes using these direct page locations: (zp),X and (zp,Y) and so on, and ideally constant-offset ((zp,3) and maybe even (zp,3,Y)) modes as well, for easy and quick access into non-homogeneous data structures. Why not add auto-increment and auto-decrement as well, and then you can use all of the locations as stack pointers, too.

And this I think, though more expensive, can work a lot better and more easily than separate register sets for things like interrupts because it lets the programmer have a lot more control over spill handling, especially when it comes to how many "registers" need to be spilled to memory.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 1:06 pm 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
BigEd wrote:
Certainly something a bit different! Some nice ideas there.

I'm a bit nervous of the stacks design: if 4 stack pointers is good, that's great, but it should be good because you feel it's good, not because you can put 1k of memory on a chip: it may well be good to put 1k of memory on a chip, but that's an implementation detail which you're feeding into the programmer's model and the instruction set architecture. So, that's two ideas which are part of different levels of the design, and it might be better to separate them. (Certainly two stacks is better than one, but it's not obvious to me that four stacks is better than two.)

On terminology: if you're building a 32 bit machine, please don't refer to 16 bits as a word! I find it hugely confusing. I know it's been done before, so perhaps it's just me.

It might be useful to headline some design decisions, for example about timing. Is this to be an architecture will allows for cycle counting, or does it allow for caches? Personally, given the choice, I'd go for caches, at least as a possibility, so I'd say upfront that timing will not be simply determined and cycle-counting isn't going to be an option. Some people would then find that a complete turn-off, but at least it wouldn't be a long-running argument and indecision. Or, of course, go the other way, in which cases cycle-counting is in but caches are out.

Ruling out C is a good example of that: be very clear that this machine isn't meant to be a good target for a C compiler, if that's the case. Oh, except I see you close by saying it is meant to be, so that's me confused!


All great points and questions.

The four stack decision was driven by my love of threaded languages with a data and return stack. Having four stacks has a couple of advantages in this case. You can have two VMs running on their own stack pair, or have an OS running on one stack pair and the USER application running on the second stack pair. It also allows you to experiment with threaded languages that have more than two stacks (some FORTHs have object stacks, etc.)

The decision to include some details about physical implementation, that it is expected for 2K of memory to be included on chip, was one that I did not consider to be controversial.

Some of the oldest processors, like the Z8 (2048 on board registers, making 2K of RAM) and some newer processors like the Parallax Propeller (requires 2K per cog in implementation) all explicitly define memory resources as part of the architecture.

This design is not meant to be cached. If it were meant to be cached, then the advantage of having multiple stacks and fast pages is mostly obviated by the cache. You could have one 32-bit stack pointer and the cache would be responsible for keeping the near elements readily available.

This design was meant to be cycle-counted.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 2:07 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1432
Location: Scotland
rpiguy2 wrote:
All great points and questions.

The four stack decision was driven by my love of threaded languages with a data and return stack. Having four stacks has a couple of advantages in this case. You can have two VMs running on their own stack pair, or have an OS running on one stack pair and the USER application running on the second stack pair. It also allows you to experiment with threaded languages that have more than two stacks (some FORTHs have object stacks, etc.)

The decision to include some details about physical implementation, that it is expected for 2K of memory to be included on chip, was one that I did not consider to be controversial.

Some of the oldest processors, like the Z8 (2048 on board registers, making 2K of RAM) and some newer processors like the Parallax Propeller (requires 2K per cog in implementation) all explicitly define memory resources as part of the architecture.

This design is not meant to be cached. If it were meant to be cached, then the advantage of having multiple stacks and fast pages is mostly obviated by the cache. You could have one 32-bit stack pointer and the cache would be responsible for keeping the near elements readily available.

This design was meant to be cycle-counted.


I'm sort of scratching my head here... And, yes, it would be too easy to say just use an existing 32-bit processor, but innovation and personal ideas and all that! Says the chap implementing a bytecode VM for a 32-bit machine on a 16/8 bit CPU... Which would be the end of the line for me as fas as the 65xxx CPUs are concerned. I really will go to a (retro) 32-bit CPU if I look at doing this again.

On the threaded stuff... I've done a lot of work in my past with a 16/32-bit CPU that handled threading at the microcode level and had on-board 2-4KB of fast static RAM. It's called a Transputer. You have effectively one stack per thread and the thread descriptor forms what's essentially a linked list to the next thread as well as that threads stack pointer. Multiple stacks sound interesting, but I wonder if it might actually prove too much to effectively or efficiently handle... However if you have experience of it, then give it a go...

As for instructions - yes to byte addressing. Currently on the '816 when trying to read or write a byte in 16-bit mode is a bit awkward due to the need to drop down to 8-bit mode, do the operation, then back to 16-bit mode again - it's not a big deal, but it all adds to extra cycles being needed when a simple LDAB/STAB type instruction would be handy.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 2:44 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 693
Location: North Tejas
BigEd wrote:
We don't normally do that here BillG: If you don't see merit in a project, best to ignore it, and let those who are interested enjoy themselves.


Well, excuuuuuuuse me!

I did not see it as a project but was just answering his question:

rpiguy2 wrote:
32-bit successor to 6502
What do folks think, would this be an interesting design to pursue?


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 4:37 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8198
Location: Midwestern USA
BillG wrote:
BigEd wrote:
We don't normally do that here BillG: If you don't see merit in a project, best to ignore it, and let those who are interested enjoy themselves.

Well, excuuuuuuuse me!

I did not see it as a project but was just answering his question:

I suppose you could have been a little more diplomatic about it. :wink:

That said, I actually agree with you. I interpreted the original post as being nothing more than a question. There were few details, so calling it a project seems to me to be a bit of a stretch.

It should be noted Bill Mensch had at one time a 32-bit successor to the 65C816 on the drawing board. It never materialized, which should inform anyone who has had visions of a 65832-type MPU. Mensch likely concluded that there would be no market for it, especially given the much greater capabilities of the ARM, which is also inexpensive.

If I were interested in a 32-bit MPU for hobby purposes or as part of a machine controller, I too would be looking at the ARM. Much as I like to monkey with old technology (and at nearly 75 years old, I'm definitely "old technology"), I know where to draw the line.

rpiguy2 wrote:
32-bit successor to 6502
What do folks think, would this be an interesting design to pursue?

Interesting? If it's interesting to you, by all means pursue it and please keep us informed with your progress.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 6:43 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8453
Location: Southern California
Most (all?) of this has already been discussed in the topics Ed indexed at viewtopic.php?f=1&t=4216 . I am always attracted to discussing them again, but re-reading what's already there will probably result in faster progress. (With that said, I do have a lot of comments I may post later. :lol: ) See also the many links to wider 65xx processors in this section of my links page.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 7:11 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10822
Location: England
It is interesting that there are many possible ways to go. But I think it's good to see what's been tried and what's been thought of before.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 02, 2020 10:13 pm 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
BillG wrote:
BigEd wrote:
We don't normally do that here BillG: If you don't see merit in a project, best to ignore it, and let those who are interested enjoy themselves.


Well, excuuuuuuuse me!

I did not see it as a project but was just answering his question:

rpiguy2 wrote:
32-bit successor to 6502
What do folks think, would this be an interesting design to pursue?


No offense was taken. BigDumbDinosaur is right this is barely a project, more of a fever dream that struck while New Jersey is under lockdown for COVID-19.

Insofar as why take 6502 all the way to 32-bits and why not use ARM or AVR32 or some other established architecture I think the only answer, which is admittedly very subjective, is that a lot of people simply don't like programming in assembly on these platforms as much as they enjoy programming assembly on simpler processors. Granted if you are going for a 32-bit micro controller chances are you won't be working in assembly anyway, so the target audience would be very, very narrow indeed.

What I have seen is a resurgence of interest in gaming platforms built off of 65XX thanks to the 8-Bit Guy's Commander X16, the Feonix 256, and the Neon816. The folks writing games bemoan not being able to create large executables and the hoops you have to go through to use even the 65C816s memory above 64K.

So that would also be a small, but potentially interested target user.

***

I am fine with moving this into the other threads discussing advanced 65XX designs.

In fact I always thought there should be a subtopic dedicated to hypothetical 65XX designs, simply because finding old threads on the subject is not trivial if you don't already have them bookmarked (thank you Garth!)


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 03, 2020 5:37 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
rpiguy2 wrote:
In fact I always thought there should be a subtopic dedicated to hypothetical 65XX designs, simply because finding old threads on the subject is not trivial if you don't already have them bookmarked (thank you Garth!)

Yes, I agree this would be a good idea.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: