"Improved" 6502
"Improved" 6502
Just for curiosity, has anyone ever tried to do (in HDL or wathever) a 6502 that is compatible with the original but is more efficient, using modern processors technique ?
1) Remove the dummy fetches in instructions that don't need it (such as ROL, ROR memory instructions, etc..)
2) Have a separate data and address cache, so that it is possible to do the last reading/writing cycle of an instruction while fetching the next one (with some kind of pipeline)
I think it would be cool.
1) Remove the dummy fetches in instructions that don't need it (such as ROL, ROR memory instructions, etc..)
2) Have a separate data and address cache, so that it is possible to do the last reading/writing cycle of an instruction while fetching the next one (with some kind of pipeline)
I think it would be cool.
Re: "Improved" 6502
I'm not aware of an aggressive core. The caching would be the big win, not merely because of having better bandwidth over multiple busses to memory, but crucially because the FPGA cores are all able to run a lot faster than the typical off-chip RAM you'd use. So a faster core is going to be spinning its wheels until you can rig up a faster route to RAM.
There's a list of cores at http://6502.org/homebuilt#HDL and a discussion at viewtopic.php?t=1673
Cheers
Ed
There's a list of cores at http://6502.org/homebuilt#HDL and a discussion at viewtopic.php?t=1673
Cheers
Ed
Re: "Improved" 6502
A nice idea would be to keep the entire zero page in local memory, with dual byte access, so you can do both ZP and ZP+1 fetches for (ZP),Y modes at the same time. If you keep them in LUT RAM, it can even be done in the same clock cycle.
One thing to keep in mind, though, is that most of these improvements will add extra logic, and therefore impact max clock speed.
One thing to keep in mind, though, is that most of these improvements will add extra logic, and therefore impact max clock speed.
Re: "Improved" 6502
Same for page 1, I think. Not quite such a boost, but I think it would help.
You're right of course about the potential for slowdown: but whether or not it does depends on what's actually critical (and whether the present critical path could be assisted in any way)
You're right of course about the potential for slowdown: but whether or not it does depends on what's actually critical (and whether the present critical path could be assisted in any way)
Re: "Improved" 6502
Ideas are nice, but the big problem is of course finding someone with enough time and motivation to actually sit down and do the hard work 
Personally I have little motivation to work on anything like this. Running a plain 6502 at 100 MHz is good enough for nostalgic reasons. And for cases where compatibility with 80's designs is not an issue, much better results can be had by throwing away the whole design, and start from scratch with a 16 or 32 bit RISC.
Personally I have little motivation to work on anything like this. Running a plain 6502 at 100 MHz is good enough for nostalgic reasons. And for cases where compatibility with 80's designs is not an issue, much better results can be had by throwing away the whole design, and start from scratch with a 16 or 32 bit RISC.
Re: "Improved" 6502
It's all true!
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: "Improved" 6502
Welcome!
The Commodore 65CE02 from 20+ years ago eliminated almost all the dead bus cycles, even having over 30 op codes that took only one clock instead of the normal minimum of two; so without re-writing code to take advantage of its new instructions, it still gave a speed-up of about 25%. It was only 10MHz though, whereas the current production ones are conservatively spec'ed for at least 14MHz and usually top out at 25MHz if the supporting parts can keep up.
The next step of course is to re-write the code to take advantage of the new instructions, or go with the 65816. I have a post on the huge difference you can get if you're constantly using 16-bit numbers (as in a higher-level language) at viewtopic.php?f=9&t=1505&p=9705#p9705. There's an example shown there where the '816 does in two instructions what the 6502 takes ten to do.
There was the 65GZ032 project with its own Yahoo forum which was for a modern 32-bit processor that could still run old 6502 code. It had a ton of registers, deep pipelining, branch prediction, onboard cache, etc., and ended up with something that has little resemblance to the 6502, but, after a lot of progress and even some working hardware, still fizzled out before it was done. I kind of lost interest when they went in directions that abandoned the 6502 flavor (outside of the 6502 emulation mode).
We were discussing an all-32-bit 65-family processor (the 65Org32), but as Arlet pointed out the problem is a shortage of time and motivation to do the hard work. ElEctric_EyE here is working toward the 65Org32 in steps, first to do a 16-bit NMOS 6502 equivalent. He's working on a video-chip project at the moment though.
A standard part of the program-structure words in Forth is DO...LOOP, with 16-bit (if you implement it on 6502) loop counter, index, and limit which are normally kept on the hardware stack in page 1. I did an equivalent 32-bit set of words (DO, ?DO, LOOP, +LOOP, I, BOUNDS, LEAVE, ?LEAVE, UNLOOP) for 6502, and the number of instructions it took was incredible. DO, which sets up the loop, took about 30 instructions (not cycles, but instructions), and LOOP which does the incrementing of the loop counter and compares it to the limit to see if it's time to exit the loop, took about 44 instructions (again, not cycles). With a 65Org32, it would be trivial, like doing a loop on 6502 with an 8-bit counter-- not even a half-dozen instructions total (plus whatever you actually do in the loop).
Additionally of course there would be multiply and divide instructions that would replace the long routines the 6502 requires, shown at viewtopic.php?f=9&t=689 and http://6502.org/source/integers/ummodfix/ummodfix.htm.
There are other things that can be done to get better performance with even old technology though, like the 16-bit look-up tables for accurately getting math functions, hundreds of times as fast as actually having to calculate them. These tables take a lot of memory, but the cost and size of memory has come way, way down to where it's somewhat practical now.
Although Arlet is not wrong about throwing the whole thing out and going with a newer processor, my point is that huge, dramatic improvements in performance could still be gained with a true 65-family processor, and some of those can be had even with existing, off-the-shelf current production 65c02's and 65816's.
Quote:
1) Remove the dummy fetches in instructions that don't need it (such as ROL, ROR memory instructions, etc..)
The next step of course is to re-write the code to take advantage of the new instructions, or go with the 65816. I have a post on the huge difference you can get if you're constantly using 16-bit numbers (as in a higher-level language) at viewtopic.php?f=9&t=1505&p=9705#p9705. There's an example shown there where the '816 does in two instructions what the 6502 takes ten to do.
There was the 65GZ032 project with its own Yahoo forum which was for a modern 32-bit processor that could still run old 6502 code. It had a ton of registers, deep pipelining, branch prediction, onboard cache, etc., and ended up with something that has little resemblance to the 6502, but, after a lot of progress and even some working hardware, still fizzled out before it was done. I kind of lost interest when they went in directions that abandoned the 6502 flavor (outside of the 6502 emulation mode).
We were discussing an all-32-bit 65-family processor (the 65Org32), but as Arlet pointed out the problem is a shortage of time and motivation to do the hard work. ElEctric_EyE here is working toward the 65Org32 in steps, first to do a 16-bit NMOS 6502 equivalent. He's working on a video-chip project at the moment though.
A standard part of the program-structure words in Forth is DO...LOOP, with 16-bit (if you implement it on 6502) loop counter, index, and limit which are normally kept on the hardware stack in page 1. I did an equivalent 32-bit set of words (DO, ?DO, LOOP, +LOOP, I, BOUNDS, LEAVE, ?LEAVE, UNLOOP) for 6502, and the number of instructions it took was incredible. DO, which sets up the loop, took about 30 instructions (not cycles, but instructions), and LOOP which does the incrementing of the loop counter and compares it to the limit to see if it's time to exit the loop, took about 44 instructions (again, not cycles). With a 65Org32, it would be trivial, like doing a loop on 6502 with an 8-bit counter-- not even a half-dozen instructions total (plus whatever you actually do in the loop).
Additionally of course there would be multiply and divide instructions that would replace the long routines the 6502 requires, shown at viewtopic.php?f=9&t=689 and http://6502.org/source/integers/ummodfix/ummodfix.htm.
There are other things that can be done to get better performance with even old technology though, like the 16-bit look-up tables for accurately getting math functions, hundreds of times as fast as actually having to calculate them. These tables take a lot of memory, but the cost and size of memory has come way, way down to where it's somewhat practical now.
Although Arlet is not wrong about throwing the whole thing out and going with a newer processor, my point is that huge, dramatic improvements in performance could still be gained with a true 65-family processor, and some of those can be had even with existing, off-the-shelf current production 65c02's and 65816's.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: "Improved" 6502
Oh yes, welcome!
Aside from the cache and the clock speed, the other big win as Garth points out could come from a multiply instruction. As with most 6502-related activities, it would be a labour of love: someone might yet do it, even if it isn't the shortest path to some practical goall.
Cheers
Ed
Aside from the cache and the clock speed, the other big win as Garth points out could come from a multiply instruction. As with most 6502-related activities, it would be a labour of love: someone might yet do it, even if it isn't the shortest path to some practical goall.
Cheers
Ed
Re: "Improved" 6502
I think it would be good to make a distinction between staying 100% compatible with the original 65(C)02, but reduce the clock cycles, and adding additional instructions. I think the first question is more interesting, because it would directly speed up all existing software, without having to rewrite any of it. Also, it has a much more limited (and down to earth) scope. Coming up with additional instructions is trivial, especially if you don't have to do the work. Coming up with interesting (yet practical) ways to speed up existing code is more challenging.
My comment should be seen in the context of FPGA implementation. I agree that the 65816 offers a nice improvement, but it's not particularly FPGA-friendly in its design. To properly implement a 65816 on an FPGA would take more time than to design something better from scratch. Also, using the 65816 only really pays off if you're willing to rewrite the code.
Quote:
Although Arlet is not wrong about throwing the whole thing out and going with a newer processor, my point is that huge, dramatic improvements in performance could still be gained with a true 65-family processor, and some of those can be had even with existing, off-the-shelf current production 65c02's and 65816's.
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: "Improved" 6502
Quote:
Also, using the 65816 only really pays off if you're willing to rewrite the code.
There are plenty of new applications being written though, and the benefit of sticking with the same processor family, even if you have new instructions or changed op codes, is the familiarity that makes the programmer more productive. I've been designing PIC16 microcontrollers into commercial products for 15 years, and yet last week I discovered another caveat that is not spelled out in the data books. It's the same reason I don't take it lightly when someone in the company wants to change op amps, switching-regulator ICs, etc. in one of our circuits. The ones I've worked with for years may not be the best, but through experience we have discovered secrets that are nebulous or non-existent in the data sheets to getting the performance we need without going through another long learning curve.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: "Improved" 6502
GARTHWILSON wrote:
My (possibly incomplete) perception of the reason people want 100% compatibility is primarily so they can run vintage games on vintage computers (and they often even want the illegal op codes); but those games often resort to software timing loops anyway, meaning a big speed-up might not give a desirable net effect. Instead of giving smoother lines and movement, it would be just as blocky and jumpy, just faster.
There are plenty of new applications being written though, and the benefit of sticking with the same processor family, even if you have new instructions or changed op codes, is the familiarity that makes the programmer more productive...
There are plenty of new applications being written though, and the benefit of sticking with the same processor family, even if you have new instructions or changed op codes, is the familiarity that makes the programmer more productive...
Re: "Improved" 6502
(I still stand by multiply as a small and easy addition which gives a big performance win where applicable, but I completely agree that spooling out ideas for improvements is enormously easier than implementing them - especially as they also need to be implemented in a toolchain, and of course should also be documented!)
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: "Improved" 6502
I agree with that too Ed. I am almost at the point where I am going to make the .d core which is the .b core plus multiply opcodes. I started to explain it here.
I'm currently writing a plot routine to plot characters with the .b core in my parallel video board(s) project. It makes sense that after this is done, I'll have something to compare speeds and a reasonable base with which to experiment and prove that the multiplication opcodes work as expected.
I'm currently writing a plot routine to plot characters with the .b core in my parallel video board(s) project. It makes sense that after this is done, I'll have something to compare speeds and a reasonable base with which to experiment and prove that the multiplication opcodes work as expected.
Re: "Improved" 6502
this overview mentions a "65CX8 with eleven additional instructions such as MPY (multiply)", but the link returns an error.
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
Re: "Improved" 6502
Arlet wrote:
this overview mentions a "65CX8 with eleven additional instructions such as MPY (multiply)", but the link returns an error.
I can't from work - it's blocked from here
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs