6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Sep 28, 2024 5:16 am

All times are UTC




Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Wed Apr 12, 2017 5:54 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
There seems to be a lot of unfounded fear of the '816, a fear of the mode bits (which can be left alone in much of your code), fear of the bank byte (which you don't have to latch, decode, or use if a 64K memory map is enough), fear of added capabilities (as if you were forced to use them right from the start), etc..

For me, there's no fear, but I grew up with the 6502, not the 65816. After the 6502, I got an 68000 and right now I'm mostly working with ARM. The '816 would be a big step back.
Quote:
What application did you have in mind that needs so much (4K)?

Most of my modern embedded applications have a few kB stack per task, mostly for local buffers, for formatting/parsing data, file/network buffers, and just general local variables. The advantage of putting most of the variables on the stack is that they are automatically freed when you no longer need them, so you can reuse the memory for something else later. That's more efficient than using static data.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 6:36 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
Bill Mensch, the designer of the 65c02, said in an interview two years ago that he estimated that if it were made in the most modern deep-submicron geometry, it would do 10GHz.

Even if we believe that number, it doesn't really make sense to do that. You'd get a lot more performance out of a bigger CPU running at lower speed.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 7:34 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8520
Location: Southern California
Arlet wrote:
Quote:
What application did you have in mind that needs so much (4K)?

Most of my modern embedded applications have a few kB stack per task, mostly for local buffers, for formatting/parsing data, file/network buffers, and just general local variables. The advantage of putting most of the variables on the stack is that they are automatically freed when you no longer need them, so you can reuse the memory for something else later. That's more efficient than using static data.

That's in chapter 14 of my 6502 stacks treatise. Like you said, local variables' space is freed when you're done with them; so only some are taking room at any given time. Although I agree that the 6502's 256-byte stack space would be inadequate for the local variables of bigger applications, automatically assigning 4K or 8K sounds kind of like requiring every airliner to have the fuel capacity to fly 10,800 miles like the Boeing 777-200LR can. There are lots of commuter airlines that seldom fly more than a few hundred miles at a time. I suspect an '816 would seldom need that kind of stack space for a task. We won't be competing with an AMD Opteron processor.

When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimate, "just to be sure." Why not test and see how much you're using, in a situation that will take the max. I'm sure you'd be surprised at how little it is in most cases. Take the test result and add 50% or whatever margin makes you comfortable.

Quote:
Even if we believe that number, it doesn't really make sense to do that. You'd get a lot more performance out of a bigger CPU running at lower speed.

It depends on what you need to do. If you need 3GIPS in an application that seldom deals with quantities of more than 8 bits, it might make good economic sense, achieving the needed computing power with less silicon real estate. People who are always in the 32- and 64-bit world forget that not everything will benefit from those. The microcontrollers I've used in products for work, for the most part, would not benefit from instructions that could handle more than 8 bits at a time, because it's all testing and twiddling bits in 8-bit I/O ports. Variables are mostly 8-bit too. Nor have I ever even had any use for a multiply instruction in any of these products. I did need a division routine in the last one, but that was for human I/O in a 16x2 LCD, and there's no need for speed there.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 7:48 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
Quote:
Even if we believe that number, it doesn't really make sense to do that. You'd get a lot more performance out of a bigger CPU running at lower speed.

It depends on what you need to do. If you need 3GIPS in an application that seldom deals with quantities of more than 8 bits, it might make good economic sense, achieving the needed computing power with less silicon real estate. People who are always in the 32- and 64-bit world forget that not everything will benefit from those. The microcontrollers I've used in products for work, for the most part, would not benefit from instructions that could handle more than 8 bits at a time, because it's all testing and twiddling bits in 8-bit I/O ports. Variables are mostly 8-bit too. Nor have I ever even had any use for a multiply instruction in any of these products. I did need a division routine in the last one, but that was for human I/O in a 16x2 LCD, and there's no need for speed there.)

If you don't need more than 8 bits, you probably also don't need 10 GHz :)

The savings on silicon real estate aren't very useful, because the only way you can make a 10 GHz CPU is by putting the memory and peripherals all on the same chip, which, combined with the I/O pads and PLL, will take much more room than the actual core. At that point, it costs very little extra room to use a small ARM, which you can then let run at lower frequency for the same performance.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 1:30 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 674
Arlet wrote:
For me, there's no fear, but I grew up with the 6502, not the 65816.

It's a crying shame that the 65816 didn't get used in more devices. Both the SNES and the Apple IIgs were sorely underclocked, in terms of showcasing the processor. The SNES wasn't user-programmable, and the IIgs just didn't have the reach to cement it as a familiar CPU you grew up with. Yeah, 68k was around, but the '816 would have been perfectly at home in more lower-cost computers and speed-competitive to the early 68k, making for a great value. Of course, this would have had to come from a company that didn't have a 68k platform that would be undermined...

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 1:38 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
White Flame wrote:
It's a crying shame that the 65816 didn't get used in more devices.

I didn't even know there was such a thing as the 65816 until a few years ago, when I joined this forum. I think it arrived to the market too late, a few years after the 68000.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 3:18 pm 
Offline

Joined: Tue Apr 11, 2017 1:50 pm
Posts: 7
Location: UK
Some interesting comments and food for thought.
Thanks

If anyone has more it's welcome.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 5:01 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8403
Location: Midwestern USA
jds wrote:
I still get a bit hung up on the limitation of bank 0, in that you need to have both the stack and DP fit within 64k.

It's not ideal but that's the way it works. However, programmable logic could be used to sandbox direct pages and stacks so they appear to be in bank $00 but are not.

Quote:
I guess I'm also struggling to get used to the smaller resource requirements of these kinds of systems, but even with say a 4k stack you'll only have room for 16 processes.

It is highly unlikely you will need a stack larger than 512 bytes, let alone 4KB. I make extensive use of stack frames in 65C816 software I write and the deepest I've had the stack so far is 150-or-so bytes. Even after accounting for the stack frame created during interrupt processing (nested interrupts, at that) there's still plenty of headroom left with a 256 byte stack.

Quote:
Given that we can now relatively cheaply fill up the address space of the 65C816 it seems that the main limit is on bank 0. This could be fixed with a MMU if someone were inclined to build one, but that may be overcomplicating the system.

There is extensive discussion around here on bank remapping and other aspects of 65C816 hardware management. I wouldn't see it as overcomplication, just the next step in putting the '816 to work in a hobby system.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 6:30 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
I happened to notice a related item mentioned on Hacker News today. It's a blog post regarding memory bandwidth, and it makes some comparisons between modern processors and older devices... such as the 6502! :)
Quote:
A 6502 was typically clocked at 1MHz and did a 1-byte memory access essentially every clock cycle, which are nice round figures to use as a baseline. [...] A large fraction of that bandwidth went simply into fetching instruction bytes.
Quote:
Absolute memory bandwidths in consumer devices have gone up by several orders of magnitude from the ~1MB/s of early 80s home computers, but available compute resources have grown much faster still, and the only way to stop bumping into bandwidth limits all the time is to make sure your workloads have reasonable locality of reference so that the caches can do their job.
Quote:
Not considered here is memory latency (and that’s a topic for a different post). The good news is absolute DRAM latencies have gone down since the 80s – by a factor of about 4-5 or so. The bad news is that clock rates have increased by about a factor of 3000 since then – oops.

The blog post itself is here. The related Hacker News discussion is here.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 6:33 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Not only have processors gone up in speed faster than memory bandwidth, the increased memory bandwidth comes at a cost of much higher latency (measured in clock cycles, not nanoseconds).

edit: Hmm.. now I see you already posted that.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 6:55 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8520
Location: Southern California
Quote:
If you don't need more than 8 bits, you probably also don't need 10 GHz :)

There shouldn't always be a link between the width of data that needs handling and the speed at which it needs to be handled.

Quote:
The savings on silicon real estate aren't very useful, because the only way you can make a 10 GHz CPU is by putting the memory and peripherals all on the same chip, which, combined with the I/O pads and PLL, will take much more room than the actual core. At that point, it costs very little extra room to use a small ARM, which you can then let run at lower frequency for the same performance.

I expect the ARM has a PLL onboard too. And for the microcontrollers, memory and I/O too. I know almost nothing about ARM, but I get the idea they're mostly microcontrollers. Correct me if I'm wrong. I started watching an intro lecture on ARM once, and it was clear as mud, perhaps due to the lecturer's ability to communicate. If a 32-bit ARM does 500MIPS though, that's not going to match the performance of a 3GIPS '02 in situations where you only need to handle 8 bits at a time and the 32-bitter's extra power is in areas that are not useful.

I'm committed to limiting my work to the smaller, simpler processors. A 65Org32 is almost an exception, although it's still quite simple by today's standards. I still see a lot of potential future progress with the 8-bitters and the 65 family, by improving the software methods, the supporting hardware, and the level of hardware integration, and moderately increasing the clock speeds. I'm against the bloatware which high-end processors are used to justify. I am constantly challenged by the article, "Low Fat Computing (A politically incorrect essay by Jeff Fox)." He and Chuck Moore (inventor of Forth), taking an entirely different programming philosophy, plus Forth hardware and software, have improved the compactness and speed of code by factors of 100 to 1000. I'm also kind of a scrooge about misapplications of technology. Remember the talking car of the 80's? It initially seemed like the most natural way for the car to communicate with you; but soon everybody hated it, and they were disconnecting it. I'm against the attitude that "more modern" always means "better."

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 7:47 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
There shouldn't always be a link between the width of data that needs handling and the speed at which it needs to be handled.

Not necessarily, but today's technology can't do 10GHz over I/O pads, so that kind of speed is only useful if you do a bunch of internal calculations. And most real world calculations benefit from using 32 bit and some more registers.

Quote:
I expect the ARM has a PLL onboard too. And for the microcontrollers, memory and I/O too. I know almost nothing about ARM, but I get the idea they're mostly microcontrollers.

The ARM is just the core, just like a bare 6502, although they are typically made with peripherals and memory in the same die. The point is that the core is only a small part of the silicon, so keeping every else the same, substituting the 6502 for an ARM doesn't increase silicon cost all that much. On a small design, on a modern process, silicon area is mostly determined by number of I/O pads, because these require huge transistors, and ESD protection and space for the bond wires.

Quote:
If a 32-bit ARM does 500MIPS though, that's not going to match the performance of a 3GIPS '02 in situations where you only need to handle 8 bits at a time and the 32-bitter's extra power is in areas that are not useful.

Depends. In many cases, even solving 8 bit problems, you'll be using 16 bit pointers. A simple, common, sequence like this:
Code:
LDA (ptr), Y
INY
BNE
INC ptr+1

takes 10 cycles on the 6502 (and uses 2 out of 3 registers), but would only take 1 cycle on an ARM. Other things, like copying memory, benefit from 32 bit loads and stores, even if you're not doing calculations. The ARM has more registers, so less need to push/pop, or use slower memory. So, it's not so far fetched that a 500 MIPS ARM could match a 3GIPS 6502 for speed, even doing mostly 8 bit stuff. And 32 bit isn't just for talking cars. A simple application that benefits from 32 bit could be a sensorless brushless DC motor controller for a home appliance.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 12, 2017 8:10 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
I'm committed to limiting my work to the smaller, simpler processors

I learned ARM assembly on ARM7TDMI, which has about 50 instructions and 5 addressing modes, all quite regular. Took me maybe 2 days until I could do 95% of the work without looking at the manual. The complexity is somewhere between 6502 and 65816. In many cases, having 32 bits and 16 registers makes things a lot easier, because most results (be it addresses or calculations) fit in a single register.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 20, 2017 6:57 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
jazz wrote:
I was wondering what a modern chip built with the 6502 philosophy and "feeling"
would look like today.

if you had to design a 6502ish processor today what would you do?

jazz wrote:
Some interesting comments and food for thought.
Thanks

If anyone has more it's welcome.


Somewhat related: elsewhere Arlet asked what kind of 8-bit CPU could we design to fit in the same resources as a 6502, in 8 bit CPU challenge, with four or five machines as responses.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: