Hmm, I don't think the idea is well-described as NUMA - for me, that's where different parts of the memory system are different distances from the core in question (usually because there are several cores and each is close to some part of the memory system.) Certainly it can make sense to have multiple memory busses, or multiple memory systems which can operate in parallel.
It seems we're agreed that the '816 takes about 5 times the source (C or HDL) to describe compared to the '02. That's not merely a lot more typing, but as you know, it's a lot more verification. And, AFAIK, we have no '816 test suite. We do have Bruce's document, but it's not quite executable.
http://www.6502.org/tutorials/65c816opcodes.html
65C816 vs 68000
Re: 65C816 vs 68000
Last edited by BigEd on Tue Dec 20, 2016 9:16 pm, edited 1 time in total.
Re: 65C816 vs 68000
kc5tja wrote:
cbmeeks wrote:
I still have my Amiga 500 and it still boots today. 
I have three in use. I have an A3000 doing MIDI work, and an A2000 doing video work. You can see them in some short YouTube videos I made. I get asked asked all the time why I don't just use this Mac or that PC.
I use my wife's A2000/060 to manage our finances. I also use it to back up our Android devices because it has USB.
I've got a dead A500, several dead A2000s, and a PPC-based A3000UX all waiting for me to fix them. But I'm not a collector, I'm a user.
Re: 65C816 vs 68000
kc5tja wrote:
Run Deluxe Paint on the IIgs and on the A500. Put the IIgs in 16-color, 640x200 mode, and likewise with the Amiga. Grab a brush that is about 1/4 the screen size on the IIgs and do the same on the Amiga. Now, on the IIgs, drag that brush around the screen, and note how, despite being clocked at 2.8MHz, it's quite capable of keeping up with the Amiga doing the same task at 7MHz.
kc5tja wrote:
This is because the 65816 doesn't have to fight the blitter for access to memory,
That's what I really meant when I said it felt snappier. I don't think you could do that with the IIgs. But I don't think that's the fault of the '816 at all. Just a different OS, different architecture, etc.
kc5tja wrote:
while the Amiga's custom chips, which gleefully allows 60fps HAM animations at 320x400, will cut into the CPU's processing power like a hot knife when driven at higher horizontal resolutions/bandwidths. Also, it doesn't help that the blitter, though clocked at 7MHz, can only touch memory no faster than 3.5 mega-transfers per second, and only during blanking periods at that. Ouch.
Keep in mind, I'm not insulting the IIgs. I love the IIgs.
kc5tja wrote:
The 65816 is not a slow CPU.
kc5tja wrote:
At 2.8MHz, the IIgs was *faster* at many kinds of graphics updates than the Mac Classic, and Jobs wasn't too happy about that.
kc5tja wrote:
(Running a IIgs at 8MHz was magical; it felt every bit as fast as an Amiga to me.)
kc5tja wrote:
GS/OS was, however, written in Pascal and largely based on MacOS System 1 code. I suspect that is where most of its sluggishness comes from.
kc5tja wrote:
I say this, BTW, as a die-hard Amiga fan. I still have my Amiga 500 and it still boots today. 
I consider the Amiga as one of my "core" machines because I worked the summer bagging groceries as a 16 year old kid so that I could buy one. I still have (and use) it to this day.
I'm also a huge fan of the IIgs. Like I said, it got the raw end of the deal. I was mostly impressed with the sound system. 32 voices!!! And I would argue that a "mildly" expanded IIgs would give the Amiga a run for it's money when it comes to audio.
Cat; the other white meat.
Re: 65C816 vs 68000
KC9UDX wrote:
But what do you do with it? There are Amiga collectors, and Amigans.
I have three in use. I have an A3000 doing MIDI work, and an A2000 doing video work. You can see them in some short YouTube videos I made. I get asked asked all the time why I don't just use this Mac or that PC.
I use my wife's A2000/060 to manage our finances. I also use it to back up our Android devices because it has USB.
I've got a dead A500, several dead A2000s, and a PPC-based A3000UX all waiting for me to fix them. But I'm not a collector, I'm a user.
I have three in use. I have an A3000 doing MIDI work, and an A2000 doing video work. You can see them in some short YouTube videos I made. I get asked asked all the time why I don't just use this Mac or that PC.
I use my wife's A2000/060 to manage our finances. I also use it to back up our Android devices because it has USB.
I've got a dead A500, several dead A2000s, and a PPC-based A3000UX all waiting for me to fix them. But I'm not a collector, I'm a user.
But the line between collectors and users isn't as B/W as you paint here.
I do *use* my Amiga's. What do I do with them? Programming games, game demos or playing games. It's all about Amiga games with me.
I don't do MIDI work. I don't do video work. I prefer my stock (well, 1MB expanded) Amiga 500 over my A1200. That's how it gets use from me.
For the non-game stuff that I do in my daily life, none of *my* Amiga's would be up to the task.
But that doesn't mean I'm not an Amiga user.
I am both. A user and a collector.
Cat; the other white meat.
Re: 65C816 vs 68000
BigEd wrote:
Hmm, I don't think the idea is well-described as NUMA - that's where different parts of the memory system are different distances from the core in question
One core can easily have two masters: one for instruction fetch, and one for data access (in fact, that's the definition of a Harvard architecture system). Or, you can have two cores, with one general purpose port each. Or you can have one core with six master ports: one for instruction fetch, three for integer load/store, and two for floating point (e.g., as might be found on a superscalar architecture).
The definition of core is both hazy and misleading. It's better to think in terms of bus masters and bus slaves.
Quote:
It seems we're agreed that the '816 takes about 5 times the source (C or HDL) to describe compared to the '02.
Don't do that.
Verilog is a disgustingly bad language for expressing state machine decoders. It is so bad, in fact, that when I switched my instruction decode logic from using nested case statements to the result of compiling my SMG code, besides a factor of 4 reduction in lines of code written, the number of logic cells consumed in the FPGA dropped by over a thousand.
I'm willing to bet that if someone decapped a 65816 and studied the PLA, I predict you'll find that most of the minterms are shared across all five modes of operation.
What I think is more significant is that you may end up losing precise timing closure with a real 65816, at least if you're targeting an FPGA. Because the 65816 circuitry is level sensitive and not edge-sensitive, it's difficult (or possibly even impossible) to precisely match its timing characteristics. To get everything to match, you have two choices:
1. You'd need to make everything, even clocked logic, out of asynchronous gates, and FPGAs definitely doesn't like to synthesize such designs, OR,
2. You need to clock your logic 2x as fast as a real 65816, and just bite the bullet and implement your logic so that things happen on alternating even and odd cycles. (This is the approach taken by the 80386 processor, by the way; a 33MHz 80386 is really clocked at 66MHz on the motherboard.)
People here lament the lack of a 65816 core; I cannot speak for why others haven't made one, but there are several reasons why I did not make my own (and went the stack CPU route instead), and why I still wouldn't want to:
1. I didn't have a tool like SMG available at the time. Hand writing PLA logic in Verilog is a pain. This issue is now resolved, however.
2. Stack CPUs map naturally to the fully synchronous circuits that FPGAs prefer. My S16X4 is as fast as a 65816 in practice, but consumes only 300 lines of Verilog. Not even exaggerating.
3. I didn't want to walk over WDC's sole source of income. I would never be able to look at myself in the mirror and feel comfortable with myself if I did. If I were to release a 65816 clone, it would receive a significant overhaul (see below), to the point where folks might not want to support WDC anymore. In particular, my 65816 clone would:
- a. Throw away current timing constraints. You don't need them in practice. A subroutine call on the latest Intel CPUs no longer takes 25 cycles to complete. More like 2. Most RET instructions consume 0 cycles today.
b. Throw away the multiplexed bus. Unnecessary on an FPGA.
c. Throw away the 8-bit data bus, and replace it with a real 16-bit bus, minimum. I might, in fact, actually go with a 32-bit bus as well, making JML and JSL instructions faster to decode for free.
d. Support single-cycle misaligned accesses via a Motorola 68040-like bus, where all 24 address bits are exposed, and all bus transfers are tagged with a size tag (8-bit, 16-bit, 24-bit, 32-bit transfer, etc). This leaves it up to external logic to split memory accesses into multiple cycles if it wants. If you have a data/instruction cache with long lines, this can really boost performance. It also lets me focus on what's relevant (proper separation of concerns). This is also the approach I take with my KCP53000 CPU, so I speak from experience here.
e. Split instruction and data memory fetches into separate bus masters. Again, it just makes implementing the CPU that much easier. It does push the complexity off into a subsequent stage of circuitry, but it's manageable.
Extra credit:
f. Provide better support for MMUs, privilege modes, hardware coprocessors, multiple cores, and so forth. It's so easy to instantiate multiple cores in Verilog that it'd be irresponsible of me to not consider these things.
g. Macro-op fusion to allow things like STA, DEX, BNE sequences to execute in a fraction of the time it'd normally take.
Re: 65C816 vs 68000
KC9UDX wrote:
kc5tja wrote:
cbmeeks wrote:
I still have my Amiga 500 and it still boots today. 
Quote:
I have three in use. I have an A3000 doing MIDI work, and an A2000 doing video work. You can see them in some short YouTube videos I made. I get asked asked all the time why I don't just use this Mac or that PC.
Quote:
But I'm not a collector, I'm a user.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: 65C816 vs 68000
kc5tja wrote:
This is called "Non-Uniform Memory Access", or "NUMA", in today's literature. Yes, it's doable, and many CPUs already have this facility. I think the latest Intel CPUs have something on the order of *6* channels to memory capable of operating independently. Wait states on any particular bus are introduced only when one channel needs to access another channel's memory.
Quote:
My homebrew RISC-V CPU works the same way; instruction and data fetch are on separate memory channels, but with the introduction of an external memory arbiter, may access a common memory pool.
Re: 65C816 vs 68000
White Flame wrote:
kc5tja wrote:
Quote:
My homebrew RISC-V CPU works the same way; instruction and data fetch are on separate memory channels, but with the introduction of an external memory arbiter, may access a common memory pool.
Also, not sure what "extra hoops" refers to; virtually all CPUs with a pipeline has separate I- and D-channels for memory; it's just that the arbiter logic (the "Bus Interface Unit" as Intel would call it) is usually internal to the core. With my design, I provide the arbiter in the same repository, but it's a separate Verilog module, and can be re-used to support SMP as well.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: 65C816 vs 68000
At that point, I think it gets specific as to what you mean by "memory" and "bus". Obviously within the CPU an L1 cache tends to have 2 memories and 2 buses, but I'm mostly talking about the largest, general tier address space memory & bus. For a SoC, that'd still be close to the CPU, but still encompass the complete memory footprint (128KB+ in my example regarding a 65816); otherwise it's external RAM chips and the bus(es) that sits on. Modern x86 has multiple channels on the motherboard, but those aren't I- and D- paths. Of course, if you were to use fully dual-port main memory, then it becomes kind of moot; it'd have the parallelism of harvard but not the limitations of partitioning. 
Re: 65C816 vs 68000
kc5tja wrote:
People here lament the lack of a 65816 core; I cannot speak for why others haven't made one, but there are several reasons why I did not make my own (and went the stack CPU route instead), and why I still wouldn't want to:
1. I didn't have a tool like SMG available at the time. Hand writing PLA logic in Verilog is a pain. This issue is now resolved, however.
2. Stack CPUs map naturally to the fully synchronous circuits that FPGAs prefer. My S16X4 is as fast as a 65816 in practice, but consumes only 300 lines of Verilog. Not even exaggerating.
3. I didn't want to walk over WDC's sole source of income. I would never be able to look at myself in the mirror and feel comfortable with myself if I did. If I were to release a 65816 clone, it would receive a significant overhaul (see below), to the point where folks might not want to support WDC anymore. In particular, my 65816 clone would:
1. I didn't have a tool like SMG available at the time. Hand writing PLA logic in Verilog is a pain. This issue is now resolved, however.
2. Stack CPUs map naturally to the fully synchronous circuits that FPGAs prefer. My S16X4 is as fast as a 65816 in practice, but consumes only 300 lines of Verilog. Not even exaggerating.
3. I didn't want to walk over WDC's sole source of income. I would never be able to look at myself in the mirror and feel comfortable with myself if I did. If I were to release a 65816 clone, it would receive a significant overhaul (see below), to the point where folks might not want to support WDC anymore. In particular, my 65816 clone would:
- a. Throw away current timing constraints. You don't need them in practice. A subroutine call on the latest Intel CPUs no longer takes 25 cycles to complete. More like 2. Most RET instructions consume 0 cycles today.
b. Throw away the multiplexed bus. Unnecessary on an FPGA.
c. Throw away the 8-bit data bus, and replace it with a real 16-bit bus, minimum. I might, in fact, actually go with a 32-bit bus as well, making JML and JSL instructions faster to decode for free.
d. Support single-cycle misaligned accesses via a Motorola 68040-like bus, where all 24 address bits are exposed, and all bus transfers are tagged with a size tag (8-bit, 16-bit, 24-bit, 32-bit transfer, etc). This leaves it up to external logic to split memory accesses into multiple cycles if it wants. If you have a data/instruction cache with long lines, this can really boost performance. It also lets me focus on what's relevant (proper separation of concerns). This is also the approach I take with my KCP53000 CPU, so I speak from experience here.
e. Split instruction and data memory fetches into separate bus masters. Again, it just makes implementing the CPU that much easier. It does push the complexity off into a subsequent stage of circuitry, but it's manageable.
Extra credit:
f. Provide better support for MMUs, privilege modes, hardware coprocessors, multiple cores, and so forth. It's so easy to instantiate multiple cores in Verilog that it'd be irresponsible of me to not consider these things.
g. Macro-op fusion to allow things like STA, DEX, BNE sequences to execute in a fraction of the time it'd normally take.
I have been playing with the idea of putting a 64-bit bus on both sides of two cores as a way to speed things up without bloating the code too much. I still remember when IBM minisystems went full 68000 and all our programs got 3-5 times larger (without doing anything more), so keeping the instructions set minimalistic is also important. Expanding address space with unused opcodes is the way to go. Not to keep compability, but to keep 16-bit addressing for most of the application, compressing its size.
As for the Amiga discussion, I have an A1000, A4000 and A1200. I want to use them, but mostly I end up using my Vic-20(!). Its unfortunate that the AmigaOS ended up being closed source (basically ending its development 23 years ago). But maybe that is what a 65x16 needs.. a translated version of the old AmigaOS with its way of doing things.
Re: 65C816 vs 68000
kakemoms wrote:
...As for the Amiga discussion, I have an A1000, A4000 and A1200.
I have several Amiga's myself...I love the machines.
I have A500 x 4, A1000, A2000, A1200 and A600. 8 Amiga's total.
kakemoms wrote:
I end up using my Vic-20(!)
Here's an idea...how about a Vic 20 that was built with modern parts. A 65C816, 512K, couple VIA's and a CPLD for video/audio? That would be a fun project!
kakemoms wrote:
But maybe that is what a 65x16 needs.. a translated version of the old AmigaOS with its way of doing things. 
I read the stories over at Folklore almost daily for inspiration. That must have been crazy fun times.
Cat; the other white meat.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: 65C816 vs 68000
cbmeeks wrote:
Here's an idea...how about a Vic 20 that was built with modern parts. A 65C816, 512K, couple VIA's and a CPLD for video/audio? That would be a fun project!
There's also another Commodore 64 that was built "modern": The 64DTV, as it's called, was a C64 reimplementation in an epoxy blob sold in some direct-to-tv joysticks with C64 games bundled. It was used in a few other specific projects as well. It had expanded video modes, more memory, a selectably faster CPU, and pads on the board to hook up a PS/2 keyboard and IEC bus.