What if independent MOS Technology had survived?

BigEd · Post by **BigEd** » Thu May 04, 2017 12:05 pm

Here's a thought. If we think, for example, of how to perform a version of lda (n),y on the 6502 where both the n and the y are 16-bit quantities, that would take quite a few instructions, bytes, cycles. All we need to do in the instruction encoding for an improved machine is to beat that density and speed comfortably - we don't have to fit it into three bytes. Even five bytes would be a win, I think.

Arlet · Post by **Arlet** » Thu May 04, 2017 12:11 pm

Another interesting angle, maybe deserving of its own thread: given the amount of resources the 6502 designers had, try to come up with a better 8 bit design.

A modern version of the challenge could be to make it fit in the number of FPGA slices that the 6502 core requires.

BigEd · Post by **BigEd** » Thu May 04, 2017 12:19 pm

I might add a pointer here to our One Page Computing challenge over on anycpu - that's about fitting a design into limited resources. We've got a contender doing surprising things in an XC9572 CPLD.

I did idly wonder yesterday, what series of thoughts the 6502 designers had, in having an X and a Y and using them in the slightly different way they did. They had room for 3 bytes of registers, plus the accumulator, and they had an 8-bit ALU and of course a 16 bit PC with an incrementer. (The status byte is just a loose collection of flip flops.) That's a lot less visible state than the 6800 which they'd come away from, which has 6 bytes of state vs 4, and also has an internal temporary register.

litwr · Post by **litwr** » Thu May 04, 2017 4:22 pm

BigEd wrote:

Here's a thought. If we think, for example, of how to perform a version of lda (n),y on the 6502 where both the n and the y are 16-bit quantities, that would take quite a few instructions, bytes, cycles. All we need to do in the instruction encoding for an improved machine is to beat that density and speed comfortably - we don't have to fit it into three bytes. Even five bytes would be a win, I think.

I don't completely catch the idea. IMHO it is obvious that the better code density means the better speed because of the better cache performance. If we have 24/32-bit address space then n and y should be 24/32-bit too. But if we have zp-registers then what is it all for?
I also miss a bit the point in an idea that 6502+ has some disadvantage compared with x86. 6502+ shows that 6502 had very big potential for its further development and with proper support would be the best to the our time. Intel x86 is good, better than DEC PDP/VAX-11, Motorola 680x0, NS 320xx, ... but 6502 was better (IMHO). Intel won because it had much more resources and Motorola was too stupid to support the better CPU and its better future.
BTW I can share yet another idea which can make the position of 6502+ even more better. The second accumulator would have to have the double width.

It would be a thing similar to z80 HL-accumulator but with the full range of operations. It would be 16-bit at the 8-bit data mode, 32-bit at 16-bit mode, ..., 128-bit at 64-bit mode. All operations with zp-registers would be very fast and short. Every ALU upgrade would make them much faster.

Arlet · Post by **Arlet** » Thu May 04, 2017 4:34 pm

Quote:

I also miss a bit the point in an idea that 6502+ has some disadvantage compared with x86

The area where the 6502+ is lacking is the ability to perform operations on the zeropage and index registers, as well as complex addressing calculations using multiple registers and/or offsets. As I understand it, anything except INC/DEC still needs to go through the A/B register.

Something simple as doing a memcpy() using 32 bit transfers requires +4 on the address pointer, which would require loading the pointer into A, adding 4, and moving it back.

You may want to try the memory allocation challenge with a 6502+ program, and compare it to my ARM Cortex code. I've posted a reference version in C.

litwr · Post by **litwr** » Thu May 04, 2017 5:40 pm

Arlet wrote:

The area where the 6502+ is lacking is the ability to perform operations on the zeropage and index registers, as well as complex addressing calculations using multiple registers and/or offsets. As I understand it, anything except INC/DEC still needs to go through the A/B register.

Something simple as doing a memcpy() using 32 bit transfers requires +4 on the address pointer, which would require loading the pointer into A, adding 4, and moving it back.

You may want to try the memory allocation challenge with a 6502+ program, and compare it to my ARM Cortex code. I've posted a reference version in C.

Thank you for the link to the interesting challenge. I made such things but I am afraid I am too busy now for the active participation. I hope to find an opportunity later - this task is really a very entertaining to me. BTW does the code for malloc and free present in cc65? IMHO it should be there.
You wrote about later and very advanced 6502+ (1985) which indeed would have a block copy instruction(s) as it was mentioned earlier, something like LDIR/LDDR of z80 or MVP/MVN of 65816 or REP MOVS of x86. Maybe it would be worth to change INX/DEY/... behavior according to the current data mode. So in the 32-bit mode INX would mean X <- X +4. This advanced version would have a lot of 2-byte opcode instructions which can make the all required operations. It also would be featured by a barrel shifter. Multi-byte opcodes would give new addressing modes etc. I can imagine (zp),X,Y mode which in the Intel syntax means [R1+R2+R3]. All this matter is just a conjecture but it would work.

whartung · Post by **whartung** » Thu May 04, 2017 7:52 pm

Arlet wrote:

I think Intel was in a very starting good position because of their early design choices, allowing for a long and smooth upgrade path with regular performance increases, while keeping support for the existing software base. Try that with the 6502, like we've talked about in this thread, and you get stuck dealing with the bad baggage from the beginning.

I think you have to give a big hand to Zilog here, since Intel never really made a "better" 8080 during the era. The Z80 pretty much filled that role. Arguably the Z80 usurped the 8080 completely, but it's compatibility with the 8080, and the popularity of CP/M and it's lowest common denominator requirement to be an 8080, carried the legacy of the 8080 forward. Intel never had to make a faster, or more sophisticated 8080. Rather they were able to jump straight in to the 8086.

Outside of the 8080's influence on the Z80, it's the Z80's legacy that carried forward to the collection of really nice and powerful Z80 derivatives that we have today.

The 8086 was not binary compatible with the 8080, but was it source code compatible? I don't recall. I don't think the mnemonics even matched, but I think you could have readily written an 8086 assembler that assembled 8080 mnemonics in to 8086 binary, and that the semantics of the opcodes were close enough to make porting 8080 pretty straightforward. But I don't know, I've never done it.

Arlet · Post by **Arlet** » Fri May 05, 2017 3:52 am

Quote:

Another interesting angle, maybe deserving of its own thread: given the amount of resources the 6502 designers had, try to come up with a better 8 bit design

I've started the challenge over on the AnyCPU forum. Even if you're not into FPGA design yourself, you're welcome to join the discussion and share ideas.

Arlet · Post by **Arlet** » Sat May 06, 2017 7:23 am

Quote:

I think you have to give a big hand to Zilog here, since Intel never really made a "better" 8080 during the era. The Z80 pretty much filled that role. Arguably the Z80 usurped the 8080 completely, but it's compatibility with the 8080, and the popularity of CP/M and it's lowest common denominator requirement to be an 8080, carried the legacy of the 8080 forward. Intel never had to make a faster, or more sophisticated 8080. Rather they were able to jump straight in to the 8086.

Correct, and interesting observation. I took a better look at the 8080, and it's not nearly as well made as the 8086. It seems Intel benefited from bringing out the 8080, and learning about the weaker spots, and improving them in the 8086, breaking binary compatibility for one time. The 8080 came out in 1974, the 6502 in 1975, the Z80 in 1976, and the 8086 in 1978 (source: wikepedia), so you are right that there was a sizeable gap between the 8080 and the 8086, and it's plausible that the Z80 helped to bridge that gap.

BigEd · Post by **BigEd** » Sat May 06, 2017 8:18 am

It's amusing to consider the 6502 as a second take on the 6800, also breaking binary compatibility! (Although in this case the team left Moto and reconvened within MOS Technology.)

Arlet · Post by **Arlet** » Sat May 06, 2017 8:31 am

Quote:

It's amusing to consider the 6502 as a second take on the 6800

Indeed, but I think their second take went into the direction of cost savings and simplicity. Which makes sense, I think, as a small company trying to compete. Intel, at that time, was probably looking for performance scaling at that point.

Tor · Post by **Tor** » Mon May 08, 2017 9:22 am

whartung wrote:

The 8086 was not binary compatible with the 8080, but was it source code compatible? I don't recall. I don't think the mnemonics even matched, but I think you could have readily written an 8086 assembler that assembled 8080 mnemonics in to 8086 binary, and that the semantics of the opcodes were close enough to make porting 8080 pretty straightforward. But I don't know, I've never done it.

That's essentially how it was done - Intel provided a beast such as you describe. It worked just so-so, as one would expect.

GARTHWILSON · Post by **GARTHWILSON** » Tue May 16, 2017 9:35 am

BigEd wrote:

GARTHWILSON wrote:

... I see the 6309 has a divide instruction, and that it takes a minimum of 25 clocks, so it takes more clocks to do just a divide than the '816 with tables takes to look up a trig or log function as I showed.

(It's worth noting that there are many ways to calculate log and trig, Taylor series being only one and not especially efficient. CORDIC, I think, doesn't use division. So, best not to assume that division is the limiting factor: it's slower than multiplication in every case I've seen, and so the people who design the algorithms for other calculations don't overuse it.)

I really would like to understand CORDIC, but in all my searching, none of the explanations I've found for CORDIC have been clear. I asked a friend who's a math teacher with a bachelor's degree in math, but he couldn't help me either. What I do find is that it still takes a step, with a small table lookup, for each bit or digit as you advance toward an answer; IOW, it won't be nearly as fast as a table lookup. One source said that the 8087 through 80486 used CORDIC, but that the Pentium went back to polynomial evaluation, infamous bug in the first iteration notwithstanding.

BigEd · Post by **BigEd** » Tue May 16, 2017 10:10 am

(Continued over here.)

What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?

Re: What if independent MOS Technology had survived?