GARTHWILSON wrote:
Several other desirables are already implemented in the 65816, like a long relative branch, stack-relative addressing, block move, push an indirect address or a literal on the stack, movable base page that doesn't have to be ZP, 16-bit stack pointer, etc..
I had sent an instruction suggestion to WDC the other day, and got an e-mail back from Bill Mensch himself. I was surprised at that.
Anyway, I recommended adding two instructions: ADX and ADY -- add a signed value to either X or Y. It would have the same addressing modes as CPX and CPY. Because the value added is sign-extended, there would be no immediate need for a SBX/SBY (subtract from X/Y) instruction. Also, ADX/ADY would behave as if the carry flag were pre-cleared. Thus, doing a 16-bit or 32-bit addition with these instructions would become about as efficient as doing an 8-bit or 16-bit addition with the existing processors.
Quote:
Putting just a few of Forth's internals like NEXT , nest , and unnest in microcode would make a big difference in execution speed and I believe could be used for other higher-level languages as well.
While this is true, I wouldn't rank this as terribly high priority.
I would definitely consider, however, adding a rational math coprocessor to the chip. Why rational and not floating point? Because it works with integers and basically consists of an integer multiplier/divider, which takes a whole lot less transistors than a floating point processor. The only really specialized piece of hardware in it would be the rational reducer, which would reduce a fraction to its lowest form.
In addition, rationals can express certain mathematical quantities more precisely than floating point in the same memory space (e.g., 355/113 for Pi, for example).
Note that the dynamic range of the RPU (for lack of better term) wouldn't appear to be as great as an FPU at first, but there are three ways to mitigate this in software:
1. Use the RPU with an explicitly 65xx-managed exponent. This gives a hybrid approach that gives the advantages of both, at a slight expense in speed.
2. Take advantage of units. That is, why multiply a number like 25175000 by .0000009 when you know that the relative exponents of the two will cancel, and you need only multiply 25175 by 9 and scale the result to get the same value?
And finally,
3. Make the RPU scalable so it can handle multiple precision integers in both the numerators and denominators of each rational (e.g., give it a carry/borrow bit, and maybe a few other flags, to let it handle integers larger than it's natural word size). Even if this is done at the cost of speed, it'll still be much faster than the 65xx's own execution speed for this type of math.
This isn't as crazy as it sounds. When I was working for Hifn, I worked quite regularly with a "Bignum" coprocessor ironically called the 6500. The 6500 had 16 registers, each was 1024 bits in size. Yes, 1024 bits. It could do addition (with or without carry), subtraction (with or without borrow), multiplication, division, exponentiation, logical shifting, or any of the above in modulo arithmetic. While it wasn't used for rational math, lord knows it could easily have done so (we actually used it to support real-time cryptography and very high confidence random number generation). It could complete most operations in linear time, depending on where the most significant bit of the operand was located (internally, it had a 32-bit ALU). And it was fast -- faster than a Pentium at working with such large quanitites. Although, a Pentium IV did beat it ... finally.
I'm not sure if Hifn is still making these chips in discrete components; I'm pretty sure they are, but they're probably sunsetting them, as its core has been integrated into other, vastly more sophisticated network processor chips. Math coprocessors have never been Hifn's primary market anyway.
Anyone working on an existing 65816-based design that has a very high need for fast math with high precision, the Hifn 6500 is a very viable option for an external rational math coprocessor.