John West wrote:
My interpretation: 00 and 11 are both "Unnormalized mantissa. Exponent = -128". 00 is for positive denormals (IEEE 754 notation feels more comfortable, if you'll excuse me) and 11 for negative. Using 00 or 11 with an exponent that isn't -128 is presumably considered an error.
It isn't an error, there's just a potential loss of precision. However, the major routines (FADD, FDIV, FMUL, FSUB, etc.) only return an unnormalized number when the exponent is -128.
kc5tja wrote:
Frankly, they are NEVER stored in variables. EVER. Because they violate the rules of floating point numbers, just as 125.4x10^1 violates scientific notation. So they can therefore ONLY occur WHILE performing a calculation. In short, they are created only as intermediate results.
One advantage of the Wozniak & Rankin representation is that it's perfectly fine to store unnormalized numbers in variables, since FP numbers in that format are stored in the same amount of memory regardless of whether the number is in a floating point "accumulator" or whether it is stored elsewhere in memory. On the other hand, in the more common representation (used by EhBASIC), FP numbers are stored in unpacked format in a floating point "accumulator" and in packed format in a variable, so unnormalized numbers cannot be stored in a variable. More on this below.
kc5tja wrote:
Personally, I prefer how Commodore stored its floating point numbers over Apple's. The problem with Apple's is you lost 1 bit of precision when storing a positive value. It also keeps that silly '1' bit (which you always know is there) hanging around. Thus, in effect, you're losing 2 bits of precision, which can add up over many calculations, and is especially noticable when using numbers like 1/3, 1/9, and irrational numbers.
I'm going to call the two representations "signed mantissa" (Wozniak & Rankin routines) and "positive mantissa" (EhBASIC, Applesoft). Since well over 99% of the FP calculations done on Apples used the Applesoft FP routines (the routines in the wozfp3.txt file were rarely used), calling the "signed mantissa" representation the "Apple" representation may be confusing.
Anyway, for a given number of mantissa bits, the "signed mantissa" has only 1 less bit of precision than "positive mantissa", not 2 less. The wozniak & Rankin uses a 24-bit mantissa which can represent every integer from -8388608 to 8388607. EhBASIC also uses a 24-bit mantissa and can represent every integer from -16777215 to 1677215 except zero (which is represented by a special exponent value, not by a mantissa value), so that is only 1 less bit of precision for both positive and negative values. Either representation can be extended to as much precision as you wish, though.
Here is a more detailed comparison of the two representations:
24-bit "signed mantissa" representation: (Wozniak & Rankin routines)
Code:
X1 M1+0 M1+1 M1+2
EEEEEEEE SNMMMMMM MMMMMMMM MMMMMMMM
24-bit "positive mantissa" representation: (EhBASIC)
Unpacked format: in FAC (the floating point accumulator)
Code:
FAC1_e FAC1_1 FAC1_2 FAC1_3 FAC1_s
EEEEEEEE NMMMMMMM MMMMMMMM MMMMMMMM SXXXXXXX
Packed format: in a variable (memory)
Code:
addr+0 addr+1 addr+2 addr+3
EEEEEEEE SMMMMMMM MMMMMMMM MMMMMMMM
The bits are:
E = Exponent
M = Mantissa
N = Mantissa (also indicates whether the mantissa is normalized)
S = Sign
X = Don't care
In the "signed mantissa" representation, the mantissa is normalized when the "normalized" (N) bit is not equal to the sign bit (S). In the "positive mantissa" representation, the mantissa is normalized when the "normalized" bit is 1. When packing the number (i.e. converting from unpacked to packed format), the "normalized" bit is simply overwritten by the sign bit, which is why (a) an unnormalized number cannot be stored in a variable (b) there is one more bit of precision than "signed mantissa" representation.
I should point out that EhBASIC (and Applesoft, which uses 32-bit mantissa but is otherwise the same) also has a rounding byte, FAC1_r, which is used as an extra byte of precision during calculations, and is ultimately rounded into FAC1. However, the absence or presence of the rounding byte depends on the floating point implementation, not on the representation used for FP numbers. An implemenation of either representation could use a rounding byte, or could omit it.
The 24-bit "positive mantissa" representation:
unpacked format:
FAC1_e: an 8-bit unsigned integer = 0 to 255
U: a 24-bit unsigned integer = 0 to 16777215
FAC1_s: the sign of the mantissa
FAC1_1: the high byte of M
FAC1_2: the middle byte of M
FAC1_3: the low byte of M
the number represented in unpacked format:
when FAC1_e = 0: 0
when FAC1_e is non-zero and bit 7 of FAC1_s = 0: U * 2 ^ (FAC1_e - 152)
when FAC1_e is non-zero and bit 7 of FAC1_s = 1: -U * 2 ^ (FAC1_e - 152)
Range of negative values: -2 ^ 128 + 2 ^ 103 to -2 ^ -151
Range of positive values: 2 ^ -151 to 2 ^ 128 - 2 ^ 103
packed format:
FAC1_e: an 8-bit unsigned number = 0 to 255
P: a 23-bit unsigned integer = 0 to 8388607
FAC1_1: bits 6 to 0 are bits 22 to 16 of M
FAC1_2: bits 15 to 8 of M
FAC1_3: bits 7 to 0 of M
The number represented in packed format:
when FAC1_e = 0: 0
when FAC1_e is non-zero and bit 7 of FAC1_1 = 0: (2 ^ 23 + P) * 2 ^ (FAC1_e - 152)
when FAC1_e is non-zero and bit 7 of FAC1_1 = 1: -(2 ^ 23 + P) * 2 ^ (FAC1_e - 152)
Range of negative values: -2 ^ 128 + 2 ^ 103 to -2 ^ -128
Range of positive values: 2 ^ -128 to 2 ^ 128 - 2 ^ 103
Advantages of this representation:
1. There is 1 more bit of (mantissa) precision as compared to a 24-bit "signed mantissa" representation. This may sound minor, but round off errors and other such issues can rapidly surface. Proper use of the extra bit can help alleviate some of the difficulies.
2. It's more common, so it's more widely understood. Unfortunately, there are a lot of subtleties and caveats when it comes to floating point, many of which the average programmer is unaware of. There is far too little documentation of floating point as it is, and a representation that has more information available can be a great benefit.
The 24-bit "signed mantissa" representation:
X1: the 8-bit unsigned number at X1 = 0 to 255
M1: the 24-bit twos complement number at M1 = -8388608 to 8388607
M1+0: the low byte of M1
M1+1: the middle byte of M1
M1+2: the high byte of M1
The number represented by X1 and M1 is: M1 * 2 ^ (X1 - 150)
Range of negative values: -2 ^ 128 to -2 ^ -150
Range of positive values: 2 ^ -150 to 2 ^ 128 - 2 ^ 105
Advantages of this representation:
1. Unnormalized numbers can be stored in variables, so numbers closer to zero (such as 2 ^ -150) are not limited to intermediate results.
2. There isn't a packed or unpacked format, so routines to convert between the two aren't needed. In addition to requiring less space, there is also a slight (a very slight) speed increase when moving numbers from a variable to the floating point "accumulator" and vice versa.
3. Calculations can be performed faster than the "positive mantissa" representation. Neither the Wozniak & Rankin routines nor EhBASIC is really optimized for speed, so this may not be so easy to see by comparing those two implemenations. For example, both perform subtraction by negating one argument and performing an addition. It would be faster to write a special subtraction routine, of course.
Anyway, for addition and subtraction, both must shift one mantissa (to "align the decimal points"), and finish by normalizing the result. In between, "signed mantissa" needs only to added the mantissa, whereas "positive mantissa" must see if the two mantissa have the same sign, adding them if their signs are the same, or subtracting them if their signs are different. In the latter case, if the subtraction resulted in a negative mantissa, the mantissa must be negated so that it positive. In BASIC, addition is more common than it may seem, because the NEXT in a FOR NEXT loop performs a floating point addition to update the loop variable. So in that case it makes sense to optimize for addition.
It might seem like "signed mantissa" multiplication and division routines would be slower than their "positive mantissa" counterparts. In many implementations this will be true, but it does not have to be so. Probably well over 99% of signed multiplication routines perform the multiplcation using an unsigned multiplication routine to calculate product = abs(mantissa1) * abs(mantissa2), negating that product if the signs of the two mantissa are different. However, it is possible to write a signed multiplication routine that does not need to calculate either absolute value or perform a final negation of the product. When all is said and done, the speed of that type of a "signed mantissa" multiplication routine and a "positive mantissa" multiplication routine will be about the same. The above is also true for division.
Also, in "positive mantissa", zero is represented by a special exponent value, not by a mantissa value, so you must specifically check the exponent for zero which takes a few extra cycles when none of function arguments are zero, which is the most common case.
The bottom line is that the best representation to use isn't clear cut. But that's often the case.