Floating point trouble

8BIT · Post by **8BIT** » Tue Mar 31, 2009 3:43 pm

Lee,

I discovered this yesterday on v2.09. I tried it on my SBC-2, SBC-3, and Simulator. I also downloaded v2.10 and tried it on my simulator and the 6502 simulator.

If you enter PRINT 41^3<enter>, you get 68921.1 vs 68921

There are other numbers that give this error.

try FOR X=40 TO 50: PRINT X^3:NEXT

I also found it on some X^4 values.

I have not dug into this at all, or tried more number ranges.

Any thoughts?

Daryl

dclxvi · Post by **dclxvi** » Sat Apr 04, 2009 6:44 pm

Note that PRINT 41^3 is not the same as PRINT 41*41*41. EhBASIC calculates X^Y using the formula EXP(Y*LOG(X)). Since LOG is base e (~2.718), for integer X, mathematically, Ln X will be irrational, so LOG(X) (as calculated by EhBASIC) will be an approximation, not an exact value, and there are the usual FP caveats (e.g. round off error) that go along with that.

So if L=LOG(X), Y*L is also an approximation, and if P=Y*L, then EXP(P) is an approximation as well. So X^Y will be an approximation, that may or may not be identical to the mathematically exact value after rounding. You can check each step of the calculation, i.e. is PRINT LOG(41) correct to however many digits it displays, is 3 * the previous result correct to the appropriate number of decimal places, etc. If so, the FP routines are probably ok, you're just running into the limitations of FP.

It's possible to modify the X^Y routine to handle integer Y specially. It's not a major change, though its not a one or two liner either. The EXP(Y*LOG(X)) formula will still be necessary for things like PRINT 3^2.1

(For the same reason, 9^.5 is not the same as SQR(9), even though .5 can be represented exactly in EhBASIC. The SQR routine in EhBASIC will generate an exact value in that case. It all depends on how many special cases you want X^Y to handle.)

8BIT · Post by **8BIT** » Sat Apr 04, 2009 7:51 pm

OK, thanks for the great explanation. I did try 41*41*41 and saw that it worked. I wasn't sure if it was a limitation or a bug.

thanks

Daryl

leeeeee · Post by **leeeeee** » Sun Apr 05, 2009 8:28 pm

Yes I do know about this. It is, as dclxvi says, a feature of the way some functions are calculated and for some values with some functions, gives a result accurate to as few as 14.5 bits.

I did try rewriting a lot of the math functions to use CORDIC methods but the size of the code with tables was bigger, it was as slow or slower as the current routines and there wasn't a huge gain in accuracy.

I would stick to basic math operators as much as you can, this will give more accurate results and may even be quicker. I think up to the 4th or 5th power is faster doing N*N*N... than using N^x

One routine that I did make shorter, faster and more accurate is SQR() which is good for 23+ bits of accuracy and, if it is out, always returns a result that is less than the true result by less than 1 bit. I could have made it good for 23.5 bits with a +/- 0.5 bit maximum error but how it is is shorter, faster and I prefer to know that any error will always give a result smaller than the true answer.

EhBASIC 68K is a little better as I wrote that from scratch and used CORDIC methods but it still returns some results with as little as 18 bits of accuracy.

It's not a bug, it's an undocumented feature.

Lee.

8BIT · Post by **8BIT** » Mon Apr 06, 2009 12:06 am

leeeeee wrote:

It's not a bug, it's an undocumented feature.

Lee.

Oh yes, I had forgotten that term. Thanks Lee!!!

dclxvi · Post by **dclxvi** » Tue Apr 07, 2009 12:36 am

As a nod to one's reaction when first encountering it, it could be referred to as a Feature, Undocumented, Calculating Kind. We could even abbreviate it.

BigEd · Post by **BigEd** » Tue Apr 07, 2009 8:44 pm

Purely for interest:

I tracked down a uk101 emulator, and the BASIC there gives the same result (it's a 1977 Microsoft BASIC, with four byte floats)

(Looks like BBC BASIC and Spectrum BASIC both use 5 byte floats, hopefully with transcendental algorithms to match.)

There's some speculation about Microsoft's effort here:
http://www.amigau.com/68K/dg/dg25.htm
(and an accuracy benchmark too)

Also there's some material on a double precision library here:
http://www.amigau.com/68K/dg/dg16.htm
- interesting to me but probably old news to Lee.

leeeeee · Post by **leeeeee** » Tue Apr 07, 2009 11:33 pm

Yes I remember that but I had forgotten just how bad some of EhBASIC's functions are.

Running the accuracy benchmark with each function pair gives the following result ..

Code: Select all

 TAN(ATN(a)) / a - 1      6.17789E-5
 EXP(LOG(a)) / a - 1      2.81708E-7
 (a * a)^.5 / a - 1       2.22382E-7    How the MicroSoft code does SQR()
 SQR(a * a) / a - 1       0             My SQR() routine

Working out the numbers into bits of accuracy gives about 14 bits for TAN(ATN(a)), which is pretty poor, and about 21 bits for EXP(LOG(a)), which is better than I remembered.

x^y is good for about 22 bits but only if x:y is close to 1. I seem to remember that it gets worse by LOG2(x:y) so 41^3 will only be good to about about 18 bits, which is what we see.

At least I improved the SQR() function by all of 2 bits. 8^)=

Lee.

BigEd · Post by **BigEd** » Fri Apr 10, 2009 7:49 pm

There's a nice article reversing the various versions of Microsoft's BASIC here: http://www.pagetable.com/?p=46

Lots of other interesting stuff there too - including sources for 7 versions, both 4byte and 5byte floats.

BigEd · Post by **BigEd** » Fri Apr 10, 2009 9:58 pm

I haven't (quite) got an environment in which to run EhBASIC - I'm nearly there.

But for interest I ran the TAN(ATN(A)) benchmark on 3 environments in emulation:

Code: Select all

Z88 BBC BASIC V3: 3.84825359E-7
C64 BASIC V2:      2.38180225E-07
BBC 1982 BASIC:   2.08041984E-7

(I see RTRussell's re-implementation of BBC BASIC differs from the 6502 original)

These all have 5-byte floats, according to

Code: Select all

PRINT 2^33+11-2^33
        12

The benchmark was

Code: Select all

   10  I=2500
   20  FORA=1TOI
   30    B= TAN(ATN(A))/A -1
   40    Z=Z+B*B
   50    NEXT
   60  PRINT "R.M.S. error = "; SQR(Z/I)

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Jun 18, 2009 6:01 pm

I wonder if ehbasic's math would be better served by using packed BCD. It wouldn't be as fast as excess-128 FP (which is what appears to be implemented in ehbasic) but it would be dead accurate. The absolute range isn't as great as FP, but unless you're an astronomer or Bill Gates calculating his net worth, the range probably wouldn't be critical.

paulrsm · Post by **paulrsm** » Tue Aug 25, 2009 3:39 am

BCD does not eliminate floating point errors; it merely changes them to different ones. Both binary and BCD can represent 1/2 exactly, but neither can represent 1/3 exactly.

kc5tja · Post by **kc5tja** » Tue Aug 25, 2009 6:59 am

Actually, floating point errors are a serious problem even for more mundane applications. For starters, floating point addition is not associative. Adding a very small number to a very large number yields round-off errors, extending up to and including the entire magnitude of the number! Consider adding 3.14159 plus 10e+30. The result will be 10e+30 in most floating point math implementations. Ouch!

This is a rather large problem because of some equations which have terms which don't get simplified for some reason (usually programmer oversight, but other reasons too). E.g., a+b-a will be horrifyingly in error if a=10e30 and b=pi.

Fixed-point math is the only general purpose solution to this problem, and that requires a fair bit of memory to express some of these larger figures. You can sometimes work around this by storing numbers logarithmically, or choosing more appropriate units of measurement (for a practical example, see http://www.colorforth.com/simulation.html . For Chuck Moore's philosophical basis for this, see http://www.colorforth.com/binding.html)

To express figures such as 1/3 precisely, you need to implement rational numbers, where each "number" is internally stored as a numerator/denominator pair. Each component of the pair could be an integer, fixed-, or floating-point representation as appropriate for your needs, though most use integer. The PC/GEOS operating system, for example, utilized rationals extensively in its GUI, on an 8088 no less, to implement scaling, skewing, rotations, and other kinds of transforms.

Most impressive.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Aug 26, 2009 2:28 am

paulrsm wrote:

BCD does not eliminate floating point errors; it merely changes them to different ones. Both binary and BCD can represent 1/2 exactly, but neither can represent 1/3 exactly.

I think you are confusing floating point math limitations with ASCII to floating point conversion errors. Any ASCII representation of a floating point number can be exactly converted to BCD if the BCD implementation is able to maintain the required precision. This wouldn't necessarily be true with binary formats such as excess-128 or IEEE floating point.

More to the point are operations that result in an irrational number, such as 1 divided by 3 (0.333333...). The operands themselves may have made it through ASCII-to-internal format conversion, but when subjected to division or various transcendental operations, produce an irrational result. That limitation would apply to any computation device without regard to the internal number format in use.