6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Sep 19, 2024 10:31 pm

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
 Post subject: Woz FP revisited
PostPosted: Sun Aug 11, 2019 6:24 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Hello, friends!

In an earlier thread I indicated that I would be revisiting the 32-bit floating point package composed by Roy Rankin and the indomitable Steve Wozniak over four decades ago. It has been an interesting journey so far, and it is far from complete, but I wanted to share a snapshot of my work for discussion and critique. Please be brutally honest. I have much more "in progress" but it's not ready for prime-time.
Code:
      ...
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; floatyxa: convert a 24-bit signed integer in Y:X:A
;   (H:M:L) into a normalized 32-bit floating point
;   number in facc.
; Expects: the high byte in register Y, the mid byte
;   in register X and the low byte in register A.
; Returns: the normalized value in facc.
; 19 bytes
floatyxa:
      sty   facc+1      ;
      stx   facc+2      ; initialize significand
      sta   facc+3      ;
      ldx   #facc       ; point to facc
      ora   facc+2      ; is significand zero?
      ora   facc+1      ;
      beq   normz       ; yes: clear exp and return
      tya               ; no: load sig.h,
      ldy   #$96        ;   init exp to 2^22,
      bne   norm2       ;   normalize
                        ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; fsub: subtract two floating point numbers.
; Expects: the minuend in farg and the subtrahend in
;   facc; both should be normalized prior to calling
;   fsub to retain as much precision as possible.
;   A subtrahend of -2^128 will trigger an overflow.
; Returns: the normalized difference (farg - facc) in
;   facc, "rounded" to contain the 24 most-significant
;   significand bits (including sign); may alter farg.
; 5 bytes
fsub:
      ldx   #facc       ; point to subtrahend
      jsr   fnegx       ; negate it and fall through
                        ;
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
; fadd: add two floating point numbers.
; Expects: the two addends in facc and farg; both
;   should be normalized prior to calling fadd to
;   retain as much precision as possible.
; Returns: the normalized sum (farg + facc) in facc,
;   "rounded" to contain the 24 most-significant
;   significand bits (including sign); may alter farg.
; 61 bytes
fadd:
      ldx   #farg       ;
      lda   facc        ; check significand alignment
      sec               ;
      sbc   farg        ; compare the exponents
      bcs   align       ; align farg to match facc
      ldy   farg        ;
      sty   facc        ;
      ldx   #facc       ; align facc to match farg
      eor   #$ff        ; negate shift counter
      adc   #1          ;
align:
      clc               ; pre-clear "guard bit"
      beq   aligned     ; aligned if 0 shift count
      tay               ; init shift counter
      lda   1,x         ; sig.h
      cpy   #26         ; no need to asr > 25 bits
      bcc   align2      ;
      ldy   #25         ;
align2:
      cmp   #$80        ; copy sign to carry
      ror               ;
      ror   2,x         ; asr significand
      ror   3,x         ;
      dey               ;
      bne   align2      ;
      sta   1,x         ;
aligned:
      lda   farg+3      ; add aligned significands
      adc   facc+3      ; ("guard bit" from the
      sta   facc+3      ;   alignment is in carry)
      lda   farg+2      ;
      adc   facc+2      ;
      sta   facc+2      ;
      lda   farg+1      ;
      adc   facc+1      ;
      ldx   #facc       ; point to facc, fall through
                        ;
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
; normx: normalize fp# in four consecutive bytes of ZP
;   (exp:sig.h:sig.m:sig.l).
; Expects: pointer to fp# in register X, sig.h in
;   register A, result of last arithmetic operation on
;   sig.h in C and V flags.
; Returns: normalized number in fp# if possible (punts
;   too large to ovfl & leaves too small denormalized).
; 29 bytes
normx:
      ldy   0,x         ; load exponent
      bvc   norm3       ; no overflow: just normalize
      ror               ; significand overflowed: ror
      ror   2,x         ;   significand, carry to msb
      ror   3,x         ;
      iny               ; increment exp
      beq   ovfl        ; result overflowed: punt
norm2:
      cmp   #$c0        ; sig.h b7 = b6 ? (neat!)
      bmi   norm4       ; no: done
      asl   3,x         ; yes:
      rol   2,x         ;   left shift significand
      rol               ;
      dey               ;   and decrement exp
norm3:
      bne   norm2       ; loop unless denormal
norm4:
      sta   1,x         ; store sig.h
normz:
      sty   0,x         ; store exp and return
      rts               ;
                        ;
      ...
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
; fabsx: replace fp# in four consecutive bytes of ZP
;   (exp:sig.h:sig.m:sig.l) with its normalized
;   absolute value.
;   An input of -2^128 will trigger an overflow.
; Expects: pointer to fp# in register X.
; Returns: normalized absolute value of fp#.
; 7 bytes
fabsx:
      clv               ;
      lda   1,x         ; is fp# negative?
      bpl   normx       ; no: just normalize
      inc   sgn         ; complement sign bit
                        ;
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
; fnegx: negate and normalize fp# in four consecutive
;   bytes of ZP (exp:sig.h:sig.m:sig.l).
;   An input of -2^128 will trigger an overflow.
; Expects: pointer to fp# in register X.
; Returns: negated and normalized value of fp#.
; 20 bytes
fnegx:
      sec               ; negate fp#
      lda   #0          ;
      sbc   3,x         ;
      sta   3,x         ;
      lda   #0          ;
      sbc   2,x         ;
      sta   2,x         ;
      lda   #0          ;
      sbc   1,x         ;
      jmp   normx       ; normalize (or asr if ovfl)
                        ;
      ...

Aside from detangling the original code in the interests of clarity and a modest speed increase, I have also added a "guard bit" in the carry flag to assist in the "rounding up" of results in which the alignment-shifted addend lost a "1" most recently off the right end of its significand. I realize that this may not be a rigorous (or even correct) method of maintaining precision, and I would definitely like your input on this matter as I continue to put the final touches on the multiply and divide subroutines, hopefully coming soon. After that, fix, log, log10, exp, and a custom sqr (assuming the stars align appropriately)!

Please note that these snippets are a work in progress, and are completely untested at this time.

Many thanks!

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Mon Aug 12, 2019 1:31 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
A single guard-bit is an improvement over none at all. However, it only adds the ability to represent the exact midpoint between two values representable in the finally rounded format. It effectively lets you see some cases in which the rounding might be ambiguous, but doesn't really let you disambiguate them reliably. Nevertheless, the best way to use it is to round towards the nearest *even* representable value, ie. with the last bit zero, when the guard bit is set.

IEEE-754 uses three rounding bits - Guard, Round, and Sticky. The two individual bits are present, I think, to handle certain cases in the core algorithms where it is necessary to shift the whole significand left by a bit before final rounding. You may or may not think this is necessary to replicate in a compact homebrew system. More important I think is the Sticky bit, which indicates whether *any* bits to the right of the main rounding bits would be set in the infinitely-precise result. It's easier to determine this than you might think, even for transcendental functions (in which you can assume the Sticky bit is set except for certain exact special cases, such as log(1)==0). For addition, subtraction, multiplication and division, you can calculate it based on bits shifted out to the right, or the final value of the remainder.

What the Sticky bit gives you is a precise indication of whether the infinitely-precise result lies *exactly* on the representable or halfway values, or in the intervals between them. This is enough information to reliably perform correct rounding in any of the accepted modes.


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Mon Aug 12, 2019 3:17 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 452
Location: Canada
Quote:
Please be brutally honest.

I find this avenue of posts interesting. I had some interest in the Woz code a while back converting it to Verilog to get the ultimate performance boost.
Since you are re-writing the code anyway, I’d suggest sticking to the IEEE format (or how about an IEEE compatible build option) even though it may be a few cycles slower. 40 years ago, there were many different floating-point formats. Since then most everything has gone IEEE. Kinda killed the creative / originality aspect of things but a great boon to compatibility and correctness.
It would make it easier to port “foreign” software which is likely using the IEEE format. Is there a planned target for this code? You mention the code is likely to be 50% larger and a lot faster. The 50% larger code isn’t a problem for the ‘816, but might be for some 6502 systems. With ‘816 instructions the code could likely be a lot faster and shorter. A 16-bit exponent field would probably be desirable with ‘816 though, so maybe 80-bit double extended arithmetic would be easier.

I found writing I/O routines to print and input floating-point numbers challenging. It's likely several kBytes of code by itself. It maybe larger than providing the basic operations but probably necessary too.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Mon Aug 12, 2019 3:31 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
I'm really glad to have spotted this here, as an alternative to the floating point package I was planning on PETTIL (a general purpose Commodore Forth).

4-byte values are a natural for Forth, and I wouldn't have to rely on the kludgy swappy approach I was going to go with, because BASIC's floating point area inconveniently overlaps my Forth data stack. It might be fun to compare your code, the original code, and then code golf this down and make it fit in my Forth


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Mon Aug 12, 2019 7:04 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
The problem with the IEEE-754 32-bit format is that it has an 8-bit exponent field that is not byte-aligned. That makes it just that little bit more awkward than it needs to be to use on an 8-bit machine.

If you swap the order of the exponent and sign fields, it should be a lot easier to work with. It's probably better to write a routine which does that only when you need to exchange values with another machine or store it in a file, rather than for every operation. This makes your internal format different from the IEEE format, but your core routines are faster and smaller.


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Fri Aug 16, 2019 8:19 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Thank you all. I certainly recognize the benefits of IEEE-754, but that is simply not my focus here. Regarding the guard, round and sticky bits ... I believe I will experiment with different revisions and (perhaps with your help) analyze the trade-offs in code size, speed and accuracy. My present focus is to get the elementary functions working, then tweak the improvements into the mix as desired. If the accuracy improvements place the executable out of reach for a severely memory-constrained system, I want to provide an opportunity for the potential user to choose the best solution for him or her. Of course, I'm not going to beat the original densely-packed Woz code for size, so that'll remain as the base-line solution.
Rob Finch wrote:
I found writing I/O routines to print and input floating-point numbers challenging. It's likely several kBytes of code by itself. It maybe larger than providing the basic operations but probably necessary too.

I agree that this is a non-trivial coding exercise. Do you have any links to examples that could set me on a proper path? The BASIC sources I've browsed through so far haven't been particularly inspirational. [Edit: It looks like Gates and company coded FIN and FOUT in a little under 600 total bytes of code and tables.] Initially, I may just provide a patch that converts back-and-forth between Woz floats and MS floats, and use the MS I/O routines during my evaluation phases. My favorite test mule is the Apple ][+, so I will have a "capable" and familiar host for my experiments.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Wed Aug 28, 2019 3:39 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Update: I have decided that FIN and FOUT are going to be my next attempts, since they are essential to visibly debugging the other operations, and because I'm interested in redesigning the old MS routines, which appear to have been written rather hurriedly. I am confident that I can do better, especially in the absence of the time pressure Gates and company no doubt were feeling in that distant era.

FIN seems simple enough ... consume buffered ASCII chars from the set { 0 1 2 3 4 5 6 7 8 9 . + - E } until an unexpected char is encountered. Convert ASCII digits to integer binary with an unsigned integer multiply/accumulate, keep track of the decimal point location, and treat exponential notation as an optional/additional method of shifting the decimal point location. As soon as an unexpected char is encountered, FP multiply or FP divide the accumulator by the appropriate power of ten, apply the sign and return.

In a bit more detail:

a) clear the sign flag, the main integer accumulator, the exponent accumulator and the decimal point location
b) grab a + or - if it's there and save it in the sign flag
c) grab any digits up to a . or E or unexpected, add each digit to the main accumulator*10
d) if . then continue grabbing and accumulating any following digits, decrement the decimal point location
e) if E then stop the main integer accumulation and form the signed decimal exponent in the smaller exponent accumulator
f) "float" the integer accumulator to a value between 0.0 and 8388607.0 [Q: What to do with an input like 838.8608 or 10000000?]
g) add the decimal point location to the exponent accumulator and use the result in an FP *10.0 or /10.0 loop, as appropriate
h) apply the sign and return

FOUT is a little bit more complex, because it needs to decide the thresholds between standard and exponential notations, remove leading and trailing zeros, and perform some rudimentary rounding. Perhaps my initial attempt will always output in the "+3.141593E+00" format, and I'll add the other niceties after some initial debugging is completed.

I'm already deep into FIN, but I'm not ready to share it yet especially since I just noticed that I'm gonna need to get FMUL and FDIV working as well ... just checking in with you all!

[Edited some grievous errors caused by posting after bed-time again!]

[Edit: it looks like I might be able to answer the question in f) above by keeping 24 unsigned bits in the accumulator and skipping the *+ for any accumulator high byte > #$18 while still consuming digits. If I'm discarding digits to the left of the decimal point I increment the decimal point location, and if I'm discarding digits to the right of the decimal point I just ignore them completely.]

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Thu Aug 29, 2019 1:40 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Note sure if this is any help, but I was musing about how many significant digits need to be accepted by a decimal to float input routine. And how things should be rounded. It feels like it might be valid to accumulate the mantissa at much higher precision than the eventual target precision.

Here's BBC Basic in action - not sure if these results are as you'd expect, but it might be food for thought.
Attachment:
Screenshot 2019-08-29 at 14.38.16.png
Screenshot 2019-08-29 at 14.38.16.png [ 82.09 KiB | Viewed 1446 times ]


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Fri Aug 30, 2019 4:44 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Here's how the Apple ][+ responds to your inputs using its MS 40-bit floats:

Attachment:
msfloat40.JPG
msfloat40.JPG [ 62.46 KiB | Viewed 1416 times ]
Similar, but not quite as "accurate" on average.

Here's how the TRS-80 Model III responds to the same inputs with its MS 32-bit floats:

Attachment:
ms32float.JPG
ms32float.JPG [ 40.56 KiB | Viewed 1415 times ]
There is an automatic promotion (including a brutal speed penalty) to MS 64-bit floats for longer input strings.

If I ever get to that point with Woz floats, I'll do the same experiment. In the meantime, what other kinds of inputs could you imagine that might shed some light on a few of the darker corner-cases?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Fri Aug 30, 2019 6:16 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
barrym95838 wrote:
In the meantime, what other kinds of inputs could you imagine that might shed some light on a few of the darker corner-cases?

http://www.rskey.org/~mwsebastian/miscprj/results.htm compares the accuracy of lots of different calculators and computers for:
arcsin (arccos (arctan (tan (cos (sin (9) ) ) ) ) )
in degrees mode.

Other pages that may be of (lesser) interest are:
https://www.hpmuseum.org/forum/thread-4619.html
https://www.hpmuseum.org/forum/thread-5940.html
https://www.hpmuseum.org/forum/thread-9744.html

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Fri Aug 30, 2019 6:42 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
AFAIK, none of the 8-bit MS BASICs included arcsin(), arccos() or degree mode, but I tried the following Apple ][+ experiment:

Attachment:
ms40float.JPG
ms40float.JPG [ 43.29 KiB | Viewed 1406 times ]

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Woz FP revisited
PostPosted: Fri Aug 30, 2019 6:49 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
barrym95838 wrote:
In the meantime, what other kinds of inputs could you imagine that might shed some light on a few of the darker corner-cases?


I tried to find some references about this. I didn't quite find what I wanted, but I did find:


As explained in that final link, for a different question, round-trip tests are not quite what we're interested in here, but they might point to some interesting test cases.

Here's what BBC Basic did with a few tests:
Attachment:
Screenshot 2019-08-30 at 07.45.19.png
Screenshot 2019-08-30 at 07.45.19.png [ 149.15 KiB | Viewed 1404 times ]


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: