6502.org • View topic - 128 bit Floating Point 65C816 implementation

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

128 bit Floating Point 65C816 implementation

Page 2 of 3

[ 44 posts ]

Go to page Previous 1, 2, 3 Next

Previous topic | Next topic

Author

Message

granati

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 6:19 am

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy

GARTHWILSON wrote:

granati wrote:

p.s.: the source code is available in the downloadable archive in my site

Is that http://65xx.unet.bz/ ? (It's not in your profile or signature line, but it's the base of your link above which gives a 404 error message.) I cannot find the source code in that domain. It is archived though at https://web.archive.org/web/20170923012 ... 80/fpu.txt .

Ok, the link now is online again. I put link too for download the archive of source code (still in develop phase) for my hardware: this archive contain a full working system (bios for video, keyboard, serial port, floppy disk and ata disk, floating point implementation, a full screen multi-Windows txt editor, monitor with assembler/disassembler, a first attempt to implement fat16, real time clock, and some library functions for strings and command line parsing ).
Not that inside is my first implementation of floating point extended-format (ieee-80 bits), just limited to the four basic operations.

More, in the source code archive, an implementation of "c64 simulator", still in develop phase, but with a fully working pascal-compiler adapted from the "Oxford pascal compiler for c64".

For who ask if exist code for implement ieee 32- or 64-bits floating point, i ansewer: no, but... code for 80- or 128-bits floating point can be easily adapted, just use the right mantissa bytes and the right biased exponent. Anyway, all intermediate computations, should be maded at least in 80-bits format for avoid precison losing and after rounding the final result can be translated to 64 bits format. Regard 32-bits format, in my opinion is better the Microsoft format that use 5 bytes (even if the biased exponent in excess-128 don't give chance to manage a gradual underflow, but this can be easily changed).

Marco

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family

Top

GARTHWILSON

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 6:30 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

Thanks. Is there a way we can give a URL that takes someone directly to various pages on the site, so we don't have to say, "Go to http://65xx.unet.bz/ and then click on [...] halfway down the left edge"?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

granati

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 6:44 am

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy

GARTHWILSON wrote:

Of course, Garth:

http://65xx.unet.bz/fpu.txt - 128-bits Floating Point
http://65xx.unet.bz/strings.txt - Small Strings Library
http://65xx.unet.bz/mb01/B1606.zip - Full Source Code Archive
http://65xx.unet.bz/mb01/KEYB.zip - Keyboard Controller Firmware Source Code

Marco

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family

Top

BigDumbDinosaur

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 4:17 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8503
Location: Midwestern USA

granati wrote:

http://65xx.unet.bz/fpu.txt - 128-bits Floating Point

Which assembler did you use to put this together?

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

granati

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 5:35 pm

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy

BigDumbDinosaur wrote:

granati wrote:

http://65xx.unet.bz/fpu.txt - 128-bits Floating Point

Which assembler did you use to put this together?

Assembler and linker from 2500AD assemblers series (avocet system, inc.), but i think not longer available.
If you want more information please contact me with private message.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family

Top

GARTHWILSON

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 6:33 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

granati wrote:

GARTHWILSON wrote:

Sure; but I mean all the html pages linked in the top half of the list along the left edge. If I click on any of them, it goes to them, but the URL in the URL bar does not change, so copy-and-pasting the link there to give to someone won't get them to the right page, only the front page. Very nice-looking boards, by the way!

Oh, and the 2500AD assembler was the first one I used. It was very good, but expensive. Avocet was a competitor at the time (1987, approximately), but later bought 2500AD apparently.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

barrym95838

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 9:04 pm

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA

GARTHWILSON wrote:

... If I click on any of them, it goes to them, but the URL in the URL bar does not change, so copy-and-pasting the link there to give to someone won't get them to the right page, only the front page. Very nice-looking boards, by the way!

...

If I right-click on any of the left-column links and select "Copy link address" from the pop-up context menu then I have no problem pasting the specific page link. Using Chrome on Windoze.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)

Top

GARTHWILSON

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Oct 24, 2018 11:38 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

barrym95838 wrote:

If I right-click on any of the left-column links and select "Copy link address" from the pop-up context menu then I have no problem pasting the specific page link. Using Chrome on Windoze.

Interesting. I tried it and it works for me too, with FF under Linux. So why does the URL bar still continue to say only http://65xx.unet.bz/ ?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

granati

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Thu Oct 25, 2018 5:30 am

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy

GARTHWILSON wrote:

barrym95838 wrote:

If I right-click on any of the left-column links and select "Copy link address" from the pop-up context menu then I have no problem pasting the specific page link. Using Chrome on Windoze.

Interesting. I tried it and it works for me too, with FF under Linux. So why does the URL bar still continue to say only http://65xx.unet.bz/ ?

Because the frame structure of site: left frame for menu, right frame for target page; url bar report just the url where frames are created.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family

Top

granati

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Thu Oct 25, 2018 5:08 pm

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy

Now all source code is browsable online at:

http://65xx.unet.bz/repos

I put online source code for gal too (that i missing till now)

Marco

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family

Top

tmr4

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Jul 20, 2022 6:36 pm

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147

I've added this floating-point package into my system (after porting the source to the CA65 assembler). The package requires its own direct page, using almost the entirety of it, and thus requires some setup and recovery before and after calling a floating-point routine. This seems like a perfect situation for the COP instruction. It looks like this was the intention for the original package, but I couldn't find any indication that it was ever implemented for it.

I worked up my own COP service routine to handle the floating-point prep and post work but was surprised that it took over double the cycles and bytes as just simply just calling separate prep and post routines from the body of the calling routine. Getting the COP signature byte added a good chunk of the extra cycles/bytes.

For example, here is a skeletonized snippet using COP taking 50 cycles and 19 bytes (not including common instructions with the comparison below):

Code:

; calling routine
        ...
        cop 1        ; 8 cycles, 2 byte(s)      COP vector: fp_csr
        ...

fp_csr:
        ...          ; fp_prep

        ; get COP signature byte
        lda ADDR,s   ; 5         2
        dec          ; 2         1
        sta F2       ; 4         2
        lda BB ,s    ; 5         2
        sta F2+2     ; 4         2
        lda [F2]     ; 7         2
        and #$ff     ; 3         3
        cmp #1       ; 2         2
        beq @fp1     ; 3         2
        ...

@fp1:
        jsr fp1
        ...
        ...          ; fp_post

        rti          ; 7         1
                     ; 50        19

Alternatively, here is a snippet using a macro that automates the prep and post calls, taking only 24 cycles and 8 bytes:

Code:

.macro fp1_call:
        jsr fp_prep  ; 6 cycles, 3 byte(s)
        jsr fp1
        jsr fp_post  ; 6         3
.endmacro

; calling routine
        ...
        fp1_call
        ...

fp_prep:
        ...
        rts          ; 6         1

fp_post:
        ...
        rts          ; 6         1
                     ; 24        8

To be fair, each floating-point call in the latter case takes 9 bytes (including the call to the floating-point routine) versus just 2 bytes for the former (not counting the single call in the COP service routine). Edit 7/21/2022: actually, it’s much more than 9 bytes as the JSR calls here won’t work for the prep and post work without pulling and replacing the return address on the stack which takes 39 cycles and 16 bytes. With that we might as well use the COP instruction. Thus, to maintain speed we need to make the prep/post work macros, dramatically increasing the size of each call to the floating-point library. But speed seems more important for a floating-point package, so the latter case seems better. Am I missing something? How have others handled calls to this floating-point package?

Last edited by tmr4 on Thu Jul 21, 2022 11:59 pm, edited 1 time in total.

Top

BigEd

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Jul 20, 2022 8:13 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

I'm not sure I see the merit in trying to use COP for something performance critical. It's just another means of calling code, and if JSL works better for you, I'd suggest using it. Either way you need to define your ABI and stick to it.

Top

BigDumbDinosaur

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Jul 20, 2022 8:19 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8503
Location: Midwestern USA

tmr4 wrote:

I've added this floating-point package into my system...The package requires its own direct page, using almost the entirety of it, and thus requires some setup and recovery before and after calling a floating-point routine. This seems like a perfect situation ...I worked up my own COP service routine to handle the floating-point prep and post work but was surprised that it took over double the cycles and bytes as just simply just calling separate prep and post routines from the body of the calling routine. Getting the COP signature byte added a good chunk of the extra cycles/bytes.

My current firmware for my POC units uses a COP ISR to dispatch API services. The caller uses COP #<api_num> to select the service it desires. I monkeyed around with several techniques for fetching the COP signature in an effort to minimize execution time. Here is the first one I tried:

Code:

03808  00D238  C2 0C         icopa    rep #sr_bdm|sr_irq    ;decimal mode off, IRQs on  (3)
03809  00D23A  E2 30                  sep #m_setr           ;8-bit registers            (3)
03810  00D23C  3B                     tsc                   ;COP stack frame is...      (2)
03811  00D23D  5B                     tcd                   ;now direct page            (2)
03812  00D23E  A9 03                  lda #sr_zer|sr_car    ;C = 0 & Z = 0 is...        (2)
03813  00D240  14 0A                  trb cop_srx           ;default return             (6)
03814  00D242  C2 20                  rep #m_seta           ;16-bit accumulator         (3)
03815  00D244  C6 0B                  dec cop_pcx           ;point to API index, ...    (8)
03816  00D246  A7 0B                  lda [cop_pcx]         ;get it &...                (8)
03817  00D248  AA                     tax                   ;protect it                 (2)
03818  00D249  E6 0B                  inc cop_pcx           ;point to return address    (8)
03819  00D24B  A9 00 00               lda !#kerneldp        ;set kernel's...            (3)
03820  00D24E  5B                     tcd                   ;direct page                (2)
03821  00D24F  4B                     phk                   ;set kernel's...            (3)
03822  00D250  AB                     plb                   ;working bank               (4)
03823  00D251  8A                     txa                   ;restore API index          (2)
03824  00D252  E2 20                  sep #m_seta           ;8-bit accumulator          (3)
03825  00D254  F0 22                  beq .apierr           ;zero API index not allowed (2 or 3)
03826  ;         
03827  00D256  3A                     dec                   ;no, zero-align             (2)
03828  00D257  C9 15                  cmp #maxapi           ;API index in range?        (2)
03829  00D259  B0 1D                  bcs .apierr           ;no, error                  (2 or 3)
03830  ;
03831  00D25B  0A                     asl A                 ;yes, convert to service... (2)
03832  00D25C  AA                     tax                   ;lookup table offset        (2)
03833  00D25D  C2 10                  rep #m_setx           ;16-bit index               (3)
03834  00D25F  FC 35 E7               jsr (apifntab,x)      ;run API processor          (8)

Instruction execution times are in parentheses. Symbols such as cop_srx and cop_pcx refer to elements of the stack frame generated at the front end of the COP ISR.

The above uses the trick of pointing direct page to the stack. It does work pretty well, but I just knew

there had to be a faster method. Deep pondering led to the following code, which includes the front and back ends of the COP ISR:

Code:

03796  00D22E  C2 30         icop     rep #m_setr           ;16-bit registers           (3) ——
03797  00D230  5A                     phy                   ;save MPU state             (4)   |
03798  00D231  DA                     phx                   ;                           (4)   |
03799  00D232  48                     pha                   ;                           (4)   | —> 28
03800  00D233  0B                     phd                   ;                           (4)   |   1.75
03801  00D234  8B                     phb                   ;                           (4)   |
03802  00D235  6C 04 01               jmp (ivcop)           ;take COP indirect vector   (5) ——
03803  ;
03804  ;   ———————————————————————————————————————————————————————————
03805  ;   COP processing re-enters here if IVCOP hasn't been altered.
03806  ;   ———————————————————————————————————————————————————————————
03807  ;
03808  00D238  C2 1C         icopa    rep #m_setx|sr_bdm|sr_irq ;                       (3) ——
03809  00D23A  E2 20                  sep #m_seta           ;8 bit accumulator          (3)   |
03810  00D23C  3B                     tsc                   ;COP stack frame is...      (2)   |
03811  00D23D  5B                     tcd                   ;direct page                (2)   |
03812  00D23E  A9 03                  lda #sr_zer|sr_car    ;C = 0 & Z = 0 is...        (2)   |
03813  00D240  14 0A                  trb cop_srx           ;default return             (6)   |
03814  00D242  A5 0D                  lda cop_pbx           ;get caller's bank &...     (4)   |
03815  00D244  A4 0B                  ldy cop_pcx           ;caller's return address    (5)   |
03816  00D246  48                     pha                   ;set caller's bank as...    (3)   |
03817  00D247  AB                     plb                   ;a temporary data bank      (4)   |
03818  00D248  C2 20                  rep #m_seta           ;16-bit accumulator         (3)   |
03819  00D24A  A9 00 00               lda !#kerneldp        ;return to kernel's...      (3)   |
03820  00D24D  5B                     tcd                   ;direct page                (2)   | —> 80
03821  00D24E  A9 00 00               lda !#0               ;clear .B                   (3)   |   5.00
03822  00D251  E2 20                  sep #m_seta           ;8-bit accumulator          (3)   |
03823  00D253  88                     dey                   ;point at API index &...    (2)   |
03824  00D254  B9 00 00               lda mm_ram,y          ;fetch it                   (4)   |
03825  00D257  4B                     phk                   ;return to...               (3)   |
03826  00D258  AB                     plb                   ;kernel's working bank      (4)   |
03827  00D259  3A                     dec A                 ;zero-align API index       (2)   |
03828  00D25A  C9 15                  cmp #maxapi           ;API index in range?        (3)   |
03829  00D25C  B0 11                  bcs .apierr           ;no, error                  (2+)  |
03830  ;
03831  00D25E  0A                     asl A                 ;yes, convert to service... (2)   |
03832  00D25F  AA                     tax                   ;lookup table offset        (2)   |
03833  00D260  FC 2C E7               jsr (apifntab,x)      ;run API processor          (8) ——
03834  00D263  E2 30                  sep #m_setr           ;assure 8-bit registers     (3) ——
03835  00D265  B0 0A                  bcs .pxerr            ;processing exception       (2+)  |
03836  ;
03837  00D267  C2 30         .done    rep #m_setr           ;16-bit registers           (3)   |
03838  00D269  AB                     plb                   ;restore MPU state          (4)   |
03839  00D26A  2B                     pld                   ;                           (5)   | —> 34
03840  00D26B  68                     pla                   ;                           (5)   |   2.13
03841  00D26C  FA                     plx                   ;                           (5)   |
03842  00D26D  7A                     ply                   ;                           (5)   |
03843  00D26E  40                     rti                   ;return to caller           (7) ——

Numbers to the right of the bracketed groups of instructions are the total number of clock cycles for that group and the theoretical execution time in microseconds.

With all that said, I don't think COP is the way to go. As Ed notes, floating point tends to be time-critical. For doing a direct page context switch when calling a floating point routine, I'd treat it as an in-line routine, not a subroutine or interrupt. Doing so makes the code slightly bigger, but also substantially faster. You should be able to just push the current value of DP (direct page register), load the register with the new direct page and be on your way. A common return for all user-called floating point routines would pull DP and then exit back to the caller.

Incidentally, I wonder what happened to Granati, the author of this floating point package. He was last here in January 2021, and was a regular visitor. He lives in Italy and as you may know, Italy was especially hard-hit by COVID... :?:

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

BigDumbDinosaur

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Jul 20, 2022 8:41 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8503
Location: Midwestern USA

tmr4 wrote:

I've added this floating-point package into my system (after porting the source to the CA65 assembler). The package requires its own direct page, using almost the entirety of it...

Meant to mention this in my previous post. It appears quite a bit of the direct page usage is for storing odds and ends. At the expense of a slight reduction in performance, you might be able to relocate many of those variables to absolute storage and thus avoid having to map a special direct page just for doing floating point work.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

tmr4

Post subject: Re: 128 bit Floating Point 65C816 implementation

Posted: Wed Jul 20, 2022 11:41 pm

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147

Thanks all. Looks like I'm headed in the right direction. Still, it was nice to figure out how the COP instruction worked.

BigDumbDinosaur wrote:

It appears quite a bit of the direct page usage is for storing odds and ends. At the expense of a slight reduction in performance, you might be able to relocate many of those variables to absolute storage and thus avoid having to map a special direct page just for doing floating point work.

Good idea, but care is needed. I tried this without success when I found out my own direct page wasn't big enough. I tried to move some of the temporary float registers (which are 20 bytes each) off the direct page but found that they're assumed to be contiguous by some portions of the code. To keep the port simple, I just used a separate direct page. The comments do mention that the data bank register needs set to Bank 0, implying that absolute addressing is being used in some cases, even though they're on the direct page.

And a funny aside from a 65816 newbie: I puzzled for a while when the linker failed with a range error when I tried to combine the floating-point direct page with mine. Since the direct page is movable, I hadn't really considered that it was still actually just a single page in length. Duh! With a single byte operand, what else would it be?

There might be other optimizations possible as well. The package is written using 8-bit registers, frequently switching to use a 16-bit accumulator (and index registers in some cases) and then back again. It seems some efficiency might be gained starting with 16-bit registers and switching to 8-bit registers when needed. I'd probably just to start from scratch though and streamline the whole package as 128-bit precision is way more than I really need anyway. Still porting is easier than writing from scratch, so I'll stick with this for a while. Thanks granati and 6502.org!

Edit (7/21/2022):

BigDumbDinosaur wrote:

The above uses the trick of pointing direct page to the stack. It does work pretty well, but I just knew

there had to be a faster method. Deep pondering led to the following code ...

I suppose you're referring to the overall COP service routine rather than just obtaining the signature byte alone, becuase for that it seems the former is faster than the latter. Extracting the essence of this for each and leaving the signature byte in the accumulator we have for the former:

Code:

        ; get COP signature byte
        tsc          ; 2 cycles  1 byte(s)
        tcd          ; 2         1
        dec ADDR     ; 8         2
        lda [ADDR]   ; 8         2
        and #$ff     ; 3         3
        inc ADDR     ; 8         2
                     ; 31        11

versus for the latter:

Code:

        ; get COP signature byte
        tsc          ; 2         1
        tcd          ; 2         1
        ldy ADDR     ; 5         2
        dey          ; 2         1
        sep #20      ; 3         2
        lda BB       ; 4         2
        pha          ; 3         1
        plb          ; 4         1
        rep #20      ; 3         2
        lda 0,y      ; 4         2
        and #$ff     ; 3         3
                     ; 35        18

And finally from mine above:

Code:

        ; get COP signature byte
        lda ADDR,s   ; 5         2
        dec          ; 2         1
        sta F2       ; 4         2
        lda BB,s     ; 5         2
        sta F2+2     ; 4         2
        lda [F2]     ; 7         2
        and #$ff     ; 3         3
                     ; 30        14

which is just a bit faster than the first, but longer. Now if we just had INC and DEC for the Stack Relative address mode (I've often wished for this):

Code:

        ; get COP signature byte
        dec ADDR,s   ; 8         2
        lda [ADDR]   ; 8         2
        and #$ff     ; 3         3
        inc ADDR,s   ; 8         2
                     ; 27        9

which is basically the first method without the switch of the direct page. Unfortunately, along those lines we have to settle for:

Code:

        ; get COP signature byte
        lda ADDR,s      ; 5         2
        dec             ; 2         1
        sta ADDR,s      ; 5         2
        lda [ADDR]      ; 8         2
        and #$ff        ; 3         3
        tax             ; 2         1
        lda ADDR,s      ; 5         2
        inc             ; 2         1
        sta ADDR,s      ; 5         2
        txa             ; 2         1
                        ; 39        17

which is worst of all (though you could save the TAX/TXA if you could wait until post to INC the return address to its original value). Still, it's not the best.

Top

Page 2 of 3

[ 44 posts ]

Go to page Previous 1, 2, 3 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 4 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum