6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Mar 28, 2024 11:01 pm

All times are UTC




Post new topic Reply to topic  [ 1 post ] 
Author Message
PostPosted: Sun Jan 21, 2007 3:56 pm 
Offline

Joined: Wed Oct 22, 2003 4:07 am
Posts: 51
Location: Norway
I decided I should stop polluting that other thread, since no one there seemed to be interested... so here is a brand new thread dedicated to fast floating-point division :)

This is just meant to be an informative post/thread for anyone who might wonder how to do fast divisions.

It's possible to do really fast divisions by using logarithms (and multiply too, but division is usually a bigger problem) and subtract (or add for multiply). This is really basic stuff, but when I went to school they had stopped teaching this, and I had to figure it out myself. Floating-point numbers are especially suited because they are already halfway... logarithmic ??? (I don't know the proper terminology).

I have optimized my fast inaccurate floating-point routine further, the first version was a bit weird. It's now faster, uses less memory, and don't use selfmodifying code. It's now really simple: (In fact, it's so simple I'm not sure if it's worth posting it, but when I started figuring this out, it didn't seem so simple, so I'll post it in case anyone else feel the same way)

Code:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;                                                           ;
; ---Super fast approximated floating-point division---     ;
;                                                           ;
;  17 bytes of code                                         ;
; 256 byte lookup table                                     ;
;   6 bytes of zp variables                                 ;
;                                                           ;
; 26 cycles to execute, not including any call/return       ;
; 24 cycles if carry is known to be set                     ;
; 0% to ~6% error???                                        ;
;                                                           ;
; 16bit fp                                                  ;
; -8bit signed exponent (range: 3.4*10^38 to 2.9*10^-39?)   ;
; -8bit normalized mantissa (9th implied bit is always 1)   ;
; -no support for negative numbers                          ;
; -no way the represent 0                                   ;
; -no exponent overflow check                               ;
;                                                           ;
; Copyright 2007 Erik Lien                                  ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;Ne   numerator signed exponent
;Nm   numerator normalised mantissa
;De   denominator signed exponent
;Dm   denominator normalised mantissa
;Re   result signed exponent
;Rm   result normalised mantissa

;-start of code-

   sec
   lda   Nm
   sbc   Dm
   tax
   lda   data,x
   sta   Rm
   lda   Ne
   sbc   De      ;no overflow check, for max speed...
   sta   Re


;   uncomment if optional sign byte is used, adds 9 cycles to execution time
;   lda   Ns
;   eor   Ds
;   sta   Rs

;-end of code-

data   ;the look-up table is still not optimized/checked for errors....
   db $00,$01,$01,$02,$03,$03,$04,$05,$06,$06,$07,$08,$08,$09,$0A,$0B
   db $0B,$0C,$0D,$0D,$0E,$0F,$10,$10,$11,$12,$13,$13,$14,$15,$16,$16
   db $17,$18,$19,$19,$1A,$1B,$1C,$1C,$1D,$1E,$1F,$1F,$20,$21,$22,$22
   db $23,$24,$25,$26,$26,$27,$28,$29,$29,$2A,$2B,$2C,$2D,$2D,$2E,$2F
   db $30,$31,$31,$32,$33,$34,$35,$36,$36,$37,$38,$39,$3A,$3A,$3B,$3C
   db $3D,$3E,$3F,$40,$40,$41,$42,$43,$44,$45,$45,$46,$47,$48,$49,$4A
   db $4B,$4C,$4C,$4D,$4E,$4F,$50,$51,$52,$53,$54,$54,$55,$56,$57,$58
   db $59,$5A,$5B,$5C,$5D,$5E,$5F,$5F,$60,$61,$62,$63,$64,$65,$66,$67
   db $68,$69,$6A,$6B,$6C,$6D,$6E,$6F,$70,$71,$72,$73,$74,$75,$76,$77
   db $78,$79,$7A,$7B,$7C,$7D,$7E,$7F,$80,$81,$82,$83,$84,$85,$86,$87
   db $88,$89,$8A,$8B,$8C,$8D,$8E,$90,$91,$92,$93,$94,$95,$96,$97,$98
   db $99,$9B,$9C,$9D,$9E,$9F,$A0,$A1,$A2,$A4,$A5,$A6,$A7,$A8,$A9,$AB
   db $AC,$AD,$AE,$AF,$B0,$B2,$B3,$B4,$B5,$B6,$B8,$B9,$BA,$BB,$BD,$BE
   db $BF,$C0,$C1,$C3,$C4,$C5,$C7,$C8,$C9,$CA,$CC,$CD,$CE,$D0,$D1,$D2
   db $D3,$D5,$D6,$D7,$D9,$DA,$DB,$DD,$DE,$E0,$E1,$E2,$E4,$E5,$E6,$E8
   db $E9,$EB,$EC,$ED,$EF,$F0,$F2,$F3,$F5,$F6,$F8,$F9,$FB,$FC,$FE,$FF
I mentioned earlier that you could do it without any LUTs at all. Just remove two lines and your left with this:
Code:
   sec
   lda   Nm
   sbc   Dm
   sta   Rm
   lda   Ne
   sbc   De   ;no overflow check, for max speed...
   sta   Re
A simple 16 bit subtract, that's all... It simply assumes that the numbers are... logarithmic. The exponent already are, but the mantissa is not, so the error is quite large, the first code uses a lookup to compensate somewhat for that error, but is still very inaccurate. If more precision is needed, the mantissas should first be converted to logarithms before the subtraction, then converted back afterwards. I have a version of the code that does that too, I'll post it if anyone is interested. (but I'll probably end up posting it even if nobody is interested anyway... :P)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: