I decided I should stop polluting that other thread, since no one there seemed to be interested... so here is a brand new thread dedicated to fast floating-point division
This is just meant to be an informative post/thread for anyone who might wonder how to do fast divisions.
It's possible to do really fast divisions by using logarithms (and multiply too, but division is usually a bigger problem) and subtract (or add for multiply). This is really basic stuff, but when I went to school they had stopped teaching this, and I had to figure it out myself. Floating-point numbers are especially suited because they are already halfway... logarithmic ??? (I don't know the proper terminology).
I have optimized my fast inaccurate floating-point routine further, the first version was a bit weird. It's now faster, uses less memory, and don't use selfmodifying code. It's now really simple: (In fact, it's so simple I'm not sure if it's worth posting it, but when I started figuring this out, it didn't seem so simple, so I'll post it in case anyone else feel the same way)
Code:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; ;
; ---Super fast approximated floating-point division--- ;
; ;
; 17 bytes of code ;
; 256 byte lookup table ;
; 6 bytes of zp variables ;
; ;
; 26 cycles to execute, not including any call/return ;
; 24 cycles if carry is known to be set ;
; 0% to ~6% error??? ;
; ;
; 16bit fp ;
; -8bit signed exponent (range: 3.4*10^38 to 2.9*10^-39?) ;
; -8bit normalized mantissa (9th implied bit is always 1) ;
; -no support for negative numbers ;
; -no way the represent 0 ;
; -no exponent overflow check ;
; ;
; Copyright 2007 Erik Lien ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;Ne numerator signed exponent
;Nm numerator normalised mantissa
;De denominator signed exponent
;Dm denominator normalised mantissa
;Re result signed exponent
;Rm result normalised mantissa
;-start of code-
sec
lda Nm
sbc Dm
tax
lda data,x
sta Rm
lda Ne
sbc De ;no overflow check, for max speed...
sta Re
; uncomment if optional sign byte is used, adds 9 cycles to execution time
; lda Ns
; eor Ds
; sta Rs
;-end of code-
data ;the look-up table is still not optimized/checked for errors....
db $00,$01,$01,$02,$03,$03,$04,$05,$06,$06,$07,$08,$08,$09,$0A,$0B
db $0B,$0C,$0D,$0D,$0E,$0F,$10,$10,$11,$12,$13,$13,$14,$15,$16,$16
db $17,$18,$19,$19,$1A,$1B,$1C,$1C,$1D,$1E,$1F,$1F,$20,$21,$22,$22
db $23,$24,$25,$26,$26,$27,$28,$29,$29,$2A,$2B,$2C,$2D,$2D,$2E,$2F
db $30,$31,$31,$32,$33,$34,$35,$36,$36,$37,$38,$39,$3A,$3A,$3B,$3C
db $3D,$3E,$3F,$40,$40,$41,$42,$43,$44,$45,$45,$46,$47,$48,$49,$4A
db $4B,$4C,$4C,$4D,$4E,$4F,$50,$51,$52,$53,$54,$54,$55,$56,$57,$58
db $59,$5A,$5B,$5C,$5D,$5E,$5F,$5F,$60,$61,$62,$63,$64,$65,$66,$67
db $68,$69,$6A,$6B,$6C,$6D,$6E,$6F,$70,$71,$72,$73,$74,$75,$76,$77
db $78,$79,$7A,$7B,$7C,$7D,$7E,$7F,$80,$81,$82,$83,$84,$85,$86,$87
db $88,$89,$8A,$8B,$8C,$8D,$8E,$90,$91,$92,$93,$94,$95,$96,$97,$98
db $99,$9B,$9C,$9D,$9E,$9F,$A0,$A1,$A2,$A4,$A5,$A6,$A7,$A8,$A9,$AB
db $AC,$AD,$AE,$AF,$B0,$B2,$B3,$B4,$B5,$B6,$B8,$B9,$BA,$BB,$BD,$BE
db $BF,$C0,$C1,$C3,$C4,$C5,$C7,$C8,$C9,$CA,$CC,$CD,$CE,$D0,$D1,$D2
db $D3,$D5,$D6,$D7,$D9,$DA,$DB,$DD,$DE,$E0,$E1,$E2,$E4,$E5,$E6,$E8
db $E9,$EB,$EC,$ED,$EF,$F0,$F2,$F3,$F5,$F6,$F8,$F9,$FB,$FC,$FE,$FF
I mentioned earlier that you could do it without any LUTs at all. Just remove two lines and your left with this:
Code:
sec
lda Nm
sbc Dm
sta Rm
lda Ne
sbc De ;no overflow check, for max speed...
sta Re
A simple 16 bit subtract, that's all... It simply assumes that the numbers are... logarithmic. The exponent already are, but the mantissa is not, so the error is quite large, the first code uses a lookup to compensate somewhat for that error, but is still very inaccurate. If more precision is needed, the mantissas should first be converted to logarithms before the subtraction, then converted back afterwards. I have a version of the code that does that too, I'll post it if anyone is interested. (but I'll probably end up posting it even if nobody is interested anyway...
)