6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Apr 28, 2024 8:36 am

All times are UTC




Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sun Jan 07, 2007 6:14 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I am in immediate need of a reasonable division routine for the 65816, and the first thing that popped into my head was:

http://6502.org/source/integers/ummodfix/ummodfix.htm

However, in looking at the code, it's all greek.

I need to understand how this code works inside and out so that I can write a set of unit tests for it, and so that I can adapt it primarily to the Kestrel's ABI (Application Binary Interface -- the rules and regulations governing how to pass parameters in and results out of functions and subroutines). However, after studying the aforementioned code, I'm left just as blind as I was before.

Garth, do you know of any websites that details how your code works at a more abstract level? The comments you provide are so mechanistic that it hides the actual intention of the meaning of the code, I think. I need more of the theory before I can understand what you've written.

Thanks.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jan 07, 2007 8:54 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Quote:
Garth, do you know of any websites that details how your code works at a more abstract level?

No, I dug into this when I traced some problems in my trig routines down to a faulty UM/MOD in the 6502 public-domain material years ago. There were no comments at all in the code I started with, and I have not seen any good explanations on the web about it. I had to figure it out myself.

BTW, years later I traced an FFT problem down to a faulty UM* which again came from the 6502 public-domain material. The conditions under which the bug shows up are rare enough that I went for years without realizing there was a bug at all. Since multiplication is a little easier than division, that one was probably a little easier to figure out without comments in the original code.

Quote:
I need more of the theory before I can understand what you've written.

I'll try to get to it tonight (Sunday night). You'll probably have it figured out by then though. :wink:


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jan 07, 2007 6:54 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
I sent optimized versions of Garth's routines to Mike a while back, but until Mike gets a chance to update the article (it makes sense for this to be low priority, since it's neither new material nor corrections, merely optimizations), I can e-mail it to you. Just drop me a line with an e-mail address to send it to. Not only are the routines both faster and smaller space-wise (especially the 65816 routine), but the article also describes why the original FIG-Forth routine doesn't work and why the new one does. It also goes through a short (8-bit divided by 4-bit to 4-bit quotient and 4-bit remainder) example to illustrate how the algorithm works. I also came up with a labelling convention that makes it easier to keep track of where things like the partial dividend, quotient, etc. are stored. With the original FIG-Forth routine, you kind of need a cheat sheet because an instruction like LDA 3,X isn't particularly conducive to remembering what 3,X represents. (Is it the quotient? Is it the remainder? What is it again?) I wound up replacing more of Garth's text than I had originally intended (sorry, Garth). It's an HTML file since it was intended to be an update to Garth's article.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jan 07, 2007 10:45 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
As I lost all Internet connectivity last night, and I had nothing better to do, I decided to try and reverse engineer from first principles how binary division works. It was vastly simpler than I thought.

Here's my implementation of the division routine, coded for the Kestrel's firmware.

Code:
.export Integer_unsignedDivide16Bits
.proc Integer_unsignedDivide16Bits
    pha
    phd
    tsc
    tcd

    ; DP locations:
    ; 01 = previous D register
    ; 03 = Denominator
    ; 05 = return address
    ; 07 = Numerator (low) / Quotient
    ; 09 = Numerator (high) / Remainder
 
    phx

    ldx #16
step:
        asl 7
        rol 9
        sec
        lda 9
        sbc 3
        bcc skipSubtraction
        sta 9
        inc 7
skipSubtraction:
    dex
    bne step
   
    plx
    pld
    pla
    rts
.endproc


How to use it (suppose we want to divide 15360 / 150):

Code:
PEA 0
PEA 15360
LDA #150
JSR Integer_unsignedDivide16Bits
PLA
STA quotient  ; should be 102
PLA
STA remainder   ; should be 60


The code does not fully conform to the Kestrel's calling conventions, where you allocate stack space for the results, but then never alter the inputs. But that's an easy enough problem to fix. Right now I'm concentrating on correctness.

I generated a huge number of unit tests to exercise the division routine via the Forth primitive interface (the primitive just translates Forth's ABI to the Kestrel's own ABI and calls the above function); about 3000 assembly lines worth. I didn't hand write it; I used a Forth script to generate the tests and expected results. I'm pleased to say that it actually works, so far as my x86 PC's own division results are concerned. :-)

Comments?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 08, 2007 1:40 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
Your routine has the "UM/MOD bug". (It's easy to get it wrong.) $80000000 / $0002 (or any other non-zero denominator, for that matter) gives a quotient of $0000 and a remainder of $0000. The high bit of $80000000 gets shifted into the carry by the ROL 9 instruction, then the subsequent SEC instruction overwrites the carry. After the ROL 9 instruction the routine has no way to tell if the numerator was $80000000 or $00000000. In short, you have to account for that carry. Not doing so results in the "UM/MOD bug".


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 08, 2007 2:50 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Perhaps so, but dividing $80000000 by $0002 will overflow the division anyway (the result is still a 32-bit quotient). Basically, the numerator(high) must be less than the denominator in order for it to not overflow. Thus, given a denominator of $0002, the largest number you can feed it and not overflow is $0001FFFF.

However, I see your point; dividing $80000000/$FFFF will produce zero. I'll see if I can't independently arrive at a solution, and "cheat" by looking at the article you sent me only if I need to. :)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 08, 2007 9:23 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
Code:
I am in immediate need of a reasonable division routine for the 65816, and the first thing that popped into my head was:

http://6502.org/source/integers/ummodfix/ummodfix.htm

However, in looking at the code, it's all greek.


The code basically does the normal thing you do when dividing on paper, but in binary.

1 shift the divisor left until it is larger then the dividend
2 shift divisor one right and check if it is larger then the dividend
- if it is larger or equal, substract shifted divisor from dividend add one to the result
3 shift result one left.
4 if divisor not yet has shifted the same #bits back right than it had been shifted left in step 1, go back to step 2.

Only the routine optimizes: in the first step it "shifts" the divisor left 16 bit by comparing it to the high word. and then shifting the dividend left instead of shifting the divisor right. When shifting the dividend left, it shifts in the result bit from the compare in step 2 (and so does step 3 in the same step)

Pretty slick, actually

Would have to be checked for boundary values though, as you already have discussed.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jan 08, 2007 6:28 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
It's already been checked. The whole reason that article exists is because the previous version of UM/MOD apparently wasn't checked, and exhibited bugs. :-)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Jan 18, 2007 7:44 pm 
Offline

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111
As "divide" was discussed!

Divide and multiply normally only makes sense for floating point applications. But for 6502 boards dedicated to some some special application it is certainly not necessary to have a full floating point system, it is enough to have the "floating point spirit". If one for example gets a mesurement as an unsigned 4 byte integer that should be scaled by some factor like for example 0.93 one simply uses the fact that 0.93 is $EE147AE1/$100000000. It is therefore fine to make an integer multiplication of the measurement value with $EE147AE1 resulting in an 8 byte value of which the 4 most significant bytes represent the scaled byte value. $EE147AE1 is of coarse the mantissa of 0.93 as floating point number!

An abolutely clean and 100% reliable "multiply" routine:


Code:
;TEST PROGRAM
    *=$033A
     LDA #$FF
     LDX #$08
LA   STA $AF,X
     DEX
     BNE LA
     JSR MULT
     BRK
;SUBROUTINE TO MULTIPLY TWO 4 BYTE NUMBERS
;THE NUMBERS TO BE PUT IN B0,B1,B2,B3 AND IN
;B4,B5,B6,B7
;
;THE PRODUCT IN BC,.,C3
;
;IF THE TWO NUMBERS ARE IN THE FORM "FLOATING POINT
;MANTISSA", I.E LEFT JUSTIFIED WITH THE MSB SET THE
;MANTISSA OF THE PRODUCT IS IN BC,BD,BE,BF
;
;B8,B9,BA,BB IS WORKING AREA

MULT LDA #$00
     LDX #$0D
L0   STA $B7,X
     DEX
     BNE L0
     LDY #$20    ;NUMBER OF BITS IN 4 BYTES
L1   LDX #$F8    ;ROLL RIGHT 8 BYTES
     CLC
L2   ROR $BC,X
     INX
     BNE L2
L3   ASL $B3
     ROL $B2
     ROL $B1
     ROL $B0
     BCC L5     
     LDX #$08    ;ADD CONTRIBUTION
     CLC
L4   LDA $B3,X
     ADC $BB,X
     STA $BB,X
     DEX
     BNE L4
L5   DEY
     BNE L1
     RTS


Intead of dividing with 1/0.93=1.07527 one better multiply with 0.93. If one really has to divide with some number like for example 3.57 on a board one could in a similar way use that 3.57 =$E47AE147/$40000000. The measurement value is multiplied by $40000000 by adding 4 less significant bytes to make it an 8 byte value,

B3,B2,B1,B0,00,00,00,00

Shifting this 8 byte value 2 steps to the right and finally "integer divide" this 8 byte value with $E47AE147 the desired value results. "Floating point spirit"!

An integer divide routine similar to the multiply routine above can easily be written, an 8 byte value to be divided by a 4 byte value. The 8 byte value is typically the "mantissa" extende by 4 less significant zero bytes. But until now I have seen no need/use of such a routine.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Jan 18, 2007 8:02 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I thoroughly disagree. Multiplication is very important when doing any kind of graphics works, or indeed, any kind of 2-D matrix manipulation (of which graphics is a specific subset of). Since the Kestrel is a graphical environment, it necessitates having a multiply routine.

Code:
addressContainingByte = (bitmapWidthInBytes * Y) + (X MOD 8) + baseAddressOfBitmapData


The MOD 8 is obviously trivial, but the multiply is still required. For fixed-width environments, this multiplication can be hardcoded. However, this is not the case for dealing with multiple, variable-width bitmaps.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Jan 18, 2007 8:04 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
kc5tja wrote:
I thoroughly disagree. Multiplication is very important when doing any kind of graphics works, or indeed, any kind of 2-D matrix manipulation (of which graphics is a specific subset of). Since the Kestrel is a graphical environment, it necessitates having a multiply routine.

Code:
addressContainingByte = (bitmapWidthInBytes * Y) + (X MOD 8) + baseAddressOfBitmapData


The MOD 8 is obviously trivial, but the multiply is still required. For fixed-width environments, this multiplication can be hardcoded. However, this is not the case for dealing with multiple, variable-width bitmaps.



I'm not sure to what post you're responding to, though --- the issue has never been multiplication (I have implemented a variation of Chuck Moore's multiply routine which handles both signed and unsigned numbers equally). The issue in question was that of division. SO many people just "leave it as an exercise to the reader" to figure out how to do division, but it's clear from my attempts, I fell into the same bugs that the FIG-Forth implementation does (despite my code being, in my opinion, much cleaner). So, as you can see, it's NOT as easy as one makes it out to be.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Jan 19, 2007 2:23 pm 
Offline

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111
OK, here is the matching bug-free and cleanily programmed division routine. In this test $AA00 is multipied with $F851EB85 and then the product is divided by $F851EB85 to find back $AA00

Code:
;TEST PROGRAM
;
;MULTIPLY AN UN-SIGNED 2 BYTE INTEGER (HERE $AA00) WITH 0.97 = $F851EB85/$100000000
    *=$033A
;THE TWO-BYTE INTEGER TO BE MULTIPLIED LEFT-JUSTIFIED PADDED WITH ZEROS
     LDA #$AA
     STA $B0
     STA $B1
     LDA #$00
     STA $B2
     STA $B3
;THE INTEGER $F851EB85
     LDA #$F8
     STA $B4
     LDA #$51
     STA $B5
     LDA #$EB
     STA $B6
     LDA #$85
     STA $B7
     JSR MULT
;THE SCALED 2-BYTE UNSIGNED VALUE IN $BC,.,&BF
;
;THE FULL RESULT OF THE MULTIPLICATION IN $BC,.,$C3
;DIVIDE IT WITH THE INTEGER $F851EB85
;THE RESULT WILL APPEAR IN $B0,..,$B3
     LDA #$F8
     STA $B4
     LDA #$51
     STA $B5
     LDA #$EB
     STA $B6
     LDA #$85
     STA $B7
     JSR DIV
;THE QUOATION $AA00 BACK IN $B0-$B3
     BRK
;SUBROUTINE TO MULTIPLY TWO 4 BYTE NUMBERS
;THE NUMBERS TO BE PUT IN $B0,$B1,$B2,$B3 AND IN
;$B4,$B5,$B6,$B7
;
;THE PRODUCT IN $BC,.,$C3
;
;IF THE TWO NUMBERS ARE IN THE FORM "FLOATING POINT
;MANTISSA", I.E LEFT JUSTIFIED WITH THE MSB SET THE
;MANTISSA OF THE PRODUCT IS IN $BC,$BD,$BE,$BF
;
;$B8,$B9,$BA,$BB IS WORKING AREA

MULT LDA #$00
     LDX #$0D
L0   STA $B7,X   ;SET WORK AREA TO ZERO
     DEX
     BNE L0
     LDY #$20    ;NUMBER OF BITS IN 4 BYTES
L1   LDX #$F8    ;ROLL RIGHT 8 BYTES
     CLC
L2   ROR $BC,X
     INX
     BNE L2
L3   ASL $B3
     ROL $B2
     ROL $B1
     ROL $B0
     BCC L5     
     LDX #$08    ;ADD CONTRIBUTION
     CLC
L4   LDA $BB,X
     ADC $B3,X
     STA $BB,X
     DEX
     BNE L4
L5   DEY
     BNE L1
     RTS
; SUBROUTINE TO DIVIDE A 8 BYTE NUMBER WITH A 4 BYTE NUMBER
; THE 8 BYTE NUMBER TO BE PUT IN $BC,.,$C3
; THE 4 BYTE NUMBER TO BE PUT IN $B4,.,$B7
;
; THE RESULT TO $B0,.,$B3
;
; IF THE 4 BYTE MANTISSA OF A FLOATING POINT NUMBER IS PUT IN THE MORE SIGNIFICANT
; PART OF THE 8 BYTE FIELD PADDED WITH ZEROS AND THE MANTISSA OF ANOTHER FLOATING
; POINT NUMBER IS PUT IN $B4,..,$B7 THE MANTISSA OF THE QUOATION CAN BE FOUND IN
; $B0,..,$B3
;
; $B8,$B9,$BA,$BB IS WORKING AREA

DIV  LDA #$00
     LDX #$04
L6   STA $B7,X   ;SET WORK AREA TO ZERO
     DEX
     BNE L6
     LDY #$20    ;NUMBER OF BITS IN 4 BYTES
L7   LDX #$F8    ;ROLL RIGHT 8 BYTES
     CLC
L8   ROR $BC,X
     INX
     BNE L8
     LDX #$F8
L9   LDA $C4,X
     CMP $BC,X
     BCC LC
     BNE LA
     INX
     BNE L9
LA   LDX #$08
     SEC
LB   LDA $BB,X
     SBC $B3,X
     STA $BB,X
     DEX
     BNE LB
; CARRY IS SET AT THIS POINT!
LC   ROL $B3     ;ROLL IN BINARY DIGIT     
     ROL $B2
     ROL $B1
     ROL $B0
     DEY
     BNE L7
     RTS


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Jan 19, 2007 5:11 pm 
Offline

Joined: Sun Aug 24, 2003 7:17 pm
Posts: 111
kc5tja writes:


Quote:
Perhaps so, but dividing $80000000 by $0002 will overflow the division anyway (the result is still a 32-bit quotient).


As "dividing" only really makes sense in the "floating point world", one should always first shift the denominator to have the most significant bit set. Then there is never any "overflow". The "exponent" is then handled outside the division routine!

kc5tja further writes:


Quote:
However, I see your point; dividing $80000000/$FFFF will produce zero. I'll see if I can't independently arrive at a solution, and "cheat" by looking at the article you sent me only if I need to


With my routine one gets $00 00 00 00 80 00 00 00 / $00 00 FF FF = $00 00 80 00 as should be ! (although not left justified as good practise would be!)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Jan 19, 2007 9:58 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I'm not sure where you get that dividing is applicable only in a floating point world. This is nonsense, and I would rather concentrate on the core issue at hand, rather than engage in a debate on applicability.

Forth specifies the presence of division operators (quite a few of them, actually), and even if it didn't, the */ operator still depends on division. Since */ is used extensively with fixed-point and scaled integer arithmetic, it would seem to disprove your accusation on division's domain of application.

Furthermore, I'm coding for a 65816, not a 6502. :-)

I will revisit this thread when I am ready for it. In the mean time, I am still in the process of moving and setting up my new apartment, so it might take some time.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Jan 19, 2007 11:02 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
To respond to what I think Mats is saying-- It's nice when you can avoid dividing by multiplying by the reciprocal; but you don't usually know what either one will be, even in embedded computers used for controlling equipment, until the program runs-- and of course it may be different every time it runs. One thing I'd like to further investigate for efficiency is a 256KB look-up table of reciprocals for the '816 which has a 16MB address range. A 16-bit input and 32-bit output makes for 64K 4-byte results, maintaining the precision from one end of the range to the other. A 128KB multiplication look-up table allows an 8x8 multiply to use as a speedier building block for larger multiplies. Then of course there's the squaring method which would only require a 512-byte look-up table but more manipulation, and other non-obvious methods. I haven't compared the efficiencies yet. (That sounds like Bruce's specialty! :wink: ) The overhead of floating-point can usually be avoided with skillful use of scaled-integer math however, even in very complex operations.


Last edited by GARTHWILSON on Sat Aug 21, 2010 9:13 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: