Page 2 of 5
Posted: Thu Oct 14, 2010 1:58 am
by dclxvi
Here it is in all it's "I just now got it working" glory, my 32-bit unsigned division code. My first thought is, it's gigantic.
A couple thoughts about this routine I never got around to posting:
1. Is it really necessary to save/restore zi, zi+1, etc. on the stack?
2. You can pre-shift the numerator left rather the denominator right. Then you're only subtracting two bytes inside the (divmod02) loop instead of four. See also the UM/MOD routine (though it doesn't do any pre-shifting) at:
http://6502org.wikidot.com/errata-software-figforth
Posted: Thu Oct 14, 2010 1:18 pm
by chitselb
1. Is it really necessary to save/restore zi, zi+1, etc. on the stack?
The zi...zi+3 is a 4-byte zeropage area where the innermost DO LOOP index and limit are kept, for speed. So division within a loop would require this. I also took another look at Blazin' Forth's UM/MOD and noticed that it gets the job done properly with a 32/16=16R16 approach (my test is -1 -1 UD. should print 4294967295)
Posted: Mon Oct 18, 2010 5:59 am
by GARTHWILSON
On a semi-related note, I am thinking about implementing a complete suite of floating point, based on the simple expedient of calling the PET ROM math routines and passing things around as 24-bit values on the stack.
I will mention that there are small (18- & 28-pin DIP & SOIC), serially interfaced 32-bit floating-point coprocessors avaiable at http://www.awce.com/pak1.htm .

Posted: Sat Feb 04, 2012 7:31 am
by GARTHWILSON
Doggone! How 'bout a comparison? I just got the equivalent code for UM/MOD for the PIC16 microcontrollers, ie, unsigned 32-bit/16-bit with 16-bit quotient and remainder, from Microchip's Embedded Control Handbook update: 237 instructions to do what the 6502 does in 35!
Re: integer division and multiplication
Posted: Thu Mar 05, 2015 10:54 pm
by chitselb
In the wake of folding Mike's excellent 40/MOD code and rewriting UM/MOD (unsigned 32/16 = rem16 quot16), I started giving the fisheye to UD/MOD (unsigned 32/16 = rem16 quot32) but couldn't figure out how to get rid of it because D. and DU. both require the 32-bit quotient to work.
Any ideas?
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 5:11 am
by GARTHWILSON
Give it the fisheye? Does that mean get rid of it? It doesn't take much memory when defined in high-level, in terms of UM/MOD, as opposed to making it a primitive. I did it in assembly for the PIC16 for a big work project last year because I couldn't find what I needed in Microchip's pre-written toolbox. I kind of sprained my head, but I finally got it done. I found that you need the whole dividend to be the number of bits in the divisor plus the number of bits in the quotient; so for UD/MOD, you'll pad the input to get there. Something that may help, if indeed you want to do UD/MOD as a primitive for U. and friends, is that it's only used for dividing by 16 or less, never a number over 255.
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 5:57 am
by barrym95838
Camel Forth (thanks to Dr. Brad):
Code: Select all
: UD/MOD ( ud1 u2 -- u3 ud4 ) \ 32/16->32 divide
>R 0 R@ UM/MOD ROT ROT R> UM/MOD ROT
;
Mike B.
[Edit: Looks kinda slow ... I'll see if I can conjure up a much faster primitive, but I doubt that it'll be as space-efficient. Are you pressed for space?]
[Edit: I'll give you a rough prediction of about 70-something bytes of native 6502 code for the primitive. Say the word, and I'll give it the old college try ... ]
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 8:31 am
by chitselb
I should have said stinkeye meaning I would like to get rid of the UD/MOD (32-bit quotient) primitive. There will be a UM/MOD (16-bit quotient) primitive, but having both is overkill. I'll build / /MOD FM/MOD */MOD as secondaries on top of UM/MOD .
Implementing UD/MOD in high-level Forth would substantially slow down numeric output. Right now it takes 81 jiffies (60ths of a second) to perform : FOO 100 0 DO I . LOOP ; in PETTIL, and I haven't tried it in Blazin' Forth. ( JIFFY@ FOO JIFFY@ D- D. )
The *only* place in the code where UD/MOD is *ever* called is from within # (sharp) for numeric output. It's fair to require that 2<=BASE<=36 because [0..9A..Z] are the only characters to (reasonably) output a number in human-readable form on a PET. Of these, I commonly use only HEX and DECIMAL. A special-purpose helper primitive will probably do the trick. I'm calling it (#) ( ud1 base -- ud2 'digit' ) pronounced 'paren-sharp'. This inner (#) primitive will JSR to reuse the same code that's in the 40/MOD primitive, but with BASE @ (instead of 40) as the divisor. I'll have to go back to the fat version of 40/MOD in case the BASE is an odd number, but no biggie there. Here's the high-level code for sharp:
Code: Select all
: # (+d1 -- +d2 )
\ d1 is divided by BASE and the quotient is d2.
\ The remainder is appended to the numeric output string via HOLD
BASE @ ( +d1 base )
(#) ( +d2 'digit' )
HOLD ; ( +d2 )
And here's the (currently working) sharp code
: # BASE @ UD/MOD ROT >6502
TOS LDA,
SED,
10 # CMP,
ASCII 0 # ADC,
CLD,
TOS STA,
TOFORTH JSR,
HOLD ;
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 12:21 pm
by theGSman
Camel Forth (thanks to Dr. Brad):
Code: Select all
: UD/MOD ( ud1 u2 -- u3 ud4 ) \ 32/16->32 divide
>R 0 R@ UM/MOD ROT ROT R> UM/MOD ROT
;
Wouldn't it be better to do it the other way around?
ie.
My own UM/MOD routine (shown below) pretty much did 32-bit division although it only pushed the lower 16-bit result on the stack. It would be a simple matter to make it a UD/MOD instead (and I am thinking of doing that).
BTW with Forth-79, Forth-83, Fig Forth and ANSI Forth (not to mention all the individuals who make up their own standards) there are too many standards to make it worth while to just pick one. I prefer to just provide UM* and UM/MOD (or UD/MOD) and leave it to the user to deal with signed multiplication and division.
Code: Select all
;
; UM/MOD - ( ud u1 -- u2 u3)
; An unsigned division of the 32 bit IACC (ud)
; by the 32 bit IOPR (u1 extended to 32 bits)
; REMAINDER IS u2 (IACCl)
; AND QUOTIENT IS u3 (IQUOTl moved to IACCh)
;
UMDMOD TXA ; +---------------+
SEC ; +12 | h udl |
SBC #6 ; +11 | l u2 | IACCl
TAX ; +---------------+
LDA #0 ; +10 | h udh |
STA STACK+1,X ; +9 | l | IACCh
STA STACK+2,X ; +---------------+
STA STACK+3,X ; +8 | h u1 |
STA STACK+4,X ; +7 | l | IOPRh
STA STACK+5,X ; +---------------+
STA STACK+6,X ; +6 | h 0 |
LDY #16 ; +5 | l | IOPRl
; IOPR shifted 16 bits already ; +---------------+
_SHIFTL LDA STACK+8,X ; +4 | h u3 |
BMI _LOOP ; +3 | l | IQUOTl
ASL STACK+7,X ; +---------------+
ROL STACK+8,X ; +2 | h 0 |
INY ; +1 | l | IQUOTh
BNE _SHIFTL ; +---------------+
_LOOP LDA STACK+10,X ; X-->| |
CMP STACK+8,X ;
; Check if IOPR > IACC ;
BCC _SHIFTR ;
BNE _SUBTR ;
LDA STACK+9,X ;
CMP STACK+7,X
BCC _SHIFTR
BNE _SUBTR
LDA STACK+12,X
CMP STACK+6,X
BCC _SHIFTR
BNE _SUBTR
LDA STACK+11,X
CMP STACK+5,X
BCC _SHIFTR
_SUBTR SEC ; If IOPR <= IACC THEN
LDA STACK+11,X ; subtract IOPR from IACC
SBC STACK+5,X ; and shift 1 into IQUOT
STA STACK+11,X
LDA STACK+12,X
SBC STACK+6,X
STA STACK+12,X
LDA STACK+9,X
SBC STACK+7,X
STA STACK+9,X
LDA STACK+10,X
SBC STACK+8,X
STA STACK+10,X
_SHIFTR ROL STACK+3,X ; else shift 0 into IQUOT
ROL STACK+4,X
ROL STACK+1,X
ROL STACK+2,X
LSR STACK+8,X ; shift IOPR right
ROR STACK+7,X
ROR STACK+6,X
ROR STACK+5,X
DEY ; continue counted shifts and subtracts
BPL _LOOP
LDA STACK+3,X ; copy IQUOT(L) TO right spot on stack
STA STACK+9,X ; (remainder already in right place)
LDA STACK+4,X
STA STACK+10,X
TXA ; CLEAN UP STACK
CLC
ADC #8
TAX
RTS
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 4:27 pm
by barrym95838
Wouldn't it be better to do it the other way around?
ie.
That looks a lot cleaner, but would slow down UM/MOD considerably. If you have horsepower to spare, it's no big deal, but CH is using "quaint" hardware, and UD/MOD is a bit expensive.
Mike B.
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 4:50 pm
by chitselb
The UD/MOD base primitive approach was what I first considered and implemented. I came across
this in VolksForth, which builds UD/MOD on top of UM/MOD , instead of the other way around. This is cool because, in my opinion, 32/32-bit division is seldom invoked and will be even more rare in my own application code, so I don't want to punish the user on every occurrence of division by crunching the whole 32-bits using UD/MOD as the basic primitive. Also UD/MOD isn't part of the Fig, Forth-79, Forth-83 or ANS standards, while UM/MOD is in the required wordset for Forth-83, aka U/MOD in Forth-79.
Code: Select all
: ud/mod ( ud1 u2 -- urem udquot )
>r 0 r@ um/mod r> swap >r um/mod r> ;
For general purpose division, I'm thinktanking an approach to UM/MOD that shifts the divisor left until it's >= the dividend, then do the usual trial-subtract from the dividend and right-shift the divisor thing until the divisor is back to its original alignment. Counting the divisor shifts and counting them back down. If something like this already exists, or anybody here is bored, or if this is a really dumb idea, I'm all ears. The reason for this is that numerator and denominator values are small values most of the time, and operating on all those leading zero bits is a waste of CPU. If somewhat bigger, more complicated code can avoid iterating through trial-subtract for each bit alignment every time through, I'd consider replacing the smaller, simpler UM/MOD code.
Most commonly I'll be using 40/MOD which is a specialty division by the constant 40, to index into the 40-column wide screen on a PET or C=64, and for unpacking text encoded in the
RADIX-50 scheme. Pictured numeric output e.g. <# #S #> also requires division, so I'll overload the 40/MOD primitive a bit to work with BASE @ as the divisor for that purpose. All the other standard general-purpose division and modulo words, including UD/MOD will be built on top of UM/MOD. I'm going to use floored division (remainders are always positive) as per the Forth-83 standard.
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 5:26 pm
by theGSman
That looks a lot cleaner, but would slow down UM/MOD considerably.
Unfortunately, my routine already has that speed penalty since I couldn't think of a way to do the division without making the divisor 32 bits anyway.
The only slowdown for my code would be the 2 INX instructions (4 cycles) and the JSR/RTS pair (6+6 cycles) for a total of 16 cycles. The UD/MOD instruction would also have a additional two LDA/STA instructions adding a further 16 cycles. That's small bikkies when you consider all the compares and shifts involved in the division.
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 5:34 pm
by theGSman
For general purpose division, I'm thinktanking an approach to UM/MOD that shifts the divisor left until it's >= the dividend, then do the usual trial-subtract from the dividend and right-shift the divisor thing until the divisor is back to its original alignment. Counting the divisor shifts and counting them back down. If something like this already exists, or anybody here is bored, or if this is a really dumb idea, I'm all ears. The reason for this is that numerator and denominator values are small values most of the time, and operating on all those leading zero bits is a waste of CPU.
That is pretty much how my routine works (see above) except that I shifted the divisor all the way to the MSB (simpler than than comparing the relative sizes after each shift and not much of a time penalty). As a bonus, the division is done entirely on the stack which improves re-entrancy.
Unfortunately, my approach makes the whole thing slower when dealing with small divisors.
Re: integer division and multiplication
Posted: Fri Mar 06, 2015 6:00 pm
by chitselb
Skipping over 8 leading 0-bits at a time during the initial shift-left phase is a matter of a well-placed BEQ instruction
Re: integer division and multiplication
Posted: Sat Mar 07, 2015 5:44 am
by barrym95838
Hey guys,
My initial stab at UD/MOD came out better than I expected: 56 bytes, ~18xx cycles (plus two ROTs)
This is completely untested!
Code: Select all
; - - - - - - - - - - - - - - - - - - - - - - - - - - -
; UD/MOD ( ud1 u1 -- u2 ud2 )
; Dividend ud1 and Quotient ud2 are 32-bits unsigned
; Divisor u1 and Remainder u2 are 16-bits unsigned
; Invalid inputs and/or outputs are silently ignored
;
udslashmod:
lda #0
sta N ;initialize partial remainder
sta N+1 ; in N, N+1
ldy #32 ;loop counter
udsm2:
asl stackl+1,x ;Dividend in NOS:3OS (h:l)
rol stackh+1,x ; is gradually replaced
rol stackl,x ; with the quotient
rol stackh,x
rol N ;N is gradually replaced
rol N+1 ; with the remainder
lda N
cmp TOS ;TOS holds divisor
lda N+1 ;partial remainder >= TOS?
sbc TOS+1
bcc udsm3
sta N+1 ; yes: update the partial
lda N ; remainder and set the
sbc TOS ; low bit in the partial
sta N ; quotient
inc stackl+1,x
udsm3:
dey
bne udsm2 ;loop 32 times
lda N
ldy N+1 ;remainder goes to TOS
jsr put
jsr rot ;ROT
jmp rot ;ROT ;
We could make it about 10% faster by copying NOS and 3OS to N+... and back, but that would make the code more than 10% longer, and I don't think that it's worth the effort.
Please let me know if I goofed up somewhere, and I'll correct it here.
Mike B.
[Edit: adjusted stackl and stackh offsets to the way I think CH is doing it.]
[Edit: fixed obvious N->TOS bug, borrowed CH's PUT, and adjusted code size claim to 56 bytes.]