chitselb wrote:
The UD/MOD base primitive approach was what I first considered and implemented. I came across
this in VolksForth, which builds UD/MOD on top of UM/MOD , instead of the other way around. This is cool because, in my opinion, 32/32-bit division is seldom invoked and will be even more rare in my own application code, so I don't want to punish the user on every occurrence of division by crunching the whole 32-bits using UD/MOD as the basic primitive. Also UD/MOD isn't part of the Fig, Forth-79, Forth-83 or ANS standards, while UM/MOD is in the required wordset for Forth-83, aka U/MOD in Forth-79.
Code:
: ud/mod ( ud1 u2 -- urem udquot )
>r 0 r@ um/mod r> swap >r um/mod r> ;
I came across an idea in Blazin' Forth where a primitive used by other primitives is implemented as a subroutine, the first six bytes being a JSR to the actual code of the routine followed by a JMP to NEXT.
Here is how to implement
UD/MOD, for an ITC Forth, in terms of
UM/MOD without incurring much of a speed penalty. First implement
UM/MOD so its body is a subroutine. This only adds 12 cycles to its execution time.
Code:
CODE UM/MOD ( UD U1 -- U2 U3 )
HERE 6 + JSR, NEXT JMP,
<actual code for UM/MOD algorithm>
.
.
.
RTS, END-CODE
Then call the actual code for the
UM/MOD algorithm from
UD/MOD.
Code:
HEX
CODE UD/MOD ( UD1 U1 -- U2 UD2 )
DEX, DEX,
2 ,X LDA, N 2+ STA, 0 ,X STA,
3 ,X LDA, N 3 + STA, 1 ,X STA,
2 ,X STY, 3 ,X STY,
' UM/MOD @ 6 + JSR,
0 ,X LDA, PHA,
1 ,X LDA, PHA,
N 2+ LDA, 0 ,X STA,
N 3 + LDA, 1 ,X STA,
' UM/MOD @ 6 + JSR,
PLA,
PUSH JMP, END-CODE
I tried implementing
UD/MOD as a primitive with its own division loop, it didn't use
UM/MOD. What surprised me was that this way ( the code above ) was not only smaller, but faster as well. Hope this helps.
Cheers,
Jim
Edit: Oops, I forgot about your split stack.
Edit: I forgot to mention that
UM/MOD does
not use N+2 or N+3.