Page 3 of 5
Re: integer division and multiplication
Posted: Sun Mar 08, 2015 4:56 am
by barrym95838
Code: Select all
: # ( +d1 -- +d2)
base @ ud/mod rot dup 9 > 7 and + Ascii 0 + hold ;
If this is the
ONLY use you have for
UD/MOD in Pettil, it might be better to just code
# as a primitive, and embed a tailor-made (optimized just for
#) version of
UD/MOD. I think that's kind of what you were suggesting earlier, but it just now sank in. I'll whip up a primitive for
# and post it here when I get it at least half-baked.
[Update: Something like this (36 bytes, 11xx cycles):
Code: Select all
; - - - - - - - - - - - - - - - - - - - - - - - - - - -
; # ( ud1 -- ud2 )
; Extract least significant digit of ud1, convert to
; ASCII and save that digit in HOLD area.
; ud2 is the quotient of ud1 / BASE
; Undefined behavior for a BASE outside [2..20]
;
sharp:
lda #0 ;partial remainder
ldy #32 ;loop counter
sharp2:
asl stackl,x ;Dividend in TOS:NOS (h:l)
rol stackh,x ; is gradually replaced
rol TOS ; with the quotient
rol TOS+1 ;a is gradually replaced
rol ; with the remainder
cmp BASE ;partial remainder < BASE?
bcc sharp3 ; no: update the partial
sbc BASE ; remainder and partial
inc stackl,x ; quotient
sharp3:
dey
bne sharp2 ;loop 32 times
cmp #10
sed
adc #'0' ;adjust remainder to ASCII
cld
jsr pushya
jmp hold ;HOLD ;
UNTESTED!]
Mike B.
[Edit: It seems kind of silly for UD/MOD to end with two ROTs, especially after I find that its only application in the standard word set is followed by a third ROT!]
[Edit: Improved comments, corrected pushay -> pushya.]
Re: integer division and multiplication
Posted: Mon Mar 09, 2015 6:48 am
by theGSman
[Edit: It seems kind of silly for UD/MOD to end with two ROTs, especially after I find that its only application in the standard word set is followed by a third ROT!]
I'm surprised that you don't have a -ROT coded for such a purpose. I have found that I use this word so often that it is worthy of being a system word.
2PICK is also a worthy stack operator in its own right.
Re: integer division and multiplication
Posted: Mon Mar 09, 2015 7:25 am
by barrym95838
If by "system word" you mean machine-coded primitive, then you are most likely correct that it could be justified. I am quite new to Forth, so I don't really know how to juggle yet. My interest in Forth stemmed from my attempt to button-up the instruction set on my experimental hobby processor design (by writing a Forth for it). I must say that it's an interesting language, but I'm running into the old problem of "too many pans in the fire", so I need to pace myself or face a nervous breakdown from all the directions in which my brain is getting pulled.
Code: Select all
; DTC 65m32 Forth
; a = TOS, x = PSP, y = IP, s = RSP, u = UP
NOT_IMM '2PICK'
_2pick: ; ( x1 x2 x3 -- x1 x2 x3 x1 )
sta ,-x
lda 2,x
jmp (,y+) ; NEXT
NOT_IMM '-ROT'
_mrot: ; ( x1 x2 x3 -- x3 x1 x2 )
exa 1,x
jmp _swap
NOT_IMM 'PICK'
_pick: ; ( xu ... x2 x1 x0 u -- xu ... x2 x1 x0 xu )
add #,x
jmp _fetch
Mike B.
Re: integer division and multiplication
Posted: Mon Mar 09, 2015 12:21 pm
by theGSman
If by "system word" you mean machine-coded primitive, then you are most likely correct that it could be justified. I am quite new to Forth, so I don't really know how to juggle yet. My interest in Forth stemmed from my attempt to button-up the instruction set on my experimental hobby processor design (by writing a Forth for it). I must say that it's an interesting language, but I'm running into the old problem of "too many pans in the fire", so I need to pace myself or face a nervous breakdown from all the directions in which my brain is getting pulled.
I know what you mean about time management.
By "system word" I mean those words that are already available when you first fire up your forth system. You don't need to stress too much about which words you should include in your system since any words not already present can easily be coded. The makers of EFORTH claim that you can derive all your Forth words from a very small set of primitives. (For example, every single arithmetic operation can be derived from UM+ and XOR). You can even get away with not including an assembler since CODE words are just bytes COMMA'd in (the initial hand assembly might be a drag though).
So you can just have a skeletal Forth which loads additional words from files. I am thinking of going this way once my file management is completed.
Re: integer division and multiplication
Posted: Mon Mar 09, 2015 7:26 pm
by GARTHWILSON
The makers of EFORTH claim that you can derive all your Forth words from a very small set of primitives. (For example, every single arithmetic operation can be derived from UM+ and XOR).
True, but the performance will be terrible. They used to have such discussions back in the years when Forth Dimensions magazine was still being published, and the minimum seemed to be about 30 primitives; but again at a severe performance penalty. I believe the purpose was for maximum portability, ie, that there's little to re-write if you go to another processor. In the example theGSman gave above of -ROT, it's four times as fast in ITC as a primitive as it is as a secondary. For multiplication and division, I would expect an even much greater difference.
Re: integer division and multiplication
Posted: Mon Mar 09, 2015 8:59 pm
by theGSman
If I needed an arithmetic or stack operator word that didn't exist in my system then I would be coding it in assembly since it usually is almost as easy. Of course, knowing that you can quickly whip up a high level version of the word for testing purposes is still an advantage (and you may elect to keep it that way if time is not an issue).
The point was that you don't have to agonize over which words you include in your Forth system since Forth is so easily extendable.
Re: integer division and multiplication
Posted: Mon Mar 09, 2015 9:23 pm
by Brad R
They used to have such discussions back in the years when Forth Dimensions magazine was still being published, and the minimum seemed to be about 30 primitives; but again at a severe performance penalty. I believe the purpose was for maximum portability, ie, that there's little to re-write if you go to another processor.
Correct. (We used to have these discussions at Forth conferences, too.) eForth was designed to have only about 30-32 primitives, as I recall, for quickest porting. But its multiplication was extremely slow.
For CamelForth I wanted a small set of primitives, for ease of porting, but large enough to give good performance. So I ended up with about 70 primitives (including unsigned multiply and divide).
At the other extreme is gForth, which I think has hundreds and hundreds of primitives.
Re: integer division and multiplication
Posted: Tue Mar 10, 2015 2:06 am
by nyef
They used to have such discussions back in the years when Forth Dimensions magazine was still being published, and the minimum seemed to be about 30 primitives; but again at a severe performance penalty. I believe the purpose was for maximum portability, ie, that there's little to re-write if you go to another processor.
Correct. (We used to have these discussions at Forth conferences, too.) eForth was designed to have only about 30-32 primitives, as I recall, for quickest porting. But its multiplication was extremely slow.
For CamelForth I wanted a small set of primitives, for ease of porting, but large enough to give good performance. So I ended up with about 70 primitives (including unsigned multiply and divide).
At the other extreme is gForth, which I think has hundreds and hundreds of primitives.
As a data point, my current 32-bit x86 ITC Forth (not particularly optimized in any direction) has 73 CODE words, seven of which are for a Linux syscall interface.
Re: integer division and multiplication
Posted: Tue Mar 10, 2015 2:38 am
by barrym95838
Hey guys,
After staring at my UD/MOD for longer than I should have (because it hasn't been properly tested yet) I came up with this for my second stab ... 49 code bytes, and still roughly 1800 cycles:
Code: Select all
; - - - - - - - - - - - - - - - - - - - - - - - - - - -
; UD/MOD ( ud1 u1 -- u2 ud2 )
; Dividend ud1 and Quotient ud2 are 32-bits unsigned
; Divisor u1 and Remainder u2 are 16-bits unsigned
; Invalid inputs and/or outputs are silently ignored
;
udslashmod:
ldy #32 ; init loop counter
lda #0 ; init partial remainder
sta N ; in N:A (h:l)
udsm2:
asl stackl+1,x ; dividend in NOS:3OS (h:l)
rol stackh+1,x ; is gradually replaced
rol stackl,x ; with the quotient
rol stackh,x
rol ; N:A is gradually replaced
rol N ; with the remainder
pha
cmp TOS ; TOS holds the divisor
lda N ; partial remainder >= TOS?
sbc TOS+1
bcc udsm3
sta N ; yes: update the partial
pla ; remainder and set the
sbc TOS ; low bit in the partial
pha ; quotient
inc stackl+1,x
udsm3:
pla
dey
bne udsm2 ; loop 32 times
ldy N
jsr put
jsr rot
jmp rot
I golfed it up the best I could ... but it's still untested. Any volunteers?
Mike B.
Re: integer division and multiplication
Posted: Tue Mar 10, 2015 2:13 pm
by chitselb
Mike,
I'm looking at your code and my inner 6502 Lee Trevino is wondering if it's even possible to rearrange things a bit so that the asl stackl+1,x and inc stackl+1,x are combined into this single instruction: rol stackl+1,x (after setting up the C flag with the compare/subtract). I fooled around with it for a couple hours in the VICE PET debugger, no joy yet.
Re: integer division and multiplication
Posted: Tue Mar 10, 2015 3:11 pm
by barrym95838
Yeah, it's so tempting to combine the
inc and the
asl into an
rol, but I have found that it's harder than it looks, because ... well, you probably know, since you've been trying it too. I was actually successful in doing it in my
19-byte 40/MOD, but only by sacrificing the possibility of odd divisors.
I'm a Tom Watson fan, by the way. I hope that you don't hold that against me.
Mike B.
[Update: Now slightly faster, with a BIT zp naked op-code trick ... should be safe on >99% of the hardware out there in the field:
Code: Select all
; UD/MOD ( ud1 u1 -- u2 ud2 )
; Dividend ud1 and Quotient ud2 are 32-bits unsigned
; Divisor u1 and Remainder u2 are 16-bits unsigned
; Invalid inputs and/or outputs are silently ignored
;
udslashmod:
ldy #32 ; init loop counter
lda #0 ; init partial remainder
sta N ; in N:A (h:l)
udsm2:
asl stackl+1,x ; dividend in NOS:3OS (h:l)
rol stackh+1,x ; is gradually replaced
rol stackl,x ; with the quotient
rol stackh,x
rol ; N:A is gradually replaced
rol N ; with the remainder
pha
cmp TOS ; TOS holds the divisor
lda N ; partial remainder >= TOS?
sbc TOS+1
bcc udsm3
sta N ; yes: update the partial
pla ; remainder and set the
sbc TOS ; low bit in the partial
inc stackl+1,x ; quotient
.db $24 ; BIT zp naked opcode
udsm3:
pla
dey
bne udsm2 ; loop 32 times
ldy N
jsr put
jsr rot
jmp rot
]
Re: integer division and multiplication
Posted: Wed Mar 11, 2015 1:22 pm
by chitselb
Using BIT zp takes 3 cycles, 1 byte. Using PHA takes 3 cycles, 1 byte. Which more clearly describes the intent of the code? Just sayin'
Re: integer division and multiplication
Posted: Wed Mar 11, 2015 3:12 pm
by Dr Jefyll
Using BIT zp takes 3 cycles, 1 byte. Using PHA takes 3 cycles, 1 byte.
Hmmm, yes... But the BIT zp causes the following PLA not to execute, so the net result is a 4 cycle saving doing it Mike's way.
a BIT zp naked op-code trick ... should be safe on >99% of the hardware out there in the field
Right -- the safety issue arises because BIT zp touches an address which may contain a read-sensitive IO register (example: the IFR on a 6522). But, since it's just a single byte you're skipping over, not two, you can avoid the danger entirely -- and save 1 more cycle -- by substituting a different naked opcode. Use BIT
immediate (or CMP/CPX/CPY immediate if you prefer).

Re: integer division and multiplication
Posted: Wed Mar 11, 2015 3:26 pm
by barrym95838
Correct on both points, as usual, Jeff. The idea was to save 4 cycles for every
'1' bit in the quotient by doing a
BIT $68 instead of a
PHA PLA. That
.db $24 should be replaced with a
.db $c9 (CMP #) for an additional (very slight) speed increase, and
100% compatibility.
Thank you sirs,
Mike B.
[Edit: Unless someone feels like busting my balloon, I must say that it feels really good straying off the beaten (UD/MOD) path and finding a little nugget in the rough, on a processor that has been around for so long. It feels like I missed the fairway on my tee shot, but came up with a birdie anyway.]
[Edit 2: Looking at CH's
github source, it appears that I shouldn't
JSR to
put, because it ends with
NEXT, not
RTS. There is a
-ROT though, so the end should probably change from:
to
]
Re: integer division and multiplication
Posted: Mon Mar 16, 2015 4:06 pm
by chitselb
Here's the (nominally tested) UD/MOD code.
It weighs in at 53 or 67 bytes, depending on whether it uses PETTIL's extended Sweet-16 to push the results back on the stack. The inner loop executes 32.5 times (we start in the middle) at either [54] or [59] clocks, or [1836] clocks total (on average).
The one-time setup/teardown code includes an additional performance penalty of 17+27x2 = [71 clocks] for 'jsr locals', to move the Dividend off of the stack, but then the 'ROR zp' (x6 x33 x1 clock) saves a total of [196 clocks] more than making up the difference. To return everything in the proper order requires moving the Dividend out of the way anyway. It seems a little crazy to squeeze a bunch of clocks with a tight loop and then squander them on Sweet-16, but it saves 14 bytes vs. doing the one-time exit code in assembler.
Code: Select all
UD/MOD ( ud1 u2 – u3 ud4 )
udslashmod
ldy #2
jsr locals ; dividend to n0..n3
;clc
;ldy #0 ; locals does this for us
stx storex
ldx #33
tya
beq udslashmodb ; bra to initialization code
udslashmoda
rol n+4 ; [5] I don't know what this 2-bytes is called
rol n+5 ; [5] trial minuend or something like that
sec ; [2]
lda n+4 ; [3]
sbc tos ; [3]
tay ; [2]
lda n+5 ; [3]
sbc tos+1 ; [3]
bcc udslashmodc ; save or abandon the trial subtraction
udslashmodb
sty n+4 ; [2+3+3] or [3]?
sta n+5 ; treating this as [~6] clocks on average
udslashmodc
rol n+0 ; [5]
rol n+1 ; [5]
rol n+2 ; [5]
rol n+3 ; [5]
dex ; [2]
bne udslashmoda ; [3]
; [54] or [59] clocks for 33x main loop = [1839]
ldx storex
; this is the end [22-bytes]
ldy n+5
sty tos+1
lda n+4
sta tos ; put remainder on stack
ldy n+1
lda n+0
jsr push6502 ; push quotient(lo) on stack
ldy n+3
lda n+2
jmp pushya ; push quotient(hi) on stack
; alternate ending [8-bytes]
brk
.byt ld | N2
.byt st | TOS
.byt ld | N0
.byt push
.byt ld | N1
.byt push
.byt nxt