Page 1 of 3
Simple 32 bit multiply
Posted: Thu Dec 21, 2017 9:37 pm
by jsii
The following is a simple 32 bit multiply routine.
I'm sure there are better ways to approach this problem,
but I think material for comparison and study is always useful.
77 bytes.
Code: Select all
;=================================================================
;=================================================================
;=================================================================
;32bit multiply
;(not thoroughly tested)
;Uses .AXY +4
;
;Operation: C = A * B (actually, C = C + A * B)
; where C = 64 bit result
; A = 32 bit #1
; B = 32 bit #2
;
; *** #1 changed
;
;Howto:
;
;$1500 - $1503 = #1 (SET $1504-$1507 TO 00) ('A')
;$1510 - $1513 = #2 ('B')
;$1520 - $152f = 00 ('C')
;
; -> CT32X01 -> $1520 - $1527 = #1 * #2
;
;Ex.: (assuming $1500-152F = 00)
;
; LDA #$07
; LDX #$03
; STA $1500
; STX $1510
; JSR CT32X01
; LDA $1520
; LDX $1521
; LDY $1522
; ; .. Here, .Y, .X, .A = first 24 bits of result
; ; .. so that .Y * 65536 + .X *256 + .A = first 24 bits result number value
;
CT32X01:
LDX $1510
JSR ACX02
LDX $1511
JSR ACX02
LDX $1512
JSR ACX02
LDX $1513
ACX02:
LDY #$08
YMXE02:
TXA
LSR
TAX
BCC YX02
PHA
LDX #$00
CLC
PHP
XMXE02:
PLP
LDA $1500,X
ADC $1520,X
STA $1520,X
PHP
INX
CPX #$08
BNE XMXE02
PLP
PLA
TAX
YX02:
ASL $1500
TXA
PHA
LDX #$01
PHP
XMXE0202:
PLP
ROL $1500,X
PHP
INX
CPX #$08
BNE XMXE0202
PLP
PLA
TAX
DEY
BNE YMXE02
RTS
;
;=================================================================
;=================================================================
;=================================================================
;>>> END ROUTINE
Re: Simple 32 bit multiply
Posted: Fri Dec 22, 2017 10:25 am
by Klaus2m5
If you compare your routine to this one:
http://www.6502.org/source/integers/32muldiv.htm you'll find that it does the multiply top down. This eliminates the need to expand one of the operands to 64 bit and at the same time reduces the add routine to 32 bit. It is only larger than yours to avoid most indexed loops and improve the cycle count.
On the other hand not doing the 32 bit shift for the multiplier but dividing it into 8 bit single shift groups may actually be faster. However in the top down multiply the multiplier could be overlapped with the lower 32 bits of the product avoiding the extra multiplier shifts completely.
edit: to ilustrate I modified the top down multiply as follows.
Code: Select all
;32 bit multiply with 64 bit product
.data
PROD: .ds 8 ;64 bit product
MULR: .equ PROD ;32 bit multiplier overlaps product
MULND: .ds 4 ;32 bit multiplicand
.code
MULTIPLY: lda #$00
sta PROD+4 ;Clear upper half of
sta PROD+5 ;product
sta PROD+6
sta PROD+7
ldx #$21 ;Set binary count to 32+1
bne START_R
SHIFT_R:
; lsr MULR+3 ;Shift multiplyer right
; ror MULR+2 ; now part of product shift
; ror MULR+1
; ror MULR
bcc ROTATE_R ;Go rotate right if c = 0
lda PROD+4 ;Get upper half of product
clc ; and add multiplicand to
adc MULND ; it
sta PROD+4
lda PROD+5
adc MULND+1
sta PROD+5
lda PROD+6
adc MULND+2
sta PROD+6
lda PROD+7
adc MULND+3
ROTATE_R: ror a ;Rotate partial product
sta PROD+7 ; right
ror PROD+6
ror PROD+5
ror PROD+4
START_R: ror PROD+3 ;shift multiplier while shifting product
ror PROD+2
ror PROD+1
ror PROD
dex ;Decrement bit count and
bne SHIFT_R ; loop until 32 bits are done
; floating point only
; clc ;Add dps and put sum in MULXP2
; lda MULXP1
; adc MULXP2
; sta MULXP2
rts
edit2: of course the loop count in the modified code must be +1 for the initial multiplier shift!
Re: Simple 32 bit multiply
Posted: Fri Dec 22, 2017 11:44 am
by jsii
If you compare your routine to this one:
http://www.6502.org/source/integers/32muldiv.htm you'll find that it does the multiply top down. This eliminates the need to expand one of the operands to 64 bit and at the same time reduces the add routine to 32 bit. It is only larger than yours to avoid most indexed loops and improve the cycle count.
Thank you! All those zeroes did 'bother' me. Maybe I'll try this alternative some time. Your way is clearly better.
On the other hand not doing the 32 bit shift for the multiplier but dividing it into 8 bit single shift groups may actually be faster. However in the top down multiply the multiplier could be overlapped with the lower 32 bits of the product avoiding the extra multiplier shifts completely.
Very interesting. I thought of not using indexing and loops, but then I decided to aim for a lower byte count this time. I wonder how large an 'un-looped' alternative would be...
Thank you for your comment and very good looking SBC, by the way.
(I see the 'edit' to your post even as I type this reply. Thank you, very clear code.)
[corrected typing error: 'lower' instead of 'slower' around "...to aim for a lower byte count..."]
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 12:25 pm
by jsii
Considering the exchange above, here's a re-written 32-bit multiply routine.
38 bytes.
Code: Select all
;=================================================================
;=================================================================
;=================================================================
;32bit multiply -- CT32X02 -- 2017, jsii
;(not thoroughly tested)
;Uses .AXY +1
;
;Operation: C = A * B
; where C = 64 bit result
; A = 32 bit #1
; B = 32 bit #2
;
; *** #1 changed
;
;Howto:
;
;$C200 - $C203 = #1 (ALSO SET $C204-$C208 TO 00)
;$C210 - $C213 = #2
;
;
; -> CT32X02 -> $C200 - $C207 = #1 * #2
;
;Ex.: (assuming $C200-C213 = 00)
;
; LDA #$07
; LDX #$03
; STA $C200
; STX $C210
; JSR CT32X02
; LDA $C200
; LDX $C201
; LDY $C202
; ; .. Here, .Y, .X, .A = first 24 bits of result
; ; .. so that .Y * 65536 + .X *256 + .A = 24 bit result number
;
CT32X02:
LDY #$21
YMEX00:
LDX #$08
LSR $C208
XMEX01:
ROR $C1FF,X
DEX
BNE XMEX01
BCC YMEX01
CLC
PHP
XMEX00:
PLP
LDA $C204,X
ADC $C210,X
STA $C204,X
PHP
INX
CPX #$05
BNE XMEX00
PLP
YMEX01:
DEY
BNE YMEX00
RTS ;38 BYTES
;
;=================================================================
;=================================================================
;=================================================================
;>>> END ROUTINE
I suppose all that x86 programming finally shows its effect; must shape up. I'm sure there must be better ways to do this yet both in terms of byte count and speed. 37 bytes would be good.
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 1:59 pm
by jsii
Based on the routine placed just above, if we don't mind 'upsetting' the carry down below,
I think we can take it down to 37 bytes; 35 bytes if no 'user bytes' are needed.
37 bytes.
Code: Select all
;=================================================================
;=================================================================
;=================================================================
;32bit multiply -- CT32X02 -- 2017, jsii
;(not thoroughly tested)
;Uses .AXY +1
;
;Operation: C = A * B
; where C = 64 bit result
; A = 32 bit #1
; B = 32 bit #2
;
; *** #1 changed
;
;Howto:
;
;$C200 - $C203 = #1 (ALSO SET $C204-$C208 TO 00)
;$C210 - $C213 = #2
;
;
; -> CT32X02 -> $C200 - $C207 = #1 * #2
;
;Ex.: (assuming $C200-C213 = 00)
;
; LDA #$07
; LDX #$03
; STA $C200
; STX $C210
; JSR CT32X02
; LDA $C200
; LDX $C201
; LDY $C202
; ; .. Here, .Y, .X, .A = first 24 bits of result
; ; .. so that .Y * 65536 + .X *256 + .A = 24 bit result number
;
CT32X02:
LDY #$21
YMEX00:
LDX #$09
XMEX01:
ROR $C1FF,X
DEX
BNE XMEX01
BCC YMEX01
CLC
PHP
XMEX00:
PLP
LDA $C204,X
ADC $C210,X
STA $C204,X
PHP
INX
CPX #$05
BNE XMEX00
PLP
YMEX01:
DEY
BNE YMEX00
RTS ;35 BYTES
NOP ;USER OR F.E.
NOP ;USER OR F.E.
;37 BYTES.
;
;=================================================================
;=================================================================
;=================================================================
;>>> END ROUTINE
Definitely must shape up. I'm sure there must be better ways to do this yet both in terms of byte count and speed. 32 bytes would be good. "A 32 byte 32-bit multiply routine" sounds good to me.
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 4:06 pm
by Klaus2m5
How about 31 bytes?
Code: Select all
CT32X02:
CLC ;bugfix!
LDY #$21
YMEX00:
LDX #$09
XMEX01:
ROR $C1FF,X
DEX
BNE XMEX01
BCC YMEX01
LDX #$fc ;added to replace cpx #$05
CLC
; PHP ;carry save no longer needed
XMEX00:
; PLP ;carry save no longer needed
LDA $C204-$fc,X
ADC $C210-$fc,X
STA $C204-$fc,X
; PHP ;carry save no longer needed
INX
; CPX #$05 ;replaced by ldx #$fc
BNE XMEX00
; PLP ;carry save no longer needed
YMEX01:
DEY
BNE YMEX00
RTS ;32 BYTES after bugfix
Of course this will cause page crossing indexes. So the 7 cycles you gain by avoiding the PHP/PLP will be reduced by 2 cycles per add loop.
edit: Found a bug in the original code. Entering the subroutine with carry set will cause the result to be plus $80000000.
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 5:27 pm
by jsii
How about 31 bytes?
Code: Select all
CT32X02:
LDY #$21
YMEX00:
LDX #$09
XMEX01:
ROR $C1FF,X
DEX
BNE XMEX01
BCC YMEX01
LDX #$fc ;added to replace cpx #$05
CLC
; PHP ;carry save no longer needed
XMEX00:
; PLP ;carry save no longer needed
LDA $C204-$fc,X
ADC $C210-$fc,X
STA $C204-$fc,X
; PHP ;carry save no longer needed
INX
; CPX #$05 ;replaced by ldx #$fc
BNE XMEX00
; PLP ;carry save no longer needed
YMEX01:
DEY
BNE YMEX00
RTS ;31 BYTES
Of course this will cause page crossing indexes. So the 7 cycles you gain by avoiding the PHP/PLP will be reduced by 2 cycles per add loop.
There you go, well done, glad you caught it!
So now, how about 27 bytes?
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 6:58 pm
by jsii
Well anyway... 27 bytes.
Code: Select all
;=================================================================
;=================================================================
;=================================================================
;32bit multiply -- CT32X03 -- 2017, jsii
;(not thoroughly tested)
;Uses .AXY +0
;
;Operation: C = A * B
; where C = 64 bit result
; A = 32 bit #1
; B = 32 bit #2
;
; *** #1 changed
;
;Howto:
;
;$61 - $64 = #1 (ALSO SET $65-$69 TO 00)
;$6A - $6D = #2
;
;
; -> CT32X03 -> $61 - $68 = #1 * #2
;
;Ex.: (assuming $61-$6D = 00)
;
; LDA #$07
; LDX #$03
; STA $61
; STX $6A
; JSR CT32X03
; LDA $61
; LDX $62
; LDY $63
; ; .. Here, .Y, .X, .A = first 24 bits of result
; ; .. so that .Y * 65536 + .X *256 + .A = 24 bit result number
;
CT32X03:
LDY #$21
YMEX00:
LDX #$09
XMEX01:
ROR $60,X
DEX
BNE XMEX01
BCC YMEX01
LDX #$FC
CLC
XMEX00:
LDA $69,X
ADC $6E,X
STA $69,X
INX
BNE XMEX00
YMEX01:
DEY
BNE YMEX00
RTS ;27 BYTES
;
;=================================================================
;=================================================================
;=================================================================
;>>> END ROUTINE
21 Bytes?
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 9:37 pm
by Klaus2m5
Re: Simple 32 bit multiply
Posted: Wed Dec 27, 2017 11:08 pm
by jsii
Re: Simple 32 bit multiply
Posted: Fri Dec 29, 2017 5:09 am
by jsii
25 bytes.
Code: Select all
;=================================================================
;=================================================================
;=================================================================
;32bit multiply and 'variable bit' multiply -- CT32X04 -- 2017, jsii
;(not thoroughly tested)
;Uses .AXY +0
;
;Operation: C = A * B
; where C = 64 bit result (or 'variable bit' result)(see below)
; A = 32 bit #1 (or 'variable bit' #1)
; B = 32 bit #2 (or 'variable bit' #2)
;
; *** #1 changed
;
;Howto:
;
;$61 - $64 = #1 (ALSO SET $65-$69 TO 00)
;$6A - $6D = #2
;.Y = 33, 25, 17 or 9, for example. (33 -> 32 bit operation, 25 -> 24 bit operation,
; 17 -> 16 bit operation, 9 -> 8 bit operation)
;
;
;
; -> CT32X04 -> $61 - $68 = #1 * #2 (for a 32 bit operation)
; $62 - $67 = #1 * #2 (for a 24 '' '' )
; $63 - $66 = #1 * #2 (for 16 bits)
; $64 - $65 = #1 * #2 (for 8 bits)
;
;
;Ex.: (assuming $61-$6D = 00)
;
; LDA #$07
; LDX #$03
; STA $61
; STX $6A
; LDY #$21 ;32 bit operation
; JSR CT32X04
; LDA $61
; LDX $62
; LDY $63
; ; .. Here, .Y, .X, .A = first 24 bits of result
; ; .. so that .Y * 65536 + .X *256 + .A = 24 bit result number
;
;
;Ex.: (assuming $61-$6D = 00)
;
; LDA #$30
; LDX #$67
; STA $61
; STX $6A
; LDY #$09 ;8 bit operation
; JSR CT32X04
; LDA $64
; LDX $65
; ; .. Here, .X, .A = 16 bit result, .A=80 .X=19
;
CT32X04:
LDX #$09
XMEX01:
ROR $60,X
DEX
BNE XMEX01
BCC YMEX01
LDX #$FC
CLC
XMEX00:
LDA $69,X
ADC $6E,X
STA $69,X
INX
BNE XMEX00
YMEX01:
DEY
BNE CT32X04
RTS ;25 BYTES.
;
;=================================================================
;=================================================================
;=================================================================
;>>> END ROUTINE
21 Bytes?
Re: Simple 32 bit multiply
Posted: Fri Dec 29, 2017 6:55 am
by rwiker
If you're going to cheat, why not move the
out of the function, too? Further, if you inline this, you can get rid of the RTS (and everything else; the actual length of the subroutine becomes 0...)
Re: Simple 32 bit multiply
Posted: Fri Dec 29, 2017 7:17 am
by BigEd
(Can we keep this good-natured? Enjoy yourselves.)
Re: Simple 32 bit multiply
Posted: Fri Dec 29, 2017 10:53 am
by jsii
If you're going to cheat, why not move the
out of the function, too? Further, if you inline this, you can get rid of the RTS (and everything else; the actual length of the subroutine becomes 0...)
I hear that bile is good around the liver. Anyway, I'm not sure what you mean by 'cheat', by 'function' or by 'inline' for that matter. I wonder how a 6502 feels about cheating, about "well behaved" code, about pretty comments or about nifty indenting altogether. Cheers.
Re: Simple 32 bit multiply
Posted: Fri Dec 29, 2017 10:55 am
by jsii
I still say there must be better ways to do this yet both in terms of byte count and speed. 21 bytes? Cheers.