Based on the routine placed just above, if we don't mind 'upsetting' the carry down below,
I think we can take it down to 37 bytes; 35 bytes if no 'user bytes' are needed.
37 bytes.
Code:
;=================================================================
;=================================================================
;=================================================================
;32bit multiply -- CT32X02 -- 2017, jsii
;(not thoroughly tested)
;Uses .AXY +1
;
;Operation: C = A * B
; where C = 64 bit result
; A = 32 bit #1
; B = 32 bit #2
;
; *** #1 changed
;
;Howto:
;
;$C200 - $C203 = #1 (ALSO SET $C204-$C208 TO 00)
;$C210 - $C213 = #2
;
;
; -> CT32X02 -> $C200 - $C207 = #1 * #2
;
;Ex.: (assuming $C200-C213 = 00)
;
; LDA #$07
; LDX #$03
; STA $C200
; STX $C210
; JSR CT32X02
; LDA $C200
; LDX $C201
; LDY $C202
; ; .. Here, .Y, .X, .A = first 24 bits of result
; ; .. so that .Y * 65536 + .X *256 + .A = 24 bit result number
;
CT32X02:
LDY #$21
YMEX00:
LDX #$09
XMEX01:
ROR $C1FF,X
DEX
BNE XMEX01
BCC YMEX01
CLC
PHP
XMEX00:
PLP
LDA $C204,X
ADC $C210,X
STA $C204,X
PHP
INX
CPX #$05
BNE XMEX00
PLP
YMEX01:
DEY
BNE YMEX00
RTS ;35 BYTES
NOP ;USER OR F.E.
NOP ;USER OR F.E.
;37 BYTES.
;
;=================================================================
;=================================================================
;=================================================================
;>>> END ROUTINE
Definitely must shape up. I'm sure there must be better ways to do this yet both in terms of byte count and speed. 32 bytes would be good. "A 32 byte 32-bit multiply routine" sounds good to me.