6502.org

Posted: **Sat Dec 22, 2012 7:31 pm**

This multiplication routine takes constant time, and uses no memory other than the stack. More detail here.

Quote:

Let me begin by saying that this is not the correct way to do multiplication on the 6502

; unsigned multiply by Daud Zoss
; as seen at http://swissembly.wordpress.com/2012/12/15/8-bit-multiply-in-place-on-6502/
;
; parameters in X and Y
; result (low byte only) in X

um8:
 pha
 tya
 pha
 txa
 tsx
 ldy #$08
loop1:
 pha
 dex
 asl
 dey
 bne loop1
 lda $0109,x
 ldy #$08
loop2:
 asl
 bcs skip1
 pha
 lda #$00
 sta $0101,x 
 pla
skip1: 
 inx
 dey
 bne loop2
 ldx #$00
 ldy #$08
loop3:
 txa
 tsx
 clc
 adc $0101,x
 tax
 pla
 dey
 bne loop3
 pla
 tay
 pla

I've tested this on easy6502 and seems to work. I haven't got my head around it!

Cheers
Ed

Posted: **Sat Dec 22, 2012 7:58 pm**

It's made from 3 loops. The first loop writes 8 values to the stack: x, 2*x, 4*x, 8*x, ... 128*x. The second loop checks 'y' for bits that are cleared. For each cleared bit, it zeroes out the corresponding 2^n*x value in the table. In the 3rd loop, the 8 values in the table are summed.

Posted: **Sat Dec 22, 2012 8:21 pm**

Thanks for the explanation!

Posted: **Sat Dec 22, 2012 10:17 pm**

Inspired by above code. Less code bytes, less stack usage, less cycles = less everything.

Code: Select all

; unsigned multiply using stack as operand storage - 6502.org style
;
; parameters in X and Y
; result (low byte only) in X

um8:    pha         ;save a, y is not altered

        tya         ;move operands to stack
        pha
        txa
        eor #$ff    ;invert carry of op1
        pha
        lda #0      ;init result
        tsx         ;get index to operands
loop:   lsr $101,x  ;test op1 2^0 to 2^7
        bcs skip
        adc $102,x  ;result =+ op2
skip:   asl $102,x  ;op2 =* 2
        bne loop    ;until nothing more to add
        tax         ;move result to x
        pla         ;discard operands
        pla
        
        pla         ;restore a
        rts

Posted: **Sun Dec 23, 2012 3:03 am**

Am I missing something, or could you omit the EOR #$FF and change the BCS to BCC like the following? (I like to apply the structure macros to make it a little more clear

Code: Select all

; unsigned multiply using stack as operand storage - 6502.org style
;
; parameters in X and Y
; result (low byte only) in X

um8:  PHA                  ; Save A.  Y is not altered.  A is saved
                           ; only in case the calling routine needed it.
      PHY                  ; Move operands to stack.
      PHX                  ; Now SP+2 has one multiplicand and SP+1 has the other.
      LDA  #0              ; Init result.
      TSX                  ; Get index to operands.
      BEGIN
          LSR  $101,X      ; Test op1 2^0 to 2^7.
          IF_CARRY_SET
              ADC  $102,X  ; Result =+ op2
          END_IF
          ASL  $102,X      ; op2 =* 2
      UNTIL_ZERO           ; until nothing more to add.
      TAX                  ; Move result to X.
      PLA                  ; Discard operands.
      PLA

      PLA                  ; Restore A.
      RTS
 ;----------------

Actually this is a pretty standard way to do multiplication, except maybe the part about putting the inputs on the hardware stack.

Posted: **Sun Dec 23, 2012 3:34 am**

BigEd wrote:

This multiplication routine takes constant time, and uses no memory other than the stack. More detail here.

Quote:

Let me begin by saying that this is not the correct way to do multiplication on the 6502

It is definitely a very curious way to do it!

Here's my processing of the original, using FOR_Y...NEXT_Y which I'll be adding to the structure-macro pages. I reduced it some by using CMOS instructions also. One series that was especially more efficient was the PHA LDA#0 STA addr,X PLA (ie, four instructions) being replaced by just STZ addr,X (a single instruction).

Code: Select all

; unsigned multiply by Daud Zoss
; as seen at http://swissembly.wordpress.com/2012/12/15/8-bit-multiply-in-place-on-6502/
;
; parameters in X and Y
; result (low byte only) in X

um8:   PHA
       PHY
       TXA
       TSX
       FOR_Y  8,  DOWN_TO,  0
           PHA
           DEX
           ASL
           DEY
       NEXT_Y

       LDA  $109,X
       FOR_Y  8,  DOWN_TO,  0
           ASL
           IF_CARRY_CLR
               STZ  $101,X
           END_IF
           INX
           DEY
       NEXT_Y

       LDX #0
       FOR_Y  8, DOWN_TO,  0
           TXA
           TSX
           CLC
           ADC  $101,X
           TAX
           PLA
       NEXT_Y
       PLY
       PLA

Posted: **Sun Dec 23, 2012 4:22 am**

I believe Klaus' EOR is used to avoid the CLC required before the ADC. When C is not set the ADC is performed, otherwise just a shift is performed. Saves 2 cycles per loop.

Code: Select all

loop:   lsr $101,x  ;test op1 2^0 to 2^7
        bcs skip
        adc $102,x  ;result =+ op2
skip:   asl $102,x  ;op2 =* 2

Posted: **Sun Dec 23, 2012 4:52 am**

Here's another way to multiply, and a very strange way, the Russian peasants' multiplication (see http://mathforum.org/dr.math/faq/faq.peasant.html). I have not tested it.

Code: Select all

     ORG  $0300

     LDA  $0340
     STA  00          ; Put 1st # in ZP reg 00

     LDA  $0341
     STA  01          ; and 2nd # in ZP reg 01

     STZ  02          ; Zero-out ZP reg's 02, 03, and 04.
     STZ  03
     STZ  04

     BEGIN
        LDA  00       ; Get the 1st #.
     WHILE_NOT_ZERO   ; As long as there's something left in it,
        AND  #1       ; see if it's odd.
        IF_NOT_ZERO   ; If it is odd,
           CLC
           LDA  01
           ADC  03    ; add the 2nd # low byte to ZP reg 03,
           STA  03

           LDA  02    ; and the 2nd # high byte to reg 04.
           ADC  04    ; (The only way this high byte gets anything
           STA  04    ; is if it gets rotated into it below.)
        THEN

        LSR  00       ; Halve  the 1st # and truncate the fractional part.
        ASL  01       ; Double the 2nd # low byte,
        ROL  02       ; and high byte with carry.
     REPEAT

     RTS
 ;----------------

Posted: **Sun Dec 23, 2012 4:54 am**

MichaelM wrote:

I believe Klaus' EOR is used to avoid the CLC required before the ADC. When C is not set the ADC is performed, otherwise just a shift is performed. Saves 2 cycles per loop.

Code: Select all

loop:   lsr $101,x  ;test op1 2^0 to 2^7
        bcs skip
        adc $102,x  ;result =+ op2
skip:   asl $102,x  ;op2 =* 2

Ah, of course.

Posted: **Sun Dec 23, 2012 11:05 am**

GARTHWILSON wrote:

Actually this is a pretty standard way to do multiplication, except maybe the part about putting the inputs on the hardware stack.

True, since standards usually describe the most resource efficient way to achieve something.

The swiss guy that wrote the original um8 code snippet wanted to familiarize himself with the direct use of the stack. So it serves more a learning purpose than a practical one.

The Russian peasants' multiply code snippet you posted is the most practical code, as it caters for a 16 bit result and can easily be expanded to any operand and result size that one would need.

Posted: **Sun Dec 23, 2012 12:01 pm**

GARTHWILSON wrote:

Here's another way to multiply, and a very strange way, the Russian peasants' multiplication (see http://mathforum.org/dr.math/faq/faq.peasant.html).

Thanks for the link, a good read. The method is not so strange really, it's just a refined version of what many people do when trying to multiply (or divide too) two numbers (particularly when not using paper): Make it easier by doubling one of the values and half the other, to see if those numbers are simpler to handle. The easiest example would be something like 'what's 20*40? Well, it's 10*80 so must be 800.'

-Tor

Posted: **Sun Dec 23, 2012 5:58 pm**

I like WhiteFlame's better.

Code: Select all

;------------------------
; 8bit * 8bit = 16bit multiply
; Multiplies "num1" by "num2" and stores result in .A (low byte, also in .X) and .Y (high byte)
; uses extra zp var "num1Hi"

; .X and .Y get clobbered.  Change the tax/txa and tay/tya to stack or zp storage if this is an issue.
;  idea to store 16-bit accumulator in .X and .Y instead of zp from bogax

; In this version, both inputs must be unsigned
; Remove the noted line to turn this into a 16bit(either) * 8bit(unsigned) = 16bit multiply.

 lda #$00
 tay
 sty num1Hi  ; remove this line for 16*8=16bit multiply
 beq enterLoop

doAdd:
 clc
 adc num1
 tax

 tya
 adc num1Hi
 tay
 txa

loop:
 asl num1
 rol num1Hi
enterLoop:  ; accumulating multiply entry point (enter with .A=lo, .Y=hi)
 lsr num2
 bcs doAdd
 bne loop

; 26 bytes

Posted: **Thu Dec 27, 2012 2:34 pm**

The OP's seems to be 8x8=8 bit, so my original one is even more applicable:

Code: Select all

; General 8bit * 8bit = 8bit multiply
; Multiplies "num1" by "num2" and returns result in .A

; by White Flame (aka David Holz) 20030207

; Input variables:
;   num1 (multiplicand)
;   num2 (multiplier), should be small for speed
;   Signedness should not matter

; .X and .Y are preserved
; num1 and num2 get clobbered

; Instead of using a bit counter, this routine ends when num2 reaches zero, thus saving iterations.

 lda #$00
 beq enterLoop

doAdd:
 clc
 adc num1

loop:
 asl num1
enterLoop: ;For an accumulating multiply (.A = .A + num1*num2), set up num1 and num2, then enter here
 lsr num2
 bcs doAdd
 bne loop

end:

; 15 bytes

However, that EOR trick is pretty cool. I'll use that in the future.

Posted: **Mon Mar 25, 2013 7:43 pm**

I've just read that early Wang calculators performed multiplication and division by computing logs and antilogs - an electronic slide rule! And yet they offered 10 displayed digits of precision. The earlier models didn't round results, which led to potentially confusing near-misses:
"An example would be multiplying 6 by 8. The LOCI-2 would give an answer of 47.99999999"
(http://www.oldcalculatormuseum.com/wang360.html)

For a description of how a few repeated adds and subtracts can compute a logarithm or antilogarithm, see the patent (granted 1968):
http://ed-thelen.org/comp-hist/Wang-patent.html

Cheers
Ed

Posted: **Tue Mar 26, 2013 8:14 pm**

BigEd wrote:

I've just read that early Wang calculators performed multiplication and division by computing logs and antilogs - an electronic slide rule! And yet they offered 10 displayed digits of precision. The earlier models didn't round results, which led to potentially confusing near-misses:
"An example would be multiplying 6 by 8. The LOCI-2 would give an answer of 47,999 999.99"
(http://www.oldcalculatormuseum.com/wang360.html)

For a description of how a few repeated adds and subtracts can compute a logarithm or antilogarithm, see the patent (granted 1968):
http://ed-thelen.org/comp-hist/Wang-patent.html

Cheers
Ed

Wang also used log and anti-log multiplication and division in their MVP BASIC. MVP BASIC also worked to 10 significant digits, a curious limitation, given that the MVP series of minicomputers was targeted to businesses, where that many significant digits was probably too small. Most MAI-compatible Business BASIC implementations (e.g., Thoroughbred, BBx, Uni-Basic, etc.) offer up to 16 significant digits.

6502.org

Daud Zoss' unusual multiplication routine

Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine

Re: Daud Zoss' unusual multiplication routine