6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 2:42 am

All times are UTC




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sat Dec 22, 2012 7:31 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
This multiplication routine takes constant time, and uses no memory other than the stack. More detail here.

Quote:
Let me begin by saying that this is not the correct way to do multiplication on the 6502


Code:
; unsigned multiply by Daud Zoss
; as seen at http://swissembly.wordpress.com/2012/12/15/8-bit-multiply-in-place-on-6502/
;
; parameters in X and Y
; result (low byte only) in X

um8:
 pha
 tya
 pha
 txa
 tsx
 ldy #$08
loop1:
 pha
 dex
 asl
 dey
 bne loop1
 lda $0109,x
 ldy #$08
loop2:
 asl
 bcs skip1
 pha
 lda #$00
 sta $0101,x
 pla
skip1:
 inx
 dey
 bne loop2
 ldx #$00
 ldy #$08
loop3:
 txa
 tsx
 clc
 adc $0101,x
 tax
 pla
 dey
 bne loop3
 pla
 tay
 pla


I've tested this on easy6502 and seems to work. I haven't got my head around it!

Cheers
Ed


Last edited by BigEd on Sun Dec 23, 2012 11:30 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 22, 2012 7:58 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
It's made from 3 loops. The first loop writes 8 values to the stack: x, 2*x, 4*x, 8*x, ... 128*x. The second loop checks 'y' for bits that are cleared. For each cleared bit, it zeroes out the corresponding 2^n*x value in the table. In the 3rd loop, the 8 values in the table are summed.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 22, 2012 8:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks for the explanation!


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 22, 2012 10:17 pm 
Offline

Joined: Sat Jul 28, 2012 11:41 am
Posts: 442
Location: Wiesbaden, Germany
Inspired by above code. Less code bytes, less stack usage, less cycles = less everything.
Code:
; unsigned multiply using stack as operand storage - 6502.org style
;
; parameters in X and Y
; result (low byte only) in X

um8:    pha         ;save a, y is not altered

        tya         ;move operands to stack
        pha
        txa
        eor #$ff    ;invert carry of op1
        pha
        lda #0      ;init result
        tsx         ;get index to operands
loop:   lsr $101,x  ;test op1 2^0 to 2^7
        bcs skip
        adc $102,x  ;result =+ op2
skip:   asl $102,x  ;op2 =* 2
        bne loop    ;until nothing more to add
        tax         ;move result to x
        pla         ;discard operands
        pla
       
        pla         ;restore a
        rts

_________________
6502 sources on GitHub: https://github.com/Klaus2m5


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 3:03 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Am I missing something, or could you omit the EOR #$FF and change the BCS to BCC like the following? (I like to apply the structure macros to make it a little more clear :)

Code:
; unsigned multiply using stack as operand storage - 6502.org style
;
; parameters in X and Y
; result (low byte only) in X

um8:  PHA                  ; Save A.  Y is not altered.  A is saved
                           ; only in case the calling routine needed it.
      PHY                  ; Move operands to stack.
      PHX                  ; Now SP+2 has one multiplicand and SP+1 has the other.
      LDA  #0              ; Init result.
      TSX                  ; Get index to operands.
      BEGIN
          LSR  $101,X      ; Test op1 2^0 to 2^7.
          IF_CARRY_SET
              ADC  $102,X  ; Result =+ op2
          END_IF
          ASL  $102,X      ; op2 =* 2
      UNTIL_ZERO           ; until nothing more to add.
      TAX                  ; Move result to X.
      PLA                  ; Discard operands.
      PLA

      PLA                  ; Restore A.
      RTS
 ;----------------

Actually this is a pretty standard way to do multiplication, except maybe the part about putting the inputs on the hardware stack.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 3:34 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
BigEd wrote:
This multiplication routine takes constant time, and uses no memory other than the stack. More detail here.

Quote:
Let me begin by saying that this is not the correct way to do multiplication on the 6502

It is definitely a very curious way to do it!

Here's my processing of the original, using FOR_Y...NEXT_Y which I'll be adding to the structure-macro pages. I reduced it some by using CMOS instructions also. One series that was especially more efficient was the PHA LDA#0 STA addr,X PLA (ie, four instructions) being replaced by just STZ addr,X (a single instruction).
Code:
; unsigned multiply by Daud Zoss
; as seen at http://swissembly.wordpress.com/2012/12/15/8-bit-multiply-in-place-on-6502/
;
; parameters in X and Y
; result (low byte only) in X

um8:   PHA
       PHY
       TXA
       TSX
       FOR_Y  8,  DOWN_TO,  0
           PHA
           DEX
           ASL
           DEY
       NEXT_Y

       LDA  $109,X
       FOR_Y  8,  DOWN_TO,  0
           ASL
           IF_CARRY_CLR
               STZ  $101,X
           END_IF
           INX
           DEY
       NEXT_Y

       LDX #0
       FOR_Y  8, DOWN_TO,  0
           TXA
           TSX
           CLC
           ADC  $101,X
           TAX
           PLA
       NEXT_Y
       PLY
       PLA

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 4:22 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I believe Klaus' EOR is used to avoid the CLC required before the ADC. When C is not set the ADC is performed, otherwise just a shift is performed. Saves 2 cycles per loop.
Code:
loop:   lsr $101,x  ;test op1 2^0 to 2^7
        bcs skip
        adc $102,x  ;result =+ op2
skip:   asl $102,x  ;op2 =* 2

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 4:52 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Here's another way to multiply, and a very strange way, the Russian peasants' multiplication (see http://mathforum.org/dr.math/faq/faq.peasant.html). I have not tested it.
Code:
     ORG  $0300

     LDA  $0340
     STA  00          ; Put 1st # in ZP reg 00

     LDA  $0341
     STA  01          ; and 2nd # in ZP reg 01

     STZ  02          ; Zero-out ZP reg's 02, 03, and 04.
     STZ  03
     STZ  04

     BEGIN
        LDA  00       ; Get the 1st #.
     WHILE_NOT_ZERO   ; As long as there's something left in it,
        AND  #1       ; see if it's odd.
        IF_NOT_ZERO   ; If it is odd,
           CLC
           LDA  01
           ADC  03    ; add the 2nd # low byte to ZP reg 03,
           STA  03

           LDA  02    ; and the 2nd # high byte to reg 04.
           ADC  04    ; (The only way this high byte gets anything
           STA  04    ; is if it gets rotated into it below.)
        THEN

        LSR  00       ; Halve  the 1st # and truncate the fractional part.
        ASL  01       ; Double the 2nd # low byte,
        ROL  02       ; and high byte with carry.
     REPEAT

     RTS
 ;----------------

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 4:54 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
MichaelM wrote:
I believe Klaus' EOR is used to avoid the CLC required before the ADC. When C is not set the ADC is performed, otherwise just a shift is performed. Saves 2 cycles per loop.
Code:
loop:   lsr $101,x  ;test op1 2^0 to 2^7
        bcs skip
        adc $102,x  ;result =+ op2
skip:   asl $102,x  ;op2 =* 2

Ah, of course. :oops:

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 11:05 am 
Offline

Joined: Sat Jul 28, 2012 11:41 am
Posts: 442
Location: Wiesbaden, Germany
GARTHWILSON wrote:
Actually this is a pretty standard way to do multiplication, except maybe the part about putting the inputs on the hardware stack.

True, since standards usually describe the most resource efficient way to achieve something.

The swiss guy that wrote the original um8 code snippet wanted to familiarize himself with the direct use of the stack. So it serves more a learning purpose than a practical one.

The Russian peasants' multiply code snippet you posted is the most practical code, as it caters for a 16 bit result and can easily be expanded to any operand and result size that one would need.

_________________
6502 sources on GitHub: https://github.com/Klaus2m5


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 12:01 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
GARTHWILSON wrote:
Here's another way to multiply, and a very strange way, the Russian peasants' multiplication (see http://mathforum.org/dr.math/faq/faq.peasant.html).
Thanks for the link, a good read. The method is not so strange really, it's just a refined version of what many people do when trying to multiply (or divide too) two numbers (particularly when not using paper): Make it easier by doubling one of the values and half the other, to see if those numbers are simpler to handle. The easiest example would be something like 'what's 20*40? Well, it's 10*80 so must be 800.'

-Tor


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 23, 2012 5:58 pm 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
I like WhiteFlame's better.
Code:
;------------------------
; 8bit * 8bit = 16bit multiply
; Multiplies "num1" by "num2" and stores result in .A (low byte, also in .X) and .Y (high byte)
; uses extra zp var "num1Hi"

; .X and .Y get clobbered.  Change the tax/txa and tay/tya to stack or zp storage if this is an issue.
;  idea to store 16-bit accumulator in .X and .Y instead of zp from bogax

; In this version, both inputs must be unsigned
; Remove the noted line to turn this into a 16bit(either) * 8bit(unsigned) = 16bit multiply.

 lda #$00
 tay
 sty num1Hi  ; remove this line for 16*8=16bit multiply
 beq enterLoop

doAdd:
 clc
 adc num1
 tax

 tya
 adc num1Hi
 tay
 txa

loop:
 asl num1
 rol num1Hi
enterLoop:  ; accumulating multiply entry point (enter with .A=lo, .Y=hi)
 lsr num2
 bcs doAdd
 bne loop

; 26 bytes


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 27, 2012 2:34 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
The OP's seems to be 8x8=8 bit, so my original one is even more applicable:

Code:
; General 8bit * 8bit = 8bit multiply
; Multiplies "num1" by "num2" and returns result in .A

; by White Flame (aka David Holz) 20030207

; Input variables:
;   num1 (multiplicand)
;   num2 (multiplier), should be small for speed
;   Signedness should not matter

; .X and .Y are preserved
; num1 and num2 get clobbered

; Instead of using a bit counter, this routine ends when num2 reaches zero, thus saving iterations.

 lda #$00
 beq enterLoop

doAdd:
 clc
 adc num1

loop:
 asl num1
enterLoop: ;For an accumulating multiply (.A = .A + num1*num2), set up num1 and num2, then enter here
 lsr num2
 bcs doAdd
 bne loop

end:

; 15 bytes


However, that EOR trick is pretty cool. I'll use that in the future.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 25, 2013 7:43 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I've just read that early Wang calculators performed multiplication and division by computing logs and antilogs - an electronic slide rule! And yet they offered 10 displayed digits of precision. The earlier models didn't round results, which led to potentially confusing near-misses:
"An example would be multiplying 6 by 8. The LOCI-2 would give an answer of 47.99999999"
(http://www.oldcalculatormuseum.com/wang360.html)

For a description of how a few repeated adds and subtracts can compute a logarithm or antilogarithm, see the patent (granted 1968):
http://ed-thelen.org/comp-hist/Wang-patent.html

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 26, 2013 8:14 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA
BigEd wrote:
I've just read that early Wang calculators performed multiplication and division by computing logs and antilogs - an electronic slide rule! And yet they offered 10 displayed digits of precision. The earlier models didn't round results, which led to potentially confusing near-misses:
"An example would be multiplying 6 by 8. The LOCI-2 would give an answer of 47,999 999.99"
(http://www.oldcalculatormuseum.com/wang360.html)

For a description of how a few repeated adds and subtracts can compute a logarithm or antilogarithm, see the patent (granted 1968):
http://ed-thelen.org/comp-hist/Wang-patent.html

Cheers
Ed

Wang also used log and anti-log multiplication and division in their MVP BASIC. MVP BASIC also worked to 10 significant digits, a curious limitation, given that the MVP series of minicomputers was targeted to businesses, where that many significant digits was probably too small. Most MAI-compatible Business BASIC implementations (e.g., Thoroughbred, BBx, Uni-Basic, etc.) offer up to 16 significant digits.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: