6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 8:04 am

All times are UTC




Post new topic Reply to topic  [ 35 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Wed Aug 29, 2012 8:53 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I've been thinking about 4-bit shifts by lookup table, so it's good to hear about that!

I don't disbelieve your numbers, but I can't quite see why the patterns shouldn't be evenly distributed. (Is it because of the overlap??)

In that Table 1, it's a Radix 4 approach, but there are 8 patterns because of the need to include the leading bit. 6 of those 8 do some work, and as it happens each does the same amount of work, a single operation, for an average of 0.75 ops per two bits. The two bits we've dealt with would conventionally have cost us an average of one op.

So, in Radix 8 we should have 16 patterns, and again all but 2 will need to do some work, but in this case sometimes we need more than one operation. It seems(*) to take 9 ops per 3 bits consumed, compared to 12 for longhand multiplication. That is, no gain... hmm. (In hardware, it's only 7 ops because plus or minus 3 is arranged to done in a single cycle. In software plus or minus 3 takes two ops.)

At least one of us is missing something! (The material in https://www.google.co.uk/search?q=%22Ra ... gorithm%22 seems helpful.)

Cheers
Ed
(*) See for example page 2 of http://www.ijvspa.net/docs/IJVSPA20120215.pdf


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 29, 2012 10:36 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Quote:
So, in Radix 8 we should have 16 patterns, and again all but 2 will need to do some work, but in this case sometimes we need more than one operation. [...] (In hardware, it's only 7 ops because plus or minus 3 is arranged to done in a single cycle. In software plus or minus 3 takes two ops.)

I'm a newcomer to the topic of Booth's method, and readily admit I may be missing something. But my conception of a software implementation is that it would match what you seem to be saying about a hardware implementation -- that each pattern will correspond to zero or one op, not more. This assumption is implicit in the numbers I presented.

If all patterns must process with one op maximum then this demands some pre-processing (some addition that takes place in advance) unless the radix only relies on powers of two (as with radix 4 and radix 2). As noted, radix 8 needs to have 3 times the multiplicand available -- as a single entity, not as a postponed obligation to add 1+2. The doc on the hardware models is pretty dense and hard to take in, but a quick perusal left me with the understanding that they use enough preliminary logic (including addition) that eventually there's a multiplexer which can instantly choose 3 times the multiplicand just as easily as any of the simpler numbers (ie, the powers of two) -- is that a reasonable way of putting it?

I left off working on the Booth radix 16 code because my understanding is I'd need to compute 3, 5, 6 and 7 times the multiplicand in advance, and buffer these values in ram so that -- as in hardware -- they are available on an equal footing compared with power-of-two numbers. But in software that wasn't looking very attractive, due to the ram required (I would use zero-page) and due to the delay of pre-processing -- although the latter may be recouped, perhaps many times over, when multiplying very large numbers. For a 16 by 16 multiply on 6502 it didn't seem worth pursuing. I switched to a scheme that does shifting in advance (ie, the powers of two) but no addition in advance. So I buffer 1, 2, 4 and 8 times the multiplicand in ram but not 3, 5, 6 or 7. It'll be at least a few days before I can find the time to finish this and report back.

Cheers, Ed, and thanks for the never-ending supply of links :D

Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 30, 2012 12:58 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Jeff:

I developed a hardware Booth multiplier some time ago. My reference for the algorithm was "Computer Organization", Hamacher et al, McGraw-Hill Book Company, New York, NY, 1978, ISBN: 0-07-025681-0. It does a very nice job of laying out the algorithm for 1 bit at a time algorithm. Like you, some additional research on the web let me extend the Booth algorithm for any number of bits. I specifically put three Booth multiplier modules together: 1x, 2x, and 4x. The following two excerpts may be of help with regard to the Booth "recoding" tables that will feed a look-up table:

(Booth 2 bits at a time Multiplier)
Code:
// Additional Comments:
//
//  The basic operations follow those of the standard Booth multiplier except
//  that the transitions are being tracked across 2 bits plus the guard bit.
//  The result is that the operations required are 0, ±1, and ±2 times the
//  multiplicand (M). That is:
// 
//  Prod[2:0]   Operation
//     000      Prod <= (Prod + 0*M) >> 2;
//     001      Prod <= (Prod + 1*M) >> 2;
//     010      Prod <= (Prod + 1*M) >> 2;
//     011      Prod <= (Prod + 2*M) >> 2;
//     100      Prod <= (Prod - 2*M) >> 2;
//     101      Prod <= (Prod - 1*M) >> 2;
//     110      Prod <= (Prod - 1*M) >> 2;
//     111      Prod <= (Prod - 0*M) >> 2;
//
//  The operations in this table can be seen as direct extensions of the four
//  conditions used in the standard Booth algorithm. The first and last terms
//  simply indicate to skip over runs of 1s and 0s. The terms 001 and 110 are
//  indicative of the 01 (+1) and 10 (-1) operations of the standard Booth al-
//  gorithm. The terms 010 (+2,-1) and 101 (-2,+1) reduce to the operations
//  noted in the table, namely ±1*M, respectively. The terms 011 (+2,0) and
//  100 (-2, 0) are the two new operations required for this modified Booth
//  algorithm.
//
//  The algorithm could be extended to any number of bits as required by noting
//  that as more bits are added to the left, the number of terms (operations)
//  required increases. Addding another bit, i.e. a 3-bit Booth recoding, means
//  that the operations required of the adder/partial product are 0, ±1, ±2,
//  and ±4. The guard bits provided for the sign bit accomodates the ±2 opera-
//  tion required for the 2-bit modified booth algorithm. Adding a third bit
//  means that a third guard bit will be required for the sign bit to insure
//  that there is no overflow in the partial product when ±4 is the operation.
//


(Booth 4 bits at a time multiplier)
Code:
// Additional Comments:
//
//  The basic operations follow those of the standard Booth multiplier except
//  that the transitions are being tracked across 4 bits plus the guard bit.
//  The result is that the operations required are 0, ±1, ±2, ±3, ±4, ±5, ±6,
//  ±7, and ±8 times the multiplicand (M). However, it is possible to reduce
//  the number of partial products required to implement the multiplication to
//  two. That is, ±3, ±5, ±6, and ±7 can be written in terms of combinations of
//  ±1, ±2, ±4, and ±8. For example, 3M = (2M + 1M), 5M = (4M + M), 6M = (4M
//  + 2M), and 7M = (8M - M). Thus, the following 32 entry table defines the
//  operations required for generating the partial products through each pass
//  of the algorithm over the multiplier:
// 
//  Prod[4:0]       Operation
//    00000      Prod <= (Prod + 0*M + 0*M) >> 4;
//    00001      Prod <= (Prod + 0*M + 1*M) >> 4;
//    00010      Prod <= (Prod + 0*M + 1*M) >> 4;
//    00011      Prod <= (Prod + 2*M + 0*M) >> 4;
//    00100      Prod <= (Prod + 2*M + 0*M) >> 4;
//    00101      Prod <= (Prod + 2*M + 1*M) >> 4;
//    00110      Prod <= (Prod + 2*M + 1*M) >> 4;
//    00111      Prod <= (Prod + 4*M + 0*M) >> 4;
//    01000      Prod <= (Prod + 4*M + 0*M) >> 4;
//    01001      Prod <= (Prod + 4*M + 1*M) >> 4;
//    01010      Prod <= (Prod + 4*M + 1*M) >> 4;
//    01011      Prod <= (Prod + 4*M + 2*M) >> 4;
//    01100      Prod <= (Prod + 4*M + 2*M) >> 4;
//    01101      Prod <= (Prod + 8*M - 1*M) >> 4;
//    01110      Prod <= (Prod + 8*M - 1*M) >> 4;
//    01111      Prod <= (Prod + 8*M + 0*M) >> 4;
//    10000      Prod <= (Prod - 8*M - 0*M) >> 4;
//    10001      Prod <= (Prod - 8*M + 1*M) >> 4;
//    10010      Prod <= (Prod - 8*M + 1*M) >> 4;
//    10011      Prod <= (Prod - 4*M - 2*M) >> 4;
//    10100      Prod <= (Prod - 4*M - 2*M) >> 4;
//    10101      Prod <= (Prod - 4*M - 1*M) >> 4;
//    10110      Prod <= (Prod - 4*M - 1*M) >> 4;
//    10111      Prod <= (Prod - 4*M - 0*M) >> 4;
//    11000      Prod <= (Prod - 4*M - 0*M) >> 4;
//    11001      Prod <= (Prod - 2*M - 1*M) >> 4;
//    11010      Prod <= (Prod - 2*M - 1*M) >> 4;
//    11011      Prod <= (Prod - 2*M - 0*M) >> 4;
//    11100      Prod <= (Prod - 2*M - 0*M) >> 4;
//    11101      Prod <= (Prod - 0*M - 1*M) >> 4;
//    11110      Prod <= (Prod - 0*M - 1*M) >> 4;
//    11111      Prod <= (Prod - 0*M - 0*M) >> 4;
//


I've successfully implemented these multipliers in a signal processing application. I believe that the recoding tables provided above are accurate.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 30, 2012 6:02 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
MichaelM wrote:
Like you, some additional research on the web let me extend the Booth algorithm for any number of bits
Thanks for the post, Michael. Since I haven't yet put the Booth method into practice, it's nice to have input from anyone who has. As for Booth encoding, yes I did eventually grok the general rule for the weighting. I don't know how you would state it but in my words the rule is that, regardless of radix, the sequence begins with 0, 1, 1, 2, 2, 3, 3, etc. and ends with -3, -3, -2, -2, -1,-1, 0. For example,

  • with radix 16, the encodings are 00000 to 11111, weighted 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, -8, -7, -7, -6, -6, -5, -5, -4, -4, -3, -3, -2,- 2, -1, -1, 0
  • with radix 8, the encodings are 0000 to 1111, weighted 0, 1, 1, 2, 2, 3, 3, 4, -4, -3, -3, -2,- 2, -1, -1, 0
  • with radix 4, the encodings are 000 to 111, weighted 0, 1, 1, 2, -2, -1, -1, 0
  • with radix 2, the encodings are 00 to 11, weighted 0, 1, -1, 0

The last two examples may seem to break the rule, but radix 4 and radix 2 merely have sequences which are too short to fully and obviously show the pattern. They aren't good examples to learn from, which is why IMO it's unfortunate that the Wikipedia article confines itself to radix 2.

Thanks for the reference to "Computer Organization." Between you and Ed the amount of info collected on the Booth topic is a little overwhelming.


Big Ed wrote:
I can't quite see why the patterns shouldn't be evenly distributed. (Is it because of the overlap??)
I might have to retract or dilute what I said about there not being even distribution. Still, it does seem that the leading bit might throw things off, since it's always zero and there's no corresponding bit that's always one. I admit I'm not yet clear and comfortable with the reasoning. A separate issue (?) apart from even distribution is the likelihood that an operation will do more work and show an advantage compared to conventional long multiplication. The numbers I presented a few posts back do show that longer radices result in higher efficiency -- that is, fewer add/subtract ops are required to accomplish the 65,536 multiplications in the hypothetical test. (Ops like 3 and 7 do more work but still count as a single op.)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 30, 2012 11:53 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Agree. The weightings that you describe match those that I sent you. Beyond the 1 bit Booth multiplier, I needed to add sign extension bits on the left that match the largest coefficient. For the 2x algorithm, the largest coefficient is ±2, so at least one sign extension bit is required on the product to prevent a 2's complement overflow in the partial sum adder. Similarly, three bits are required to accomodate the ±8 coefficient of the 4x Booth multiplier.

In the hardware implementation, I used two adders to create the partial sums, and that is why the 4x table is written as it is.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 30, 2012 6:49 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Just a quick note to acknowledge Jeff's point that pre-computing the non-power-of-two digits does make a difference to the analysis. It costs something to do that, so it's not quite as clearcut as it might be.

Musing: in hardware we generally worry about carry propagation, and all this adding and pre-adding doesn't add to that much (especially if it's in a previous clock cycle!) We also worry about component cost or chip area or power, which is why we like halving the number of partial products. There's also a speedup from reducing the number of partial products which have to be totted up anf from reducing distances. (I'd be interested to hear from Michael as to whether this analysis holds up. I've made it sound like speedup is a secondary gain. I reckon the cost or area is better by 2x but the speed improvement somewhat less.)

In software every addition counts for some time, so the total number of ops is always of primary interest, together with the time overhead of marshalling data.

So, two rather different cost functions being applied to the same algorithmic innovation.

One thing I noticed while searching: people are still writing papers on improving hardware implementations. One idea seems to be that there are many possible and equivalent choices for the re-coded operand, and choosing the globally best one using only local information is a challenge (at higher radices anyway.)


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 31, 2012 12:21 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Ed:

I'll post a reply about the HW implementation of the Booth multipliers over in the programmable logic thread you set up regarding HDLs.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 23, 2012 5:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks (that's here)

On the original curiosity that radix-2 Booth's doesn't seem to save anything - I learnt from conversation about the Elliott 803 that the Booth's implementation there allows for a signed multiply which can perform an early exit. The machine is able to determine at the appropiate point that all remaining higher-order bits are copies of the sign bit, and therefore the overall multiplication takes n/2 operations as an average. The implication is that without Booth-encoding the input, this shortcut isn't possible (or, perhaps, it would only apply to positive operands)


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 23, 2012 5:38 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Ed:

I read your comment on the Elliot 803. Kind of interesting that my MiniCPU-S is a serial ALU implementation like the Elliot 803. I have long wanted to investigate the use of serial implementations, and only a few commercial computers ever used the technique; one of the Data General NOVAs being one of the highest volume serial ALU computers produced. I was unaware of the Elliot machine before your post; very cool little demonstration video you referenced.

The Booth recoding performed in the Booth multiplier Verilog modules I posted on GitHub follows the Booth multiplication algorithm. However, the 1 bit at a time (1x) modules essentially perform multiplication from MSB to LSB using repeated shift and add/subtracts.

If you examiine the Booth Recoding table, the case statement aroung line 124 (Booth_Multiplier.v), you can see that no add/sub takes place for the condition where the current and previous multiplier bits are 00 and 11. This is the key concept of the Booth algorithm. The kernel of the 1x implementation is such that only single bit right shifts are being performed, but any number of right shifts are possible in order to skip over repeated strings of 0s and/or 1s. (A similar condition applies to the 2x and 4x modules.)

To take greater advantage of the Booth algorithm, more shifts need to be allowed when runs of 0s and/or 1s are encountered/detected. I've thought about extending the modules that I posted to incorporate this feature. However, in addition to incorporating more logic for the shifter, the loop counter would have to be dynamically adjusted to account for the extra shifts that are being taken to skip over the runs of 1s and/or 0s.

The problem is that I don't have enough time to do all that I want to do for recreational purposes. Although it does not appear as if I am making much progress on my Minimal CPU in a CPLD project, I am devoting most of my spare time to that project. I hope to have a rough cut at the Program Control Unit (PCU) later today or early next week. When that's finished, I may switch to re-implementing it in an FPGA (as a parallel ALU). For that future project, I may want to add a fractional multiplier instruction, and an improvement to the Booth multipliers along the lines discussed above may be in the cards.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 23, 2012 5:58 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
An improved multiplier would be interesting, but the minimal CPU would be great!


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 26, 2012 4:42 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Dr Jefyll:

There was an earlier discussion in this thread about a Booth multiplier implementation for the UM* function in 6502 Forths. BigEd posted link to a novel multiplication routine for unsigned integers that generated quite a bit of interest and responses. I investigated its performance and that of the solution proposed by Klaus2m5.

Instead of joining in that effort, I decided to implement a signed multiplication subroutine using only variables allocated on the 6502 stack. My first attempt was a miserable failure, so I decided to implement a 1 bit at a time Booth multiplication subroutine with the same restriction that all of the variables are allocated on the 6502 stack.

I have completed that effort, including wrapping a self-checking test program around the subroutine. (The code can be found on GitHUB.) I put it into my M65C02 core testbench and ran it. Within that core, operating from single cycle memory at 100 MHz, the simulation indicates that the average performance is about 3.28 us (approximately 328 cycles/product) for a signed 8x8 multiply that yields a full 16-bit 2sC result.

I was wondering if you have any timings for any of the signed multiplication routines in Forth. I don't plan any further optimization of the subroutine I developed and tested, but it would be nice to know what cycle counts are reasonable to expect for such a multiplication routine. It certainly was fun to implement the routine using a Booth algorithm, and it certainly does point out the reason to support stack relative addressing in an 65C02 core implementation, or to add a multiplication instruction to the instruction set.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 26, 2012 7:37 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Hi Michael,

thanks for your interest and your contribution. I posted some info on multiply execution speeds earlier in this thread here, and, although it doesn't directly answer your question, perhaps you will find it helpful. Just bear in mind that the routines I tested do a 16 by 16 un-signed multiply with a 32-bit result, so they're not the best comparison with your signed 8 by 8 routine. I look forward to sitting down and having a good look at your code and at the items Klaus2m5 and BigEd posted but I haven't found the time as yet.

Also on my list but long overdue is some followup regarding the Booth coding I was yacking about. Probably some of my initial comments months ago were naive but as a newcomer to the topic I hope I can be forgiven! It's not easy to concisely describe a software implementation, especially as it may or may not exactly mimic the hardware version. Another challenge is that Booth using Radix 2 is rather a different animal from Booth using a higher Radix such as 8 or 16; certainly there are a whole different set of advantages and drawbacks.

My grasp on all of this is perhaps still imperfect, but I did manage to code a 16 by 16 multiply using Radix 16. I know the results to be accurate, since the routine passed an exhaustive test (ie, 2^32 multiplications), but unfortunately the execution speed failed to improve on the non-Booth results mentioned above. There's room for further optimization, but my hunch is that, for a 6502 multiplying 16 by 16, Booth offers little or no potential for speed advantage. For larger numbers (a 32 by 32 multiply, say) I suspect Booth would prove worthwhile.

Quote:
It certainly was fun to implement the routine using a Booth algorithm
I absolutely agree -- it's a fascinating puzzle! I wanted to see about attempting Radix 256 but due to time constraints I had to tear myself away.

cheers,

Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 26, 2012 2:18 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Thanks for the reply. I am interested in signed products. Is there a reason why there appears to be so much emphasis here and elsewhere on unsigned multiplication routines? Since floating point uses sign-magnitude for the mantissa in most formats, is the emphasis on unsigned integers used to support floating point arithmetic in Forth?

I followed the link provided and it was helpful. I have plans to expand the signed 8x8 multiplication case to a signed 16x16 case. If you examine the code I referred to, the signed multiplication algorithm it represents requires an arithmetic right shift operation. Since that is not a supported function of the 6502 instruction set, there are a number of spots in the code where an explicit test is made and several instructions are inserted to restore the sign bit into the most significant position.

After reading your post, I am sure that just these additional tests and restore operations will reduce the performance of a signed multiply routine versus an unsigned multiply routine.

The question then arises, if you had to choose one multiplication instruction to add to a 6502, would you add a 16x16 or a 32x32?

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 27, 2012 5:53 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Michael, 6502 Forth normally starts with two multiply primitives (ie, code definitions).  One, * (pronounced "star"), is 16x16 with 16-bit result, and happens to work with both signed and unsigned.  (I had to verify that just now, as I use signed so seldom in my applications.)

The other one, UM* (pronounced "you-emm-star") is a 16x16 unsigned multiply primitive that gives a 32-bit result.  The 'U' in the name means it's unsigned, and the 'M' means it's mixed-precision.  This primitive is used in many secondaries (ie, colon definitions).  M* for example, which is signed, does the multiplication internally with UM* and handles the sign with other primitives inside the M* secondary.  If higher precisions (triple and quad) are implemented, they normally still use UM* in them.  You see so much emphasis on it because of the fact that this is the common building block.  It doesn't stop there though.  Nor does it really have anything to do with floating-point which can of course be done in Forth but is discouraged for most applications because of its inefficiency in processors that don't have a floating-point unit.

I was also just looking at the secondaries (also using UM*) for quad-precision signed and unsigned arithmetic secondaries published in Forth Dimensions magazine vol. 4 #1, and it's pretty lengthy.  Without trying it, it seems like doing it in primitives would not take any more code space (and of course would be faster).

As for an 8-bit multiplication, having the high 8 bits of a 16-bit input always 0 (or FF for negative numbers) would make it go faster, but not as fast as it would if you just wrote a separate routine that only looped half as many times and didn't have to handle as much data each time.  If you did an 8x8 often enough, certainly you could write the primitive, but then you would probably write the secondary for the signed multiplication too if you want a 16-bit result, and maybe the */ equivalent for 8-bit if there seems to be any use for it.

For just 8x8, you can get direct answers from the table lookups discussed along with scaled-integer methods at http://wilsonminesco.com/16bitMathTables/index.html .  The 8x8 table is at http://wilsonminesco.com/16bitMathTables/MULT.HEX .

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 27, 2012 1:22 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Hi Michael, in a previous conversation, Finn quoted the article series "6809 The Design Philosophy by Terry Ritter and Joel Boney" as saying:
Quote:
Additionally, the unsigned multiply is easily capable of extension into multiple precision arithmetic. (Try that with a singed multiply!)

To me, that favours an unsigned, or both, for any hardware multiply instruction. For a software implementation, perhaps not so important.
(The other problem with signed multiply is that there's no headroom in positive numbers to represent the product of two maximal negative numbers) (edit: hang on, is that right? Perhaps only in fractional fixed point?)
Cheers
Ed


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: