Ed:
As you suspect, the amount of logic increases as the number of bits processed increases. My objective for the Booth multipliers was not efficiency as much as FPGA vendor independence. My project attempts to ensure that the IP used in the design is 100% unencumbered by vendor-specific IP so that it can be ported seamlessly from one FPGA vendor/family to another. Since some FPGA families do not include built-in hardware multipliers, I needed a relatively fast way of doing 40-bit fixed poind arithmetic. The 40 clock cycles required for the 1 bit Booth multiplier would be acceptable, but I wanted a bit more performance since I had the resources to spare. Unlike the fully pipelined arithmetic shifter which follows it in the arithmetic processor, the Booth multipliers are not pipelined. Like all of the arithmetic elements in the signal processor, they register the mulitplicand and multiplier when an enable/start signal is asserted and provide a RDY output when the product is ready. Thus, if a future target is resource limited, the 1 bit Booth multiplier can be dropped in place of the 4 bit Booth multiplier. If this change was required, no changes would be required to the signal processor microcode because the module interfaces provide the necessary timing control.
I set the bit width to 16 for comparison purposes and synthesized three different multipliers. The following table provides a comparative look at the increased resource utilization and decreased performance that you referred to in the *UM Forth thread:
Code:
-----------------------------------------
| x1 | x2 | x4 |
-----------------------------------------
#FFs | 87 | 87 | 87 |
#LUTs | 129 | 183 | 647 |
#Slices | 93 | 136 | 362 |
-----------------------------------------
Constraint | 6 ns | 8 ns | 10 ns |
Achieved | 5.719 | 7.118 | 9.960 |
-----------------------------------------
As expected the number of logic implementation elements, LUTs for the Xilinx Spartan 3AN used, increases as the number of bits processed are increased. Also the additional adders and multiplexers needed decrease the performance from a base of 5.72 ns to 9.96 ns. This performance changed approximately 75%, a significant decrease in performance. The performance difference between the 1 bit and 4 bit multipliers did not decrease as much as the increase in the resources used: 129 to 647 LUTs. The 5x increase in LUTs is significant.
The supporting data is provided below.
The following is taken from the synthesis report for the 1 bit Booth multiplier. It will be used as the base for comparison purposes.
Code:
=========================================================================
* HDL Synthesis *
=========================================================================
Performing bidirectional port resolution...
Synthesizing Unit <Booth_Multiplier>.
Related source file is "Booth_Multiplier.v".
Found 32-bit register for signal <P>.
Found 1-bit register for signal <Valid>.
Found 17-bit register for signal <A>.
Found 5-bit down counter for signal <Cntr>.
Found 34-bit register for signal <Prod>.
Found 17-bit 4-to-1 multiplexer for signal <S>.
Found 17-bit adder for signal <S$addsub0000> created at line 91.
Found 17-bit subtractor for signal <S$addsub0001> created at line 92.
Summary:
inferred 1 Counter(s).
inferred 84 D-type flip-flop(s).
inferred 2 Adder/Subtractor(s).
inferred 17 Multiplexer(s).
Unit <Booth_Multiplier> synthesized.
=========================================================================
HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 2
17-bit adder : 1
17-bit subtractor : 1
# Counters : 1
5-bit down counter : 1
# Registers : 4
1-bit register : 1
17-bit register : 1
32-bit register : 1
34-bit register : 1
# Multiplexers : 1
17-bit 4-to-1 multiplexer : 1
=========================================================================
And the following is the MAP report for the 1 bit Booth Multiplier showing the final resource utilization used when meeting a 6 ns PERIOD constraint.
Code:
Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Number of Slice Flip Flops: 87 out of 3,584 2%
Number of 4 input LUTs: 129 out of 3,584 3%
Logic Distribution:
Number of occupied Slices: 93 out of 1,792 5%
Number of Slices containing only related logic: 93 out of 93 100%
Number of Slices containing unrelated logic: 0 out of 93 0%
*See NOTES below for an explanation of the effects of unrelated logic.
Total Number of 4 input LUTs: 129 out of 3,584 3%
Number of bonded IOBs: 68 out of 195 34%
Number of BUFGMUXs: 1 out of 24 4%
The synthesis report for the 2 bit Booth multiplier shows the expecte growth in the product register needed to guard against arithmetic overflows when adding/subtracting the ±2*M terms to the product register.
Code:
=========================================================================
HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 4
18-bit adder : 2
18-bit subtractor : 2
# Counters : 1
5-bit down counter : 1
# Registers : 5
1-bit register : 2
18-bit register : 1
32-bit register : 1
34-bit register : 1
# Multiplexers : 1
18-bit 8-to-1 multiplexer : 1
=========================================================================
and its MAP report
Code:
Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Number of Slice Flip Flops: 87 out of 3,584 2%
Number of 4 input LUTs: 183 out of 3,584 5%
Logic Distribution:
Number of occupied Slices: 136 out of 1,792 7%
Number of Slices containing only related logic: 136 out of 136 100%
Number of Slices containing unrelated logic: 0 out of 136 0%
*See NOTES below for an explanation of the effects of unrelated logic.
Total Number of 4 input LUTs: 183 out of 3,584 5%
Number of bonded IOBs: 68 out of 195 34%
Number of BUFGMUXs: 1 out of 24 4%
The reports for the 4-bit Booth multiplier follow:
Code:
=========================================================================
HDL Synthesis Report
Macro Statistics
# Adders/Subtractors : 16
20-bit adder : 8
20-bit subtractor : 8
# Counters : 1
5-bit down counter : 1
# Registers : 5
1-bit register : 2
20-bit register : 1
32-bit register : 1
36-bit register : 1
# Multiplexers : 1
20-bit 32-to-1 multiplexer : 1
=========================================================================
Code:
Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Number of Slice Flip Flops: 87 out of 3,584 2%
Number of 4 input LUTs: 647 out of 3,584 18%
Logic Distribution:
Number of occupied Slices: 362 out of 1,792 20%
Number of Slices containing only related logic: 362 out of 362 100%
Number of Slices containing unrelated logic: 0 out of 362 0%
*See NOTES below for an explanation of the effects of unrelated logic.
Total Number of 4 input LUTs: 647 out of 3,584 18%
Number of bonded IOBs: 68 out of 195 34%
Number of BUFGMUXs: 1 out of 24 4%