kakemoms wrote:
Booth multipliers are more up to speed, but still pretty large. I still need to look more into them as a 8-bit*8-bit is only a sum of 8 shifted numbers, so there might be a more efficient solution lurking around somewhere.
If you examine the process that is implemented by the Booth algorithm, it basically starts by shifting one operand right, the multiplier, and determining if an add or subtract operation is required with the other operand, the multiplicand.
To avoid propagating the carries from the low product register through the high product register, the Booth algorithm starts by adding the multiplicand only to the upper half of the product register. The essence of the Booth algorithm is that the effects of the least significant bits of the multiplier and the multiplicand to the upper half of the double length product are computed first. The result of this approach is that the carry chain is cut in half, which can result in a significant increase in the speed of the partial product accumulator.
As you've determined, for any multiplication/accumulator combination that exceeds the parameters of the basic MAC function in the FPGA, the Booth multiplier code for which I provided a link, will result in about the same LUT resource utilization. As long as you stay within the parameters of the built-in functions, then the FPGA resources needed to build a multiplier or multiplier/accumulator are minimized. Otherwise, you are almost better off trading off speed, area, or some combination thereof on your own.
One approach, which I experimented with several years ago, was to simulate the operation of the Booth algorithm in 65C02 assembler. I built a signed 8x8 multiplication routine. (You can find the assembler for it in this github.com repository:
6502-Code-Snippets. There is a thread here on the forum that instigated that effort, which also discussed some other multiplication algorithms that may be of interest to you.) One thing that I learned as part of the effort was that there are some primitive functions/instructions that could be implemented in programmable logic that would save a significant number of instructions if speed was not the overarching goal. Since I was focused on extending my 6502/65C02 core in other ways, I did not spend any more effort on this subject. I have left four opcodes free in my latest core in order to consider adding some type of multiplication based on the results of the Booth multiplier work I did.
Take a look at the code I linked to for you. It may provide some insights that will allow you to decide the best path forward for your 6502-inspired ANN processor. Keep us posted on your progress; we're always interested on new applications of the 6502.