6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 9:27 am

All times are UTC




Post new topic Reply to topic  [ 19 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Sun Jun 18, 2017 2:26 pm 
Offline
User avatar

Joined: Fri Mar 31, 2017 7:52 pm
Posts: 45
Interesting project. I've been sketching out thoughts on a "from scratch" CPU in FPGA. That seems like a pretty good way to ramp up my knowledge and experience with them. I was looking at an all TTL CPU I built for my Digital 2 class in college back in 1985 (unfortunately I've only been able to find about half my notes and none include the final design) and the National Radio Institute NRI-832 "trainer" (also all TTL and from the early 1970s) for ideas on an instruction set. I had not considered using a subset of the 6502. I really like your idea of 16 bit instructions based on a subset.

Thanks,
Jim


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 19, 2017 11:56 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
jim30109 wrote:
Interesting project. I've been sketching out thoughts on a "from scratch" CPU in FPGA. That seems like a pretty good way to ramp up my knowledge and experience with them. I was looking at an all TTL CPU I built for my Digital 2 class in college back in 1985 (unfortunately I've only been able to find about half my notes and none include the final design) and the National Radio Institute NRI-832 "trainer" (also all TTL and from the early 1970s) for ideas on an instruction set. I had not considered using a subset of the 6502. I really like your idea of 16 bit instructions based on a subset.

Thanks,
Jim


Well, the idea of an 8-bit address bus was actually Big Ed's, but it fit nicely with my intention of making a multi-core solution for neural network simulation. And I have always loved the 6502 due to its simplicity and down-to-earth architecture. Instead of trying to bloat the instruction set to do all sorts of things, I needed a more specialized core that fits better with neural networks, e.g. each "neuron" being a small simple function, but having a large number of nodes that can move data around at high speed. E.g. it is both more efficient and less costly than other solutions, so it kind of makes sense. :)

PS: You can probably expand the instruction bus and address bus if you need more memory, but at the cost of some lower speed. Since I divided the instruction- and data-memory, it is also possible to have a larger instruction memory as long as one keep branches relative (as you probably noticed there is no JMP or JSR instructions).

Renato


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 25, 2017 12:30 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
Keeping the core small, retaining speed and implementing multiplication seems to be hard. I tried the standard accumulator*xregister (which is then done by the interpreter), but it bloated the core with 110 LUTS and reduced the speed to 27MHz(!).

Booth multipliers are more up to speed, but still pretty large. I still need to look more into them as a 8-bit*8-bit is only a sum of 8 shifted numbers, so there might be a more efficient solution lurking around somewhere.

For now I have settled with the squaring method, so EOR opcodes have been replaced with SQR:
SQR A,
SQR $ZP
SQR $ZP,X
Each takes 1 cycle.

It gives a core size of 276 LUTS (including memory logic), and requires 2KB. It seems to be able to run above 100MHz.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 25, 2017 2:24 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
kakemoms wrote:
Booth multipliers are more up to speed, but still pretty large. I still need to look more into them as a 8-bit*8-bit is only a sum of 8 shifted numbers, so there might be a more efficient solution lurking around somewhere.
If you examine the process that is implemented by the Booth algorithm, it basically starts by shifting one operand right, the multiplier, and determining if an add or subtract operation is required with the other operand, the multiplicand.

To avoid propagating the carries from the low product register through the high product register, the Booth algorithm starts by adding the multiplicand only to the upper half of the product register. The essence of the Booth algorithm is that the effects of the least significant bits of the multiplier and the multiplicand to the upper half of the double length product are computed first. The result of this approach is that the carry chain is cut in half, which can result in a significant increase in the speed of the partial product accumulator.

As you've determined, for any multiplication/accumulator combination that exceeds the parameters of the basic MAC function in the FPGA, the Booth multiplier code for which I provided a link, will result in about the same LUT resource utilization. As long as you stay within the parameters of the built-in functions, then the FPGA resources needed to build a multiplier or multiplier/accumulator are minimized. Otherwise, you are almost better off trading off speed, area, or some combination thereof on your own.

One approach, which I experimented with several years ago, was to simulate the operation of the Booth algorithm in 65C02 assembler. I built a signed 8x8 multiplication routine. (You can find the assembler for it in this github.com repository: 6502-Code-Snippets. There is a thread here on the forum that instigated that effort, which also discussed some other multiplication algorithms that may be of interest to you.) One thing that I learned as part of the effort was that there are some primitive functions/instructions that could be implemented in programmable logic that would save a significant number of instructions if speed was not the overarching goal. Since I was focused on extending my 6502/65C02 core in other ways, I did not spend any more effort on this subject. I have left four opcodes free in my latest core in order to consider adding some type of multiplication based on the results of the Booth multiplier work I did.

Take a look at the code I linked to for you. It may provide some insights that will allow you to decide the best path forward for your 6502-inspired ANN processor. Keep us posted on your progress; we're always interested on new applications of the 6502.

_________________
Michael A.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 19 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: