6502.org • View topic - Conway's Life on 6502

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Conway's Life on 6502

Page 2 of 4

[ 50 posts ]

Go to page Previous 1, 2, 3, 4 Next

Previous topic | Next topic

Author

Message

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 5:24 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

On Xilinx, you can configure a LUT as an SRL16 shift register. The cool trick is that you get 4 address lines that allow you to pick any of the 16 bits as output, but you also get a shift register that you can use to dynamically alter the table.

It's almost like a dynamically programmable LUT, but unfortunately for this application, you don't get access to the carry chain.

Edit: and of course, you also don't get to dynamically update the routing between various LUTs.

Attachments:

SRL16.png [ 30.63 KiB | Viewed 1229 times ]

Top

MichaelM

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 5:30 pm

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL

Alternatively, you could take a microprogrammed approach like the one that Dr Jefyll linked to on anycpu.org. You can use distributed RAM or Block RAM to implement microprogram control store in a writeable manner.

_________________
Michael A.

Top

BigEd

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 5:35 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

Arlet wrote:

On Xilinx, you can configure a LUT as an SRL16 shift register

That's rather interesting - so we could reprogram a patch of the FPGA by serial shifting. We could presumably architect in some muxes which drive some adder or multiply resources if we wished.

For Life, we're only counting up to 3 or 4 (or 8 or 9, depending on how we do it) so a fast carry shouldn't be needed. But making this a general piece of reprogrammable fabric has an attraction.

(And of course, the serial shifting could be done automatically, taking inputs from byte-wide input registers, for easier programming by the host micro.)

MichaelM wrote:

Alternatively, you could take a microprogrammed approach

That's also interesting, although it will, I think, generally mean taking several cycles to run a microprogram to get something done. Sometimes this will be a good way to go.

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 5:37 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

You could also implement a 'fast forward' algorithm where you look at bigger squares and do several generations in one step.

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 5:39 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

BigEd wrote:

That's also interesting, although it will, I think, generally mean taking several cycles to run a microprogram to get something done. Sometimes this will be a good way to go.

If you have a 6502 pushing and pulling the data, you do have several cycles available.

Top

BigEd

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 5:44 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

That's true. Much depends on what you're trying to compute - the balance of computation and communication, and the amount of intermediate state. I can imagine a crypto module which has to compute several rounds is well-suited to a microsequenced implementation running several iterations of a program, for example.

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 6:56 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

BigEd wrote:

Ah, of course - 24 bits in, 8 bits out, just a big mass of logic in between. But it's all quite simple stuff, so single clock cycle for sure.
Edit: even quicker if it's 32 bits in, 8 bits out!

Even quicker if it's 8 bits in, 8 bits out where you use a bit of FPGA memory to remember the surrounding area.

Top

BigEd

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 7:07 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

Hmm, I'm not sure how to do that - the surrounding area is also being updated. It's true that in the present algorithm we tend to shift the window to the right, so the current generation's context usually inherits half its values from the previous generation. But sometimes we skip from one patch to a non-adjacent patch... maybe that's the same thing, as we shifted in zeros last time.

So, maybe we need to update 16 bits of context, and we inherit 16 bits from the previous iteration - and indeed, that's quite a saving.

(The above numbers are from Life42 - other approaches will of course differ.)

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 7:46 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

Yes, to calculate N^2 block of new generation you need to send over (N+2)^2 of old generation, but as N increases the ratio will get closer to 1.

Naturally, sparse representations of the field will have more overhead.

Top

BigEd

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 7:56 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

The problem with larger tiles is that they tend to bring in more empty space - 4x2 is working pretty well, and 4x4 isn't too bad but we can't yet tell if it's better or worse overall.

The nice thing about what we've just noticed is that it breaks down into

16 bits of state inside the engine
16 bit of input
8 bits of output
A big (but shallow) cloud of logic to make the 8 bits from the 32 inputs
A trivial transformation to update the 16 bits of state from the 32 inputs

To figure out what kind of architecture would be useful to implement the big shallow cloud, without losing too much generality, I'd have to think a bit. Possibly Dave's dataflow diagram of Life88 will help.

We know that fundamentally we'll need 9 bits of input to go to each bit of output, for the case of Life. We don't need a fully general function of 9 bits (fortunately) so we implement one or two computation layers in between input and output. If we accumulate a four-bit sum for each output bit, that's 8 four-bit totals as intermediate results.

But is population count of 8 or 9 inputs very much logic? I don't know.

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 8:15 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

Just a quick and dirty 9 input saturating summation takes 11 LUTs on a Spartan-3 or Spartan-6.

Top

BigEd

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 8:29 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

Thanks - in fact for Life there are two cases, so it's more of an 8 input count with the 9th input helping to reach the 1 bit conclusion. But clearly this gives an idea of the total amount of stuff in this shallow cloud.

Probably, by building a cloud which can implement a 4x2 life, we'll have something which is fairly local and has a bit of a 2-d flavour to it.

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 8:49 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

If I did it right, the total logic for 9 inputs, 1 live/dead output is also 11 LUTs.

Code:

module life(
    input self,      // center cell
    input [7:0] n,   // 8 neighbour cells
    output alive      // output
    );

// 3 partial sums, all fit in 2 bits
wire [1:0] sum12  = n[0] + n[1];
wire [1:0] sum345 = n[2] + n[3] + n[4];
wire [1:0] sum678 = n[5] + n[6] + n[7];

// saturated add of partial sums using 3 bits
wire [2:0] sum12345 = sum12 + sum345;
wire [2:0] sum = sum12345[1:0] + sum678;

// dead when 4 or more neighbours.
wire dead4 = (sum12345 >=4 ) | (sum >= 4);

// lower 2 bits of sum for 0..3 neighbours
wire [1:0] sum3 = sum[1:0];

assign alive = self ? (~dead4 & (sum3 >= 2) ) 
                    : (~dead4 & (sum3 == 3) );

endmodule

Actually the 11 LUTs was for Spartan-3. It's only 6 LUTs on Spartan6.

Top

BigEd

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 9:03 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England

Nicely done! The next step, I suppose, is to extract the connectivity of the 11 LUTs and make a general 'cloudlet' module which includes that connectivity and preferably extends it in a generally useful way by filling out unused connections and adding some small number of LUTs as appropriate. (It would be nice if it projected onto a two or three layer PLA kind of structure, then we know we're quite general.)

Then, 8 or more of these cloudlets make up the logic cloud.

I'm thinking this looks a bit like an image processing kernel. As a quite different possible application, would something of this shape be useful for ECC decoding or similar?

Top

Arlet

Post subject: Re: Conway's Life on 6502

Posted: Wed Nov 23, 2016 9:12 pm

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

Adding 3 cells at a time is probably optimal because it can be done in a single LUT using the carry input and the carry output. I tried adding 2 groups of 4, and it took more LUTs.

Top

Page 2 of 4

[ 50 posts ]

Go to page Previous 1, 2, 3, 4 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 27 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum