6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 6:01 pm

All times are UTC




Post new topic Reply to topic  [ 50 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Wed Nov 23, 2016 10:36 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Actually, just writing it in the most straightforward way, and letting the tools optimize a bit, works too:
Code:
module life(
    input self,      // center cell
    input [7:0] n,   // 8 neighbour cells
    output alive     // output
    );

wire [3:0] sum = n[0] + n[1] + n[2] + n[3] + n[4] + n[5] + n[6] + n[7];

assign alive = self ? (sum == 2 || sum == 3)
                    : (sum == 3);

endmodule

6 LUTs on Spartan-6, 10 LUTs on Spartan-3.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 23, 2016 10:44 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Great when simple things turn out better! I think I'd recast it more like Life88, something like this:
Code:
module life(
    input self,      // center cell
    input [7:0] n,   // 8 neighbour cells
    output alive     // output
    );

wire [3:0] sum = n[0] + n[1] + n[2] + n[3] + n[4] + n[5] + n[6] + n[7] +self;

assign alive = sum == 3 || (self && (sum == 4));

endmodule

Is that any different? (Looking at it, I like that the sum is over all 9 values, but I'm not so keen on the final expression.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 24, 2016 5:04 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
That's better on Spartan-3 with 8 LUTs, but it jumps to 16 LUTs on Spartan-6.

I also had an idea last night for an even smaller design, but either the tools don't understand it, or the connections inside the FPGA don't allow the carry chain output to be used that way.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 24, 2016 5:34 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Anyway, while it is fun to optimize one cell, reality deals with bigger tiles, both offering opportunities for reuse of partial expressions as well as running into routing restrictions and congestion.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 24, 2016 5:58 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Oh, yes, I'm still thinking in terms of generating 8 cells of output in one clock cycle, taking in 24 bits of input, which come from 32 bits of the life grid. 16 of those bits are reused from one generation to the next. (That's our life42 algorithm)

Because I'm sticking with a list life, which is a sparse array of tiles, there are diminishing returns to using larger tiles, because a larger tile picks up more dead cells and becomes less efficient.

The next step up in size would be life44, which needs:
64 bits of cells
32 of which are new and 32 are reused
36 of the 64 are inputs to a network of 16 cloudlets to produce 16 bits of output

So, instead of writing in 2 bytes and reading back 1, we'd be writing in 4 bytes and reading back 2. Empirically, the number of life44 tiles we'd process is about 70% the number of life42 tiles.

Edit: hang on, life44 is smaller than this - actually it would be:
48 bits of cells
24 of which are new and 24 reused
36 of the 48 are inputs to 16 cloudlets
and so it's 3 bytes in to get 2 bytes out.

But I would describe the network in terms of 8 or 16 cloudlets, expressed something like your code. That would be maintainable!

To use the reconfigurability idea would mean describing a network of the shift registers which is connected suitably to be configured as a cloudlet. As we happen to know that we have this cloudlet architecture, we can program all the cloudlets in parallel. We can hope to execute various 3x3 kernels over our 6x4 or 6x6 inputs to produce our 4x2 or 4x4 outputs.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 28, 2016 6:20 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Dave prompted me to think a bit more about the large-table approach to Life, which ought to be faster than the small-table approach we have (which itself has turned out to be faster than the no-table approach Dave wrote first.)

The small-table approach takes the 4x3 surround of a 2x1 target and delivers the 2 bits of result with a single lookup into a 4k table. We use this four times in Life42, which holds a 4x2 pattern of cells in a single byte. (I did mock up a Life44 which holds a 4x4 pattern of cells in a pair of bytes, and it looks workable but not necessarily compelling. Possibly converting from C to assembly would be worthwhile.)

The large table approach is to take a 4x4 surround of a 2x2 target and deliver 4 bits of result with a single lookup. We did have a look at some assembly code for this and it looks like it should be fast, but we didn't go so far as to make it fully functional. The table size is 32k bytes - will fit in a 64k machine but wouldn't leave a huge amount of space for the universe. Maybe 12k instead of 44k.

Can we do better than 32k bytes? I believe we can, by using the symmetry of the square to reduce the general 4x4 pattern down to a smaller number of patterns in a canonical form.
- By inspecting 2 bits on adjacent corners, and choosing whether or not to flip the pattern (and then the result) we reduce by 25% - from 32k down to 24k
- By inspecting 4 bits along an edge and choosing whether or not to flip we reduce by 6/16ths, so 32k becomes 20k - this seems worthwhile. It might increase the available space for the universe from 12k to 24k.
- By doing both - carefully! - at right angles, we can reduce from 32k to 15k. I think this is feasible but tricky. I think we actually need to look at 3 corners before the first flip, and then look at an edge.
- We could even apply the second technique twice, I think, to get down from 32k to 13k. I think this is past the point of diminishing returns.
- We haven't looked at rotations at all. As rotations move bits between the byte pairs in a more general way than either a left-right flip or top-bottom flip, it seems like it would be quite expensive to do the forward transformation. I don't expect to think further about this possibility.

If anyone has any bright ideas for more rapid calculation of the 2x2 future of a 4x4 patch in 6502 assembly, I'm all ears!


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 28, 2016 6:31 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Can you grab 8 pixels from the corners/edges, and use a small table lookup to find out rotation/mirroring ?


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 28, 2016 6:43 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
That's an idea! For the one-edge approach, we only have to look at one byte to determine the transformation, and indeed a table is used in the approach I have in mind. Looking at both bytes will clearly mean a bit more fiddling - but indeed, it might not be too bad at runtime, just difficult to think through.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 11, 2016 4:58 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Isaac Kuo wrote three very small life programs for the memory-constrained VIC-20 some years ago, and (with permission) I quote his comments here:

Quote:
My fast optimized unexpanded VIC game of life fit in 976 bytes (this includes a BASIC stub to start the machine code, but for purposes of the challenge the BASIC code can be ignored and the code directly started with a typed in SYS call.) For purposes of the rules in this challenge, I think an additional 8 bytes of ROM are utilized - the character ROM for "\" is used as a table to calculate 2^X (where X is from 0 to 7).

http://isaackuo.cloudapp.net/stuff/dynalife/

My first version actually fit in 641 bytes, including a pixel editor (both keyboard and joystick input) and code to dynamically allocate custom tiles where needed (the unexpanded VIC doesn't have enough RAM to provide a bitmap to the entire screen...I dynamically allocate and trash collect tiles where 1+ pixels are set).

But the routine for running the Life simulation was way too slow for my tastes. I kept improving things until I got what is perhaps the fastest full screen Life simulator possible on an unexpanded VIC. It does a lot of weird stuff in parallel, and crams various tables and arrays in RAM that the video chip doesn't have access to (including the weird nibble memory of the alternate color page).

It's generally annoying to use [the nibble-wide memory] because the upper 4 bits could be read as anything. I use it as a temporary store of the lowest two bits of a vertical strip of pixels. Writing to it is simply copying the currently processed byte to the nybble store...don't care that the upper 16 bits are discarded.

Then, when it gets read, an AND mask wipes everything except for the lowest two bits anyway. So there's no performance penalty for dealing with the random junk read in the upper nybble.

Basically, the program processes things in vertical strips, top to bottom. But it's not aligned how you might expect. The input is a 10 pixel wide strip including two pixels from the previous column. The output is an 8 pixel wide strip which includes one pixel from the previous column and only including 7 pixels from the current column. You might think that the most natural thing would be to output the 8 pixels aligned with the current column of tiles. But that turns out to be less efficient since it has to draw data from three different columns, and it's also a nightmare keeping track of last turn's pixels vs next turn's pixels, unless you have full double-buffering. On the unexpanded VIC, RAM is way too tight for full double-buffering.

Instead of double-buffering, I store a temporary copy of a 2 pixel wide strip. The 3x4 bytes used to store neighbor totals aren't corrupted by the flipping pixels because the currently written row was already read and summed via a couple table lookups. But that thin vertical strip needs to be saved so the previous turn's version of those pixels can be used for calculations by the time the "paint roller" moves on to the next column.

The most important optimization was the system which tracks "clean" tiles so they can be entirely skipped. But the various weird things done in parallel roughly double performance compared to a more naive optimized implementation.

I was trying to keep memory usage low and frame rate high because I had this idea for using Life as a basis for a trilogy of videogames:

LIFECOMMAND - a bioweapon/zombie apocalypse themed Missile Command variant, which I did actually revisit in HTML5 form: http://isaackuo.cloudapp.net/ijk/

LIFETEROIDS - after evacuating Earth, the generation ship Exodus must survive waves of aliens (Life spaceships)

LIFEBOTRON - Not really sure how this one would have worked, but it would have featured robotron style action defending the colonists from the aliens in their new world

Hmm...I found some of the opening title text I had started writing, designed to fit within the tight confines of the VIC20 screen. It would have ticked onto the screen character by character, like a teletype.

Opening/Ending of LIFECOMMAND
--------------------
IN YEAR 199X A.D.

THE ALIEN ASSAULT
THREATENS THE FEW
REMAINING CITIES.

STRATEGIC DEFENSE
PROTOTYPE MITHRIL
IS THE ONLY HOPE.
--------------------
IN YEAR 199X A.D.

STRATEGIC COMMAND
COMPLETES THE ARK
SPACESHIP EXODUS.

THE EARTH IS LOST
TO THE ALIENS BUT
MANKIND SURVIVES.
--------------------

Opening of LIFETEROIDS
--------------------
IN YEAR 199X A.D.

THE ALIEN ASSAULT
DESTROYED THE FEW
REMAINING CITIES.

THE ORBITAL ALIEN
ARMADA INTERCEPTS
SPACE ARK EXODUS.
--------------------

When I was coding this stuff, I didn't do any research into sophisticated Game of Life algorithms. I was just doing what was straightforward to me. Better to code something you make up yourself and understand, than to try and code based on something you don't understand.

And the unexpanded VIC's tight RAM prevented me from even considering the various algorithms that pop up when you do a cursory search on Game of Life algorithms. (They use huge table lookups.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 11, 2016 9:46 pm 
Offline

Joined: Sun Dec 11, 2016 9:29 pm
Posts: 2
I just registered, and I'm subbing this thread. I'm not necessarily up for much participation in a new forum right now, but I will be at some point! Previously, I had been active on VIC20 Denial, but that came to an abrupt end when my twin sons were born. I still have this backburner VIC20 project for getting Apple ][ resolution by exploiting the way the NTSC VIC20 does chroma signal timings, but it's...weird. And very VIC20 specific, not really a 6502 thing...

In principle I could make a game of life using it, but the limited graphics RAM of even an expanded VIC would greatly exacerbate the tile limitation. Essentially, each 14x16 pixel tile would consume 32 bytes rather than just 16 bytes, so only around half of the screen could be occupied with "dirty" tiles. Still, having 280x192 resolution instead of just 160x192 resolution on a VIC20 would be a major coup! (The way to get this resolution would be to use yellow or green tiles, and take advantage of the fact that on a monochrome monitor colors appear as alternating checkerboard patterns. This checkerboard mask is effectively ANDed with the tile bitmap; by altering the character set each field you can effectively get each 8x16 tile to display 14x16 with half of the pixels displayed each field.)


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 12, 2016 6:59 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Good to see you here! I think we've had one or two discussions here about unusual approaches to video output, so your NTSC idea might well be of general interest, if at some point you feel ready to explain. (Probably in a new topic though, as this one probably doesn't have much of an audience!)


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 13, 2016 1:46 am 
Offline

Joined: Sun Dec 11, 2016 9:29 pm
Posts: 2
Thanks so much for the warm welcome! I need to have some sort of gizmo to get files from my PC clones to the VIC20 and C128 before I get serious about the NTSC thing. My experiments had been strictly in the form of manually typed in POKEs which is...well a frustrating experience beyond a certain very small scale.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 13, 2016 10:06 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I certainly moved forward when I got an SDCard gizmo... it's an interesting question, so I've started a thread at
viewtopic.php?f=1&t=4339


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 22, 2016 4:00 pm 
Offline

Joined: Tue Jun 08, 2004 11:51 pm
Posts: 213
Hi
Keeping 9 bits of information is redundant. You only need to
know if you had a neighbor you already know who you are.
You only keep track of neighbors in 8 bits.
The display can be picked up from a neighbor, like a row off.
8 bit and a lookup tables of 256 tell you if you create, maintain or destroy life.
Determining how to deal with edges is the more complicated problem.
Torus or fall of the edge are the two choises.
Since life is sparse, only calculate were neighbors are when you need to change life
status. You then update your neighbors.
Use simple powers of 2 for arrays. Old and new generations only need 3
more rows but wasting 1 extra row makes a nice binary rolling array.
The beginning of the array moves 4 rows on each generations.
Dwight


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 27, 2016 6:24 pm 
Offline

Joined: Wed Jun 25, 2014 4:28 pm
Posts: 21
I think the 256 byte version for the C64 by Ruk/P-a Bäckström was already mentioned - here is the link:
http://csdb.dk/release/?id=104384

It's pretty amazing. Alas, the link to the source code is broken and I haven't been able to find it anywhere else either.
Does anyone have the source code?

Kind regards,

Oscar.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 50 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: