6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 11:28 am

All times are UTC




Post new topic Reply to topic  [ 45 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Thu Apr 13, 2017 7:46 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Because I'm interested in a useful benchmark between different processors, and also because we were discussion dynamic memory allocators, I had an idea for a programming benchmark challenge.

The goal is to write a dynamic memory allocator, and then compare performance between different CPUs. Not only will this give a somewhat realistic benchmark, it will also result in useful code that people can adopt in their own programs, or use for studying purposes.

The rules:
  • The memory allocator is initialized with start/end address of a heap. The size of the heap is in relation to typical memory for the processor (64kB for 65C02/Z80/6809/STM8, 1MB+ for 65816/ARM/x86).
  • Two functions/subroutines must be provided:
    • An 'alloc' routine, which takes a size argument, and returns the address of a suitable block, or an error code (e.g. 0 value), when no suitable block is available.
    • A 'free' routine which takes the address of a previously allocated block, and adds it back to the heap.
  • The allocator must use a 'best fit' algorithm, which means that it should keep all free blocks in a list, find the one that is closest to the required amount, and return that. If the block is bigger than the required amount, the block should be split in two, and the remainder placed back on the free list for future allocations. When the 'free' function is called, little blocks should be combined into bigger blocks as much as possible.
  • Apart from the heap itself, the allocator should only use a small static amount of memory.
  • Code must run from ROM. Self-modifying code is allowed, but it needs to be copied to RAM first.
  • You must allow for the heap to be as big as all available memory on the chosen platform (ZP and stack excluded).
  • A single allocation may successfully request all free memory in one time, with maximum of 64 kB.

Performance will be measured by number of cycles for a (to be defined) test scenario, total code size, and estimated development time. Sharing of ideas and code is encouraged, and algorithms can be refined as much as possible.

Anybody interested ?


Last edited by Arlet on Sun Apr 16, 2017 12:13 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 10:06 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Arlet wrote:
...
The rules:
  • The allocator must use a 'best fit' algorithm, which means that it should keep all free blocks in a list, find the one that is closest to the required amount, and return that. If the block is bigger than the required amount, the block should be split in two, and the remainder placed back on the free list for future allocations. When the 'free' function is called, little blocks should be combined into bigger blocks as much as possible.
  • Apart from the heap itself, the allocator should only use a small static amount of memory.
...
Performance will be measured by ... estimated development time.

Nice idea. But why do you wish to restrict the algorith to 'best fit'? I think there should no restriction upon the implementation strategie - e.g. a first fit (faster) could be more desireful within a multitasking environment where turnaround time is valuable.

The development time should not be a part of the performance evaluation. Too many variables influences that. Runtime performance (avarage and max response times (alloc/free), code and memory requirements) should be evaluated and tabulated. This way, everyone could choose between different implementations depending on the demands.

Perhaps a torture sequence of allocs and frees should be prepared for this as well :)


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 10:14 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The main reason to "enforce" the best fit algorithm is make sure the different implementations are all doing the same thing, and we're not comparing first fit on one CPU to best fit on another, which would skew the results.

Of course, if you want to do first fit or something else, that's possible, and interesting, but I would recommend doing it as a second option, after finishing best fit.

Development time is just to get a rough idea.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 10:32 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
One thin'k' more: strategies may vary and perhaps(?) could benfit whether free() needs to do housekeeping in time or not. In a multitasking environment a separate regular scheduled housekeeping() could do the clean up perhaps better and free() becomes very quick.
Quote:
Of course, if you want to do first fit or something else, that's possible, and interesting, but I would recommend doing it as a second option, after finishing best fit.

I understand. ( :twisted: it's just a question how 'best' is interpreted :wink: )


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 11:16 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GaBuZoMeu wrote:
it's just a question how 'best' is interpreted :wink:

In the context of dynamic memory allocation, "best fit" is a well known algorithm. It means that you search through all the free blocks, and get the one that is the closest match to the requested size.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 12:50 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
I was merely joking.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 5:29 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8509
Location: Midwestern USA
Arlet wrote:
GaBuZoMeu wrote:
it's just a question how 'best' is interpreted :wink:

In the context of dynamic memory allocation, "best fit" is a well known algorithm. It means that you search through all the free blocks, and get the one that is the closest match to the requested size.

Are you requiring that the request for memory be exactly filled? For example, must the algorithm return 26 bytes if the programs requests 26 bytes, or are you allowing fixed sized blocks, e.g., 32 bytes in response to a 26 byte request?

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 5:49 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
Are you requiring that the request for memory be exactly filled? For example, must the algorithm return 26 bytes if the programs requests 26 bytes, or are you allowing fixed sized blocks, e.g., 32 bytes in response to a 26 byte request?

Increasing the block size by a modest amount is allowed, for instance to guarantee alignment (if applicable), to allow some room for bookkeeping information, or to prevent unusably small blocks from getting created. For instance, if you have a 32 byte free block, and there's a request for 26 bytes, it could be that the remaining 6 bytes are too small to be used, so you can hand out 32 bytes instead. Of course, when the block is free'd later, you have to free all 32 bytes.

The goal is to keep this overhead limited. I don't think a request for 129 bytes should be rounded up to 256.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 11:24 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
I agree the "Best fit" requirement should be dropped. The problem is to dynamically allocate, and free memory as represented by the interface of two functions.

The task of the evaluator is to drive the software hard enough for their use cases to see which one best fits their needs.

Clearly there must be some disadvantage to a "first fit" vs a "best fit" algorithm, whether the evaluator will encounter that disadvantage will have to wait for the test results.

Most allocators work very well with a clean heap. It's when the heap is fragmented that things go awry, so it would behoove the evaluator to fragment the heap to test some "worst case" situations. But for an application that may just be allocating and free 1K blocks, an allocator that does that well, but, perhaps, performs poorly on allocating a similar amount of memory in 50 byte blocks could still be a perfectly useful allocator for their use case.

So, rather than specifying an implementation criteria, posit the testing and judging criteria instead and let the allocator be the black box it should be.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 13, 2017 11:31 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I wrote a garbage collector for the 6502 way back in the day. I was going to use it for a word processor that I never finished, and it was optimized for lines of <127 bytes, so it wouldn't be applicable to the current task anyway. Could a garbage collector be included in the specification though, to give an allocation request a second chance for success after an initial fail due to fragmentation?

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 2:10 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Arrgh! It’s been so long since I looked at malloc() and free() that I can’t find a general purpose one I basically wrote myself. I just use the version in the ‘The Standard C Library’ by Plauger with a minor change to use semaphores during allocation and freeing. The idea was to make it thread safe.
Quote:
Could a garbage collector be included in the specification though, to give an allocation request a second chance for success after an initial fail due to fragmentation?

The challenge does say that free() should try and recombine blocks much as possible. Given that free would combine neighbouring blocks there’s little for a garbage collector to do unless you go outside of neighbouring blocks and that requires being able to move objects in memory around. The garbage collector would have to use handles to memory objects rather than pointers.

What I’d like to see is a simple paged memory management unit memory allocator / freer. I’m using something similar in design to a 6829 MMU but I haven’t got around to finishing off the driver for it.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 4:07 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Maybe I'm confused about the difference between a garbage collector and a defragmenter. My idea would involve merging all of the free blocks into a single block before giving up. And yes, moving active blocks would require an extra layer of indirection that may not be appropriate for a simple environment.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 4:21 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
My idea would involve merging all of the free blocks into a single block before giving up

Yes, merging free blocks is required. I would do it right away as part of the free(), and not wait until allocation fails, otherwise you get much more fragmentation.

As far as other ideas, I want to avoid endless discussions about different kind of allocators or usage scenarios. We all love to argue, but I want to see some code. The challenge is "best fit", not because it's the ultimate best, but because it makes it easier to compare performance with a simple test. Do you want to participate in the challenge ? Then write a (or improve someone else's) "best fit" allocator.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 9:07 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
My own attempt, on Cortex M4. With fresh heap, doing an alloc() takes 57 cycles. Same for repeated allocs. A free() also takes 57 cycles. Coalescing a free block with 2 neighbors takes 64 cycles. Allocating 8 blocks, and freeing 4 of them, followed by another malloc(), takes 99 cycles because it's doing 5 iterations to find the best fit. So, each iteration on the freelist takes about 10 cycles extra on top of regular alloc.

By the way, this is compiled C code. I've looked at the assembly, but there's nothing obvious that I can improve.

Edit: improved first time alloc to 52 cycles (1 cycle saved by going to assembly :D and removing unnecessary instruction)

Edit2: improved free to 54 cycles (non coalescing), 59 cycles (coalesce on one side), 65 cycles (coalesce on both sides)


Last edited by Arlet on Sat Apr 15, 2017 7:05 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 10:23 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
This is how my storage works:

The heap is divided in 'blocks', each block consisting of a number of 32 bit words. The first and last word of each block contain the size. For instance, when alloc is called with argument 40, a block of 10 words (40 bytes) is returned, but in reality a block of 12 words is allocated, with size fields hidden at both ends.
Code:
[1 | size][returned 40 byte data block][size]

When free() is called, the allocator can look up the original size by fetching the word at offset -1. But, also, it can easily find the blocks to left and to the right. To find the block on the right, simply add the length to the address. To find the block on the left, look at offset -2 where the length of the left block is stored, and subtract that length.

To find out if a block is in use or not, the MSB of the first size field is set to '1' for a used block, and set to '0' for a free block. All free blocks are also linked on a double-linked list, the 'free list'. When a block is on the free list, it looks like this:
Code:
[0 | size][prev][next][free data][size]

Part of the user data is used for 2 pointers. This puts the minimum block size at 4 words.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 45 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron