Strange, does WDC not provide the source code to their libraries so you can debug this sort of thing?
Rolling your own malloc() for a single process running on the 816, wouldn't be terribly difficult.
The simplest implementation I know of would be a simple bump allocator. You just have one pointer that points to the next "free" point in memory, and you increment it by the length you wish to allocate.
Code:
void *next_pointer = (void)0x0012F11; // Or where ever you want to start from
void *malloc(size_t size)
{
void *rval = next_pointer;
if ((next_pointer + size) > far_heap_end)
return NULL;
next_pointer += size; // Set up for next allocation
return rval;
}
The disadvantage to this approach is that you can't really "free" memory after you've allocated it. I've used this method for when I want to do some quick simple OS development though and don't want to write out a full memory manager.
There are several different ways you can go about making a more complex allocator of course. Really depends on what you're looking to do though.
For the small memory area of the 816, I'd probably do it one of two ways:
1) Use a simple bitmap, and divide the memory into chunks. (E.g. a single page, 4KiB on the Intel platform)
This would be about 512 bytes of memory to track this.
Each bit indicates if the that chunk is free (0) or allocated (1)
Nice thing is you can skip over several chunks at once just checking for MAX_INT. You never allocate less than one chunk.
When you release memory, you flip a bit back to 0 in your bitmap.
This is pretty fast, and easy to implement, but can leave lots of large gaps in your memory if the structures of your allocations are frequently small.
FreeBSD, and Linux use something kinda sorta like this with their buddy allocators. They then pair this with a slab allocator that runs in the process space to make it more efficient and fast.
Another approach might be to use a linked list of free blocks. Each block has a pointer to the next free block, and a size indicating how big it is.
You can then take a block and slice it down to the size you need and adjust the pointers as needed.
A bit more complicated to implement, but not horrible. Might leave your memory really fragmented though; which for the 816 is probably not as much of an issue seeing that you're not trying to keep cache lines filled.