This is a jolly interesting topic, for lots of reasons. My little 8 bit OS (implemented on a 6809) uses the following simple strategy for its dynamic memory allocator.
Memory blocks are in a single-linked list, with a 5 byte header: Next pointer, a Length value, and a Free boolean. The allocator is a simple "first-fit" (a new term for me, but I like it). Once a block is found that is big enough, it is split with the first half being the new allocated block and the second half being a new free block, if there is sufficient space. I've not tried to implement coalescing of free blocks, but it should be fairly straightforward.
I also wrote a routine to return memory stats: total memory, free memory, and the largest chunk of free memory. That routine ended up being nearly a complex as the malloc()!
I was surprised by how well my simple allocator works, this being the first one (in any language) that I've written. I guess "real" allocators are complex due to handling all the corner cases.
One thing I've wondered is wether the allocator would work better if the "second" half of an existing block was made the allocated block, with the front half being the free remainder? This way the list would find a free block at the start of the list.
So a 6502-related question. My allocator makes extensive use of the 16bit index registers in the 6809, with 16bit comparisons and arithmetic. Writing the code was fairly simple thanks mostly to the 16bit registers. Assuming the kind of allocator Arlet described was written for the 6502, where the 16bit ops would have to be hand-coded, how would it perform? How much harder would it be write?
If anyone is interested in my allocator:
https://github.com/aslak3/maxi09os/blob ... memory.asmI'm really interested in seeing others with similar functionality to my own, especially for the 6502 and 65816.