WDC816CC and farmalloc

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Post Reply
User avatar
AndrewP
Posts: 368
Joined: 30 Aug 2021
Location: South Africa

WDC816CC and farmalloc

Post by AndrewP »

I am struggling to get farmalloc to work using WDC's linker and C compiler and I suspect it's more of a linker problem than a farmalloc problem.

I have the 'C' code:

Code: Select all

void* far_heap_start = (void*)0x12f111;
void* far_heap_end = (void*)0x1fffff;

void main(void)
{
	void*				pvBackground;
	
	for (;;)
	{
		pvBackground = farmalloc(100);
		...
	}
Compiled and linked using:

Code: Select all

wdc816cc.exe  -bs -DUSING_816 -ml -wr -wu -so -sp -lt -pb -pp -px -o Main.obj Main.c
wdcln.exe -C10000 -D20000 -g -t -sz -HIE -obin\Video.hex Main.obj -lcl -lml
farmalloc initially returns $00000000 (a zero value 32bit pointer) which doesn't make sense as I would expect the first return to be somewhere above $0012f111. As it loops it allocates out successive pointers for a little less than 64KB before executing a BRK instruction. Which also doesn't make sense as I would expect it to be able to allocate about 850KB before it can't any more.

The WDC documentation indicates:
Quote:

Code: Select all

void *heap_start = (void * )0x28000, *heap_end = (void * )0x48000;
The example allocates 128K
So I would expect to be able to allocate out more than 64KB.

The sharp eyed among you will have already noticed that heap_start is not the same as far_heap_start however if I use heap_start and heap_end then I get the following linker error:

Code: Select all

Undefined symbol: ~~far_heap_end
Undefined symbol: ~~far_heap_start
And at this point I should mention I am compiling for the Large memory model (using -ml) and linking against the Large standard and math libraries (-lcl -lml).
~~far_heap_end and ~~far_heap_start are definitely referenced int the cl.lib file. So I think the WDC documentation is incorrect in their example.

If I look at the generated main.lst file I can see that ~~far_heap_start and ~~far_heap_end are correctly externally defined

Code: Select all

   701                        ;void* far_heap_start = (void*)0x12f111;
   702                        	data
   703                        	xdef	~~far_heap_start
   704                        ~~far_heap_start:
   705 00:0000: 11 F1 12 00  	dl	$12F111
   706 00:0004:              	ends
   707                        ;void* far_heap_end = (void*)0x1fffff;
   708                        	data
   709                        	xdef	~~far_heap_end
   710                        ~~far_heap_end:
   711 00:0004: FF FF 1F 00  	dl	$1FFFFF
   712 00:0008:              	ends
and the linker seems to link them into cl.lib.

Has anyone managed to use farmalloc before (I think so) and if so please could you post how you did it! I don't know if I'm an idiot or if farmalloc really is broken.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: WDC816CC and farmalloc

Post by BigDumbDinosaur »

AndrewP wrote:
I am struggling to get farmalloc to work...Has anyone managed to use farmalloc before (I think so)...

The referenced topic concerned a malloc() implementation Marco Granati was working on for the 65C816.  His design apparently was a clean-sheet implementation with no relationship to the WDC compiler.  From what I was able to understand about it, the heap was limited to 64KB and could not span banks.

Speaking of Marco, he abruptly stopped visiting the forum in January of 2021 after having been fairly active for a number of years.  His native Italy was hit especially hard by COVID...
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
AndrewP
Posts: 368
Joined: 30 Aug 2021
Location: South Africa

Re: WDC816CC and farmalloc

Post by AndrewP »

I really hope Marco is well and it's just the usual life obligations that have kept him off this forum; unfortunately his subversion repository hasn't been updated in a number of years too.
BigDumbDinosaur wrote:
a malloc() implementation Marco Granati was working on for the 65C816.
Thanks for pointing out it wasn't the WDC malloc Marco was using, I hadn't noticed that. The more I think about it the more it appears I must be getting something in the linker process wrong. I'll report back here if I solve it.
leepivonka
Posts: 167
Joined: 15 Apr 2016

Re: WDC816CC and farmalloc

Post by leepivonka »

I'm not familiar with the WDC package, so this is mostly a guess.

Is it expecting

Code: Select all

far_heap_start = $12f111
far_heap_end = $1fffff
instead of

Code: Select all

far_heap_start: dl $12f111
far_heap_end: dl $1fffff
User avatar
AndrewP
Posts: 368
Joined: 30 Aug 2021
Location: South Africa

Re: WDC816CC and farmalloc

Post by AndrewP »

leepivonka wrote:
Is it expecting

Code: Select all

far_heap_start = $12f111
far_heap_end = $1fffff
instead of

Code: Select all

far_heap_start: dl $12f111
far_heap_end: dl $1fffff
Thanks but unfortunately I don't have control over that, that's assembly emitted by WDC's C compiler.

I still haven't found a solution to this (well I have, I just wrote my own) but it's looking more and more like the cl.lib is incomplete.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: WDC816CC and farmalloc

Post by BigDumbDinosaur »

AndrewP wrote:
I still haven't found a solution to this (well I have, I just wrote my own) but it's looking more and more like the cl.lib is incomplete.

I suspect Marco Granati had run into something like this, as I know he was using the WDC development tools with his projects.  Perhaps that is what prompted him to conjure a new-and-improved malloc().
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
Yuri
Posts: 371
Joined: 28 Feb 2023
Location: Texas

Re: WDC816CC and farmalloc

Post by Yuri »

Strange, does WDC not provide the source code to their libraries so you can debug this sort of thing?

Rolling your own malloc() for a single process running on the 816, wouldn't be terribly difficult.

The simplest implementation I know of would be a simple bump allocator. You just have one pointer that points to the next "free" point in memory, and you increment it by the length you wish to allocate.

Code: Select all

void *next_pointer = (void)0x0012F11; // Or where ever you want to start from

void *malloc(size_t size)
{
    void *rval = next_pointer;

    if ((next_pointer + size) > far_heap_end)
        return NULL;

    next_pointer += size; // Set up for next allocation
    return rval;   
}
The disadvantage to this approach is that you can't really "free" memory after you've allocated it. I've used this method for when I want to do some quick simple OS development though and don't want to write out a full memory manager.


There are several different ways you can go about making a more complex allocator of course. Really depends on what you're looking to do though.

For the small memory area of the 816, I'd probably do it one of two ways:

1) Use a simple bitmap, and divide the memory into chunks. (E.g. a single page, 4KiB on the Intel platform)
This would be about 512 bytes of memory to track this.

Each bit indicates if the that chunk is free (0) or allocated (1)

Nice thing is you can skip over several chunks at once just checking for MAX_INT. You never allocate less than one chunk.

When you release memory, you flip a bit back to 0 in your bitmap.

This is pretty fast, and easy to implement, but can leave lots of large gaps in your memory if the structures of your allocations are frequently small.

FreeBSD, and Linux use something kinda sorta like this with their buddy allocators. They then pair this with a slab allocator that runs in the process space to make it more efficient and fast.


Another approach might be to use a linked list of free blocks. Each block has a pointer to the next free block, and a size indicating how big it is.

You can then take a block and slice it down to the size you need and adjust the pointers as needed.

A bit more complicated to implement, but not horrible. Might leave your memory really fragmented though; which for the 816 is probably not as much of an issue seeing that you're not trying to keep cache lines filled.
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: WDC816CC and farmalloc

Post by drogon »

Yuri wrote:
Strange, does WDC not provide the source code to their libraries so you can debug this sort of thing?
Do they still make money selling it? If so then that's maybe why want to keep the recipes secret... Saying that, I'd suggest it's more than time to open source it all. The code improvements may well be worth it. However, I also suspect they may have outsourced the work elsewhere, so maybe there are other issues there. (Licensing, etc.)
Quote:
Rolling your own malloc() for a single process running on the 816, wouldn't be terribly difficult.
Memory allocators are a very well (and often) "solved" problem. Sometimes though it's a case of pick the least worst performing suitable for your needs...

Part of the issue with the '816 is the 64K banks of RAM, so catering for a heap that's > 64KB and an allocator to deal with chunks > 64KB is the challenge - but if the C compiler correctly gives you far/wide pointers and generates the code to manage it, it ought to be fairly straightforward.

When I was looking for an allocator, I chose the classic/standard "first fit" algorithm and initially wrote one in Sweet16 for my 65C02 system, but when I went to the '816 and wanted BCPL, I re-wrote it in BCPL.

It's a linked list and I keep a pointer to the end of the last chunk allocated, so allocating the next chunk should be fast - that fails when you run out of heap, then you re-start the scan from the start, looking for free spaces big enough and then split them into 2 chunks - one the size you want, the other marked as free. You keep doing this until you don't have a free chunk big enough, to split then you do a merge, trying to merge adjacent free chunks into bigger chunks. Lather rinse, repeat.

So in-theory, fragmentation can cause you to run out of memory, even when you have lots of free chunks. Other than deliberate stress testing, I've never had that happen, even when running my system for weeks at a time doing edits, compiles, tests and so on. I have 512KB of RAM in my '816 system.

Another thing I do is implement "guard" words in the allocated chunks - these are checked when I free a chunk to make sure they've not been overwritten. This has caught few program bugs and were worth the effort.

All this in BCPL which compiles to a bytecode which is interpreted at an effective instruction rate of somewhere between 100 to 300K instructions/sec. on my 16Mhz '816 but performance is more than adequate.

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: WDC816CC and farmalloc

Post by BigDumbDinosaur »

drogon wrote:
Yuri wrote:
Strange, does WDC not provide the source code to their libraries so you can debug this sort of thing?
Do they still make money selling it?

I seem to recall the software is free for the downloading.

Quote:
Part of the issue with the '816 is the 64K banks of RAM, so catering for a heap that's > 64KB and an allocator to deal with chunks > 64KB is the challenge - but if the C compiler correctly gives you far/wide pointers and generates the code to manage it, it ought to be fairly straightforward.

Banks shouldn’t be an issue with data fetches and stores, as RAM can be treated as linear space by use of 24-bit pointers.¹  The real issue is in protecting one process’ heap from encroachment by another process.  That would, I’d think, be outside the purview of a memory allocator.

————————————————————
¹In a practical application, use of 32-bit pointers will result in faster-acting code, since register size changes during pointer manipulation can be largely avoided.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: WDC816CC and farmalloc

Post by drogon »

BigDumbDinosaur wrote:
drogon wrote:
Yuri wrote:
Strange, does WDC not provide the source code to their libraries so you can debug this sort of thing?
Do they still make money selling it?
I seem to recall the software is free for the downloading.
Quote:
Part of the issue with the '816 is the 64K banks of RAM, so catering for a heap that's > 64KB and an allocator to deal with chunks > 64KB is the challenge - but if the C compiler correctly gives you far/wide pointers and generates the code to manage it, it ought to be fairly straightforward.
Banks shouldn’t be an issue with data fetches and stores, as RAM can be treated as linear space by use of 24-bit pointers.¹  The real issue is in protecting one process’ heap from encroachment by another process.  That would, I’d think, be outside the purview of a memory allocator.
Lets count the number of multi-tasking OSs for the '816 that are workable right now... I count 1. Mine. Guard words at the start and end of each chunk allocation do work well.

But even in systems like Unix, a process can spawn threads that see the same memory, so the problem doesn't do away there.

This is from the header of my BCPL getvec:

Code: Select all

 *      We have a heap which we split into chunks.
 *      Chunk format:
 *
 *            31 30-23 22     0
 *      +-------+-----+--------+
 *      | InUse | PID | Length |
 *      +-------+-----+--------+
 *      | Guard:      FEEDBEEF |
 *      +----------------------+
 *      | .....................|
 *      |      Data space      |
 *      | .....................|
 *      +----------------------+
 *      | Guard:      BEADCAFE |
 *      +----------------------+
 *
 * Each chunk has 3 words extra and the 23-bit length parameter includes this.
 *      InUse is a flag : top-bit -  with zero for free, top-bit set for in-use
 *      with the other 7 bits being used for the Imp PID.
 *
 *      The guard words are there to help detect heap/chunk corruption at free
 *      or merge times.
What you get back from the call is a pointer to the word after the first guard word.
Quote:
————————————————————
¹In a practical application, use of 32-bit pointers will result in faster-acting code, since register size changes during pointer manipulation can be largely avoided.
Which is exactly what my BCPL system does - everything is a 32-bit word. The trade-off is speed, but the advantages of having just one data type oughtweigh that for me.

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
User avatar
Yuri
Posts: 371
Joined: 28 Feb 2023
Location: Texas

Re: WDC816CC and farmalloc

Post by Yuri »

drogon wrote:
Yuri wrote:
Strange, does WDC not provide the source code to their libraries so you can debug this sort of thing?
Do they still make money selling it? If so then that's maybe why want to keep the recipes secret... Saying that, I'd suggest it's more than time to open source it all. The code improvements may well be worth it. However, I also suspect they may have outsourced the work elsewhere, so maybe there are other issues there. (Licensing, etc.)
Quote:
Rolling your own malloc() for a single process running on the 816, wouldn't be terribly difficult.
Many professional software tool sets provide the source code so you can debug issues that come up. Microsoft has provided the code for their standard C library for quite some time, as did Borland, Watcom and many others back in the day as I recall.
Post Reply