I have written filing systems for various systems in the past - it's never been my main thing, but has been needed for stuff I've worked on - mostly experimental/prototype that never saw the light of day, and the one on my Ruby boards which is somewhat full-featured but written in C... however...
So something simple - where a cassette tape is usable, so one file per SD card (wasteful of resources) to full-blown like a classic Unix-style filesystem with superblocks, inodes and whatnot (huge software complexity, especially in ASM) what does that leave us with ...
So how about this in a thinking out loud sort of way...
In a typical operating system there is a memory allocator. C users will be familiar with the malloc() function for example. My BCPL system has something similar called getvec(). (And I wrote one in Sweet16 a while back too for another project that didn't get too far)...
Behind the system call is the actual allocator - ranging from utterly trivial to hugely complex - the more RAM you have and the more CPU cycles you have at your disposal, the more complex algorithm you can use which maximises available RAM and more efficiently merges free blocks and so on. There is no fragmentation in that if you allocate a block of memory then that's it. If you want it bigger you need to allocate a bigger block and copy the data over (see the realloc() system call in Unix)
The simplest (usable!) allocator is called "First Fit" and I'm thinking this would work well for SD cards (and other media like an SPI EEPROM where seek and access times are not a factor)
(It can also work on spinning rust, just might be slower)
So how to use this on an SD card as a filing system?
Carrying on thinking out loud here...
You "format" the disc (and I'll use the term 'disc' here because...) as one big free chunk. There are markers at the start and end of each chunk as well as a pointer to the length. My BCPL implementation looks like:
Code:
* 31 30-23 22 0
* +-------+-----+--------+
* | InUse | PID | Length |
* +-------+-----+--------+
* | Guard: FEEDBEEF |
* +----------------------+
* | .....................|
* | Data space |
* | .....................|
* +----------------------+
* | Guard: BEADCAFE |
* +----------------------+
*
* Each chunk has 3 words extra and the 23-bit length parameter includes this.
* InUse is a flag : top-bit - with zero for free, top-bit set for in-use
* with the other 7 bits being reserved (for a PID in a multi-tasking
* system, if it ever sees the light of day)
Then, as part of the format procedure, you allocate a chunk big enough to hold filename and pointers to each allocated file. so if you though you might want 100 files of 31 character length then you'll need 32 bytes per filename and 4 bytes per pointer, lets say 4 bytes for a timestamp (although the world has moved to a 64-bit time_t) and 4 bytes for 'attributes' (locked, text/binary/owner/etc.) then you need 40 bytes per entry or 400 bytes.
If you ignore the "PID" field then you have 31-bits for the chunk size, that's 2GB, so that limits the size of the device (or partition) to 2GB, as well as the maximum file size to 2GB.
Personally, I feel that's more than enough for our little 8-bit systems (and even 16-bit 816's - my Ruby filing system can only handle 32MB per partition and I feel that's more than I'll ever need in the lifetime of this project)
The 'guard' words are there to protect, or at least warn of chunk overwrite, so a consistency check can quickly tell if it's corrupt or not...
In my BCPL OS, I perform a heap merge (ie concatenate adjacent free chunks) operation at the end of every program execution - some implementations will do the merge operation at every free() function. I keep a pointer to the end of the last chunk allocated to speed up searches, but if/when it doesn't find a free chunk, then I reset the point to the start and traverse the list until it finds a free chunk large enough. Chunks that are large enough get split into smaller chunks.
The upshot of that in a filing system is that you would need to pre-allocate the file size in advance if you wanted to sequentially write a file - otherwise you'll run out of space. My guess is that almost all operations will be simple "load a file" or "store a file" to and from RAM, however if you create a large enough file in the first place, then the usual read/write/seek/append type operations would be possible within the length of that pre-allocated file.
Writing a "compact" type routine should be fairly straightforward, should the system become too fragmented, but I've found that I can run my BCPL OS for days/weeks compiling, running, testing code and never have issues with running out of RAM or not being able to find a free chunk of RAM big enough to use.
Is it prone to corruption? Yes. This does happen when I have an errant program that scribbles all over RAM, the guard words are checked at free() and merge() times - in a filing system? I feel overrunning the end of a chunk should never happen as the underlying code should be checking the bytes written into that chunk vs. the actual size.
Other issues? It may not actually be possible to know how much data is actually written. ie. how big a file is? For a whole-file (e.g. a BASIC program) save operation this won't be an issue as the chunk will be an exact size, but creating a big file for random access? Hard to know.
Then there will be file data vs. the underlying block nature of the device - SD cards are block devices with a fixed block size of 512 bytes, so using this scheme you'll need some sort of way to translate a 32-bit word offset to a block number plus offset - that's not hard though and at one level you can say it actually maximizes usage, but on the other hard try buying an SD card < 8GB now...
In practical terms? There is the read/modify/write operation needed when a chunk splits a block boundary - this is perhaps sub-optimal, so the easy thing is to round UP all file allocations so that they fit inside an exact number of media blocks (which for SD card is 512 bytes) Then I'm thinking I'd add an extra word into the header to say the exact size - when known - of the saved data..
So you type in your little BASIC program and hit SAVE "work of art" and ... The BASIC builds a block of data - pointer to a filename, start in RAM, end in RAM (or length) and calls the operating system (This is how OSFILE works in Acorn and my Ruby OS), the OS searches for a free chunk big enough, rounded up to an even size for the media, the header now has an extra word with the real size, as well as the whole chunk size, writes the data, updates the directory and that's that. (And the search might just be a matter of returning the current 'end' pointer without doing the list traverse, so very fast. At boot time (and file delete and merge time), you search the disc to find the last chunk but that also ought to be relatively fast, then you know where you are
And so on.
This really ought to be easy to code up in 6502/816 ASM - I'd hope in under 4K plus the underlying block read/write code and off you go...
I might actually give it a go, but in BCPL as a bit of a proof of concept as I really do not think I'll be writing any more '816 code now that I can do everything in BCPL
Cheers,
-Gordon