Allocating a New Cluster
Allocating new clusters for data storage is a critical part of the file system, and it's not as simple as it looks. Break this, and you've irrecoverably borked your file system data.
The FAT is the only thing which knows where there is any space on the system, and ideally, you navigate that using a single system variable:
FSI_Nxt_Cluster. That tells you which cluster record to start searching; it seems to hold the address of the last allocated cluster in spite of its name. Another variable,
FSI_Free_Count, is intended to track how many clusters remain unallocated. However, both of those variables should be checked to ensure that they are smaller than the total number of clusters, and if not, assume them to be damaged and must be recalculated.
Those variables live in the FSI sector, which usually follows the VIB, so it is necessary to read that sector to access them (though my software maintains a local copy of
FSI_Nxt_Cluster, read during initialisation, and it must be written back to the FSI after every change).
Strategies
When the disc is newly formatted, the FAT entries (apart from entries 0 and 1) are zero. A zero value means 'this sector is not allocated' (the
position of the FAT entry defines the cluster it's managing; the
value of the entry manages the cluster chain).
So that leads to the highest level, and fastest, allocation strategy:
starting with FSI_Nxt_Cluster , step through the subsequent entries until you find one that is zero. If you get to the end of the FAT sectors, you're out of space, so quit with an error. On a lightly used disk, that first free entry will likely be the next entry in sequence. That's a very fast test, though of course you have to read and write a couple of sectors with the updated data.
Using that strategy, you will use all the available space on the disk - provided that no file is ever deleted. As mentioned earlier, deleting a file does not free its allocated space on the disk; it merely marks the directory entry as 'deleted'. So you can end up deleting a 100MB file and still having no free space.
An external mechanism to recover that space is to copy all the files using another OS and machine. Once you have copied the files, format the disk, and then copy them all back. That will result in a FAT with unallocated entries (zeros) which can now be allocated; the deleted files won't have been transferred back and forth. This is obviously not something one can do in real time. Whether it's worth it or not is a question which only you can answer; if you are creating and deleting a lot of files, the answer is possibly no.
A more nuanced strategy, then, is
to free and then reuse the clusters associated with a deleted file. To maintain the ability of undelete utilities to work, one would wish not to do that until as late as possible; ideally, not until they're actually required. In this strategy, we will use the entire disk, including the space released by deleted files.
But... there's always a but.
There is a huge question of how to find deleted files. A deleted file has all its original metadata with the exception of the first character of its name. We can't just zero out that entry if we reallocate its clusters; any subsequent use of the directory would be broken (later files would be invisible and potentially overwritten). However, we do need to mark it in some way to indicate that we've already re-used its clusters; otherwise, we might end up using them twice and that would not be good. However, recall that a newly created file has a file size of zero, and a cluster pointer pointing to zero. So we can modify the directory entry of a deleted file, once we reallocate, so that it reverts to looking like a file which has been created and deleted before any use. The directory entry remains FAT32 legal and we can identify that it's no longer got allocated clusters.
One other thing to remember is that we don't want to keep writing to the same sectors, if we can afford it. Solid state storage has a limited number of writes to a given section before it stops working, so we don't want to keep reusing the same sectors. CF cards have a greater or lesser amount of built-in wear leveling, mostly depending on price, but even so it would be bad practice to keep using the same sectors over and over.
A strategy to re-use 'deleted' clusters
- Recursively search the root directory and its children for the first deleted file with non-zero filesize and cluster address. If we get to the end of the FAT, we have no free space; quite with an error.
- Set the file length for the directory entry to zero.
- Note the cluster start address. Locate that cluster entry in the FAT.
- Follow the cluster chain in the FAT, setting each entry to zero (unallocated).
- Mark the directory entry cluster link to zero.
With this done, one can use the following allocation strategy:
starting with FSI_Nxt_Cluster , step through the subsequent entries until you find one that is zero. If you get to the end of the FAT sectors, perform a search for a deleted file that can be re-used as above. If you have been able to reallocate a cluster, reset FSI_Nxt_Cluster to the start of the FAT and search as previously.
Note that these strategies have not dealt with tracking available clusters in FSI_Free_Count ; this must also be done when clusters are deallocated and reallocated.
A Simpler Approach?
The strategies discovered above both have advantages and disadvantages. The first, simple approach is very fast, but does not allow all the disk to be used. The more complex second approach is much more complex and so error prone, but does allow the full use of the disk. It will also be susceptible to slower and slower searches as the disk fills, when it may be necessary eventually to search the entire FAT.
A third approach sacrifices the ability to use undelete utilities. Instead of simply marking deleted files as deleted, we reallocate the memory at delete time (marking the directory entry with zero for file size and cluster pointers). We don't change
FSI_Nxt_Cluster so we keep writing new files further along the FAT, but we do update
FSI_Free_Count.
What we're doing with this hybrid approach is making holes in the FAT as we delete files, but we continue to add new
clusters sequentially. So when we get to the end of the FAT, we can start at the beginning and look for the holes that deletion will have left (if we've deleted anything. If not, time to buy a bigger disk!)
This approach means that we know which clusters can be reallocated at delete time, so we don't need to do a deep search to find them at an inconvenient later time. The delete function takes a little longer, but allocating a new cluster is much faster.
Any thoughts? I think I like the third approach best, though for testing I might just start with the first. The second I think is how MS do it (and Linux) and it looks like a nightmare for an eight-bit machine with limited space for disk buffers.
Neil