Allocating a new cluster
(with apologies for the delay; I've been rebuilding a NAS that let the magic smoke out of its controller. I now know a lot more about NASes than I did a couple of weeks ago!)
Allocating a new cluster is one of those fundamental file operations that ought to be simple, easy, and straightforward. Guess what... it's not.
Let's recap the FAT32 file system.
- The FAT table consists of uint32_t pointers, so four bytes each, in sequential order in the FAT sectors.
- There is one FAT entry per cluster, whether it's allocated or not.
- There is a second copy of the FAT, immediately following the first (usually, and in my disks this is so; the parameters can tell you more about this and really you should check them)
- The position of the entry in the fat tells which cluster it refers to; the value of the entry is either a pointer to the next cluster in the chain, a magic number to say it's the end of the chain, or a zero to indicate that the cluster is not allocated.
- Every file except the root directory has a directory entry, either in the root directory or in a child of that directory. That directory includes a pointer to the first cluster allocated to it (but no others).
- When a file is created, no cluster is allocated to it and its allocated cluster number is also zero.
- When a file is deleted, I deallocate its clusters at the same time - so the process of creating and deleting files leads to a FAT with holes in it; new clusters are initially allocated sequentially in the table while deleted files occur behind that (at lower entry positions).
- The first sector in the partition - the VIB (Volume Information Block) contains most of the data you need to locate the various structures within the partition including the position of the FSI sector (File System Info) which holds two critical variables: FSI_Next_Cluster and FSI_Free_Count which must be updated after every filesystem write or erasure. The FSI is usually the sector following the VIB (it is on my disks).
(To avoid too many reads and writes to the disk, keep these two variables in memory).
- FSI_Next_Cluster appears to hold the last cluster allocated, not the next free space. It's where MS say we should start looking for new space. Actually telling us where it was would be too easy, right?
Bear in mind a couple of points: I am trying to do all the file manipulation (other than file open, read, and write in a program) through a single sector buffer;
transient. I am trying to keep the number of 4-byte variables to a minimum. I am also trying to avoid repeated writes to the disk. As a final complication, the 65c02 isn't really suited to the amount of 32-bit arithmetic involved. It's not that it
can't, but it's not small and tidy. One irritating thing is that the perfectly reasonable instructions
inc (ptr),y and
dec (ptr),y don't exist, which complicates adding
one.
Better to allocate than never...
This is the procedure you must do to allocate a new cluster. Note that it doesn't matter why you're allocating it, the procedure is always the same. But in some cases, you may already have something in
transient, so keep that in mind.
- Start by reading the value of FSI_Next_Cluster. This is the cluster number where we start looking.
- From that cluster number, obtain the FAT record containing its record; the FAT sector is FSI_Next_Cluster >> 7 and the record number is FSI_Next_Cluster & 0x7f
- Examine that 32-bit record value. If it's zero, we're ready to update things. If not...
- Increment the record counter to the next record, and go back to [3]. If incrementing the record counter takes us beyond $128, then we reached the end of the sector and we need to get the next sector into transient and try again.
- If we get to the end of the FAT without finding a record with value zero, we need to remember that there may be space further back in the FAT, before our starting point. We want to start with the oldest of those, the earliest, so we need to go back to the beginning of the FAT and start looking again. But we must be sure that we only do that once, or we're stuck in a loop...
- If by this time, we have not found a zero record, then the disk is full and we cannot allocate more space. Return an error code.
- Otherwise, we have found somewhere to allocate. There's an amount of housekeeping necessary to do before we can return the cluster number (which we calculate from the current FAT sector and sector record values).
- Change the value in the FAT record to indicate an end-of-chain marker
- Write the FAT record back to the disk
- Write the FAT record back to the second FAT
- Decrement FSI_Free_Count in memory
- Set FSI_Next_Cluster to the new cluster number
- Write the FSI back to the disk
Point 5 is interesting. Initially, I thought that I would have to compare the FAT sector LBA with the start of the second FAT sector (which I don't have precalculated) and also use some sort of flag to indicate that I was on the second search for earlier deallocated space. But it occurs to me that I _do_ have a precalculated value for the start of the clusters:
cluster_begin_lba and that the second FAT is an accurate copy of the first (we haven't updated anything at this point). The clusters, of course, follow directly after the second FAT.
That means that it is only necessary to start at
FSI_Next_Cluster in the first fat, and search for zero records until either we find one, or we get to
cluster_begin_lba; a much simpler process. So the sixth or seventh rewrite is on its way...
When do we need this?
Two main occasions: when we make a child directory, we need to allocate a cluster to hold that directory; and when we write to an existing file - either one which has just been created, and so has no allocated cluster, or one where we're writing past the end of an existing cluster.
We do need to bear in mind that this allocation routing comprehensively demolishes anything that might be in
transient. If we're building a file through a file pointer, we will have a dedicated sector buffer for that file, so that might not be a great issue. However, when we are creating a new directory, we will have to manage - in
transient - the sector from the parent's directory containing the reference we will have just inserted to the new directory and the sector from the new directory containing links to the new . and .. directories.
Buckle up. It could get exciting.
Neil