Joined: Thu May 28, 2009 9:46 pm Posts: 8485 Location: Midwestern USA
|
I've been very busy on other (revenue producing) things, but this filesystem thingie is still stewing somewhere in my cerebrum. I've actually had a little time to work out some algorithms for low-level functions that would be needed in a practical filesystem implementation. One of the more difficult functions is that of translating a filename into an inode number so the file can be accessed. Such a task is complicated by the fact that in a hierarchical filesystem the filename may be just a simple filename, a relative pathname or an absolute pathname:
Code: simple filename → inventory.idx relative pathname → data/inventory.idx absolute pathname → /bcs01/data/inventory.idx UNIX and Linux documentation that describes the kernel APIs that access the filesystem (e.g., the open() system call) refers to a filename as a pathname, even though it may not include a directory reference. I tend to use the word filename when talking about a file and pathname when talking about the sequence of directories that must be traversed to get to the filename.
In my hybrid S51K filesystem that I am developing, a filename is a character string that is cataloged into a directory (itself a file that is managed by the kernel), along with an inode number. A directory could have hundreds or thousands of filename entries, each with a corresponding non-zero inode number—an inode number of zero indicates a deleted file. The inode number is used to fetch the file's inode from the disk so the file can be accessed. In reality, inodes are the files, as they contain all of the administrative data needed to locate the file's data blocks on the disk, as well as other information indicating who owns the file, read-write permissions, time of last access, etc.
In addition to the filename entries that have been inserted by creating new files, a directory will have two special entries: . (a period, pronounced "dot") and .. (two consecutive periods, pronounced "dot-dot"). The . entry refers to the directory itself and the .. entry refers to the directory's parent. For example, if . refers to /bcs01/data then .. will refer to /bcs01. Within /bcs01, .. will refer to /, which is the root directory. As with other active filenames in the directory, associated with . and .. are non-zero inode numbers. The presence of these special filenames allows the kernel to find its location in the filesystem without actually knowing directory names.
Before continuing, and for the benefit of anyone reading this who is not familiar with the UNIX operating system (or Linux—the two are 95 percent functionally identical), I should mention something called the u-area (as in user area). The u-area is a contiguous section of RAM in the executing process' workspace that the kernel manages. The u-area will contain, among other things, a table listing the files that the process has opened, and will also contain two inode numbers, one corresponding to the process' notion of where the root of the filesystem begins (it doesn't have to be the actual root, as it can be changed with the chroot command), and the other being the inode number of the current (working) directory. The shell command pwd (print working directory) takes the working directory inode number and convert it into a human-readable absolute path starting at the root directory, producing, for example, /bcs01/data. Storing the root and working directory information as inode numbers keeps RAM consumption to a minimum—only four bytes are needed for this information, plus two additional byte to indicate to which filesystem each inode belongs. The alternative, storing this information as actual directory names, would consume a lot of static space, as the directory fields would have to be large enough to store the longest possible pathname (typically 512 bytes).
Getting back to the problem of translating a filename to an inode number, the above filename examples indicate that there are three possible scenarios to consider. The first one is the simplest: it refers to a file in the current working directory. If the working directory is /bcs01/data then it is inferred that reference to the simple filename inventory.idx really means /bcs01/data/inventory.idx. Let's suppose I want to open that file. I have a macro that expands to the open file assembly language mumbo-jumbo required by the kernel—the details of what the macro expands into are of no concern at this time:
Code: open "inventory.idx",o_read When the kernel processes an open request it calls upon the function namei to perform the task of converting any pathname into a filesystem number and corresponding inode. If namei succeeds, the matching inode will be loaded into workspace and locked pending further processing. A flowchart that summarizes what namei does to search for a filename is presented later on.
The core of namei's search is a parsing and comparison process, in which the desired filename character string is compared to each entry in the directory. As the above example is a simple filename, no virgules (/) are present and namei knows that the current working directory is to be searched, and that the filename delimiter will be a null ($00). As read into workspace, the desired directory entry would appear thusly in a machine language monitor memory dump:
Code: 16 3F 69 6E 76 65 6E 74 6F 72 79 2E 69 64 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00:inventory.idx.................. In my hybrid S51K filesystem, the maximum filename length is atomically defined as 30 bytes and an inode is atomically defined to be an unsigned 16 bit integer. Therefore, each "slot" in a directory is exactly 32 bytes in length, and as the filesystem uses 1KB logical disk blocks, exactly 32 directory slots can fit into a single block. Hence the above workspace is 33 bytes in length so that a full length filename can be terminated with a null for easier searching.
In the above example, 16 3F would be the inode number in little-endian format ($3F16), and 69 6E 76 65 6E 74 6F 72 79 2E 69 64 78 would be the filename (inventory.idx)—the kernel pads short filenames with nulls to achieve consistent lengths, 17 nulls in this example.
Since the kernel knows that it can confine its search to the working directory, the search conceptually devolves to a two step affair:
- A byte-for-byte filename comparison is made to each directory entry with a non-zero inode number. A match is deemed to have occurred if all non-null characters in the filename field of the directory entry match the desired filename.
- If a match didn't occur and more filenames remain in the directory repeat step 1 for the next filename. Otherwise, return a "file not found" error.
The search ends when the filename has been located or the full extent of the directory has been read. As noted, the search process skips directory entries whose inode number is zero. Searching is not as disk-intensive as it may seem, as the kernel caches disk blocks into buffers and then works on the buffered block image.
More complicated is the use of a partial or full (absolute) pathname, e.g., /bcs01/data/inventory.idx. If the first character in a pathname is / then the kernel starts at the root of the filesystem hierarchy, as defined by the root directory inode number in the process' u-area, which may or may not be the system's actual root directory. Otherwise, the search commences at the current location in the filesystem hierarchy, as defined by the current directory inode number in the process' u-area. The process becomes an iterative one, as depicted by the following flow chart, which shows a search commencing at the root directory:
Attachment:
File comment: namei algorithm
flowchart_filename_search.gif [ 127.8 KiB | Viewed 1521 times ]
The above algorithm, with a minor change, will work for all three search cases. If the search involves a partial pathname or a simple filename, the current working directory is initially set as the next pathname component in the first step of the algorithm. The rest of the search proceeds as shown.
During namei's search, intermediate pathname components (e.g., bcs01/ and data/) are expected to be directories (the trailing / indicates that this is the case), so not only must a name character string match occur, the matched entry's file type must be a directory, which is determined by loading its inode and examining the file type bit field. If the entry is not a directory then the search immediately terminates and a "file not found" error is returned to the caller. Otherwise, the inode becomes the next pathname component and the search reiterates.
It is clear from studying the above that namei has to scan each pathname component to see if it is terminated by /, as that is how namei "knows" that the path to the desired filename traverses yet another directory.* Furthermore, if namei determines that it has reached the end of the pathname string (determined when the terminating null is encountered) it must ignore the file type of that last pathname component, as that component could be a file or yet another directory (it will be up to the higher level function that instigated the search to decide if the file type is appropriate). Not shown are the steps that must be executed to actually read the directory contents from the disk—various kernel primitives take care of that, or the functions that parse the pathname into individual components, e.g., bcs00/, data/ and inventory.idx.
Aside from the mechanics of searching are those of security. My namei algorithm doesn't show the procedures where the kernel determines if the user has access rights to any component in the pathname. So part of the loop would be checks to examine the ownership and permissions of each pathname component and compare them to the user's. These checks are actually fairly simple, as every user and user group is identified by a unique word-sized (16 bit) number, and permissions are defined in a word-sized bit field. If the user doesn't have permission then the search will abort and return a "permission denied" error.
*Obviously, / can never be allowed in a file or directory name. If I attempt to create a file with the name data/customers.idx, the kernel will create the file customers.idx in the data/ subdirectory, assuming data/ exists—a "can't create" error would occur if data/ doesn't already exist.
_________________ x86? We ain't got no x86. We don't NEED no stinking x86!
Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:10 am, edited 1 time in total.
|
|