6502.org • View topic - DESIGNING A NEW FILESYSTEM

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

DESIGNING A NEW FILESYSTEM

Page 1 of 6

[ 89 posts ]

Go to page 1, 2, 3, 4, 5, 6 Next

Previous topic | Next topic

Author

Message

BigDumbDinosaur

Post subject: DESIGNING A NEW FILESYSTEM

Posted: Sat Aug 04, 2012 6:51 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

Over in my series of POC V1 ramblings, I've scribed a bit about a filesystem that I am trying to implement. Designing a reasonably robust filesystem that will work with 65C816 hardware can be a challenge, but certainly is possible. Several more days of thinking (always a dangerous activity for me) have gotten me ever closer to defining what it is I want to achieve.

BigDumbDinosaur wrote:

As I'm somewhat (!) partial to UNIX-like operating systems...I'm going to base my filesystem on the venerable S51K filesystem that native to System V.

One of the first challenges is what to call this filesystem. It will be S51K-like in many ways (inodes, directories are files like everything else, etc.) but not all ways. So I can't call it S51K. Also, I don't want to have "816" in the name, since a filesystem structure is (should be) portable to any system capable of supporting it. Hmm... Anyone got a good name for a filesystem that works on a 65C816-powered computer but could be run on something else?

Anyhow, I started out by listing the salient features of this filesystem, aside from the basic one of being able to keep data organized:

Maximum filesystem size of 64 GB.
Maximum file size of 4 GB.
Maximum of 65,535 files per filesystem.
Support for 30 character filenames.
Support for both hard and symbolic links.

The S51K filesystem supports 24 bit block addresses for files, which meant a file can theoretically be as large as 16.7 GB. In practice, that size is unattainable because the bytes-in-file field in the inode is a 32 bit field, limiting the real file size to 4 GB. Hence my file size limit.

The S51K filesystem gets its name from the fact that it was first introduced in the Bell Labs UNIX System V operating system and that it uses a 1 KB logical block structure. On the DEC PDP-11 hardware (where much UNIX development took place), a physical disk block is 512 bytes in size. However, on the DEC VAX hardware, a disk block is 1024 bytes in size. In order to make the filesystem portable between different machines, the decision was made to standardize on a 1024 byte block size. The low level disk driver in the UNIX kernel took care of this detail by being specific to the hardware on which it was being run—it was one of the few areas in the kernel that was written in assembly language. As it turned out, the 1 KB block size produced better efficiency in the filesystem kernel code as compared to the same code using 512 byte blocks.

Structurally, the S51K filesystem is comprised of a superblock (supervisor block, the first block in the filesystem), an inode array, a data block array, and allocation maps to control usage of the inode and data block arrays. Each file has an inode that describes the file's size, access times, locations of data blocks, type and permissions. The matching directory entry consists of the inode number (a 16 bit quantity) paired with a filename of 1 to 14 characters. Hard links to files could be created, as long as they pointed to a file in the same filesystem. No support was present for symbolic links (symlinks) or links to directories.

Directories are like ordinary files in that they take up space in the data block array, are referred to by an inode and (in the case of subdirectories) an entry in a directory (the parent directory). Like any other file, a directory can be opened and read, which is how UNIX utilities such as ls and find are able to do what they do (the UNIX system function fstat is able to retrieve data from an inode when given an open file descriptor). Only the kernel is permitted to write to a directory.

The inode and data block allocation maps in the S51K filesystem are a set of linked lists. Linked lists made sense on the relatively slow DEC PDP-11 hardware where S51K first came to life. Also, disks of the time were very small by today's standards, so a lot was done to keep filesystem overhead from using too much precious space. However, linked lists become a bottleneck on large filesystems and often lead to severe fragmentation if frequent file creation and deletion occurs. These limitations were addressed by the S51K's descendant, the Acer Fast Filesystem (AFS), which replaced linked lists with bitmaps. The Extended Acer Fast Filesystem (EAFS) added support for much longer filenames via a directory kludge.

My filesystem will incorporate/borrow/steal internal features from both S51K and AFS, as well as have some features that I conjured. Here's a pictorial representation of my filesystem:

Code:

          NEW FILESYSTEM LAYOUT        Logical Block Offset (LBO)
   +=================================+ ==================================
   |                                 | starting LBO depends on the number
   |         data block array        | of data blocks in filesystem
   +—————————————————————————————————+ $??????
   |      block allocation map       |
   +—————————————————————————————————+ $00040a
   | block allocation map tag block  |
   +—————————————————————————————————+ $000409
   |           inode array           |
   +—————————————————————————————————+ $000009
   |      inode allocation map       |
   +—————————————————————————————————+ $000001
   |           superblock            |
   +=================================+ $000000

Internally, the following are features that I plan to incorporate:

1024 byte logical block size, achieved by pairing contiguous 512 bye disk blocks.
Logical block offset (LBO) addressing. Everything in the filesystem will be addressed with a number that is a 32 bit offset relative to the superblock. This feature makes the filesystem portable to any hardware that has a kernel capable of accessing the filesystem. it also simplifies the arithmetic operations required to address blocks and compute filesystem geometry values.
Inodes structured like the S51K model but with an expanded data block address table for faster access. The S51K inode had a table with 10 slots for direct block addresses and three slots for indirect addresses, with all slots being 24 bits wide. I am going to increase the direct block address slots to 16, which means that data in the first 16 KB of any file can usually be read or written with a single block access. The indirect address structure of S51K will be retained. Table slot width will be increased to 32 bits to be compatible with the 64 GB filesystem size limit. The S51K's 24 bit address table slot size limits the maximum block address within the filesystem to 16,777,215, which would prevent full usage of a larger filesystem.
16 bit inode numbers, which is what sets the 65,535 maximum files limit. Once I have built POC V2, which will have much more RAM than POC V1, I will look at implementing a 24 bit inode number, which will expand the file limit to 8,388,608 per filesystem. I'm sticking with 16 bits for now to keep the inode allocation map structure down to a smaller size. See next.
Bitmaps to control inode and data block usage, similar to that used in AFS. The inode allocation bitmap (IAM) is a fixed size of eight logical disk blocks. The data block allocation map (BAM) will be variable in size, depending on the size of the filesystem. The maximum filesystem size limitation of 64 GB is set by the maximum permissible size of the BAM, which is 8192 logical blocks.

A problem that develops with large filesystems is that the BAM also becomes very large and the time required to track down and allocate a free data block greatly increases as the file system fills up. In a 1 KB logical block filesystem, there is one BAM block for every 8192 data blocks, so the BAM block count goes up proportionally with the size of the filesystem's data area.

Consider a maximum-size filesystem of 64 GB. The stats on this filesystem would be:

Code:

   Filesystem Blocks: 67,108,864
       Usable Blocks: 67,104,758
         Data Blocks: 67,096,567
          BAM Blocks:      8,191

Usable blocks means the blocks available for the BAM and data block array. Note the number of BAM blocks. There are enough blocks to provide one bit for each data block in the filesystem—the last block is partially used, since 67,096,567 bits won't fit into an integral number of blocks.

Now, suppose 75 percent of the data blocks have been allocated to files and no files have ever been deleted. There will be no fragmentation and 75 percent of the BAM will have cleared bits—a cleared bit meaning the corresponding data block is in use. Now, if a file is opened and data is added to it, it is probable that a new data block will have to be allocated. This means the BAM would have to be scanned until a set bit was found. However, in our hypothetical case, the first set bit will be in relative BAM block 6142, which means 6141 full block scans would have to be performed in search of that set bit—a lot of processing. The compute-bound activity would go fast enough, especially since I could use 16 bit loads and stores to manipulate the bitmap. However, a lot of time would be consumed in reading in the BAM from disk—as much as 45 seconds if the disk hasn't cached any tracks and nothing is buffered in core. That sort of performance would make a Commodore 1541 floppy drive seem lightning fast by comparison.

This is a significant problem that I mulled for some time before devising a solution, which was to add an extra block to the BAM I call a "tag block," a term I borrowed from the tag RAM that was used with SRAM caches on 80386 and 80486 motherboards. The following pictorial represents a BAM with a tag block:

Code:

                                                     +———————————————————————————+
                                                     |                           |
                                                     | BAM Tag Block — 8192 Bits |
                            ------------------------ |                           | ------------------------
                           /              /          +———————————————————————————*          \              \
                          /              /                         |                         \              \
                         /              /                          |                          \              \
                        /              /                           |                           \              \
+—————————————————————+     +—————————————————————+     +—————————————————————+ ... +—————————————————————+     +—————————————————————+
|                     |     |                     |     |                     |     |                     |     |                     |
| BAM Block $0000 for |     | BAM Block $0001 for |     | BAM Block $0002 for |     | BAM Block $1FFE for |     | BAM Block $1FFF for |
|     Data Blocks     |     |     Data Blocks     |     |     Data Blocks     |     |     Data Blocks     |     |     Data Blocks     |
| $00000000—$00001FFF |     | $00002000—$00003FFF |     | $00004000—$00005FFF |     | $03FFC000—$03FFDFFF |     | $03FFE000—$03FFFFFF |
|                     |     |                     |     |                     |     |                     |     |                     |
+—————————————————————+     +—————————————————————+     +—————————————————————+ ... +—————————————————————+     +—————————————————————+

Each bit in the tag block indicates the status of one BAM block, with bit 0 of byte $0000 in the tag block representing BAM block $0000 (address is an LBO). If the tag bit is set then the corresponding BAM block has at least one set bit. When all bits in a BAM block have been cleared because all corresponding data blocks have been allocated, the tag bit is likewise cleared. As there are 8192 bits in the tag block, a maximum of 8192 BAM blocks can be controlled, representing $04000000 (67,108,864) data blocks. The result is a filesystem size of slightly more than 64 GB, once overhead is taken into account.

With the use of the tag block, it isn't necessary to perform a time-consuming sequential scan of the entire BAM looking for a free data block. Instead, the process is reduced to first scanning the tag block to find the first BAM block that points to at least one free data block. Once that BAM block has been determined, it will be scanned to locate the free block. Bitmap scanning is a quick process, as no bit-twiddling is required until a non-zero control byte is encountered in the map. Shifting and masking is then used to pinpoint the first set bit. With the 65C816, 16 bits can be handled at a time, which means the worst-case situation, the last byte in the block is the one with the set bit, amounts to 4096 scan iterations, with only the last one requiring shifting and masking. As a full BAM scan is avoided, no more than two disk reads are required to get the relevant BAM block, followed by one write to update the BAM and another to update the tag block if it is changed. If the relevant blocks are already in core, then disk activity is reduced to one or two writes only. In a true multitasking environment, the write operations would be handle by a "flush dirty buffers" routine that would run at periodic intervals to keep the filesystem consistent.

There are some other filesystem details I have to work out, but most of them are more about algorithms than actual design. One such algorithm that will demand some serious attention is the one that searches a directory for a filename and returns an inode number to the caller. That one should keep me thinking for a while.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 1:35 am, edited 2 times in total.

Top

BigDumbDinosaur

Post subject: DESIGNING A NEW FILESYSTEM cont'd

Posted: Mon Aug 27, 2012 7:31 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

In the previous post I rambled a bit about a filesystem that could be adaptable to my POC unit's mass storage facilities, which presently are includes a creaky old Seagate Hawk SCSI disk with a 1 GB capacity. However, as I said above, I'm shooting for a filesystem size of 64 GB, which can be comfortably supported with a 65C816 system.

As I thought this thing through, I produced some simple drawings to help me visualize what I'm trying to accomplish. The first drawing is a layout of the entire disk with two filesystems, one of which would be the root filesystem for a UNIX-like environment.

Attachment:

File comment: Disk layout w/two filesystems.

disk_layout.gif [ 84.59 KiB | Viewed 4505 times ]

This layout is similar to something you might find on a UNIX system disk where the S51K or Acer Fast Filesystem (AFS) is supported. The master boot block is one physical disk block and since a hard disk block is 512 bytes, has enough room for a filesystem table, minimal boot code and some administrative data that the BIOS would look at to determine if the disk is bootable. No partition table is used (or required), so the entire disk will be treated as a single device by whatever operating system controls it.

In my filesystem design, which I have temporarily dubbed 816NIX for lack of a better name, a maximum of four filesystems can be ensconced on a single disk, representing 256 GB of usable storage. Thirty-two bit addressing is used through out, which is why the maximum file size of 16 GB mentioned in the earlier post exists. Each filesystem has a slot in the filesystem table that includes the filesystem's name, its magic number, its size in physical blocks and where the filesystem can be found on the disk. In the case of a bootable disk, the first defined filesystem is the root filesystem and is where the system will go to find an operating system.

A fundamental characteristic of 816NIX is that all structures start on even logical block addresses (LBA). This has to do with the fact that the logical block size (1024 bytes) in the filesystem is comprised of two contiguous physical disk blocks, as previously described. This is apparent in the physical block offsets shown in the disk layout drawing. The reserved space from $00000012 to $00000112 will be used for several operating system features, one which involves setting up pipes. On a real UNIX system, pipes are either anonymous or FIFOs that exist as a file on disk. Anonymous pipes will be implemented within the reserved space, using raw disk access. I will also implement FIFOs, which are little more than some data blocks that have been a directory entry in filesystem.

The next drawing is the layout of a typical filesystem (this drawing is a refined version of what I produced in the previous message).

Attachment:

File comment: 816NIX filesystem layout.

filesystem_layout.gif [ 83.53 KiB | Viewed 4505 times ]

Note the reference to a "logical block offset" or LBO. Although disk blocks are addressed by LBA, the computations that figure out where files and other data are located within the filesystem are always zero-based. Once the LBO has been determined it is added to the starting LBA of the structure in question to get the LBA of the target disk block.

Originally I was thinking about using LBOs in all filesystem tables in an effort to make the filesystem movable from one device to another. After giving it more thought, I concluded that such portability is of limited value and that having to constantly take LBOs from address tables and such and convert them to LBAs would merely eat up processing time. So addresses that are stored in data tables or in indirect address blocks will be actual LBAs. If the filesystem needs to be moved to a different disk it will have to be transferred a file at a time.

The last drawing is map of a maximum sized file.

Attachment:

File comment: Maximum sized file map.

large_file_map.gif [ 144.4 KiB | Viewed 4505 times ]

Here, along with my use of a tagged bitmap to manage free filesystem data blocks, is one of the deviations from the S51K filesystem. In both S51K and AFS, the inode address table has 13 slots, of which 10 are pointers to direct address blocks. The implication of this feature is that if a file is no more than 10 KB in size, all accesses can be determined by calculating an offset into the inode table and using the LBA in the slot to access the proper data block—no indirect addressing is required. The result is a single disk access to get to the data, which is as fast as file access can get. The avoidance of indirect addressing is why UNIX or Linux FIFOs are limited to 10 KB—a FIFO should transit data with minimal delay, which would not be the case if indirect addressing was required.

The 10 KB threshold made sense back when machine resources were very limited (consider that the earliest UNIX system allocated 8 KB total to each user for code and data). We're not bogged down with such restrictions any more and I felt more efficiency could be gained by enlarging the inode address table. Over time, I've noted that a lot of file sizes fall into the 10 KB to 20 KB range, access to which would require indirect addressing with S51K or AFS. As can be seen in the drawing, the inode address table now has 24 slots, which means that files that are 21 KB or smaller don't have to use indirect addressing. A side-effect of this change is that the in-core size of an inode is much larger than before, 128 bytes instead of 64. As inodes have to be RAM-resident when a file is opened, there are RAM consumption implications—100 open files opened system-wide would require 12,800 bytes of RAM in which to hold inodes. Plus a table pointing to the in-core inodes has to be kept somewhere. You can see why the original S51K inode size of 64 bytes was important. If the operating system consumed a lot of RAM, there would be a real crimp on user space.

A fundamental piece of code that will be required to access files is something that can convert a zero-based byte offset (BOIF) into a file into an LBA and a byte offset in the block (BOIB). The first step is to compute an LBO and BOIB. The following pseudo-code handles that step:

Code:

    LBO = BOIF / 1024             ;compute logical block offset
    BOIB = BOIF - LBO * 1024      ;compute byte offset in data block

In the above, 1024 is the logical disk block size.

LBO only tells us how far into the file we have to go to get to the data, relative to the start of the file. The LBO has to be translated to an LBA in order to actually give the SCSI primitives what they need to access the disk. The first step in this part of the process is to determine if the LBO falls within the direct addressing part of the file:

Code:

    IF LBO < N_DIRLBA             ;if a direct offset...
       LBA = BAT(LBO*4,4)         ;get data block LBA from inode table
       RETURN LBA,BOIB            ;return LBA & byte offset in block
    FI

where:

Code:

    BAT      = block address table in inode
    N_DIRLBA = constant: number of data block LBAs in the inode address table (BAT)
    4        = constant: size of an LBA in bytes

The above will work for any BOIF from 0 to 21,503 inclusive, since the corresponding LBO will be in the range 0 to 20. The expression LBO*4,4 would magically index to an LBA in the BAT.

Once BOIF is 21,504 or greater, indirection is required. Here's a possible way to handle single indirection (refer to the file map to see which part of the file is being accessed):

Code:

    MLBO = LBO - N_DIRLBA         ;reindex LBO to account for direct blocks
    IF MLBO < 256                 ;if single indirection...
       LBA = BAT(N_DIRLBA)        ;get indirect block LBA from BAT
       GETBLK(LBA)                ;load master indirect block &...
       LBA = BLKIMG(MLBO*4,4)     ;get data block LBA from block image
       RETURN LBA,BOIB            ;return LBA & byte offset in block
    FI

where:

Code:

    256 = constant: number of LBAs per disk block
    4   = constant: size of an LBA in bytes (as above)

The function getblk() would retrieve a block pointed to by LBA. As before, the expression MLBO*4,4 would magically index to an LBA in the block image.

Before continuing, I should mention that each indirect block is a table of LBAs that either point to another indirect block (applicable to double or triple indirection) or point to data blocks. In the indirection hierarchy, each indirection level has a master indirect block. In single indirection, the master block points to a maximum of 256 data blocks. The single indirection master block's LBA is gotten from slot $21 in the inode address table (BAT), as that slot is the first indirect slot. The computation of the variable MLBO points to slot 21 ($15), and is a simple case of subtracting out the direct address slots. The maximum BOF for single indirection is 283,647, which is the sum of the data stored in the 21 directly addressed data blocks (21,504 bytes) plus the data that can be stored in 256 single indirection data blocks (262,144 bytes).

Once BOIF goes over 283,647 double indirection becomes necessary. In double indirection, the master indirect block contains the LBAs of up to 256 fine indirect blocks, which in turn, contain the LBAs of the actual data blocks. As two levels of indirection are now involved, the code is a little more involved:

Code:

    IF MLBO < 65536               ;if double level of indirection...
       LBA = BAT(N_DIRLBA+1)      ;get master indirect block LBA
       MLBO = MLBO/256            ;reindex to account for single indirection
       GETBLK(LBA)                ;load master indirect block
       LBA = BLKIMG(MLBO*4,4)     ;get indirect fine block LBA
       GETBLK(LBA)                ;load fine indirect block
       LBA = BLKIMG(MLBO*4,4)     ;get data block LBA
       RETURN LBA,BOIB            ;return LBA & byte offset in block
    FI

The above code is just an embellishment of the code used for single indirection. The maximum BOIF for double indirection is 67,392,511, which is the sum of the previous addressing plus the total number of data blocks that are pointed to by the fine indirect blocks, which amounts to 67,108,864 bytes.

Beyond a BOIF of 67,392,511, triple indirection is required. Here the master block points to a maximum of 256 coarse indirect blocks, which in turn, are able to point to a maximum of 65,536 fine indirect blocks, which in turn, can point to a maximum of 16,777,216 data blocks or 17,179,868,160 bytes. To that is added the bytes accounted for by the direct, single indirect and double indirect data blocks for an absolute maximum file size of 17,247,261,695 bytes, or 16 GB. Triple indirection would be accomplished as follows:

Code:

    LBA = BAT(N_DIRBLK+2)         ;get master indirect block LBA
    MLBO = MLBO/65536             ;reindex to account for double indirection
    GETBLK(LBA)                   ;load master indirect block
    LBA = BLKIMG(MLBO*4,4)        ;get coarse indirect block LBA
    GETBLK(LBA)                   ;load coarse indirect block
    LBA = BLKIMG(MLBO*4,4)        ;get fine indirect block LBA
    GETBLK(LBA)                   ;load fine indirect block
    LBA = BLKIMG(MLBO*4,4)        ;get data block LBA
    RETURN LBA,BOIB               ;return LBA & byte offset in block

where:

Code:

65536 = constant: number of LBAs pointed to by the second level of indirection

A total of four disk accesses will be required to reach the full extent of the file. As almost all operating systems use a block-random method of allocating storage to files, triple indirection can potentially be slow due to the required disk activity. While much less an issue with modern disks than with the stuff that existed back when I started working with UNIX, all this disk drive monkey-motion can reduce system throughput. An intelligent buffering system can help in that regard.

Now, I don't claim that the above algorithm is correct. It seems to be correct but perhaps someone else will spot an error of logic and/or omission.
—————————————————————————————————————————————————————————————————————————————————————————
Edit: 2012/08/28: I did discover an error in my algorithm, which upon looking back, was obvious but somehow overlooked. After I've tested the changes I'll write more about it.
—————————————————————————————————————————————————————————————————————————————————————————
Edit: 2014/07/09: Fixed errors in the above that was pointed out to me by another reader.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 1:40 am, edited 3 times in total.

Top

BigEd

Post subject: OT: partition tables

Posted: Mon Aug 27, 2012 8:45 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

Off topic, but this description of SunOS/BSD partitions took me back:
http://tinf2.vub.ac.be/~dvermeir/mirror ... dm-33.html
Which is to say, there was a partition table concept before the IBM compatible BIOS.
Linux on ARM can use the MBR style partition (that's what my NAS uses, not an IBM compatible!) and these days we have another option, the GPT: http://en.wikipedia.org/wiki/GUID_Partition_Table

I'm not sure I agree with your tactic of hard-coding disk layout in the filesystem, if that's what you're doing: even making up a convention of storing the offsets and sizes of the partitions (filesystems) and storing it in the boot block would be more flexible. Which is not to say it's essential, of course.

Cheers
Ed

Top

Tor

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Aug 27, 2012 1:13 pm

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan

Hm, yes, partitions can be found in lots of places, SGI uses them, for example. And according to the 'advanced partition support' config option for Linux they were also (in addition to SGI) used by Acorn, OSF/Alpha, Amiga, Atari, Macintosh, BSD, Minix 2.0, Unixware (well, that's x86 too), Ultrix, Sun (Sparc and x86), SYSV68, and a couple more obscure cases.
So partitions seems to be a concept also used outside the x86/PC world

-Tor

Top

BigDumbDinosaur

Post subject: Re: OT: partition tables

Posted: Mon Aug 27, 2012 6:45 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

BigEd wrote:

Tor wrote:

Ah! Gotcha both. I never said that the concept of partitions didn't exist prior to or outside of the x86 architecture. I just said that they are largely unknown outside of x86—nowadays, anyhow. "Largely" doesn't mean "completely." Also, most of the examples given are for historic/obsolete systems (e.g., the Amiga and the Sun proprietary hardware), or operating systems that had their origins on x86 hardware (e.g., Minix).

Furthermore, many of the examples are actually filesystems, as defined in computer science, not partitions. The SunOS partition description in particular is a good example of the misuse of the term "partition." I quote:

An example of the partition table on a SunOS 4.1.X disk might be:

# format sd0

format> partition

partition> print Corresponding

Current partition table (original sd0): File System

partition a - starting cyl 0, # blocks 33120 (46/0/0) / - root

partition b - starting cyl 46, # blocks 125280 (174/0/0) swap

partition c - starting cyl 0, # blocks 828720 (1151/0/0) entire disk

partition d - starting cyl 220, # blocks 59760 (83/0/0) /var

partition e - starting cyl 0, # blocks 0 (0/0/0)

partition f - starting cyl 0, # blocks 0 (0/0/0)

partition g - starting cyl 303, # blocks 610560 (848/0/0) /usr

partition h - starting cyl 0, # blocks 0 (0/0/0)

The above "partition" scheme obviously shows its age with its referral to the disk geometry.

Anyhow, to quote a formal definition of partitioning:

Disk partitioning is the act of dividing a hard disk drive into multiple logical storage units referred to as partitions, to treat one physical disk drive as if it were multiple disks.

Note that no mention is made of filesystems. It just so happens that in MS-DOS, Windows and some others, only one filesystem can be defined per logical disk device. Particularly on x86 hardware, partitions (not filesystems) were a way to cope with the limited addressing capabilities of the early FAT filesystems as disk capacity evolved. Recall that the PC-XT was the first in the series to offer a hard drive, which was a whopping 20 MB—FAT12 couldn't address that much space, resulting in the development of FAT16 and beyond.

As for Linux, since it had its origins on the x86 architecture, where the concept of partitions already existed under MS-DOS, OS/2, Xenix, SCO UNIX and Windows, the decision to make a partition act as a single filesystem seemed inevitable and may well have been the result of an initial misunderstanding by Linus Torvalds of the difference between a filesystem and a partition. Linux could have just as easily been designed for use with a partition-less disk, as the BIOS itself knows nothing about partitions.

SCO UNIX and its derivatives treat partitions as partitions (that is, as virtual disks) and use what are called divisions within the active partition(s) to demarcate filesystem boundaries. Unless one is configuring a machine to support multiple operating systems, only one partition per physical disk is defined on SCO machines. That partition, of course, can have multiple filesystems.

The SCO division table is not stored in the boot block and has no relationship to the partition table. SCO's setup is much more flexible than Linux's (and more robust—boot block corruption doesn't affect the division table), as the latter is limited to four filesystems per disk, whereas SCO's division concept, which is controlled by an operating system command called divvy, allows up to eight filesystems per active partition. Also, it isn't overly difficult to move filesystem boundaries, unlike in Linux, where the partition table itself has to be manipulated.

BigEd wrote:

I'm not sure I agree with your tactic of hard-coding disk layout in the filesystem, if that's what you're doing: even making up a convention of storing the offsets and sizes of the partitions (filesystems) and storing it in the boot block would be more flexible. Which is not to say it's essential, of course.

I think you've got that backwards. I didn't hard code the disk layout in the filesystem. The disk layout dictates where the major structures are located, that is, where the master boot block is, where stage 1 boot can be found and where to look for the filesystem table. All of these elements are of a fixed size of necessity, as they would be on any storage medium. In fact, part of my disk layout mimics that of the SCO UNIX layout, which as I noted, is very robust—much more so than the archetypal PC layout under DOS or Windows.

The disk layout indirectly dictates where the first filesystem will start. The other three filesystems, if defined, are located relative to the end of the first filesystem, which can be an arbitrary size up to the full extent of the disk if desired. Therefore, nothing about filesystem locations is hard coded. It just so happens that the first filesystem starts at the same point on any disk connected to the computer because the "invisible" static structures are defined in size and location, which would be the case no matter the operating system (take a good look at what is in physical block zero of the boot drive in any PC to see that this is the case). It is certainly possible to define the first filesystem to start elsewhere on the disk, although doing so would mostly waste space.

Incidentally, I decided on the four filesystems per disk limit only to avoid using up too much space in the boot block. Also, I was being realistic about RAM space consumption. Each mounted filesystem would have to have a copy of its superblock in core at all times, which amounts to 1 KB RAM space per filesystem. It's patent that having a lot of mounted filesystems would quickly eat up a lot of RAM.

Each filesystem's internal relationships are, of necessity, hard coded relative to the start of the filesystem (i.e., the origin of the superblock), otherwise the operating system wouldn't know where to look for the key filesystem structures. Again, this mimics current practice. The physical disk blocks that make up the various filesystem structures (inodes, BAM, etc.) are determined at the time the filesystem is created, and that is accomplished by computing offsets to the filesystem's origin. The filesystem table itself is stored early in the boot block right after a jump vector to the master boot code and a magic number, simply as a matter of convenience—it could have been stored in physical block $00000001 instead of physical block $00000000. The filesystem table must also, by necessity, have a fixed location in order for the first stage boot loader to be able to find it and subsequently find the root filesystem to continue the operating system load.

The draft version of the filesystem table is as follows:

Code:

  FILESYSTEM TABLE LAYOUT   Offset in Table
+=========================+ ===============
|   4th filesystem slot   |
+—————————————————————————+ $38
|   3rd filesystem slot   |
+—————————————————————————+ $26
|   2nd filesystem slot   |
+—————————————————————————+ $14
|   1st filesystem slot   |
+—————————————————————————+ $02
|      1st open slot      |
+—————————————————————————+ $01
|   active entries (0-4)  |
+=========================+ $00

Again, the offsets are relative—relocating the table to a different part of the disk won't affect anything.

A filesystem table slot appears as follows:

Code:

  FILESYSTEM TABLE SLOT LAYOUT    Offset in Slot
+===============================+ ==============
|        superblock LBA         |
+———————————————————————————————+ $0E
|             name              |
+———————————————————————————————+ $06
|         magic number          |
+———————————————————————————————+ $02
|           not used            |
+———————————————————————————————+ $01
|             flag              |
+===============================+ $00

The (bit-wise) flag indicates if the filesystem is active and also indicates if the filesystem is to be mounted read-only or read/write.

All of the above mimics current practice, so I'm not inventing anything here, just adapting.
———————————————————————————————
Edit: belatedly discovered and fixed a typo that could have been confusing.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Feb 09, 2021 7:18 am, edited 2 times in total.

Top

BigEd

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Aug 27, 2012 7:16 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

Quote:

I didn't hard code the disk layout in the filesystem

That's OK then.

Top

BigDumbDinosaur

Post subject: DESIGNING A NEW FILESYSTEM: File Access Algorithm

Posted: Thu Sep 06, 2012 12:39 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

BigDumbDinosaur wrote:

In the previous post I rambled a bit about a filesystem that could be adaptable to my POC unit's mass storage facilities...A fundamental piece of code that will be required to access files is something that can convert a zero-based byte offset (BOIF) into a file into an LBA and a byte offset in the block (BOIB)...

I decided to sleep on this for a while before trying to actually implement it. In the process, I wrote a program to run on POC that would generate the structure of a maximum-sized file (4GB) so I could test my theories. At 4GB, all levels of indirection are used, resulting in the most complex structure that would be produced in a real file. Generation resulted in the writing of over 8 million disk blocks, so it took a while for this program to finish. I used a fake inode defined in RAM rather than a real one on the disk, since anything involving the manipulation of a inode would be working on a core copy.

Once that file structure had been produced I started to see relationships that I wasn't seeing on paper, causing me to revise my thinking on how to convert a byte offset in a file (BOIF) into the values needed to access the file, that is, the logical block address (LBA) and the byte offset in that block (BOIB). As it turns out, it isn't very complicated.

In the following pseudo-code, all values are unsigned and division is integer division, that is, Q=floor(N/D) in C.

First, here are some needed atomic constants:

Code:

    S_WORD = 2                    ;bytes in a word (16 bits)
    S_DWORD = 4                   ;bytes in a double word (32 bits)
    S_LDBLK = 1024                ;bytes in a logical disk block
    N_DIRLBA = 21                 ;number of direct address blocks

The first step is to compute a zero-based logical block offset (LBO) into the file, which will then lead to the computation of BOIB.

Code:

    LBO = BOIF / LBS              ;compute logical block offset
    BOIB = BOIF - LBO * S_LDBLK   ;compute byte offset in data block

This is the same as the start of the algorithm I mentioned in a post above.

As previously described, only a small number of the total data blocks assigned to a file are directly addressable from the inode, the N_DIRLBA value. The balance of the file has to be accessed via indirect addressing, the degree of indirection being determined by how far into the file the desired access point is located. So the next step is to determine if indirection is required and if so, the degree of indirection that is required:

Code:

    INDIR = 0                     ;degree of indirection
    WHILE LBO => INDIRTAB(INDIR)  ;determine degree of indirection
       INDIR = INDIR + 1
    WEND
    LBO = LBO - MLBOTAB(INDIR)    ;adjust logical block offset as needed
    ...

    ;data tables...

    INDIRTAB  N_DIRLBA            ;single indirection
              N_DIRLBA + 256      ;double indirection
              N_DIRLBA + 256 + 65536 ;triple indirection
              4294967295          ;sentinel value
    ;
    MLBOTAB   0                   ;no indirection
              N_DIRLBA            ;single indirection
              N_DIRLBA + 256      ;double indirection
              N_DIRLBA + 65536    ;triple indirection

Upon completion of the above, LBO is the logical block offset relative to the degree of indirection. If LBO < N_DIRLBA then LBO will be unchanged, INDIR = 0 and direct addressing is used. In such a case, no other work is required other than to retrieve the data block LBA corresponding to LBO from the inode block address table (IBAT):

Code:

LBA = IBAT(LBO*S_DWORD,S_DWORD)

where the expression LBO*S_DWORD,S_DWORD would magically index to an entry in IBAT (if only it were that simple in assembly language).

If INDIR > 0 then indirection is required, which involves accessing one or more indirect address blocks in order to find the LBA of the block in which the data is located. In this case, INDIR tells us how many indirect blocks will have to be read. Each indirect block is a table of LBAs, 256 maximum per block, that either point to another indirect block (applicable to double or triple indirection) or point to data blocks. Each indirection level has a master indirect block, whose LBA is stored in IBAT and can be derived as follows:

Code:

MLBA = IBAT((N_DIRLBA+INDIR-1)*S_DWORD,S_DWORD)

where (N_DIRLBA+INDIR-1)*S_DWORD,S_DWORD indexes into the inode block address table. With the master block's LBA "in hand," the following pseudo-code would load the block:

Code:

getblk(MLBA,BUF) ;load block into BUF

Once the master block has been loaded, the following relationships determine how to proceed:

Single indirection: master block points to data blocks.
Double indirection: master block points to one or more fine blocks. Fine blocks point to one or more data blocks.
Triple indirection: master block points to one or more coarse blocks. Coarse blocks point to one or more fine blocks. Fine blocks point to one or more data blocks.

Now, let's suppose LBO = 1234567, a random value I've pulled from my hat. That value of LBO means that INDIR = 3. As it so happens, the largest possible value of LBO after processing by the above loop can never exceed 4,194,303 because the largest possible file size in a 32 bit filesystem is (surprise!) $FFFFFFFF or 4,294,967,295 bytes. Divide that by the logical block size of 1024 bytes and you get 4,194,303 or $3FFFFF blocks. Hence LBO can be represented by a three byte binary value. In this case, LBO = 1234567 is $12D687,

Now, consider the master ? coarse ? fine progression. We've already noted that any indirect block can store up to 256 LBAs. Since the fine blocks are the lowest level of the indirection chain and the master block is the highest, we can view the $12 byte of $12D687 as the index into the master block, $D6 of $12D687 as the index into the coarse block, and $87 of $12D687 as the index into the fine block, since working our way up the hierarchy from fine to master involves multiplying by 256 for each indirection level.

So, with the master block loaded into BUF, we can get the coarse block corresponding to LBO with:

Code:

getblk(BUF($12*S_DWORD,S_DWORD),BUF)

and the fine block corresponding to $12D687 with:

Code:

getblk(BUF($D6*S_DWORD,S_DWORD),BUF)

Finally, the data block itself can be gotten with:

Code:

getblk(BUF($87*S_DWORD,S_DWORD),BUF)

If double indirection is involved, the process is reduced to two steps, as the most significant byte of LBO will be zero. If single indirection is involved, the process is reduced to one step, as the middle and most significant bytes of LBO will zero and can be ignored.

All of this is surprisingly easy to program in assembly language when using the W65C816S, as its ability to do 16 bit loads, stores, shifts and arithmetic operations greatly reduces the amount of code required, as well as the time required to perform the operation. Consider the step where LBO is adjusted:

Code:

LBO = LBO - MLBOTAB(INDIR) ;adjust logical block offset as needed

LBO is a 32 bit value stored in the 6502's characteristic little-endian format, and if one were programming the 6502 or 65C02, the code to adjust LBO would be as follows:

Code:

         ldx #0                ;number index
         ldy #s_dword          ;bytes to process
         sec
;
loop     lda lbo,x             ;LBO
         sbc mlboval,x         ;value gotten from MLBOTAB
         sta lbo,x             ;adjusted LBO
         inx
         dey
         bne loop

Contrast the above to the way it would be done with the '816:

Code:

         rep #%00100000        ;select 16 bit .A
         sec
         lda lbo               ;LBO least significant word (LSW)
         sbc mlboval           ;value gotten from MLBOTAB LSW
         sta lbo               ;adjusted LBO LSW
         lda lbo+s_word        ;LBO most significant word (MSW)
         sbc mlboval+s_word    ;value gotten from MLBOTAB MSW
         sta lbo+s_word        ;adjusted LBO MSW

Anyhow, this is the algorithm I am going to try. It works on paper.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 1:45 am, edited 3 times in total.

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM: File Access Algorithm

Posted: Fri Sep 07, 2012 2:58 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

BigDumbDinosaur wrote:

I decided to sleep on this for a while before trying to actually implement it...this is the algorithm I am going to try. It works on paper.

It also works on POC. During one of my periodic insomnia episodes (last night) I cranked in the assembly language code needed to exercise the "lseek" algorithm I outlined in my previous post. After fixing the usual typos (e.g., entering TAX when I meant TXA) and assorted logic boo-boos, all aspects of it work as theorized. This means either I'm pretty smart for a crusty old reptile (or grumpy old bastard, as one member here described me), or incredibly lucky—most likely the latter. :lol:

Something that these theory-into-reality exercises has highlighted is that I need to come up with a more fine-grained timekeeper for the purpose of gauging algorithm performance. POC currently has a time-keeping resolution of 10 milliseconds (the jiffy IRQ counter), which is okay for tracking wall-clock time in relatively lengthy processes. However, it's too "coarse" for measuring the time required for the triple indirection file seek to work. I, of course, can cycle-count the code to get an idea how long the compute-bound part of the process takes. However, manual cycle counting is a pain and prone to error. So a near-term mini-project is figuring out how to create a fine-grained time-keeper. The secret is somewhere in the counter/timer in the DUART.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 1:47 am, edited 2 times in total.

Top

jonb

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sat Sep 22, 2012 8:39 pm

Joined: Sun May 08, 2011 7:39 am
Posts: 104

BigDumbDinosaur wrote:

One of the first challenges is what to call this filesystem.

My dear fellow, simply call it BDD-FS... what could be simpler?

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sat Sep 22, 2012 9:03 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

jonb wrote:

BigDumbDinosaur wrote:

One of the first challenges is what to call this filesystem.

My dear fellow, simply call it BDD-FS... what could be simpler?

Sounds like a good suggestion, if not slightly narcissistic. If I don't get any others in the near future that's what I'll use. Now, if ElEctric_EyE were to make a name suggestion, it would probably be GOB-FS.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 1:48 am, edited 2 times in total.

Top

BigEd

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sat Sep 22, 2012 9:09 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

?!?

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sat Sep 22, 2012 10:33 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

BigEd wrote:

?!?

Sorry, Ed. I confused you with someone else. :oops:

Suitable editing has been applied.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

BigEd

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sun Sep 23, 2012 1:48 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

Oh, right! I though perhaps an obscure reference to Arrested Development, although it seemed unlikely you'd be a fan of the show - it wasn't a great commercial success, hilarious though it was.
Cheers
Ed

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sun Sep 23, 2012 7:03 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8506
Location: Midwestern USA

BigEd wrote:

I never saw that program, as I am not that much of a TV watcher. Is that a UK show?

Edit: It's an American product...had to look it up to find out.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 1:49 am, edited 1 time in total.

Top

ElEctric_EyE

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Sep 24, 2012 9:23 am

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

BigDumbDinosaur wrote:

... Now, if ElEctric_EyE were to make a name suggestion, it would probably be GOB-FS.

I'm thinking you're jabbing at me since I live the south, so GOB would mean Good 'Ole Boy? :roll:

I'm not yer typical sutherner as I'm a Yankee transplant, but my general views agree more with the Bible Belt than the Northeast liberals nowadays. Not to take us too far off-topic, but I had to respond.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502

Top

Page 1 of 6

[ 89 posts ]

Go to page 1, 2, 3, 4, 5, 6 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 15 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum