Joined: Thu May 28, 2009 9:46 pm Posts: 8506 Location: Midwestern USA
|
This topic has languished for some time. However, Chad Burrow’s recent topic concerning methods of mass storage, along with some recent programming activities of mine, prompted me to revive this topic.
Some pages back, I described the architecture of the “816NIX” filesystem I want to implement on my POC units’ mass storage. Several things held me back. One was machines prior to V1.3 only had bank $00 RAM, and realistically there was insufficient RAM to efficiently implement a kernel that included a filesystem driver, along with the necessary buffer space, whilst leaving adequate room for a user program. With V1.3 offering another 64KB, these concerns have been alleviated, as extended RAM can be used for user-space, leaving all of bank $00 for a kernel, stack, buffers, etc. This has spurred me to get back on this project.
Some recent code-pounding has focused on low-level functions that will be essential to being able to build and manage a filesystem. One of them writes an initialized bitmap to the disk. Another is a function that clears areas of the disk to a known state, usually nulls. Past efforts used repetitive SCSI write transactions, along with some arithmetic to keep track of progress. In wall-clock time, it was a slow process and not having a good means of speeding up SCSI read/write performance (no DMA controller ), my enthusiasm was cooled by the prospect of processing that would be slower than a snake in a snowstorm.
Where things really bogged down was in initializing the filesystem’s inode structures, especially the inode array itself. A maximum of 16,777,216 inodes may be configured per filesystem (one inode per file or directory). Within the filesystem, disk space is allocated in 1KB “logical blocks,” which means the inode allocation bitmap (IAM) of a new filesystem would consist of 2048 contiguous blocks (8192 bits in a logical block) with all but one bit set to 1. Assuming an average write speed of 750KB/second,¹ it would take about three seconds to generate the IAM for a maxxed-out filesystem. Certainly tolerable, but...
Each inode has an on-disk size of 128 bytes, which means eight inodes can fit into one logical block. In a maxxed-out filesystem, 2,097,152 contiguous logical blocks must be assigned to the inode array. If an inode is not in use it must be initialized to nulls so a filesystem checker will know it’s not allocated to a file. Hence part of generating a new filesystem will involve writing 2,097,152 logical blocks full of nulls. Again assuming an average write speed of 750KB/second, it would take nearly 50 minutes to initialize the inode array. In actuality, it took over an hour during testing I did on POC V1.1, whose Ø2 clock runs at 12.5 MHz, vs. 16 MHz with V1.3. Whether 50 minutes or an hour, it was much too slow.
What changed things was some information in a Seagate SCSI technical paper that was apparently meant only for internal use, a copy having fortuitously come my way. In this paper, mention is made of an obscure SCSI command called “write same.” When issued this command, the disk will replicate a block a variable number of times, up to a maximum of 65,535 times, to be exact. The nature of the command suggests it was intended for production testing of new units. However, I have also determined that the SBC-3 SCSI standard marks this command as mandatory for direct-access devices, i.e., disks.
Anyhow, the procedure involved is like that of writing multiple blocks, which normally would be SCSI opcode $2A, paired with a 16-bit number of blocks, referred to as the transfer size (NBLK). When initiated, the drive will keep requesting data during the data-out bus phase and the host will keep putting bytes on the bus until the drive has received enough.
The SCSI opcode for “write same” is opcode $41 and the NBLK parameter is the total number of identical blocks to be written. The drive only requests one block of data from the host during the data-out bus phase and then after buffering the block in its own RAM, writes the block at the requested logical block address (LBA) and then replicates it NBLK -1 times, advancing the LBA by one block after each write. The host machine can wait for the drive to finish or can go on to something else.
The block duplication is done entirely in the drive at whatever speed it can manage, and without host involvement. Some testing I did with a 15,000 RPM Seagate drive suggests an effective duplication rate around 97,000 (512 byte) blocks per second, far beyond the capabilities of POC V1.3 using repetitive writes, even with writes involving large buffers. I also tested with a slower 7200 RPM drive (an old Maxtor unit pulled out of a server years ago). The duplication rate was about 35,000 blocks per second, which is still well beyond what V1.3 could do writing in 32 KB chunks.
With this new-found knowledge, I now had a way to generate a filesystem in a reasonable amount of time. Testing with the 2,097,152 block count required to initialize the inode array of a maxxed-out filesystem showed that that step in filesystem generation would only require 43 seconds to complete. It goes without saying that is a huge improvement over 3000 seconds (50 minutes). Incidentally, writing the IAM using the “write same” technique took about 0.045 seconds to complete.
Another relatively large structure that has to be initialized during filesystem generation is the data block allocation map (BAM). As with the IAM, the BAM is an inverted bitmap, with each bit assigned to a data block. The 816NIX filesystem can have a maximum data area size of 64GB, which given the 1KB logical block size, means the data area will have 67,108,864 data blocks in a maxxed-out filesystem. Fortunately, the data blocks don’t have to be initialized to any particular state, so the approximate 12 minutes that would be required to do so can be avoided.² However, the BAM has to be initialized, which in a maxxed-out filesystem, consists of 8192 logical blocks. Using “write same,” I’d expect that generating the BAM would take around 0.09 seconds to complete, which would be an almost-imperceptible period of time. I have not empirically determined that, but given what I have determined with other testing, expect it to be a realistic number.
All-in-all and using the above numbers, I’d expect it would take under a minute to generate a maxxed-out, 64GB filesystem, with a root directory containing . and .. entries. Definitely tolerable!
———————————————————— ¹POC V1.3 can theoretically do 800KB/second, but there is bus protocol management overhead that is unavoidable.
²I might make clearing the data blocks to nulls an option in the filesystem generator, although I would think it would be a feature with limited value.
_________________ x86? We ain't got no x86. We don't NEED no stinking x86!
Last edited by BigDumbDinosaur on Wed Dec 06, 2023 8:38 am, edited 3 times in total.
|
|