Joined: Thu May 28, 2009 9:46 pm Posts: 8485 Location: Midwestern USA
|
I haven't posted into this topic for a while, as I didn't have much about which to bloviate. Also, I decided to redo some structural elements of my filesystem design after some previously posted commentary.
BigDumbDinosaur wrote: Rob Finch wrote: Just wanted to say I had some interest in this topic. I'd like to port a filesystem to my system at some point. My system has an HD SD-Card interface so there's potentially many GB's of storage. One thing I noticed is with potentially 24 bit inode pointers, perhaps the filenames could be limited to 29 characters? So that there are three bytes available for the node pointer. Yes, 24 bit inode numbers could allow up 16,777,215 files per filesystem with only a trivial reduction in filename size, as you note. Given that really high capacity mass storage is readily available, your suggestion has merit and has me giving the 816NIX filesystem some more thought. I redid the design so that a maximum of 16,777,215 inodes per filesystem is possible (the minimum is 8, which is really too small to be useful), and increased the maximum filename length to 60 bytes. The number of inodes can be set when the filesystem is created, unlike before, in which the number of inodes per filesystem was fixed at 65,535. Other elements of the filesystem design are as before: the maximum data area size is 64GB and the logical block size is 1KB. I'm in the process of reworking my mkfs program to take advantage of these changes.
BigDumbDinosaur wrote: Also to be considered is that the inode allocation bitmap would be substantially larger and hence more complicated to manage. Assuming the 16,777,215 file limit, that's 16,777,216 bits to be managed (inode $000000, while assigned a spot in the bitmap, does not actually exist). A total of 2,097,152 bitmap bytes would be required (not counting overhead) or 2048 logical (1024 byte) disk blocks. Where the complication comes in is that when a file is created, the bitmap has to be scanned to find an available inode...In order to avoid a lot of disk gyrations, a tag map would have to be used to determine which of the bitmap blocks has at least one available inode. Since there would be 2048 bitmap blocks, the tag map would need 2048 bits, or 256 bytes. I could probably shoehorn that into the tail end of the filesystem's superblock, practical because the 816NIX filesystem uses a 1K logical block size and my design doesn't fill up the superblock with a linked list like the old S51K filesystem did (note that when a filesystem has been mounted the superblock will always be resident in RAM). The inode tag map does fit in the tail end of the superblock, so that concern is gone. Since the superblock has to be in RAM at all times while the filesystem is mounted the tag map scan only involves compute-bound activity, which will go real fast.
——————————————— Date and Time-Stamping ——————————————— One of the many things that has to happen as files are created, read and written (and deleted) is the generation of time-stamps. Each file's inode has three time-stamps, tracking the last time the file was accessed (atime), the last time it was modified (mtime, which is the date and time usually displayed by ls -l on a Linux or UNIX system) and the last time the inode itself was modified (ctime). Each filesystem also has a time-stamp, which tracks when changes are made to the filesystem itself. That time-stamp is stored in the superblock.
The traditional UNIX time-stamp is a field that represents the number of seconds that have elapsed since midnight UTC on January 1, 1970, a point in time referred to as the "epoch." There are some good reasons for time-stamping in this fashion, not the least of which is it is independent of time zones. Another is that forward or backward date computations are reduced to basic integer arithmetic. For example, one could determine a date 30 days into the future by taking the current time value and adding 86400 × 30 to it.
Linux and UNIX maintain the system's notion of the time of day in an integer that is incremented once per second by the jiffy interrupt. System time is synchronized to UTC, and reference to a time zone occurs only when an application requests the time and date for the purposes of display. When something has to be time-stamped the current value of the system time is copied over to whatever it is that is being time-stamped—it's not at all complicated.
Prior to the development of the 64 bit versions of UNIX and Linux, the time-stamp would be held in a 32 bit signed integer of type time_t, with dates prior to the epoch being represented by a negative time value. There are several problems with this. The first one is the need to perform signed arithmetic to convert negative time_t values to a "broken-down" (i.e., human readable) date and time, or vice versa. More significantly, 32 bits worth of seconds is good for a bit more than 136 years. Since any date after the epoch is a positive time_t value (meaning bit 31 is zero), the farthest out this timekeeping method can go before rollover occurs is a little more than 68 years, giving rise to the "year 2038" problem.
The 64 bit versions of UNIX and Linux have eliminated the "year 2038" problem by redefining time_t to be a 64 bit signed integer, still based on the midnight UTC January 1, 1970 epoch. This change increases the date range to approximately 292 billion years into the future, which is well beyond the projected life of the solar system.
Practically speaking, it isn't necessary to track time out to the end of time, nor is it necessary to go real far into the past. However, I didn't want the "year 2038" problem to pop up, just in case I was still able to tinker with my toys 23 years from now (I'd be in my nineties at that time). So I decided to use a 48 bit unsigned integer for time_t, which can track up to 281,474,976,710,655 seconds, equal to more than 8,919,403 years. The routine that I've written that converts the broken-down date and time to time_t accepts 16 bit unsigned integers in each of the input fields (e.g., month, hour, etc.), which limits the maximum possible year to 65,535. However, the end-of-century leap year algorithm implemented by the Gregorian calendar is faulty for years beyond 9999, so the practical maximum date is Friday, December 31, 9999. I'm quite certain that neither I or any POC computer will be around at that time.
In addition to defining a more compact time_t, I decided to make midnight UTC on October 1, 1752 the epoch. Since my version of time_t is unsigned, my epoch had to be much earlier than the UNIX epoch if it was going to be possible to store dates before 1970. However, there was another consideration that led to that date.
The British Empire converted from the Julian calendar to the Gregorian calendar in 1752, with the actual switch occurring on September 3. As the Julian calendar had accumulated errors due to an improper leap year algorithm, 11 days were excised from September 1752 to get things back into sync. I didn't want to have to deal with a month that was missing nearly two weeks, so I decided to start at October 1. Hence converting midnight UTC on October 1, 1752 will produce a time_t value of $000000000000.
The algorithm I have devised for converting between broken-down time and time_t is based upon the algorithm used to compute a Julian Day number (not to be confused with the Julian calendar—the two are unrelated), as originally implemented in FORTRAN. The computation steps are as follows:
Code: a = (14 – MONTH) ÷ 12 y = YEAR + 4800 – a t = HOUR × 3600 + MINS × 60 + SECS time_t = (DOM + (153 × (MONTH + a × 12 – 3) + 2) ÷ 5 + y × 365 + y ÷ 4 + y ÷ 400 – y ÷ 100 – 2393284) × 86400 + t where YEAR, MONTH, DOM, HOUR, MINS and SECS are the broken-down date and time values.
This algorithm is readily programmed in assembly language using 64 bit arithmetic functions, which are implemented with little difficulty on the 65C816. All division is integer division as would be produced by floor() in ANSI C on positive numbers. The usual algebraic precedence applies.
Technically, the above is incorrect for any date prior to 1582, as any date earlier than that would be from the proleptic Gregorian calendar. However, since I am not considering dates prior to October 1, 1752, the errors that would occur in converting pre-Gregorian dates are of no consequence.
_________________ x86? We ain't got no x86. We don't NEED no stinking x86!
Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:55 am, edited 2 times in total.
|
|