6502.org • View topic - DESIGNING A NEW FILESYSTEM

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

DESIGNING A NEW FILESYSTEM

Page 4 of 6

[ 89 posts ]

Go to page Previous 1, 2, 3, 4, 5, 6 Next

Previous topic | Next topic

Author

Message

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sun Jun 01, 2014 10:44 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

cr1901 wrote:

Is it in your plans to make this file-system POSIX-compliant (to an extent)?

No. This filesystem, like the machine on which it will be implemented, is "proof of concept." From the programmer's perspective, it will be a "stream of bytes" model that will effectively implement the UNIX unbuffered file I/O methods.

Quote:

Also, re: not porting a filesystem from the C world- if any of them rely of GNU C extensions, then it isn't possible to port it, and gcc doesn't exist for the '816 anymore.

What would you define as a "filesystem from the C world"? Neither K&R or ANSI C define how a filesystem is or should be internally structured. The local operating system attends to that matter, which obviously takes filesystem implementation out of ANSI's purview.

What ANSI does define is that the standard library make local operating system file access services available for both buffered and unbuffered (raw) access, using ANSI-defined function names. For buffered access, functions such as fopen(), fread(), fseek(), etc., are used. If raw access, the UNIX equivalents, which don't have an 'f' in the function name, are used. In either case, the standard library code interfaces to the operating system API in a machine-specific way and expects the OS to take care of the internal mumbo-jumbo. None of that is part of the ANSI standard.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:45 am, edited 1 time in total.

Top

cr1901

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Jun 02, 2014 2:25 am

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158

BigDumbDinosaur wrote:

Perhaps that was a bad choice of words, but I meant file systems written for portable open-source operating systems using freestanding C, which is the the alternate ANSI C specification without a standard library*, and entry-points are implementation defined.

ext2/3, ffs, etc. aren't written in assembly, but rather freestanding C. And even that is not strictly true, as they will also use extensions provided by the GNU C compiler, such as to force a structure alignment (which ANSI C does not specify, but C11 does... but I don't know ANYONE who uses C11). A full list for Linux is found here: http://www.ibm.com/developerworks/linux ... gcc-hacks/. The range-case and zero-length array make me cringe personally.

*For others reading: It would be a rather bad chicken and egg problem if an OS written in C used the standard library, when the OS is supposed to implement the standard library, wouldn't it

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Jun 02, 2014 4:58 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

cr1901 wrote:

BigDumbDinosaur wrote:

What would you define as a "filesystem from the C world"?...

One is not required to use the C standard library at all, of course, so ANSI C is "freestanding" C if the library is ignored. Keep in mind that during the halcyon days of AT&T System V UNIX, almost the entire kernal was developed in K&R C. If I correctly recall, little of the standard library was involved. The only assembly language present was in the device drivers, and even there, not much was used, since the C compiler usually generated sufficiently succinct code. By the time Kernighan and Pike published The UNIX Programming Environment (1984), the UNIX kernel had been reduced to under 1000 lines of assembly language—everything else was C.

That said, development of any kind of filesystem constitutes bare metal programming and at least in the 65xx world, that usually means assembly language—none of the available C compilers, including WDC's, emit sufficiently succinct code to really be practical for kernel development. My opinion is that with the 65C816, developing a kernel in assembly language is not much more difficult than in C, and assuming reasonable skill on the part of the programmer, would result in substantially tighter and faster code.

Quote:

ext2/3, ffs, etc. aren't written in assembly, but rather freestanding C. And even that is not strictly true, as they will also use extensions provided by the GNU C compiler, such as to force a structure alignment (which ANSI C does not specify, but C11 does... but I don't know ANYONE who uses C11). A full list for Linux is found here: http://www.ibm.com/developerworks/linux ... gcc-hacks/. The range-case and zero-length array make me cringe personally.

gcc has become the antithesis of the UNIX philosophy and is a far cry from the compiler that I used to know. As far as I'm concerned, gcc in its present form should be classified as bloatware. Creeping featurism doesn't even start to describe it.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:46 am, edited 1 time in total.

Top

Tor

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Jun 02, 2014 6:06 am

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan

I've never used a single gcc extension. I've never found the need for any of them. And so the code works equally well on every *nix platform.

- Tor

Top

Aslak3

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Jun 02, 2014 11:53 am

Joined: Mon Aug 05, 2013 10:43 pm
Posts: 258
Location: Southampton, UK

BigDumbDinosaur wrote:

I think this sums up the bulk of GNU and/or Linux generally, at least in recent times.

As for C dev on "small" systems, I've never done C dev on 8 bits, beside the AVRs. There it seems to work reasonable well, considering the improved dev time compared to assembly, but I suspect in more "constrained" register environments the object code produced by a C compiler would be very bad compared to hand written assembly.

I think writing a complex filesystem layer in any assembly is an exciting project. I'm quite confident the code written would outstrip any C code, at least on a MPU like the '816. I wouldn't want to go to that effort for code running on a 32bit MPU though.

This reminds me that I need to get back my MINIX filesystem reader....

_________________
8 bit fun and games: https://www.aslak.net/

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Mon Jun 02, 2014 2:29 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

Tor wrote:

I've never used a single gcc extension. I've never found the need for any of them.

Same here. Somehow we managed all these years without myriad and sundry compiler extensions. I suspect more than a few of those extensions were concocted to "simplify" a specific programming situation.

Aslak3 wrote:

I think writing a complex filesystem layer in any assembly is an exciting project. I'm quite confident the code written would outstrip any C code, at least on a MPU like the '816. I wouldn't want to go to that effort for code running on a 32bit MPU though.

Strangely, writing assembly language for the 68K MPUs isn't as difficult as it might seem, even if targeting for the later true 32 bit versions. On the other hand, the x86 assembly language is so unwieldy it's become like the plague: best avoided. I have no problem admitting that I don't know the x86 assembly language at a level sufficient to write a "Hello, world." program, let alone a filesystem implementation. For x86 enthusiasts, there's always gcc. :lol:

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:47 am, edited 1 time in total.

Top

BigDumbDinosaur

Post subject: DESIGNING A NEW FILESYSTEM

Posted: Thu Mar 12, 2015 5:33 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

I haven't posted into this topic for a while, as I didn't have much about which to bloviate. Also, I decided to redo some structural elements of my filesystem design after some previously posted commentary.

BigDumbDinosaur wrote:

Rob Finch wrote:

Just wanted to say I had some interest in this topic. I'd like to port a filesystem to my system at some point. My system has an HD SD-Card interface so there's potentially many GB's of storage.
One thing I noticed is with potentially 24 bit inode pointers, perhaps the filenames could be limited to 29 characters? So that there are three bytes available for the node pointer.

Yes, 24 bit inode numbers could allow up 16,777,215 files per filesystem with only a trivial reduction in filename size, as you note. Given that really high capacity mass storage is readily available, your suggestion has merit and has me giving the 816NIX filesystem some more thought.

I redid the design so that a maximum of 16,777,215 inodes per filesystem is possible (the minimum is 8, which is really too small to be useful), and increased the maximum filename length to 60 bytes. The number of inodes can be set when the filesystem is created, unlike before, in which the number of inodes per filesystem was fixed at 65,535. Other elements of the filesystem design are as before: the maximum data area size is 64GB and the logical block size is 1KB. I'm in the process of reworking my mkfs program to take advantage of these changes.

BigDumbDinosaur wrote:

Also to be considered is that the inode allocation bitmap would be substantially larger and hence more complicated to manage. Assuming the 16,777,215 file limit, that's 16,777,216 bits to be managed (inode $000000, while assigned a spot in the bitmap, does not actually exist). A total of 2,097,152 bitmap bytes would be required (not counting overhead) or 2048 logical (1024 byte) disk blocks. Where the complication comes in is that when a file is created, the bitmap has to be scanned to find an available inode...In order to avoid a lot of disk gyrations, a tag map would have to be used to determine which of the bitmap blocks has at least one available inode. Since there would be 2048 bitmap blocks, the tag map would need 2048 bits, or 256 bytes. I could probably shoehorn that into the tail end of the filesystem's superblock, practical because the 816NIX filesystem uses a 1K logical block size and my design doesn't fill up the superblock with a linked list like the old S51K filesystem did (note that when a filesystem has been mounted the superblock will always be resident in RAM).

The inode tag map does fit in the tail end of the superblock, so that concern is gone. Since the superblock has to be in RAM at all times while the filesystem is mounted the tag map scan only involves compute-bound activity, which will go real fast.

———————————————
Date and Time-Stamping
———————————————
One of the many things that has to happen as files are created, read and written (and deleted) is the generation of time-stamps. Each file's inode has three time-stamps, tracking the last time the file was accessed (atime), the last time it was modified (mtime, which is the date and time usually displayed by ls -l on a Linux or UNIX system) and the last time the inode itself was modified (ctime). Each filesystem also has a time-stamp, which tracks when changes are made to the filesystem itself. That time-stamp is stored in the superblock.

The traditional UNIX time-stamp is a field that represents the number of seconds that have elapsed since midnight UTC on January 1, 1970, a point in time referred to as the "epoch." There are some good reasons for time-stamping in this fashion, not the least of which is it is independent of time zones. Another is that forward or backward date computations are reduced to basic integer arithmetic. For example, one could determine a date 30 days into the future by taking the current time value and adding 86400 × 30 to it.

Linux and UNIX maintain the system's notion of the time of day in an integer that is incremented once per second by the jiffy interrupt. System time is synchronized to UTC, and reference to a time zone occurs only when an application requests the time and date for the purposes of display. When something has to be time-stamped the current value of the system time is copied over to whatever it is that is being time-stamped—it's not at all complicated.

Prior to the development of the 64 bit versions of UNIX and Linux, the time-stamp would be held in a 32 bit signed integer of type time_t, with dates prior to the epoch being represented by a negative time value. There are several problems with this. The first one is the need to perform signed arithmetic to convert negative time_t values to a "broken-down" (i.e., human readable) date and time, or vice versa. More significantly, 32 bits worth of seconds is good for a bit more than 136 years. Since any date after the epoch is a positive time_t value (meaning bit 31 is zero), the farthest out this timekeeping method can go before rollover occurs is a little more than 68 years, giving rise to the "year 2038" problem.

The 64 bit versions of UNIX and Linux have eliminated the "year 2038" problem by redefining time_t to be a 64 bit signed integer, still based on the midnight UTC January 1, 1970 epoch. This change increases the date range to approximately 292 billion years into the future, which is well beyond the projected life of the solar system.

Practically speaking, it isn't necessary to track time out to the end of time, nor is it necessary to go real far into the past. However, I didn't want the "year 2038" problem to pop up, just in case I was still able to tinker with my toys 23 years from now (I'd be in my nineties at that time). :lol:

So I decided to use a 48 bit unsigned integer for time_t, which can track up to 281,474,976,710,655 seconds, equal to more than 8,919,403 years. The routine that I've written that converts the broken-down date and time to time_t accepts 16 bit unsigned integers in each of the input fields (e.g., month, hour, etc.), which limits the maximum possible year to 65,535. However, the end-of-century leap year algorithm implemented by the Gregorian calendar is faulty for years beyond 9999, so the practical maximum date is Friday, December 31, 9999. I'm quite certain that neither I or any POC computer will be around at that time.

In addition to defining a more compact time_t, I decided to make midnight UTC on October 1, 1752 the epoch. Since my version of time_t is unsigned, my epoch had to be much earlier than the UNIX epoch if it was going to be possible to store dates before 1970. However, there was another consideration that led to that date.

The British Empire converted from the Julian calendar to the Gregorian calendar in 1752, with the actual switch occurring on September 3. As the Julian calendar had accumulated errors due to an improper leap year algorithm, 11 days were excised from September 1752 to get things back into sync. I didn't want to have to deal with a month that was missing nearly two weeks, so I decided to start at October 1. Hence converting midnight UTC on October 1, 1752 will produce a time_t value of $000000000000.

The algorithm I have devised for converting between broken-down time and time_t is based upon the algorithm used to compute a Julian Day number (not to be confused with the Julian calendar—the two are unrelated), as originally implemented in FORTRAN. The computation steps are as follows:

Code:

     a = (14 – MONTH) ÷ 12
     y = YEAR + 4800 – a
     t = HOUR × 3600 + MINS × 60 + SECS
time_t = (DOM + (153 × (MONTH + a × 12 – 3) + 2) ÷ 5 + y × 365 + y ÷ 4 + y ÷ 400 – y ÷ 100 – 2393284) × 86400 + t

where YEAR, MONTH, DOM, HOUR, MINS and SECS are the broken-down date and time values.

This algorithm is readily programmed in assembly language using 64 bit arithmetic functions, which are implemented with little difficulty on the 65C816. All division is integer division as would be produced by floor() in ANSI C on positive numbers. The usual algebraic precedence applies.

Technically, the above is incorrect for any date prior to 1582, as any date earlier than that would be from the proleptic Gregorian calendar. However, since I am not considering dates prior to October 1, 1752, the errors that would occur in converting pre-Gregorian dates are of no consequence.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:55 am, edited 2 times in total.

Top

BigEd

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Thu Mar 12, 2015 9:07 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England

> 8,919,403 years
Hmm, are you sure you can take the risk? Very daring!

Very slightly more seriously, there's also
"The next leap second will be introduced in UTC on 30 June 2015 at UTC 23:59:60"

Just a little bit more seriously, have you any plans for noatime, or relatime, behaviour? The observation is that updating atime causes a lot of write activity to be caused by read activity, which hurts performance (and maybe shortens the life of solid-state storage) so some other heuristic might be preferred. See for example http://thunk.org/tytso/blog/2009/03/01/ ... erelatime/ where there's an idea for using an inherited extended attribute to suppress atime behaviour except where needed.

(Personally I've found atime very useful from time to time, to see what's happened.)

Cheers
Ed

Top

Rob Finch

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Thu Mar 12, 2015 12:21 pm

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada

The following is a bit far-fetched.

With 48/64 bits available could the calendar system in use be indicated for time-stamps ? What if one doesn't want to use the Gregorian calendar ?
I'm thinking what about tracking a Martian calendar ? (Time on other planets).

_________________
http://www.finitron.ca

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Thu Mar 12, 2015 3:54 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

BigEd wrote:

> 8,919,403 years
Hmm, are you sure you can take the risk? Very daring!

Well, I thought long and hard about it, but decided it was a safe number after studying mortality statistics for the human race. :lol:

Quote:

Very slightly more seriously, there's also
"The next leap second will be introduced in UTC on 30 June 2015 at UTC 23:59:60"

Interesting that you mention this, as the time_t method of reckoning the date and time doesn't account for leap seconds, presumably because unlike the twice-annual change from standard time to daylight saving time and back, the application of a leap second is not something that fits any sort of schedule. However, if the system in question has access to a stratum 2 Internet time server then the leap second correction will eventually occur.

Quote:

Just a little bit more seriously, have you any plans for noatime, or relatime, behaviour? The observation is that updating atime causes a lot of write activity to be caused by read activity, which hurts performance (and maybe shortens the life of solid-state storage) so some other heuristic might be preferred. See for example http://thunk.org/tytso/blog/2009/03/01/ ... erelatime/ where there's an idea for using an inherited extended attribute to suppress atime behaviour except where needed.

I haven't given that any significant thought. The noatime filesystem attribute was devised to deal with a theoretical problem that could arise on a real busy system. In actuality, the inode updates that occur to the atime field are mostly in-core writes, as the inode has to remain in core as long as at least one task has the file opened. Periodically a daemon will flush dirty buffers to disk but not necessarily dirty inodes. The inode would be flushed, of course, after the file has been closed by the last task that had it open. So the theoretical disk activity that would occur would be small. In any case, frequent writing is more a concern with solid state disks than mechanical ones.

Quote:

(Personally I've found atime very useful from time to time, to see what's happened.)

There are any number of programs that won't work right if atime is not maintained. The one that immediately comes to mind is the make utility. Also, SCCS is strongly dependent on accurate atime data.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 2:58 am, edited 1 time in total.

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Thu Mar 12, 2015 3:58 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

Rob Finch wrote:

You'd have to devise a different conversion algorithm to account for the characteristics of the calendar system in question. All the time_t field does is track elapsed seconds relative to some arbitrary point in time.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 4:59 am, edited 1 time in total.

Top

Rob Finch

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Thu Mar 12, 2015 10:22 pm

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada

Quote:

You'd have to devise a different conversion algorithm to account for the characteristics of the calendar system in question. All the time_t field does is track elapsed seconds relative to some arbitrary point in time.

Yes, you would. Alternate conversion algorithms aught to be an option somewhere. I guess this would be a display format not a time value. I think you should be able to specify a display format along with the time. So that when one is looking at listings one can see the time rendered in the display format. With 64 bits surely one could reserve a byte to indicate how to display the time.

I guess another thing I'm getting at is there should be a way to tell where the time is taking place. If we're going to record when, then why not where ? (Eg. GPS co-ordinates).

_________________
http://www.finitron.ca

Top

BigDumbDinosaur

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Fri Mar 13, 2015 1:43 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

Rob Finch wrote:

BigDumbDinosaur wrote:

Let's step back a second on this. What I have been discussing is the time-stamping of files, directories and filesystems. The time-stamp is merely a number (time_t) that is derived from an operating system function. Another operating system function increments that number at one second intervals in the UNIX and Linux universe. The operating system itself has no knowledge of calendrical or temporal matters, nor does it need such knowledge to maintain time_t. All the operating system has to know is how to regularly increment time_t, how to make a copy of it available to a calling program and how to change the current value of time_t in order to "set the clock."

Decisions on how to interpret time_t are made by other programs that know about calendrical and temporal matters, which follows a basic rule of computer science: the data itself should not define how it is to be interpreted and used. In the case of interpreting time_t, everything that determines what a given time_t value represents is defined separately from time_t itself. Translation between time_t and a calendar date is governed by calendric rules that are peculiar to the particular calendar in use, be it Gregorian, Julian, Persian or some other. Translation between time_t and the time of day is governed by temporal rules that are determined by the local time zone and the local observance of daylight saving time, as well as the universally accepted relationships between time units. None of that is determined by the operating system or the structure of time_t itself.

In UNIX and Linux, the choice of epoch (midnight UTC on January 1, 1970) was arbitrary—it approximately coincided with the genesis of UNIX. Microsoft uses 1980, which roughly coincided with the birth of MS-DOS. I'm using an altogether different epoch. It's all arbitrary, and you can make anything of it that you want by writing a program or programs to interpret the data however you see fit. If you want to apply special rules to the interpretation, such as displaying the time in decimal fractional format (something that is often used in payroll software for tracking employee attendance), your program can include that. It still would not change the nature of time_t.

Quote:

I guess another thing I'm getting at is there should be a way to tell where the time is taking place. If we're going to record when, then why not where ? (Eg. GPS co-ordinates).

That would be something that a database would maintain, not an operating system timekeeper. In Linux and UNIX, time_t is based upon the epoch, which is by definition anchored to UTC. An environment variable, TZ, determines the local time zone and the rules for switching between standard time and daylight saving time, if locally observed. TZ is referred to by the programs that convert between the broken-down time and time_t, such as date or library functions like mktime() and ctime(). Again, the kernel knows nothing about time zones and DST.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 5:02 am, edited 1 time in total.

Top

BigDumbDinosaur

Post subject: DESIGNING A NEW FILESYSTEM: Calendrics

Posted: Fri Mar 13, 2015 6:26 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8485
Location: Midwestern USA

I started a separate topic on date and time matters so as to avoid dilution in this one.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Last edited by BigDumbDinosaur on Tue Mar 15, 2022 5:03 am, edited 1 time in total.

Top

BigEd

Post subject: Re: DESIGNING A NEW FILESYSTEM

Posted: Sat Mar 14, 2015 9:25 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England

Just a point on time resolution in the filesytem: I once had reason to dig into the source of make, and was obliged to make a patch for our installation. As I recall, make assumes that local filesystems have a finer resolution of timestamp, or might do, whereas network filesystems don't, or might not. I admit, my memory is fading! But as it's true that a filesystem can perform very many operations within a second, it is inconvenient that NFS (for example) only relays timestamps to a one second resolution. It might be worth considering holding a count of milliseconds or microseconds in another integer, as the fractional part of the timestamp, as 'make' is very useful!

Cheers
Ed

Top

Page 4 of 6

[ 89 posts ]

Go to page Previous 1, 2, 3, 4, 5, 6 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 4 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum