6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Apr 23, 2024 7:59 pm

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Sat Feb 05, 2022 2:16 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8426
Location: Southern California
In Aug 2016, I started a topic on super-simple file systems for flash memory which has seven posts at https://bradsprojects.com/forum/viewtop ... 233&p=6739 on the Brad's projects forum.  I should have just done it here, but just didn't because it's not particularly 65-related.  The forum has software problems now and Brad is not maintaining it (hopefully that's temporary) as he is not pushing it at the moment for his electronics students.  I have not been able to get hold of him.  [Edit, a couple of months later:  I got hold of Brad, and he says he doesn't know how to fix it.]  It was suggested that I copy the entire topic over here, as Chad (forum name "sburrow") started the topic "External Storage Options."  So here it is.  The nameless friend mentioned there is BDD, who gave me his permission to repeat publicly here what he wrote in email.

I wrote:

Quote:
Does anyone here know of a super-simple file system for flash memory that has already been worked out?  I can use a search engine as well as anyone, but reading a ton of web pages takes a lot of time, and I was hoping someone here was already familiar with one or more and could give quick recommendations for or against.  With thoughts of coming up with my own, I've been listing criteria and drawing diagrams of how file headers might be constructed, but so far I'm not very satisfied with what I come up with.

Ideally I'd just use SD cards; but I have found the details of handling the various densities, block and sector sizes, CRC error-detection methods, security methods, fragmenting, wear-leveling (if any), FAT systems and subdirectory structures, and so on, to be daunting.  I'm not going to try anymore unless there's a really simple explanation somewhere.  I think the things I need to transfer between the home-made workbench computer and the PC could all go over RS-232, including with the FTDI RS232-to-USB adapter if it's on the newer PC that doesn't have an RS-232 port.

I want something that takes very little code to run it.  How did the C64 and other home computers of its era handle the files on floppy discs?  (I know the C64 disc drives had their own microprocessor in them.)  I tentatively plan on starting with 4MB SPI flash memories in tiny SO-8 packages that I have built onto postage-stamp-sized plug-in cards, then maybe moving up to 16MB, or remotely possibly even 64MB, but definitely not more.  [Edit: They're available on the front page of my site now.]  This is not for photos, videos, music, bloatware, or any of the other things that take up a lot of storage, but rather for workbench-control source code, assembled or compiled programs, measurements taken, etc.. I'm thinking of something like this (although these are not set in concrete):

  • up to dozens, maybe hundreds of files, max (256?), not tens of thousands
  • no directories (everything in root)
  • no fragmentation.  If an edited file doesn't fit where it came from, a longer available area is found for it.  If there is none, one or more other files are moved to get enough contiguous blank space
  • not necessarily having a FAT.  I can imagine various ways to do this.
  • names can be long enough to be meaningful, unlike DOS's 8.3.  A 31-character limit is reasonable.  (Is that what C64 had?)

Ideas?

and:

Quote:
Going totally without a file system might be something like the cassette tapes of the home computers of the early 80's when disc drives sometimes cost more than the computer, except that the flash memory would be around a million times as fast as tape for "record" and "play" and perhaps a billion times as fast for "fast forward" and "rewind."  On the cassettes, you wrote the names of your "files" (usually programs) on the label, and if there was more than one on the tape, you wrote the tape counter's value where each "file" started, and optionally where it ended.  Leaving safety space between programs was fine, but there was no safety catch if you accidentally let one program overwrite the beginning or end of another.  You kept track of it by hand.

Certainly there's an in-between step that could be implemented to ease the job of file management with minimal code.

Brad replied with:
Quote:
Sorry but I can't help out at all with this one Garth.  I do however like how you brought up the tape days of the 80's.  I was actually experimenting with an Amstrad CPC464 just the other week which had an inbuilt tape drive.

A tape type memory system would certainly seem to be the easiest way to go in terms of circuitry and coding I would think but of course if you were to delete files you would need to manually shift things around to avoid fragmentation should you want to load new larger files.

It's been a few days since you first posted this - have you been able to find anything ideas on the internet?


My next post:
Quote:
brad wrote:
It's been a few days since you first posted this - have you been able to find anything ideas on the internet?

Here's something a friend sent, which I'm sure he wouldn't mind my posting:
Quote:
I guess it will depend on how fancy you want to get.  You could emulate the relatively simple filesystem found on Commodore floppy disks, although the general structure and operation are not real efficient when applied to high-capacity storage media.  The layout of a Commodore floppy filesystem is as follows:

  • "Super block."  This is a single disk block (256 byte blocks on all Commodore floppy drives) that carries some basic information about the filesystem, such as its name, the total number of data blocks, the number of data blocks that are free, the number of data blocks in used, etc..  The super block also has a 16 bit ID field to uniquely identify each disk.  That field is used in the Commodore DOS to recognize when one disk is removed from the drive and a different one is inserted.
  • BAM.  The block allocation map (BAM) is a multi-block bitmap that indicates the status of the data blocks that are part of the filesystem.  There is a one-for-one correspondence between each bit in the BAM and a data block, hence there are as many bits in the BAM as data blocks in the filesystem.  If a bit is set, the corresponding data block is free, otherwise the block is in use.  As a file is written, the BAM is consulted when a data block is needed.  When a file is deleted the corresponding data blocks must be freed.
  • Directory.  This is a list of the files in the filesystem.  The Commodore directory structure is a sequential list of 32 byte slots.  Sixteen bytes are reserved for the filename (padded with trailing nulls), a byte is used for the file type (with $00 meaning the slot is free and may be claimed for cataloging a new file), a pair of bytes is used to maintain a count of the data blocks consumed by the file, and a pair of bytes is used to point to the track and sector address of the file's first data block.
  • Data blocks.  These blocks contain whatever data has been written to a file.  Bytes $00 and $01 in each block are pointer bytes, whose function depends on whether or not the current data block is the final block in the file:
    • If another data block follows the current block, bytes $00 and $01 point to the track and sector address of the next data block.
    • If the current block is the final data block in the file, bytes $00 and $01 indicate how many data bytes are in the current block.  As Commodore disks were formatted to 256 bytes, byte $01 always contained $00.

    Track and sector addresses are expressed as single byte integers, with the first track being $00 and the first sector in each track also being $00.

You could readily adapt the above, changing track and sector pointers into absolute block addresses, and using 32-bit fields to accommodate the full extent of the storage medium.  It's probably about as simple as you are going to get and still achieve organized storage.

That said, I'm sure you can see some obvious problems.  If you want to append data to a file you have to read the chain of data blocks to get the address of the last block.  Similarly, if you want to delete a file, you have to read the chain of blocks in order to find out which blocks to free up.  Random access likewise requires sequential reading of the chain of blocks to get to the desired data.

These limitations are why simple filesystems are not appropriate as storage gets more capacious.  When Fiscal Information developed the Lt. Kernal hard drive subsystem for the Commodore 64 and 128, they basically had to throw out the Commodore methods and adopt minicomputer operating system storage model.  You will probably have to do something similar, unless you don't care about flexibility and performance.  As programming exercises go on a scale of 1 to 10, I'd give it a 7 or 8 in difficulty.

I was thinking along similar lines, so to hear that they did it this way is a little bit assuring (although I hope I can make it simpler).  This definitely gives me more to think about though.

In a later email, he said,

Quote:
  • Fragmentation is supported, but fragmentation isn't much of an issue in low-capacity floppy discs.
  • The Commodore DOS, which runs in the drive itself, not the computer, had a validate command that would try to fix up disk errors.  It was nothing like modern filesystem checkers like fsck on Linux or chkdsk on Windows.


Then Brad wrote:
Quote:
I'm now realising just how complicated it must be to program a good file system.  There are so many things to think about - not just storing files but all the other variables that go along with it.

I'm quite interested to see what you end up going with here.  I smiled at the end of the post where he mentioned it was about a 7 or 8 on the difficulty scale!


Ed's post:
Quote:
Might be worth looking at Acorn's Rom Filing System - it uses a data layout very like their cassette filing system, but with links from one file to the next to allow a rapid scan.  The official format officially allows names up to 10 chars, but they are zero-terminated so no fundamental name length limit.  It is of course a read-only file system, but it must be easy to append to it, and if you adopt a convention of deleting files by mangling their names or flagging them in some way, you have something where you can write a new file.  Random access for read/write might be another matter, but each file is a list of data blocks, so maybe not so bad.

See page 14 of http://www.sprow.co.uk/bbc/library/sidewrom.pdf
and page 349 of http://stardot.org.uk/mirrors/www.bbcdo ... 0Guide.pdf

Edit: also this design document for a different flash file system might have useful information:
https://github.com/pellepl/spiffs/blob/ ... /TECH_SPEC


My last post, and this is as far as the topic got:
Quote:
This 12-minute video is relevant.  It's not a "super-simple file system" (like this forum topic title calls for), but it does describe folders and fragmentation rather simply.
https://www.youtube.com/watch?v=KN8YgJn ... o&index=22

This is in a sequence of videos.  The next one after it is about data compression to reduce file size.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 05, 2022 2:21 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
I am hoping to hear some thoughts about simple file system. I have a goodish number of DT28F320 (4meg x8 or 2meg x16) Intel Strataflash. It is has 32 128KB erase blocks, so my thought on simple file system is having 32 files such that each file contains up to 128KB contiguous data. The first file is directory contains the names of the 31 remaining files and their true size. I can add a file name/size to directory without erasure, I can delete a file name by writing all zero; I can update file size by zeroed out previous size and write new size, but to erase and update directory I need to save its content else where, sector erase, then restore the directory.

It turns out 128KB file size also works very well with equivalent simple file system for compact flash disk. CF has 512byte erasable block, so updating is even easier.

This is just a thought experiment, I haven't done anything else.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 05, 2022 2:54 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
I've thought about this a bit in the past but not really moved forward because I decided I cared more about ease of writing from modern PCs, so I implemented 6502 exfat and FAT drivers for SD cards and used those.

One thing I had thought about though, and was tempted by, was that given that the size of modern media dwarfs the capabilities of the average 6502-based system, we could easily use fixed size records for file storage, accept a hard limit on file size, and do away with any real need for fragmentation and complex allocation tables. If you are willing to limit the total file count to 256, you can still allow a maximum file size of many megabytes on even the smallest SD card. I think this is similar to Bill's suggestion.

Acorn's DFS format essentially stores an alphabetically-sorted list of filenames in the first 256-byte sector, with corresponding metadata in the second sector. The metadata includes a start sector and size in bytes, and this is enough to manage the rest of the disc surface. Searching for an empty space requires a little work, but is a rare operation; and if in our case all files take up exactly one (very large) cluster, it's just a matter of finding the first unused cluster.

If you're willing to accept a limit of 256 files on the device, cluster numbers are just bytes, which is great for a 6502, and it would be easy to use the "empty" directory entries to track unused cluster numbers.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 05, 2022 3:15 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
I agree CF or SD have so much storage, filing system can be greatly simplified by trading space for complexity. A 1GB CF/SD is not much more expensive than 32MB CF/SD (if 32MB can even be found). 32MB CF can store 256 contiguous 128K files and 1GB 4096 such files. We really don't need complicated filing system when such abundance of storage space are cheaply available.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 05, 2022 3:38 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1395
Location: Scotland
I have written filing systems for various systems in the past - it's never been my main thing, but has been needed for stuff I've worked on - mostly experimental/prototype that never saw the light of day, and the one on my Ruby boards which is somewhat full-featured but written in C... however...

So something simple - where a cassette tape is usable, so one file per SD card (wasteful of resources) to full-blown like a classic Unix-style filesystem with superblocks, inodes and whatnot (huge software complexity, especially in ASM) what does that leave us with ...

So how about this in a thinking out loud sort of way...

In a typical operating system there is a memory allocator. C users will be familiar with the malloc() function for example. My BCPL system has something similar called getvec(). (And I wrote one in Sweet16 a while back too for another project that didn't get too far)...

Behind the system call is the actual allocator - ranging from utterly trivial to hugely complex - the more RAM you have and the more CPU cycles you have at your disposal, the more complex algorithm you can use which maximises available RAM and more efficiently merges free blocks and so on. There is no fragmentation in that if you allocate a block of memory then that's it. If you want it bigger you need to allocate a bigger block and copy the data over (see the realloc() system call in Unix)

The simplest (usable!) allocator is called "First Fit" and I'm thinking this would work well for SD cards (and other media like an SPI EEPROM where seek and access times are not a factor)

(It can also work on spinning rust, just might be slower)

So how to use this on an SD card as a filing system?

Carrying on thinking out loud here...

You "format" the disc (and I'll use the term 'disc' here because...) as one big free chunk. There are markers at the start and end of each chunk as well as a pointer to the length. My BCPL implementation looks like:

Code:
 *            31 30-23 22     0
 *      +-------+-----+--------+
 *      | InUse | PID | Length |
 *      +-------+-----+--------+
 *      | Guard:      FEEDBEEF |
 *      +----------------------+
 *      | .....................|
 *      |      Data space      |
 *      | .....................|
 *      +----------------------+
 *      | Guard:      BEADCAFE |
 *      +----------------------+
 *
 * Each chunk has 3 words extra and the 23-bit length parameter includes this.
 *      InUse is a flag : top-bit -  with zero for free, top-bit set for in-use
 *      with the other 7 bits being reserved (for a PID in a multi-tasking
 *      system, if it ever sees the light of day)


Then, as part of the format procedure, you allocate a chunk big enough to hold filename and pointers to each allocated file. so if you though you might want 100 files of 31 character length then you'll need 32 bytes per filename and 4 bytes per pointer, lets say 4 bytes for a timestamp (although the world has moved to a 64-bit time_t) and 4 bytes for 'attributes' (locked, text/binary/owner/etc.) then you need 40 bytes per entry or 400 bytes.

If you ignore the "PID" field then you have 31-bits for the chunk size, that's 2GB, so that limits the size of the device (or partition) to 2GB, as well as the maximum file size to 2GB.

Personally, I feel that's more than enough for our little 8-bit systems (and even 16-bit 816's - my Ruby filing system can only handle 32MB per partition and I feel that's more than I'll ever need in the lifetime of this project)

The 'guard' words are there to protect, or at least warn of chunk overwrite, so a consistency check can quickly tell if it's corrupt or not...

In my BCPL OS, I perform a heap merge (ie concatenate adjacent free chunks) operation at the end of every program execution - some implementations will do the merge operation at every free() function. I keep a pointer to the end of the last chunk allocated to speed up searches, but if/when it doesn't find a free chunk, then I reset the point to the start and traverse the list until it finds a free chunk large enough. Chunks that are large enough get split into smaller chunks.

The upshot of that in a filing system is that you would need to pre-allocate the file size in advance if you wanted to sequentially write a file - otherwise you'll run out of space. My guess is that almost all operations will be simple "load a file" or "store a file" to and from RAM, however if you create a large enough file in the first place, then the usual read/write/seek/append type operations would be possible within the length of that pre-allocated file.

Writing a "compact" type routine should be fairly straightforward, should the system become too fragmented, but I've found that I can run my BCPL OS for days/weeks compiling, running, testing code and never have issues with running out of RAM or not being able to find a free chunk of RAM big enough to use.

Is it prone to corruption? Yes. This does happen when I have an errant program that scribbles all over RAM, the guard words are checked at free() and merge() times - in a filing system? I feel overrunning the end of a chunk should never happen as the underlying code should be checking the bytes written into that chunk vs. the actual size.

Other issues? It may not actually be possible to know how much data is actually written. ie. how big a file is? For a whole-file (e.g. a BASIC program) save operation this won't be an issue as the chunk will be an exact size, but creating a big file for random access? Hard to know.

Then there will be file data vs. the underlying block nature of the device - SD cards are block devices with a fixed block size of 512 bytes, so using this scheme you'll need some sort of way to translate a 32-bit word offset to a block number plus offset - that's not hard though and at one level you can say it actually maximizes usage, but on the other hard try buying an SD card < 8GB now...

In practical terms? There is the read/modify/write operation needed when a chunk splits a block boundary - this is perhaps sub-optimal, so the easy thing is to round UP all file allocations so that they fit inside an exact number of media blocks (which for SD card is 512 bytes) Then I'm thinking I'd add an extra word into the header to say the exact size - when known - of the saved data..

So you type in your little BASIC program and hit SAVE "work of art" and ... The BASIC builds a block of data - pointer to a filename, start in RAM, end in RAM (or length) and calls the operating system (This is how OSFILE works in Acorn and my Ruby OS), the OS searches for a free chunk big enough, rounded up to an even size for the media, the header now has an extra word with the real size, as well as the whole chunk size, writes the data, updates the directory and that's that. (And the search might just be a matter of returning the current 'end' pointer without doing the list traverse, so very fast. At boot time (and file delete and merge time), you search the disc to find the last chunk but that also ought to be relatively fast, then you know where you are

And so on.

This really ought to be easy to code up in 6502/816 ASM - I'd hope in under 4K plus the underlying block read/write code and off you go...

I might actually give it a go, but in BCPL as a bit of a proof of concept as I really do not think I'll be writing any more '816 code now that I can do everything in BCPL ;-)

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 05, 2022 6:41 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8140
Location: Midwestern USA
drogon wrote:
I have written filing systems for various systems in the past...So something simple - where a cassette tape is usable, so one file per SD card (wasteful of resources) to full-blown like a classic Unix-style filesystem with superblocks, inodes and whatnot (huge software complexity, especially in ASM) what does that leave us with ...

As is so often the case with this sort of stuff, Joe's solution won't necessarily be palatable to Mary, and Mary's solution might be too complicated for George. Designing a new filesystem presents you with a trio of options analogous to what faced early adopters of railways as a means of transportation. In their case, the options were comfort, safety and speed — you can make two choices. :D

A UNIX-like filesystem can be complicated, although I don't know that doing it in assembly language — at least on the 65C816 — is necessarily more complicated than in another language suitable for system-level programming (at times, C seems clumsy to me, especially in handling bit fields — a friend once referred to C as “assembly language for dummies”). Usually, a project such as this would be designed top-down. However, already knowing the general modus operandi of filesystems and use of mass storage, I started bottom-up by writing and testing the functions needed to generate and manage the filesystem structure, e.g., creation of on-disk bitmaps. Most of the code seemed to be straight-forward and not particularly difficult to implement. There's just a lot of it, which can be mitigated with macros.

One thing that is de rigueur in filesystem code is efficient integer arithmetic functions. Along the way, I wrote a four-function library that can do 64-bit integer arithmetic (relatively easy with the 65C816), allowing my nascent filesystem code to work with high-capacity storage in a device-agnostic way. Device-agnosticism is produced by keeping the mechanics of managing a filesystem separate from the mechanics of managing the mass storage.

Everything in my filesystem code that has to perform a disk access does so through “black box” function calls, to which the caller passes four parameters: the storage device's ID, logical block address (LBA), a data transfer size and a buffer address. The LBA is a 32-bit integer that progressively addresses storage in arbitrarily-sized blocks. The block size for a device and the maximum number of blocks it can store are obtained with another “black box” call. Beyond knowing the storage device's block size and total number of blocks, the caller doesn't have to know anything about how to access the storage device. The “black box” function works out access details and makes a call to a BIOS API primitive that “talks” to the storage device.

In my case, I'm using SCSI, which means the basic method of primitive device access is the same regardless of the device type (currently my API supports random access devices, aka block devices, which includes hard drives, CD/DVD drives, and similar). While the SCSI primitive itself is complicated, it too is a black box like the higher level functions that call it.

The one problem I see with using some of the simpler filesystem proposals is they will quickly “run out of steam” as the size of the storage device increases. That gets back to the railway trichotomy I mentioned above. In this case it will be simplicity, capacity and speed — you can make two choices. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 05, 2022 9:09 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
I think many of us have considered a small filesystem for the 8-bit CPUs. The ones that exist from long ago may, or may not be something you want to use. As far as the storage media goes, I think you could simply ignore the concept of a flash device and just focus on some storage device that has some number of blocks that can be randomly accessed using a block number, hence LBA. That would be the preferred level of the (or any) filesystem to work at. BDD's "black box" approach would make the filesystem usable with a wider range of devices.

Next would be defining and validating a set of requirements. This would range from functional requirements such as API calls, e.g., load/save, rename/delete, create/open/close, record level access, search, etc. Other (non-functional) requirements would include things like, how much memory (ROM and RAM) should the filesystem be limited to, supported storage size, allocated block size, how many directory entries (and how large they are), maximum number of files, flat model, number of supported drives/devices, etc.

As simple as the above appears, it won't be (simple) to come up with a set of either that a larger number of folks will agree upon. Still, I think this becomes a useful discussion and perhaps might result in something tangible at some point. For a starting point, I would suggest looking at an existing of set of function calls... as I have it handy, I'll just list the PEM calls in DOS/65:

Code:
cmdtbl  .dw     xwboot          ;warm boot (x=0)
        .dw     xcnsin          ;console input with echo (x=1)
        .dw     sndchr          ;console output (x=2)
        .dw     simram+21       ;tape reader (x=3)
        .dw     simram+18       ;tape punch (x=4)
        .dw     simram+15       ;printer output (x=5)
        .dw     getcon          ;console input w/o echo (x=6)
        .dw     xgtios          ;read i/o status (x=7)
        .dw     xstios          ;set i/o status (x=8)
        .dw     sndstr          ;print buffer (x=9)
        .dw     bufinp          ;read buffer (x=10)
        .dw     kbdsts          ;test console ready (x=11)
        .dw     simram+45       ;read list status (x=12)
        .dw     xintds          ;initialize system (x=13)
        .dw     xchgdr          ;log in drive (x=14)
        .dw     xopen           ;open file (x=15)
        .dw     xclose          ;close file (x=16)
        .dw     xfndfr          ;find first match (x=17)
        .dw     xfndnx          ;find next match (x=18)
        .dw     xdltfl          ;delete file (x=19)
        .dw     xread           ;read record (x=20)
        .dw     xwrite          ;write record (x=21)
        .dw     xmake           ;create file (x=22)
        .dw     xrenme          ;rename file (x=23)
        .dw     xintlg          ;interrogate log in status (x=24)
        .dw     xintdr          ;interrogate current drive (x=25)
        .dw     chgdma          ;set buffer address (x=26)
        .dw     xrdalv          ;read allocation map start (x=27)
        .dw     setron          ;set r/w status (x=28)
        .dw     xrdros          ;read r/w status (x=29)
        .dw     setlst          ;set list echo status (x=30)
        .dw     lststs          ;read list echo status (x=31)
        .dw     xrtclo          ;read low clock (x=32)
        .dw     xrtchi          ;read high clock (x=33)
        .dw     xrddcb          ;read dcb address (x=34)
        .dw     simram+51       ;translate sector ((x=35)
        .dw     xgsusr          ; get or set user code (x=36)


A total of 37, which include console calls: reader/punch and user, which could likely be deleted from a simple filesystem. There are likely other calls that can be eliminated as well. I'm not saying to use the DOS/65 in a heavily modified manner, but just to get an idea if the PEM calls are logical for a starting point.

If not, perhaps Daryl's CF flash storage system might be a starting point:
https://sbc.rictor.org/ide.html

Code:
   C0      Close all files   
   C1      list directory   
   C2      make directory   
   C3      change directory   
   C4      erase file/directory   
   C5      erase all files and empty directories in current directory   
   C6      protect file (read only attribute)   
   C7      unprotect file (read only attribute)   
   C8      rename file/directory/vol label   
   C9      open File
   CA      Read File
   CB      Write File
   CC       Set pointer
   CD      Get pointer
   CE      Close File
   CF      Test if File Exists
   D0      File Size
   D1      Read Sector    (Direct access mode)
   D2      Write Sector   (Direct access mode)
   E0      list date/time   
   E1      set date/time   
   E2      get rtc reg   
   F0      list volume label   
   F1      list free space   
   F2      format drive   
   F3      list drive info   
   F4      Mount drive (and Initialize FAT file System)


Gotta start somewhere...

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 06, 2022 8:49 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8140
Location: Midwestern USA
floobydust wrote:
I think many of us have considered a small filesystem for the 8-bit CPUs. The ones that exist from long ago may, or may not be something you want to use. As far as the storage media goes, I think you could simply ignore the concept of a flash device and just focus on some storage device that has some number of blocks that can be randomly accessed using a block number, hence LBA. That would be the preferred level of the (or any) filesystem to work at. BDD's "black box" approach would make the filesystem usable with a wider range of devices.

As I said: “simplicity, capacity and speed.” Which two would you like?

While my “black box” philosophy won't fit all needs, it does insulate the higher-level aspects of filesystem management from the relative uncouthness of the low-level system calls that access the storage device(s). One of the reasons I adopted that approach was to “future-proof” my design as the nature of mass storage changes. Right now, I'm using parallel SCSI to address my disks. Parallel SCSI has all-but-disappeared in contemporary hardware, although the standard itself is very much alive. Due to the widespread use of SCSI in servers and high-performance workstations, parallel SCSI drives continue to be available (in fact, I recently purchased one for upgrading a client's mail server). So I don't feel like I'm on a path to obsolescence...yet.

At some point, my parallel SCSI setup will have to go the way of the do-do bird. Replacing it with SATA hardware would be a good move, if only access could be gotten to the SATA protocol specs without having to pay big bucks to join a SIG.

Quote:
Next would be defining and validating a set of requirements...

This is usually where would-be designers run into difficulty. Some requirements are obvious, some are obscure.

Quote:
For a starting point, I would suggest looking at an existing of set of function calls... as I have it handy, I'll just list the PEM calls in DOS/65...

Some of those calls are not related to filesystem management and in any case, the filesystem implemented in DOS/65 is only suitable for relatively small storage devices. However, the list does highlight some of the facilities that would be needed to support a more robust filesystem that works well with (relatively) high-capacity storage.

A few of the PEM calls are device access primitives. Since those are necessarily tightly-bound to whatever is being accessed, they don't really count as part of a filesystem. They are things that need to be insulated from the filesystem code by higher-level functions that abstract device access.

As for punching a tape...I junked my Teletype machine decades ago. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 07, 2022 6:41 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8426
Location: Southern California
There's a lot of great material here to chew on, and I have not had time to chew on it as much as I would like to yet, but I will.  Keep the ideas coming (with an emphasis on the "super-simple" part of the topic title).  Thanks everyone.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 22, 2022 7:28 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
GARTHWILSON on Sat 5 Feb 2022 wrote:
  • up to dozens, maybe hundreds of files, max (256?), not tens of thousands
  • no directories (everything in root)
  • no fragmentation. If an edited file doesn't fit where it came from, a longer available area is found for it. If there is none, one or more other files are moved to get enough contiguous blank space
  • not necessarily having a FAT. I can imagine various ways to do this.
  • names can be long enough to be meaningful, unlike DOS's 8.3. A 31-character limit is reasonable. (Is that what C64 had?)


I've spent too long working on filing systems. SheepFS1 is unsuitable for your purposes because it is intended to be packed with a virtual machine into 32KB or less. The highlight of this system is that directory entries may be five bytes: one byte for block pointer and four bytes for Radix 40, six character file name. 40 * 40 * 40 = 64000 and therefore it is possible to store three characters in two bytes. I adapted this from DEC Radix 50 encoding where 50 * 50 * 50 = 125000. I subsequently discovered that a 6502 disassembler uses a similar encoding and that it was copied by two or more other disassemblers. I am perplexed that a common algorithm has not been shared with a filing system.

Anyhow, I believe that SheepFS2 meets your specification. It is suitable for 16MB-1TB of storage. It is very loosely based on Acorn DFS and the Eris 2010 filing system. This arrangement assumes compatibility with primary partition tables and therefore all sector references will be offset by 64 or significantly more.

gfoot on Sat 5 Feb 2022 wrote:
given that the size of modern media dwarfs the capabilities of the average 6502-based system, we could easily use fixed size records for file storage, accept a hard limit on file size, and do away with any real need for fragmentation and complex allocation tables. If you are willing to limit the total file count to 256, you can still allow a maximum file size of many megabytes on even the smallest SD card. ... Acorn's DFS format essentially stores an alphabetically-sorted list of filenames in the first 256-byte sector, with corresponding metadata in the second sector. ... If you're willing to accept a limit of 256 files on the device, cluster numbers are just bytes, which is great for a 6502, and it would be easy to use the "empty" directory entries to track unused cluster numbers.


Divide a partition into buckets of 64KB or larger. Bucket size determines maximum file size. So, nominally, maximum file size is 64KB. There is a 1:1 mapping between file content and bucket. There is also a 1:1 mapping between bucket number and meta-data. Specifically, bucket zero is special and contains 255 (or more) pages of 256 bytes. Each page has the meta-data for one bucket/file. No fragmentation. No FAT. No BAM. Furthermore, you may note the similarity with 65816 where bank zero is special. This is intentional to aid understanding. The major difference is that bucket size >= bank size for the purpose of using more than 16MB storage.

The first half of each meta-data page is file name in Radix 40 encoding. (Hey, I like Radix 40 encoding.) This allows 192 character names to be stored in 128 bytes. The second half of each meta-data page is 32 bit fields or thereabouts. This includes 32 bits for house-keeping, 32 bits for optional fragment number, 32 bits for optional previous record, 32 bits for optional next record, 32 bits for optional payload checksum, 32 bits for optional meta-data checksum, 24 bits for flags, 40 bits for optional time-stamp, 32 bits for optional user number, 32 bits for optional group number, 32 bits for file length, 32 bits for optional load address and 32 bits for optional execute address. (I hope these meta-data fields are sufficent for BigDumbDinosaur and drogon.) Obviously, all fields are in little-endian order.

If meta-data entry M describes negative length then the name is not valid and bucket M does not contain useful data. If meta-data entry M describes zero length then the name is valid and the file has zero length. If meta-data entry M has positive length then the name is valid and bucket M contains useful data. If length exceeds bucket size then meta-data is corrupted.

Meta-data entry zero maps to bucket zero and this bucket is the meta-data table itself. Therefore, page zero of bucket zero has meta-meta-data which includes a unique "magic number" to identify the class of filing system, a unique number to identify the instance of filing system, version number for the filing system format and bucket size (in bits where 16 means 64KB).

In the default case, where buckets are 64KB, 256 buckets * 64KB per bucket = 16MB. This fits on many small, crufty CF cards. When bucket size is doubled, maximum file size is doubled. However, when bucket size doubles, this includes meta-data bucket zero. Therefore, the number of bucket references also doubles. Overall, each additional bit of bucket address adds two additional bits to filing system size. For example, 512 buckets * 128KB per bucket = 64MB. 1024 buckets * 256KB per bucket = 256MB. Most squanderously, 65536 buckets * 16MB per bucket = 1TB. This covers a wide range of volume sizes with relative efficiency.

In the default case, actions such as directory listing, file creation or file deletion may require a linear scan of 64KB. For larger filing systems, this may increase significantly. This may be a problem if alphabetical listings are required. It is for this reason that the meta-data includes non-critical previous/next fields. This allows ordering of the records to be maintained for display. However, it is not critical if this chain is corrupted or absent.

In the case of smaller volumes, I suggest an optional extension where 16MB of text may be stored between bucket zero and bucket one. This would allow lines of repetitive text to be stored with 3 byte references. In the most trivial case, where 85 pointers do not span a page boundary, 21760 references may be made to lines of any length without exceeding a 64KB file limit. This would be incredibly useful for source code where the majority of text does not change between versions. Furthermore, it is possible to implement this in a manner which remains compatible with a GETCHR/PUTCHR interface. Unfortunately, insertions are O(n^2) and this will slow as storage increases. However, unless we collectively intend to type more than 16MB of unique lines, we will never exhaust this area. This also means we never have to compact it.

I previously implemented 3 byte line references on Commodore Amiga for the purpose of saving space on 880KB floppy disks. The trivial implementation was unbelievably slow. This could be significantly improved by computing an 8 bit hash of a line and reducing the problem by a factor of 256. Specifically, this would guarantee that each linear scan is 64KB or less.

Despite this being a trivial filing system, there are multiple places where an integrity check may be performed. For example, sequential write to file may occur in parallel to a rigorous, 32 bit checksum calculation. Likewise, the optional line reference extension stores lines by hash value. Therefore, any line with a mis-matched hash is obviously in error.

I don't think that any read operation requires more than four sectors to be resident in memory. In particular, files may be statelessly accessed by numerical reference to protocol multiplex, device number, partition number, file number and file offset. This allows all files on all volumes to be "open" simultaneously without increasing the minimum number of sectors resident in memory.

If this filing system is of interest and we can decide the final arrangement for meta-data fields, I can write an archiving utility to collate or extract files. I don't expect an archiver to be a command line script larger than 10KB.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 27, 2022 2:13 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
Sheep64 on Tue 22 Feb 2022 wrote:
sequential write to file may occur in parallel to a rigorous, 32 bit checksum calculation.


I have a technique to implement optional LDPC on random access files. It requires significant space for the checksum. This wouldn't be a problem if a file name is restricted to 96 characters or less.

I also have a technique for optional sub-directories. One bucket contains a name-space of files with corresponding buckets for file content. One bucket contains a name-space of directory names without corresponding buckets. Directories have numbers. Directories belong to other directories and directory zero indicates membership of the top level directory. Files also have directory numbers where zero indicates inclusion in the root directory.

Assuming that none of the file names clash (and assuming that directory names do not clash with file names), a simple system will see all of the file names collapsed down to one flat directory. Whereas, a more advanced system will see the optional directory structure. Overall, it is possible to migrate a filing system from a flat table of contents to a hierarchical scheme via named inodes.

Anyhow, this scheme requires one optional bucket. It might be worthwhile to define a set of optional buckets. This would provide upward/downward compatibility and allow simple systems to skip optional, advanced features.

drogon on Sat 5 Feb 2022 wrote:
FEEDBEEF ... BEADCAFE


SheepFS1 has CAFEFACE. I previously worked in an Internet cafe. It it also a variant of Boaty McBoatface or, my favorite, Floaty McFloatface.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 22, 2022 12:18 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
I forgot to apply network principles to filing systems. Specifically, it is possible to implement an optional parity check on sequential files and random access files by using a Hamming code and its dual. For 2^N words of storage, it requires 2N+1 words of checksum. 16MB buckets require 49 byte checksum. This is quite verbose but file names could be reduced to 64 bytes or so. If you want to add fancy features in the future, the checksum is compatible with sparse files, de-duplicated files and encrypted filing systems.

I also considered a Radix101 encoding where it is possible to store six Roman, Greek or Cyrillic characters within five bytes. I know that's a marginal saving but I considered it.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: