Adventures in FAT32 with 65c02

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Post Reply
hoglet
Posts: 367
Joined: 29 Jun 2014

Re: Adventures in FAT32 with 65c02

Post by hoglet »

barnacle wrote:
And so, after that minor mathematical misdemeanour, I tried it with the call to putchar removed. It now takes ten seconds to complete (mechanically timed with an analogue stopwatch!) so I'm reading 33.6kB/second. Which feels like it could be fast enough for something like virtual memory for a file editor or similar, and certainly not an issue for file save/load for files that will fit into the 64k memory space.
Is this at a 65c02 clock speed of 1.8432 MHz?
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

Yes
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

The inner loop reads each byte of the sector, updates the file pointer, and compares it with the file length, before sending it to serial. (Approximate timings).
  1. with everything included, 30 seconds
  2. with the call to putchar deleted, 10 seconds
  3. with the entire inner loop removed, about 3 seconds
That includes following the whole FAT chain whenever I need a new cluster. There's effectively no seek time on a solid state drive, so we're looking something around 110kB/second for raw read times.

Neil
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Adventures in FAT32 with 65c02

Post by BigDumbDinosaur »

barnacle wrote:
There's effectively no seek time on a solid state drive, so we're looking something around 110kB/second for raw read times.
Seems like a pretty good speed.

If it is doing 110 KB/sec at 1.8432 MHz, extrapolating that to the 16 MHz at which my POC unit is running would theoretically improve the raw transfer rate to about 954KB/sec.  That’s better than the 710 KB/sec I am seeing with POC V1.3’s SCSI subsystem, although disk characteristics are not a factor with that.  The host adapter (HBA) has to be wait-stated and with two accesses to the HBA hardware per byte transferred, I figure I’m losing about 25 percent of the transfer rate that would be possible without wait-states.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

With my current design, I feel I'm constrained to 2MHz for the UART's speed limit. I have only a simple design, with nothing by way of clock stretching. Though I may stick a 3.6864MHz oscillator in; I have some but I don't expect success. There's also the speed at which the CF might be accessed; the datasheet I have says 300ns for 5v, but it's a very old datasheet. More modern systems would use a DMA subsystem, too, and sixteen bit access...

Meanwhile, I'm thinking about how to rewrite things so I can use an allocated buffer rather than the static transient, and how much the double indirection is going to slow things down. As I said previously, there's an awful lot of 32 bit arithmetic, and at the moment it all uses a hard coded lba uint32 as a source or a target.

You can't easily do

Code: Select all

uint32 arithmetic (uint32 * p1, uint32 * p2)
{
}
in 65c02 land.

Neil
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Adventures in FAT32 with 65c02

Post by BigDumbDinosaur »

barnacle wrote:
There's also the speed at which the CF might be accessed; the datasheet I have says 300ns for 5v, but it's a very old datasheet.
If you had a logic analyzer handy, it would be a simple matter to determine how quickly the CF responds to an access.
Quote:
More modern systems would use a DMA subsystem, too, and sixteen bit access...
DMA would run no faster than the CF can run, but would still be faster than banging it with the C02 as you are doing.  The lack of DMA in my POC unit has a lot to do with the limited SCSI transfer rate it achieves.  The bus itself can run at 3.5 MB.sec in async mode, and up to 10 MB/sec in sync mode.  A typical DMA controller reads on one cycle and writes on the next.  In my POC application, that could theoretically produce 5 MB/sec throughput, disregarding device latency.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

Another advantage of a 16 bit system would also be enough spare memory to load an entire cluster in one hit.

Code: Select all

cf_r2:
			lda CFREG0
			sta (trans_ptr),y
			iny
			bne cf_r2
is about as tight as it can go for the innermost loop, but it's still 4+6+5+2 = 17 clock cycles per byte, eight us.

Neil
User avatar
Dr Jefyll
Posts: 3525
Joined: 11 Dec 2009
Location: Ontario, Canada
Contact:

Re: Adventures in FAT32 with 65c02

Post by Dr Jefyll »

Hmmm.. wouldn't the cycle count be more like 4+6+2+3 = 15 clock cycles per byte? But I know it's late at night where you are, Neil, so maybe you're getting sleepy. :wink:

As for "about as tight as it can go," there is still some wiggle room, as you imply. For example, it's probably permissible to unroll the loop so that the jump back to the top executes only once for every 2 (or 4 or 8 ) bytes rather than once for every byte.

Code: Select all

cf_r2:
			lda CFREG0       ; get a byte and move it
			sta (trans_ptr),y
			iny
			lda CFREG0       ; get another byte and move it
			sta (trans_ptr),y
			iny
			bne cf_r2        ; *now* test for exit
One could also save one cycle per byte by mapping CFREG0 into Zero page, although not everyone has oodles of free space in Z-pg. (I do.)

-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Adventures in FAT32 with 65c02

Post by BigDumbDinosaur »

Dr Jefyll wrote:
Hmmm.. wouldn't the cycle count be more like 4+6+2+3 = 15 clock cycles per byte? But I know it's late at night where you are, Neil, so maybe you're getting sleepy. :wink:...
Another possibility might be to use self-modifying code and before entering the loop, writing the base address into what is right now sta (trans_ptr),y.  The main loop would be able to use somewhat-faster absolute-indexed addressing instead of indirect-indexed addressing.  I use that technique in the DMA transfer loop in my POC unit’s SCSI driver.
Quote:
...it's probably permissible to unroll the loop so that the jump back to the top executes only once for every 2 (or 4 or 8 ) bytes rather than once for every byte.
On paper, that would seem to be a good idea. However, it assumes that the transfer quantity will be powers-of-two, which won’t always be the case.  For example, getting status from the device might return an odd number of bytes.
Last edited by BigDumbDinosaur on Wed Dec 24, 2025 6:56 am, edited 1 time in total.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

Yeah, obviously sleepy time. I have no idea where the last numbers came from, I can't even see them this morning on the table I was looking at...

This is code that lives in ROM space. But it's always going to be a power of two, so unrolling is a possibility. I think it will take a lot of unrolling to make a significant difference, but every little helps (premature optimisation!).

The good news is that I think I have an easy way to pass in a source/destination for the data transfer, so I'm not restricted to transient. The bad news is that I'm stuck with the 32-bit file pointer increment and compare I mentioned earlier, for f_xxx routines that need to know about file position.

Oh well.

There is little pain moving to the next sector, just an increment for lba, but the next cluster requires converting sector to cluster, going back to the FAT (so at least one and possibly two sectors read there) and converting back to a sector before actually reading it. Navigating backwards requires tracing the FAT from the start of the file, which I'm not looking forwards to at all.

Neil
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

Rip 'em up and Start Again
I've currently arrived at the stage of considering ripping things up and starting again... er, I mean, refactoring to reconsider some basic assumptions. There will likely be an amount of moving fast and breaking things...

Where I am at the moment:
  • I can create a named file
  • I can delete a named file
  • I can list a given directory
  • I can cat a named file (ideally a text file, but anything works)
which means basically I can manipulate the FAT sectors with speed and despatch. It does take rather more 32-bit arithmetic than I would like, but it all seems to work.

But...

This is fine as long as I only ever want to open one file at a time. Unfortunately I can see at least a handful of use cases where that's not enough:
  • Copying from one file into another
  • Operating e.g. a text editor with a working file larger than available memory
  • Anything where it might be handy to have a .bak or .tmp file
  • Something like an assembler, which might require multiple .asm files as input, with .hex and .lst as outputs
There are no doubt others. So there seems merit in C's FILE * system; before a named file is used, it must be opened for reading or writing and a file structure populated defining all sorts of information about the file in question. That suggests that multiple buffers be used into or from which data is transferred during file reads and writes.

I _could_ simply move from transient to a named buffer each time a sector is read (and the reverse, obviously), but that has two serious implications: firstly, a simple read of a buffer takes twice as long since, and secondly, every access to a file requires reloading at least the current sector, since we don't know what else has been done in the meantime.

So cf_read changes so that instead of using a fixed address of transient as a target, a pointer is set before each call. (I also ask cf_read to do the cf_set_lba before each read, since it's required every time and I previously had it called before the cf_read call anyway (when I remembered)). cf_write changes similarly. Indeed, those two routines are so similar I may combine them and simply set a flag to decide whether a read or write is desired. I'm not yet sure whether it's worth it.

At the moment, f_create, f_del, f_cat and f_dir don't require FILE structures and can work entirely within transient, and all the FAT record tracing works within transient, and I don't see a reason to change that.

FILE structure
This structure is still very tentative. It's also likely to be quite large, so it will have to live in main memory, not zero page. However, first thoughts indicate that it should contain the following pointers/flags:
  • Read/write flag
  • Dirty flag (set if the sector has been written to and so requires a rewrite to disc) (note that both these flags are contained in the directory entry's attribute byte)
  • Buffer address (uint16, though if I insist on a page boundary alignment it could be just one byte)
  • File Pointer (the position we're currently reading/writing from/to)
  • Current cluster (uint32)
  • Current sector (uint32) which also provides a useful indicator if we need a new cluster
  • Copy of the directory entry - 32 bytes, but has lots of useful stuff in it, at least some of which I'll need like the filename and starting cluster
f_open
So f_open has to perform a number of tasks.
  1. Check whether a file is already open, and fail if it is (or maybe not; I can see where it might be handy to open the same file for both read _and_ write operations, but I'm not sure I want to get into that yet.)
  2. Check whether the file exists, and fail if it doesn't (or call f_create?)
  3. Copy the file's directory entry into the FILE structure
  4. Zero the file pointer entry and current cluster
  5. Allocate a memory buffer, and note its address
  6. Finally, load the sector into memory
I'm sure more will occur to me later, but the first step is to test that everything works with modified cf_read and cf_write routines.

Neil
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Adventures in FAT32 with 65c02

Post by BigDumbDinosaur »

barnacle wrote:
Rip ’em up and Start Again
I’ve currently arrived at the stage of considering ripping things up and starting again... er, I mean, refactoring to reconsider some basic assumptions.  There will likely be an amount of moving fast and breaking things...
Funny how that procedure seems to pop up time and time again in software development.  :D  When I was developing my SCSI driver, I think I filled up the metaphoric wastebasket (“bin” to you guys) several times with the remanents of the previous night’s code.  :shock:
Quote:
Where I am at the moment:

  1. I can create a named file
  2. I can delete a named file
  3. I can list a given directory
  4. I can cat a named file (ideally a text file, but anything works)


which means basically I can manipulate the FAT sectors with speed and despatch.  It does take rather more 32-bit arithmetic than I would like, but it all seems to work.
You are much further along with this little exercise than you might think.  The above procedures cover much of what is routinely done in a filesystem.  Many other operations can be built up from combinations of those procedures.  For example, you can copy a file by using procedures A (I modified your list to make it display as ABC, etc.), part of C, D, and a write-to-file procedure.  Similarly, you can rename a file using A and B, and crosslinking the new filename to the old, followed by deleting the old name from the directory without releasing the clusters belonging to it.
Quote:
This is fine as long as I only ever want to open one file at a time. Unfortunately I can see at least a handful of use cases where that’s not enough:

  • Copying from one file into another
  • Operating e.g. a text editor with a working file larger than available memory
  • Anything where it might be handy to have a .bak or .tmp file
  • Something like an assembler, which might require multiple .asm files as input, with .hex and .lst as outputs


...a named file...must be opened for reading or writing and a file structure populated defining all sorts of information about the file in question.
Managing multiple open files isn’t too tough—it’s just an exercise in organizing some data.

The key structures are multiple “file control blocks” (FCB) and a table of 16-bit pointers (FCBTAB) to point to in-use FCBs.  The FCBTAB would be a fixed size, which when divided by the size of a pointer, determines the maximum number of files that may be simultaneously opened.  You would, of course, also have to have as many FCBs as slots in the FCBTAB.  With care, an FCB can be kept down to a reasonable size...that will be important in a 65C02 system, which is inherently memory-challenged.

Quote:
At the moment, f_create, f_del, f_cat and f_dir don’t require FILE structures and can work entirely within transient, and all the FAT record tracing works within transient, and I don’t see a reason to change that.
However, you will likely find it easier to work with a buffer pool to be used with open files and a buffer dedicated to manipulation of the FAT itself, keeping in mind that FAT manipulation must be atomic.  Such an arrangement has the potential to reduce the number of medium accesses needed to maintain the file system as different files get read and written.

Usually, there are two layers to this sort of thing: the disk buffer pool and the filesystem buffer pool.  In a uni-tasking system, the disk buffer pool can be small—two buffers would be more than adequate, whereas the filesystem buffer pool would have to have enough buffers to work with the maximum number of files that may be simultaneously opened, unless you are prepared let open files share buffers.

Quote:
FILE structure
This structure is still very tentative. It’s also likely to be quite large, so it will have to live in main memory, not zero page. However, first thoughts indicate that it should contain the following pointers/flags:

  • Read/write flag
  • Dirty flag (set if the sector has been written to and so requires a rewrite to disc) (note that both these flags are contained in the directory entry’s attribute byte)
  • Buffer address (uint16, though if I insist on a page boundary alignment it could be just one byte)
  • File Pointer (the position we’re currently reading/writing from/to)
  • Current cluster (uint32)
  • Current sector (uint32) which also provides a useful indicator if we need a new cluster
  • Copy of the directory entry - 32 bytes, but has lots of useful stuff in it, at least some of which I’ll need like the filename and starting cluster
In the above list, items which would be in each open file’s FCB, the file attributes (the DIR_Attr directory field), read/write and dirty flags can be in a bit field, so only one byte.  You will also need the timestamp fields (DIR_CrtTimeTenth, DIR_CrtTime, etc.) if writing to the file, and the DIR_FileSize so you can update the file’s size as it is modified.  I see no need for having the filename in the FCB, as once the file has been opened, your system will (should) assign each open file a number for use by the user program.

The file pointer should be file pointerS, as the read and write positions will likely be different for a file that has been opened for reading and writing.  You would need to track both to avoid accidentally mangling a file when a read at one location is immediately followed by a write to another.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Adventures in FAT32 with 65c02

Post by barnacle »

Some food for thought, there, BDD; thanks.

In the interest of speed, I'm envisaging a system whereby once a file has been opened for reading, it will load the first sector into the buffer. That lets me move rapidly forwards through it (a likely use case, I think) with 'instant' access to the first sector, 'speedy' access to the next seven sectors (which are sequential), and reasonably quick access to the next cluster. Movement backwards within a cluster is also similarly speedy, though it gets slower when I get to the start of the cluster and have to start again from the first.

For writing, I think I'm looking by default as 'append' and will start with the last sector loaded in the buffer, and the file position pointer pointing at the first available space. It would still be possible to move the file position point to other points in the file for overwriting existing data.

But in either case, I need that every open file has its own buffer. But I think I like the list of pointers to FCBs.

One reason for keeping a copy of the file's directory entry in the FCB is that it holds pretty much all I need by way of control data for the file (and for when it's time to write it back if things change). Conveniently, that holds the filename in the correct format for searching through the directory so where I know where the data goes back (if required).

Neil (still digesting)
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Adventures in FAT32 with 65c02

Post by BigDumbDinosaur »

barnacle wrote:
Some food for thought, there, BDD; thanks.
Think of it as a followup to Christmas dinner.  :D  Here’s some music to inspire your programming journeys, as well as aid your digestion.  :D
Quote:
In the interest of speed, I'm envisaging a system whereby once a file has been opened for reading, it will load the first sector into the buffer...
I expect that most of the compute-bound processing will go plenty fast, even though you have to do your arithmetic a byte at a time.  The bottlenecks will be in bridging the chasm between the most fundamental characteristic of a file—it’s an array of bytes—and the fundamental characteristic of a disk-like storage device—it’s an array of blocks.

This whole thing begs for a layered approach.  You have a disk driver which knows about the mechanics of accessing the storage medium—the array of blocks, but knows nothing about how that array of blocks is being used.  You also have a filesystem driver which knows how the array of blocks is being used, but knows nothing about the mechanics of accessing the storage medium.  The key difference is the FAT filesystem sees mass storage in terms of cardinally numbered clusters, but the computer sees mass storage in terms of cardinally number blocks.

The disk driver is always subservient to the filesystem driver, the latter which tells the former what to do by passing four pieces of information:

  • Logical block address (LBA).
     
  • Number of contiguous blocks to be accessed at that LBA.
     
  • Buffer pointer.
     
  • Operation: read or write.


In order to give the disk driver that information, your FAT filesystem driver must translate an open file’s read/write pointer into a cluster and an offset within the cluster.  Below that step will be another one that converts the cluster’s cardinal number within the filesystem into LBA and number-of-blocks parameters which, along with a buffer pointer and the read/write flag, gets fed to the disk driver.  The disk driver does its business and when finished, tells the filesystem driver that all is well...or not.  The SCSI library I wrote for my POC unit basically works in that fashion—the caller has no idea what is going on inside the driver, so much so, in fact, that none of the temporary workspace used in the driver is even visible (it’s all on the stack, as are the parameters fed to the driver by the caller).

With this sort of segregation, your FAT filesystem package can be made to work with virtually any random-access storage medium.  The disk driver is specific to the medium type, and the filesystem driver only knows what it knows about how the storage is being used.  With the right disk driver, you could even support the old C-H-S method of disk addressing.

Quote:
But in either case, I need that every open file has its own buffer. But I think I like the list of pointers to FCBs.
The FCBTAB method makes looking up an open file straightforward task—just double the file number and you have an index into the table.  Copy what is in the table at that index to zero page and you have a pointer to the FCB.  Easy-peasy, as they say.  :wink:

Obviously, each open file having its own buffer (as well as the FCB) could get expensive, memory-wise.  Having more buffers potentially means less-frequent medium accesses, but more clock cycles being eaten up with buffer management.  Then there is the potential data loss if a lot of buffers are dirty and the machine goes belly-up...fsck to the rescue!

Compromises, compromises!  :?
x86?  We ain't got no x86.  We don't NEED no stinking x86!
jgharston
Posts: 181
Joined: 22 Feb 2004

Re: Adventures in FAT32 with 65c02

Post by jgharston »

barnacle wrote:
FILE structure
This structure is still very tentative. It's also likely to be quite large, so it will have to live in main memory, not zero page. However, first thoughts indicate that it should contain the following pointers/flags:
  • Read/write flag
  • Dirty flag (set if the sector has been written to and so requires a rewrite to disc) (note that both these flags are contained in the directory entry's attribute byte)
  • Buffer address (uint16, though if I insist on a page boundary alignment it could be just one byte)
  • File Pointer (the position we're currently reading/writing from/to)
  • Current cluster (uint32)
  • Current sector (uint32) which also provides a useful indicator if we need a new cluster
  • Copy of the directory entry - 32 bytes, but has lots of useful stuff in it, at least some of which I'll need like the filename and starting cluster
The 6502 filing system I wrote has the following 24-byte info block for each open file:

Code: Select all

24-bit channel workspace:
 00   01 02 03 04  05 06 07 08  09 0A 0B 0C  0D 0E 0F   10 11  12 13 14   15  16   17
Flags <--Start-->  <---PTR--->  <---EXT--->  <--DIR--> <DskID> <-Alloc->  <fptr> <Buff>
  |             \--drive
  \-- OUT:IN:EOF:D:W:R:written:read
You may not need a copy of the directory entry, I use a pointer to the directory entry: <DIR> which is the directory ID and <fptr> which is the offset into the directory.
Quote:
f_open
So f_open has to perform a number of tasks.
  1. Check whether a file is already open, and fail if it is (or maybe not; I can see where it might be handy to open the same file for both read _and_ write operations, but I'm not sure I want to get into that yet.)
The standard principle is "read many, write once", so if a file is open for reading, it can be open again for reading as many times as you have resources, but only for reading. But any file that is open for writing blocks any further opening of any type. So, opening for update (read *and* write) blocks that file from being opened again as it is open for writing.
Quote:
[*] Check whether the file exists, and fail if it doesn't (or call f_create?)
You make the action dependent on what the caller wants. Open for input - just tell me there's no file. Open for update - also just tell me there's no file there. Open for output - create the file if it's not there. You have choice of open for output either returning saying "can't, there's already a file" and requiring the user to delete it, or doing the deletion itself. Take your guidence from pre-existing FAT filesystem APIs. In general, you should minimise how much of your calls fail, they should just pass a result back to the caller so that the caller can decide what to do.
Post Reply