Page 2 of 14

Re: Adventures in FAT32 with 65c02

Posted: Sat Dec 06, 2025 10:52 pm
by BigDumbDinosaur
barnacle wrote:
BigDumbDinosaur wrote:
In general, to send an instruction to the CF, registers are preset with required data and then a command is issued to CFREG7.  Often is is necessary to wait for the CF to complete a previous instruction and/or to become ready...I’d be inclined to write that as...
Well, they do say that premature optimisation is generally not a good idea, but I take your point about speeding this part up.
I usually subscribe to the notion that premature optimization is...er...premature.  :mrgreen:  I get the code working first and then go on an optimization spree.  However, when I write code that has to touch I/O hardware, I seem to immediately look at it for succinctness and write what appears at the time to be the most optimized code.
Quote:
Though I get the feeling that the BIT instruction is one that confuses some (many?) people... it certainly confuses me.  Since BIT ANDs A with the target, don't you need to set A first?
BIT does AND memory with .A, which operation conditions z in SR (status register).  However, BIT on memory also copies bits 7 and 6 of memory to the n and v bits, respectively, in SR.  Hence BIT CFREG7 implicitly sets or clears n and v according to CFREG7’s state.  The automatic conditioning of n and v feature makes BIT very attractive for use in device drivers.

The 65C02 (and 65C816) also offers BIT immediate, which is handy for non-destructively testing whatever is loaded into .ABIT immediate doesn’t affect n or v, only z.

Quote:
(I have a suspicion that a 6502 is not going to be waiting for a CF capable of tens of megabytes per second!)
Maybe not, but, needless to say, that’s not something you should rely on in your driver design.
Quote:
One easy way to speed this up is to inline the test with a macro; that's what, ten cycles saved on the call/return? And I still need to test further to see if the bit 6 test is actually required. But at the moment, I will settle for clarity.
The JSR - RTS combination costs 4 bytes and 12 cycles.  In-lining the CF_WAIT function via a macro could save quite a few clock cycles during a block read/write, at the minor expense of 8 extra bytes per call.  Assuming a block is 512 bytes, the JSR - RTS combination to run CF_WAIT would add 6144 cycles to the total “cost” of a block read or write transaction.

However, going that route this early in the project would make you guilty of premature optimization.  :D

Re: Adventures in FAT32 with 65c02

Posted: Sun Dec 07, 2025 7:01 am
by AndrewP
Just thought I'd mention I've posted a (C++) FAT32 implementation here on the forum. Don't feel compelled to read it unless you've run into a bug that the FAT32 spec is not helpful on - I know how much energy it takes to read someone else's code.

Re: Adventures in FAT32 with 65c02

Posted: Sun Dec 07, 2025 9:40 am
by barnacle
Thanks Andrew; there is certain to be an amount of trial and error here and further references are always useful. However, I'm not greatly familiar with C++, so we'll see :mrgreen:

@BDD: it is a premature optimisation, true, but as you point out inlining it just in the sector transfer block saves an easy three and a half milliseconds per transfer (at my conservative 1.8MHz clock). That might end up as a noticeable difference (particularly if it turns out to be unnecessary to test bit 6, as well). I'll probably do some informal timing tests at some stage.

An idle (very lazy, without looking at the loop code!) calculation gives an estimate for the transfer of perhaps six or seven, call it ten to be safe, milliseconds per sector; so a hundred sectors per second, 500k baud rate equivalent? Faster than Kansas City CUTS format, then :)

Neil

Re: Adventures in FAT32 with 65c02

Posted: Sun Dec 07, 2025 11:28 am
by barnacle
The Volume Information Block
In the last installment, we followed the disc structure from the MBR (master boot record) to the first partition and the Volume Information Block (VIB) which is always the first sector in the partition. There's usually some unused space between the MBR and the actual partitions; I don't know why, but on my CF cards formatted as FAT32 by Linux, the first (and only) partition started at sector $800. That's nearly a MB of unused space, shocking! But your mileage may vary; that's why you need to check the MBR to find out.

Caveat regarding error checking
You will have noticed a complete lack of error checking in the code I've shown so far, whether it's the error return from CF accesses, or validation of the on-CF structures. That's because I formatted the discs; I know what's on them; and for _my_ safety I've confirmed what I need with examination of the disc image.

This is not good programming practice.


You should make those checks. You should know whether you have a CF installed or not, whether it's the one you started with, whether it has the file system you expect, and so on. I haven't made those checks here for a handful of reasons: (a) I'm lazy; (b) I'm trying to show the basics of the system, and test code can add a lot of visual clutter at assembler level; and (c) I'm lazy.

You have been warned. I've tried to point out the obvious check places as we go along, but in the filesystem I _do_ make a lot of assumptions about sizes and options because I know what I formatted.

Decoding the VIB
To be able to interract with the file system, we have to know where everything lives. To save directory space, at the cost of more potentially wasted space in the file storage, files are stored in clusters - sequential groups of sectors. The number of sectors in a cluster is variable but always a power of two; for FAT32 it is almost always eight. The cluster is the smallest unit of storage you can allocate to a file; if the file is one byte long it will occupy an entire cluster of 4kB. If it's 4097 bytes, it will occupy two clusters, and so on.

The FAT32 system consists of the MBR (which we've discussed), one to four partitions (of which we're using only one, though the others work the same way - though I'm not going into child partitions), and then the filesystem itself. Within the partition we find, in order, the VIB, some reserved sectors, (usually) two copies of the File Allocation Table (after which the format is named) and then the clusters that hold the data. There may be a handful of unused sectors at the end if things don't quite add up.

The VIB contains the information you need to get to a file; to things you have to calculate are the LBA of the start of the first FAT and the LBA of the start of the clusters.

Offset $b is a UINT16 word telling you the number of bytes per sector. This has been 512 ($0200) for every CF I've looked at, but there may be cases where it isn't, so you should check.
Offset $d is a UINT8 byte giving the number of FATs. Again, I assume this to be true and don't actually look.
Offset $2c is a UINT32 word indicating the FAT record number of the root directory. I don't know why they do this because the spec says the root directory record is always 2, so that's what I assume.

Now we get to the bits we have to calculate...

Offset $0e has a UINT16 word indicating the number of reserved sectors
Offset $24 has a UINT32 word with the number of sectors per Fat.

Code: Select all

bytes_per_sector	equ 512		; assumed - offset 0x0b-c
fat_copies			equ 2		; assumed - offset 0x0d
root_sector			equ 2		; assumed - offset 0x2c
sectors_per_fat_ptr	equ transient + 0x24	; uint32
reserved_ptr		equ transient + 0x0e	; uint16


	bss
; global static variables
partition_begin_lba	ds 4		; holds the partition address on entry
;sectors_per_fat		ds 4    ; we might need this later, but not quite yet
fat_begin_lba:		ds 4
cluster_begin_lba:	ds 4
We need to know the LBA of the start of the first FAT, and the LBA of the start of the clusters.
The start of the first FAT can be found by adding the reserved sectors to the start of the partition:

Code: Select all

	; add reserved sectors to partition start to get the FAT start
	clc
	lda reserved_ptr
	adc partition_begin_lba
	sta fat_begin_lba
	lda reserved_ptr+1
	adc partition_begin_lba+1
	sta fat_begin_lba+1
	lda #0
	adc partition_begin_lba+2
	sta fat_begin_lba+2
	lda #0
	adc partition_begin_lba+3
	sta fat_begin_lba+3	
Now we know where the FAT starts, we can advance to the clusters. For that, we need to multiply the number of clusters per FAT by two, and add it to fat_begin_lba. Remember that there could be any number of FAT copies, but the spec says just two, so that's what I'm going with...

Code: Select all

	; the data clusters being sectors_per_fat * number of fats later
	lda sectors_per_fat_ptr
	sta cluster_begin_lba
	lda sectors_per_fat_ptr+1
	sta cluster_begin_lba+1
	lda sectors_per_fat_ptr+2
	sta cluster_begin_lba+2
	lda sectors_per_fat_ptr+3
	sta cluster_begin_lba+3
	; that's sectors per fat
	; we should really check the actual count, but assume two fats
	asl cluster_begin_lba
	rol cluster_begin_lba+1
	rol cluster_begin_lba+2
	rol cluster_begin_lba+3		; multiply by two
	; now add the cluster start
	clc
	lda cluster_begin_lba
	adc fat_begin_lba
	sta cluster_begin_lba
	lda cluster_begin_lba+1
	adc fat_begin_lba+1
	sta cluster_begin_lba+1
	lda cluster_begin_lba+2
	adc fat_begin_lba+2
	sta cluster_begin_lba+2
	lda cluster_begin_lba+3
	adc fat_begin_lba+3
	sta cluster_begin_lba+3		; and we have the start of the clusters
And on my system, that returns the three values:

Code: Select all

 
partition_begin_lba 00000800
fat_begin_lba       00000820
cluster_begin_lba   00000FC0
Neil

Re: Adventures in FAT32 with 65c02

Posted: Sun Dec 07, 2025 12:16 pm
by BigDumbDinosaur
barnacle wrote:
@BDD: it is a premature optimisation, true, but as you point out inlining it just in the sector transfer block saves an easy three and a half milliseconds per transfer (at my conservative 1.8MHz clock). That might end up as a noticeable difference (particularly if it turns out to be unnecessary to test bit 6, as well). I'll probably do some informal timing tests at some stage.

An idle (very lazy, without looking at the loop code!) calculation gives an estimate for the transfer of perhaps six or seven, call it ten to be safe, milliseconds per sector; so a hundred sectors per second, 500k baud rate equivalent? Faster than Kansas City CUTS format, then :)
When I was doing logic analysis on my new-and-improved™ SCSI host adapter last year, one of the tests was to empirically determine the raw transfer speed that the SCSI driver’s read/write loops could achieve.  It turned out the time to complete one loop, which would fetch a byte, was 1.375 µseconds with the MPU running at 16 MHz. That time included the delay from two wait-states, one caused by polling the host adapter’s status (a BIT instruction) and the other caused by reading a byte from the host adapter.  The code involved is in-line, with one branch always taken—a host adapter interrupt breaks the loop when the transfer is complete.

Assuming Ø2 was 1 MHz, the above fetch loop time would be 22 µsecs—it would be slight faster without wait-stating the host adapter accesses.  All transfer loop code is eight-bit, which makes me bold enough to say you should be able to achieve similar performance with the 65C02 if your code is tight enough.

Quote:
The FAT32 system consists of the MBR (which we've discussed), one to four partitions...
Actually, the MBR is unrelated to the filesystem—it will be present on any disk that has been made bootable with FDISK.  It is entirely possible to have a bootable disk with no partitions, which means there would be no filesystems as well.  Of course, such a disk wouldn’t be very useful.  :mrgreen:

Furthermore, it is possible to have multiple filesystems within a single partition, a feature of disks that were formatted under SCO UNIX or SCO OpenServer.  Such disks only have one partition and use a division table to segregate up to eight filesystems in that partition.  The partitioning setup I have devised for my POC unit incorporates that feature.

Re: Adventures in FAT32 with 65c02

Posted: Sun Dec 07, 2025 3:10 pm
by BigDumbDinosaur
BTW, if anyone has ever wondered why FAT filesystems are prone to corruption (a “feature” that made Peter Norton filthy rich), reread what Neil posted about the monkey motion required to read a directory and access a file.  There is also a lot of monkey motion involved in maintaining filesystem consistency as files are created, updated and deleted, which creates a fertile environment for developing bugs.

Much of that is due to baggage that Microsoft has been dragging around since the days of the IBM PC-XT and its 20 MB ST412 disks.  As the FAT12 filesystem of the time was superbly unsuited to dealing with capacious mass storage, Micro$oft’s brilliant code jockeys patched the daylights out FAT12 to handle it, resulting in FAT16.  However, there was a problem.

The PC-XT was a relatively slow machine even when running purely compute-bound, plus there was a lot of competition for RAM space, which was expensive stuff back then.  The structures that had to be in core to keep track of what was going on in the filesystem had to be made as small as practical to reduce the amount of grunt work needed to maintain them.  Using small filesystem structures might leave enough room to actually, y’know, be able to run a program that could do something useful.  :shock:

Ergo rather than allocate disk space a block at a time, as was done with floppy disks (and was being done in some contemporary versions of UNIX), Micro$oft came up with the cluster approach to help constrain the size of the data structures required to track disk usage.  Now disk space was being consumed in 4096 byte chunks instead of 512 byte pieces, and unsurprisingly, users did occasionally run out of space on their expense hard drives, mostly because a lot of files tended to be small, and small files are inefficient in a filesystem that uses up disk space the way politicians use up tax dollars.

Not only did Micro$oft come up with a filesystem that made CP/M disk functions look efficient, they “accidentally” planted a little time-bomb in the FAT16 code, which bug was a significant factor in Mr. Norton earning his millions.

Due to a procedural error in the part of the code that creates a new file, a cluster would be assigned to the file but the directory entry would be written (with the cluster number) before the cluster was actually allocated.  Should the machine go down before the disk copy of the FAT was updated to mark the cluster as allocated, there would be a directory entry for the new file pointing to an unallocated cluster.

After the system was restarted and another new file was created, it, likely as not, would be assigned that same cluster and now there would be two directory entries pointing to the same place on the disk, resulting in crosslinked files.  Opening one of the crosslinked files and writing to it, followed by opening the other crosslinked file and writing to it, could, and often would, corrupt data.  In some cases, the filesystem itself would become corrupted and inaccessible, especially when one of the crosslinked files was deleted.

Eventually, someone at Micro$oft found that little boo-boo and fixed it, but not before countless MS-DOS users had countless files trashed.  Norton’s utilities had, among other things, a filesystem checker that could detect such a problem and on occasion, fix it.  Needless to say, just about every PC user on the planet who had been bitten by the crosslink bug bought that software.

Aside from the FAT filesystem’s fragility, cluster-based allocation can cause a lot of disk thrashing, especially once usage exceeds around 60 percent of available filesystem space.  Adding insult to injury, FAT’s allocation scheme uses linked lists to track cluster usage, which can be tedious to maintain, and are also prone to corruption.

The inefficiency and fragility of FAT-based filesystems, as well as their difficulty in scaling as disk capacity rapidly increased in the 1990s, was was one of several factors that motivated Micro$oft to develop the NT filesystem (NTFS) that has been standard since Windows XP—not that it is that much more efficient than FAT32.  At least, NTFS is much harder to corrupt.  :?

Re: Adventures in FAT32 with 65c02

Posted: Sun Dec 07, 2025 3:46 pm
by barnacle
The File Allocation Table
Here's the first sector of the FAT that I have on this CF card. I copied a handful of text files to the card, and then for testing and to make sure it was larger than one cluster, copied a pile of mostly .pdf files across. There's also a subdirectory, which we'll see later.

Code: Select all

0200  F8 FF FF 0F FF FF FF 0F D4 8A 00 00 FF FF FF 0F ................          
0210  05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00 ................          
0220  09 00 00 00 0A 00 00 00 0B 00 00 00 FF FF FF 0F ................          
0230  FF FF FF 0F 0E 00 00 00 0F 00 00 00 10 00 00 00 ................          
0240  11 00 00 00 12 00 00 00 13 00 00 00 14 00 00 00 ................          
0250  FF FF FF 0F 16 00 00 00 17 00 00 00 18 00 00 00 ................          
0260  19 00 00 00 1A 00 00 00 1B 00 00 00 1C 00 00 00 ................          
0270  1D 00 00 00 1E 00 00 00 1F 00 00 00 FF FF FF 0F ................          
0280  21 00 00 00 22 00 00 00 23 00 00 00 FF FF FF 0F !..."...#.......          
0290  FF FF FF 0F 26 00 00 00 FF FF FF 0F FF FF FF 0F ....&...........          
02A0  29 00 00 00 2A 00 00 00 FF FF FF 0F FF FF FF 0F )...*...........          
02B0  2D 00 00 00 2E 00 00 00 2F 00 00 00 30 00 00 00 -......./...0...          
02C0  FF FF FF 0F 32 00 00 00 33 00 00 00 34 00 00 00 ....2...3...4...          
02D0  35 00 00 00 36 00 00 00 37 00 00 00 38 00 00 00 5...6...7...8...          
02E0  39 00 00 00 3A 00 00 00 3B 00 00 00 3C 00 00 00 9...:...;...<...          
02F0  3D 00 00 00 3E 00 00 00 3F 00 00 00 40 00 00 00 =...>...?...@...          
0300  41 00 00 00 42 00 00 00 43 00 00 00 44 00 00 00 A...B...C...D...          
0310  45 00 00 00 46 00 00 00 FF FF FF 0F 48 00 00 00 E...F.......H...          
0320  49 00 00 00 4A 00 00 00 4B 00 00 00 4C 00 00 00 I...J...K...L...          
0330  4D 00 00 00 4E 00 00 00 4F 00 00 00 50 00 00 00 M...N...O...P...          
0340  51 00 00 00 52 00 00 00 53 00 00 00 54 00 00 00 Q...R...S...T...          
0350  55 00 00 00 56 00 00 00 57 00 00 00 58 00 00 00 U...V...W...X...          
0360  59 00 00 00 5A 00 00 00 5B 00 00 00 5C 00 00 00 Y...Z...[...\...          
0370  5D 00 00 00 5E 00 00 00 5F 00 00 00 60 00 00 00 ]...^..._...`...          
0380  61 00 00 00 62 00 00 00 63 00 00 00 64 00 00 00 a...b...c...d...          
0390  65 00 00 00 66 00 00 00 67 00 00 00 68 00 00 00 e...f...g...h...          
03A0  69 00 00 00 6A 00 00 00 6B 00 00 00 6C 00 00 00 i...j...k...l...          
03B0  6D 00 00 00 6E 00 00 00 6F 00 00 00 70 00 00 00 m...n...o...p...          
03C0  71 00 00 00 72 00 00 00 73 00 00 00 74 00 00 00 q...r...s...t...          
03D0  75 00 00 00 76 00 00 00 77 00 00 00 78 00 00 00 u...v...w...x...          
03E0  79 00 00 00 7A 00 00 00 7B 00 00 00 7C 00 00 00 y...z...{...|...          
03F0  7D 00 00 00 7E 00 00 00 7F 00 00 00 80 00 00 00 }...~...........  
Again, remember that my sectors are transferred to a transient area at $200. This listing is all of the first sector.

A FAT32 entry is four bytes long. The meanings of the values are:
  • $00000000 - the cluster is not used
  • $00000001 - reserved for internal use
  • $00000002 - $0FFFFFEF - the cluster is fully used and extends to a new cluster with this number.
  • $0FFFFFF0 - $0FFFFF6 - generally reserved
  • $0FFFFFF7 - cluster contains a bad sector and should not be used
  • $0FFFFFF8 - $0FFFFFFF - end-of-chain marker: this is the last cluster in the chain
The top four bits are reserved and should be ignored. The first two records are also reserved and should not be touched. The first record that refers to the clusters is record #2, at offset $08, and it refers to cluster zero... obvious, right?

This is important.
  • The FAT record entry number is its position in the FAT sectors divided by four.
  • All FAT record entry numbers, except for the special values above, refer to FAT locations.
  • Each FAT record entry, except for the special values above, refers to a single cluster. The number of that cluster is the FAT record entry number, minus two.
  • Each cluster contains a group of contiguous sectors, assumed here to be eight (though really you should grab that data from the VIB and confirm).
So taking all that together: FAT records 0 and 1 are at offsets $00 and $04, and are ignored. At offset $08 we find record # 2, and it refers to cluster zero. Cluster zero can be found as the first cluster in the data section, starting with the sector at cluster_begin_lba. The value of record 2 is D4 8A 00 00, or $8AD4, and that is the record number of the next cluster of that file.

To find the location of record $8AD4, recall that there are 128 four-byte records in a sector. Dividing $8AD4 by 128 gives us $115, so the record will be in the sector fat_begin_lba + $115 and it will be at offset (($8AD4 AND $7F) * 4) or $150 within that sector.

I'm not going to look at the cluster yet; instead, have a look at offset $0C. That has a value of FF FF FF 0F which tells us that everything on that file fits within that single cluster, which lives at cluster_begin_lba + ($0C / 4) - 2, or cluster_begin_lba + 1.

So that one's not very interesting; how about the next record at offset $10? That one starts at cluster cluster_begin_lba + ($10 / 4) - 2, (cluster_begin_lba + 2) which isn't big enough to hold it; it extends to record 5; from there to records 6, 7, 8, 9, $0A, and finally ends in $0B, as indicated by the FF FF FF 0F value of record $0B.

Record $0C, at offset $30, is another short file which can be found in cluster cluster_begin_lba + 10, using the same maths as before. And so it goes...

On this half-gigabyte CF, the format program has determined (and we have shown by examination) that the two copies of the FAT are $7A0 (1952) sectors each, giving space for 124,928 records; at 4k per cluster that's 511,705,088 bytes storage. And they said 640k would be enough for anyone...

Neil

Re: Adventures in FAT32 with 65c02

Posted: Mon Dec 08, 2025 2:24 pm
by SamCoVT
If you want some more speed, remove your call to cf_wait from inside your loop reading sector bytes. The CF card has an internal sector buffer. When you issue command $20, you do need to wait for the command to complete - but only once. The CF card will be copying your requested sector into its sector buffer while it is "BUSY". Once that is complete, you can read all 512 bytes very quickly as the CF card is just copying from it's internal buffer at that point.
If transferring multiple sectors, you only need to wait for "BUSY" in between sectors as the card fetches the next sector into its buffer.

This sequence is described on page 66 of the Sandisk CompactFlash Memory Card Product Manual posted earlier.

You may also want to check the error conditions before reading the sector data, but I was also lazy in my own implementation and assumed it just worked with good success.

Re: Adventures in FAT32 with 65c02

Posted: Mon Dec 08, 2025 2:50 pm
by barnacle
That's a good thought, Sam. Thanks.

I do think it unlikely that most of the error conditions and wait states that occur because spinny discs have long access times and analogue interfaces are unlikely with CF hardware.

Currently trying to work out how much code I can share between 'find_first' and 'find_next'...

Neil

Re: Adventures in FAT32 with 65c02

Posted: Tue Dec 09, 2025 6:43 am
by barnacle
The Directory structure
In the last gripping installment, our hero had decoded the VIB, located the FAT, and from that, he now knows where the treasure is. Or at least, he knows where there is a map to the treasure. It wouldn't be a good quest without a cryptic and confusing map to pore over, right?

But he knows where the root directory is. It's always the first cluster in the data section. (Actually, you don't need the FAT to find it, the VID data told us where to look, but trust me, he'll need the FAT later.) On my example CF, it's at sector

Code: Select all

00000FC0
Here's what the first sector looks like:

Code: Select all

0200  41 65 00 63 00 75 00 2E 00 63 00 0F 00 0B 00 00 Ae.c.u...c......          
0210  FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................          
0220  45 43 55 20 20 20 20 20 43 20 20 20 00 8A 21 60 ECU     C   ..!`          
0230  74 5B 74 5B 00 00 A8 78 EA 52 04 00 F2 7C 00 00 t[t[...x.R...|..          
0240  E5 4E 54 49 54 4C 7E 31 20 20 20 10 00 3E 08 60 .NTITL~1   ..>.`          
0250  74 5B 74 5B 00 00 08 60 74 5B 03 00 00 00 00 00 t[t[...`t[......          
0260  41 74 00 65 00 73 00 74 00 00 00 0F 00 32 FF FF At.e.s.t.....2..          
0270  FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................          
0280  54 45 53 54 20 20 20 20 20 20 20 10 00 3E 08 60 TEST       ..>.`          
0290  74 5B 74 5B 00 00 39 60 74 5B 03 00 00 00 00 00 t[t[..9`t[......          
02A0  41 65 00 65 00 70 00 72 00 6F 00 0F 00 92 6D 00 Ae.e.p.r.o....m.          
02B0  2E 00 63 00 00 00 FF FF FF FF 00 00 FF FF FF FF ..c.............          
02C0  45 45 50 52 4F 4D 20 20 43 20 20 20 00 96 21 60 EEPROM  C   ..!`          
02D0  74 5B 74 5B 00 00 98 78 EA 52 0C 00 14 0D 00 00 t[t[...x.R......          
02E0  41 67 00 72 00 61 00 70 00 68 00 0F 00 C2 69 00 Ag.r.a.p.h....i.          
02F0  63 00 73 00 2E 00 63 00 00 00 00 00 FF FF FF FF c.s...c.........          
0300  47 52 41 50 48 49 43 53 43 20 20 20 00 A1 21 60 GRAPHICSC   ..!`          
0310  74 5B 74 5B 00 00 91 78 EA 52 0D 00 C8 78 00 00 t[t[...x.R...x..          
0320  41 69 00 31 00 38 00 6E 00 2E 00 0F 00 FE 63 00 Ai.1.8.n......c.          
0330  00 00 FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................          
0340  49 31 38 4E 20 20 20 20 43 20 20 20 00 AC 21 60 I18N    C   ..!`          
0350  74 5B 74 5B 00 00 86 78 EA 52 15 00 80 A7 00 00 t[t[...x.R......          
0360  41 6D 00 61 00 69 00 6E 00 2E 00 0F 00 41 63 00 Am.a.i.n.....Ac.          
0370  00 00 FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................          
0380  4D 41 49 4E 20 20 20 20 43 20 20 20 00 B8 21 60 MAIN    C   ..!`          
0390  74 5B 74 5B 00 00 A5 4C F1 52 20 00 49 3D 00 00 t[t[...L.R .I=..          
03A0  41 73 00 65 00 72 00 69 00 61 00 0F 00 80 6C 00 As.e.r.i.a....l.          
03B0  2E 00 63 00 00 00 FF FF FF FF 00 00 FF FF FF FF ..c.............          
03C0  53 45 52 49 41 4C 20 20 43 20 20 20 00 C3 21 60 SERIAL  C   ..!`          
03D0  74 5B 74 5B 00 00 E5 78 EA 52 24 00 ED 0C 00 00 t[t[...x.R$.....          
03E0  41 73 00 73 00 64 00 31 00 33 00 0F 00 0B 30 00 As.s.d.1.3....0.          
03F0  39 00 2E 00 63 00 00 00 FF FF 00 00 FF FF FF FF 9...c........... 
Once again, this is a view of the transient area. The file names are obvious, right? Capital letters, look like sensible file names, start in convenient columns? There's one at offset $20, something a bit odd at offset $40, one at offset $80, and so on. But there are actually directory entries every 32 bytes (that is, a directory entry is structure 32 bytes long).

The mystery items are MS's long file name structures, and I'm ignoring them completely. They always precede the default 8.3 filenames that we're interested in, they're complex to decode (and encode), they use MS's wide character set, and most significantly we can work exclusively in 8.3 filenames. There's nothing to stop us using long file names, and maybe some time in the future, but I'm trying to keep the code down to a manageable (and understandable size).

However we do have to look at them, because they have one thing in common with an 8.3 filename, and that's the attribute byte.

Code: Select all

; directory record field offsets:
dir_name		equ 0		; eleven chars, space filled
dir_attrib		equ 11		; one byte
dir_frstcluhi	equ 20		; two bytes
dir_wrttime		equ 22		; two bytes
dir_wrtdate		equ 24		; two bytes
dir_frstcllo	equ 26		; two bytes
dir_size		equ 28		; four bytes
; (other fields are ignored)

; the values of the attribute byte
attr_ro			equ 1		; this is a read only file
attr_hid		equ 2		; this is a hidden file (ignored)
attr_sys		equ 4		; this is a system file (ignored)
attr_vol		equ 8		; this is a volume name (ignored; there
							; should be only one file on a volume with
							; this attribute and its cluster fields are
							; both set to zero)
attr_dir		equ 16		; this is a directory
attr_arc		equ 32		; this bit is set when a file is created, 
							; written to, or renamed to indicate to a
							; backup utility that things have changed
attr_lfn		equ attr_ro + attr_hid + attr_sys + attr_vol
							; indicates that this is a long file name
							; long file name data precedes the default
							; 8.3 filename record and is ignored here
For practical purposes, I'm ignoring most of the bits in the attribute field. If the attr_lfn bits are set, I know I'm looking at a long file name and I don't need to worry about that. Equally, I show without fear or favour files that are hidden, system files (this flag is intended to indicate to defragmentation programs that a particular file should remain in a particular location on the disc, apparently so that the OS can find it without jumping through all the hoops of the FAT system!), and (for now) the volume name. I do care about the attr_dir flag, though not immediately.

The purpose of a file system is to be able to find a file. Sadly, while FAT32 can do that, it doesn't do it terribly easily or quickly. The files records in the directory, for example, are not sorted alphabetically, and that means that to find a particular file you have to check every entry until you find it (or run out of directory). I've chosen to use a system with two calls: fs_find_first sets up the internal pointers, and returns with those pointers indicating the first file entry's location, and fs_find_next which returns subsequent files.

For fs_find_first we enter with lba holding the sector number of the first sector of the directory, and when we return, transient will contain directory data, and fs_dir_ptr will point to the data in the transient section.

fs_find_next uses the same structures as prepared by fs_find_first and returns the same data. It will reload sectors and clusters as they are required.

Code: Select all

	bss
fs_dir_number	ds 1		; entry $0 to $7f
fs_dir_ptr		ds 2		; pointer to the entry in transient
fs_dir_sector	ds 1		; the sector in the cluster

	code
fs_find_first:
	jsr cf_set_lba			; set the current directory
	jsr cf_read				; and load the first sector
	stz fs_dir_number		; start with record zero
	stz fs_dir_sector		; and the first sector
	SHOWTRANS
fs_ff_01:
	lda fs_dir_number
	sta fs_dir_ptr+1		; we need to multiply the dir number by 32
	stz fs_dir_ptr			; so we * $100 and / 8
	lsr fs_dir_ptr+1
	ror fs_dir_ptr
	lsr fs_dir_ptr+1
	ror fs_dir_ptr
	lsr fs_dir_ptr+1
	ror fs_dir_ptr			; now add transient

	clc
	lda fs_dir_ptr
	adc # lo transient
	sta fs_dir_ptr
	lda fs_dir_ptr+1
	adc # hi transient
	sta fs_dir_ptr+1		; we are pointing to the desired record

	ldy #dir_attrib
	lda (fs_dir_ptr),y		; get the attribute byte
	and #attr_lfn
	cmp #attr_lfn			; is it a lfn fragment?
	beq fs_find_next		; yes, so try the next			
		; it's not a long file name, so what is it?
		lda (fs_dir_ptr)		; the first character of the name
		cmp #$e5
		beq fs_find_next		; it's deleted
		cmp #$05	
		beq fs_find_next		; it's deleted and Kanji
		cmp #$0
		bne fs_fn_20			; no more entries
		; we've found a record or got to the end and can stop
fs_find_next:
	; else get the next record
	inc fs_dir_number		; have we reached record $10?
	lda fs_dir_number
	cmp #$10
	bne fs_ff_01			; try the next entry
		inc fs_dir_sector		; otherwise we need the next sector
		lda fs_dir_sector	
		cmp #$8					; have we run out of cluster?
		bne fs_fn_12
		; FIXME load new cluster
		; meanwhile, just wait forever
fs_fn_11:
		bra fs_fn_11
fs_fn_12:
		; increment the lba
		inc lba		
		bne fs_fn_15
			inc lba+1
			bne fs_fn_15
				inc lba+2
				bne fs_fn_15
					inc lba+3
fs_fn_15:
		; load the new sector
		LYA lba		
		jsr hexuint32
		jsr crlf

		jsr cf_set_lba
		jsr cf_read
		stz fs_dir_number
		jmp fs_ff_01
fs_fn_20:
	; we're done
	rts
This is reasonably straightforward. The first part of the code loads the transient area with a sector, and sets up the pointer to the first entry. Then we look at the first character of the file name; there are some special characters hidden there too.

Specifically, if the value is $E5 then the file has been deleted. It's not physically deleted from the disc unless its cluster has been overwritten by a later file write, and replacing that $E5 will miraculously restore the file to life. I don't fully understand the $05 value: apparently, $E5 is a valid character value in Kanji encoding, so I guess you're only likely to see this on a Japanese file system, but it's an easy test. I assume some mechanism would indicate to the file system code whether the $E5 or $05 is used to indicate deletion, but here I just assume either.

Another special character is the value $00. That's possibly the most useful indicator, since it tells us that (a) this isn't a file entry, and (b) there aren't any more and you can stop looking now. But that's not checked here, it's up to whatever is calling fs_find_next to stop asking.

So if the file entry is indicating that we have a valid filename, we stop and return; we've found a file. Otherwise, we handle moving the file pointers a further 32 bytes and if necessary, getting the next sector. This is the entry point for fs_get_next.

One important point: the code has a problem if it runs out of cluster before it runs out of files. We don't yet know how to find the next cluster; that can be the next adventure. For now, we'll just spin in place if that happens.

Here's the code I'm using to get each directory entry

Code: Select all

	; call fs_find_first with fs_dir_cluster as directory wanted
	lda cluster_begin_lba
	sta lba
	lda cluster_begin_lba+1
	sta lba+1
	lda cluster_begin_lba+2
	sta lba+2
	lda cluster_begin_lba+3
	sta lba+3
	jsr fs_find_first
dir:
		lda (fs_dir_ptr)	; check the first character of name
		beq done			; quit if it's zero
		lda fs_dir_ptr
		sta put_ptr
		lda fs_dir_ptr+1
		sta put_ptr+1
		jsr putmemline		; show the first half of the record
		
		jsr fs_find_next
		bra dir				; until no more records
and here are the results:

Code: Select all

0220  45 43 55 20 20 20 20 20 43 20 20 20 00 8A 21 60 ECU     C   ..!`          
0280  54 45 53 54 20 20 20 20 20 20 20 10 00 3E 08 60 TEST       ..>.`          
02C0  45 45 50 52 4F 4D 20 20 43 20 20 20 00 96 21 60 EEPROM  C   ..!`          
0300  47 52 41 50 48 49 43 53 43 20 20 20 00 A1 21 60 GRAPHICSC   ..!`          
0340  49 31 38 4E 20 20 20 20 43 20 20 20 00 AC 21 60 I18N    C   ..!`          
0380  4D 41 49 4E 20 20 20 20 43 20 20 20 00 B8 21 60 MAIN    C   ..!`          
03C0  53 45 52 49 41 4C 20 20 43 20 20 20 00 C3 21 60 SERIAL  C   ..!`          
00000FC1                                                                        
0200  53 53 44 31 33 30 39 20 43 20 20 20 00 07 22 60 SSD1309 C   .."`          
0240  53 59 53 4D 45 4D 20 20 43 20 20 20 00 12 22 60 SYSMEM  C   .."`          
02A0  46 41 54 5F 46 49 7E 31 50 44 46 20 00 7B 1D 77 FAT_FI~1PDF .{.w          
0300  53 4C 49 44 45 53 7E 31 50 44 46 20 00 87 1D 77 SLIDES~1PDF ...w          
0380  55 4E 49 54 31 30 7E 31 50 44 46 20 00 9C 1D 77 UNIT10~1PDF ...w          
03E0  53 4C 49 44 45 53 7E 32 50 44 46 20 00 AD 1D 77 SLIDES~2PDF ...w          
00000FC2                                                                        
0260  55 4E 49 54 31 30 7E 32 50 44 46 20 00 C1 1D 77 UNIT10~2PDF ...w          
02E0  53 52 41 4D 5F 42 7E 31 50 44 46 20 00 0A 20 77 SRAM_B~1PDF .. w          
0340  43 46 5F 34 34 5F 7E 31 50 44 46 20 00 22 20 77 CF_44_~1PDF ." w          
03A0  43 46 5F 34 34 5F 7E 32 50 44 46 20 00 2E 20 77 CF_44_~2PDF .. w          
00000FC3                                                                        
0200  36 35 30 32 5F 43 7E 31 50 44 46 20 00 3A 20 77 6502_C~1PDF .: w          
0260  43 46 5F 42 4F 41 7E 31 50 44 46 20 00 4E 20 77 CF_BOA~1PDF .N w          
02E0  36 35 30 32 41 4E 7E 31 50 44 46 20 00 5E 20 77 6502AN~1PDF .^ w          
0340  4E 45 4F 4E 36 35 7E 31 41 53 4D 20 00 6A 20 77 NEON65~1ASM .j w          
0380  49 44 45 20 20 20 20 20 50 44 46 20 00 74 20 77 IDE     PDF .t w          
03E0  46 41 54 33 32 46 7E 31 50 44 46 20 00 82 20 77 FAT32F~1PDF .. w          
00000FC4                                                                        
0220  46 20 20 20 20 20 20 20 50 44 46 20 00 95 20 77 F       PDF .. w          
0280  48 45 58 43 4F 4E 7E 31 50 44 46 20 00 AA 20 77 HEXCON~1PDF .. w          
02C0  31 20 20 20 20 20 20 20 50 44 46 20 00 C0 20 77 1       PDF .. w          
0320  36 35 30 32 5F 43 7E 32 50 44 46 20 00 04 21 77 6502_C~2PDF ..!w          
0380  5A 38 30 2D 44 4F 7E 31 50 44 46 20 00 18 21 77 Z80-DO~1PDF ..!w          
03C0  57 36 35 43 32 32 20 20 50 44 46 20 00 27 21 77 W65C22  PDF .'!w          
00000FC5                                                                        
0200  49 53 36 32 43 32 35 36 50 44 46 20 00 3E 21 77 IS62C256PDF .>!w          
0260  53 59 4E 45 52 54 7E 31 50 44 46 20 00 4A 21 77 SYNERT~1PDF .J!w          
02A0  53 4E 37 34 48 43 7E 31 50 44 46 20 00 7D 21 77 SN74HC~1PDF .}!w          
0320  56 49 56 41 4C 44 7E 31 44 45 42 20 00 B5 21 77 VIVALD~1DEB ..!w          
0360  38 31 34 30 34 39 20 20 50 44 46 20 00 92 2B 77 814049  PDF ..+w          
03C0  41 50 58 38 32 33 7E 31 50 44 46 20 00 A8 2B 77 APX823~1PDF ..+w          
00000FC6                                                                        
0200  42 55 34 38 58 58 7E 31 50 44 46 20 00 C4 2B 77 BU48XX~1PDF ..+w          
0260  44 41 54 41 53 48 7E 31 50 44 46 20 00 19 2C 77 DATASH~1PDF ..,w          
02E0  44 41 54 41 53 48 7E 32 50 44 46 20 00 3D 2C 77 DATASH~2PDF .=,w          
0360  36 35 30 32 2D 4B 7E 31 5A 49 50 20 00 5B 2C 77 6502-K~1ZIP .[,w          
03C0  31 30 31 44 53 45 7E 31 50 44 46 20 00 67 2C 77 101DSE~1PDF .g,w          
00000FC7                                                                        
0220  31 30 31 44 2D 54 7E 31 50 44 46 20 00 78 2C 77 101D-T~1PDF .x,w          
0280  46 43 49 2D 31 30 7E 31 50 44 46 20 00 88 2C 77 FCI-10~1PDF ..,w          
02E0  31 30 31 44 53 45 7E 32 50 44 46 20 00 96 2C 77 101DSE~2PDF ..,w          
0360  4E 45 57 53 4C 45 7E 31 50 44 46 20 00 A6 2C 77 NEWSLE~1PDF ..,w          
03E0  5A 47 36 35 30 32 7E 31 50 44 46 20 00 0C 2D 77 ZG6502~1PDF ..-w          
And we did indeed run out of cluster; recalling the last episode you will see that the FAT record for the cluster, at offset $08, had the address of another FAT record. There's more.

But for now, we can see that we have isolated only those files which are undeleted, and have valid 8.3 file names, and live in the first cluster.

Neil

Re: Adventures in FAT32 with 65c02

Posted: Tue Dec 09, 2025 4:57 pm
by BigDumbDinosaur
barnacle wrote:
In the last gripping installment, our hero had decoded the VIB, located the FAT, and from that, he now knows where the treasure is. Or at least, he knows where there is a map to the treasure. It wouldn't be a good quest without a cryptic and confusing map to pore over, right?
Three cheers for our hero and a big BOO HISS for Microsoftness.  If nothing else, your (mis)adventures in decoding the tangled web that is FAT32 make me glad that the native filesystem for Linux has a lot less FAT than the mess you are messing with.  :D

In what you have presented so far, it becomes quite clear why versions of Windows that relied on FAT32 tended to be slow when it came to reading and writing files.  The combination of having to negotiate the FAT structure and then having to fetch/store clusters instead of individual blocks makes for a lot of disk I/O, even when the calling program only wants a few bytes from a file.  Ugh!

Re: Adventures in FAT32 with 65c02

Posted: Tue Dec 09, 2025 7:45 pm
by barnacle
To be fair, I think the sectors come from the original spinny disk layout... though it does seem that the system was designed by someone who may (allegedly) have been under the influence of recreational chemicals.

Here's a fun bit, to be more explored when I've got more software running: to get from one cluster to the next, you have to derive the FAT record relating to your current cluster, find the sector where that record lives, read the sector, get the link value, calculate the cluster number from that link, calculate the cluster's first sector from the cluster number, and Robert is your mother's brother.

It's not a lot of arithmetic, but it's all thirty-two bits, and Mr 65c02 isn't terribly quick at that, nor particularly code dense while doing it. (I've got 0x00000001 and 0x00000002 as manifest constants in the rom to help with code size.)

One way to avoid all the arithmetic might be to track the cluster number throughout, but it seems to me that if I'm trying to find a file, I shouldn't have to manage that; it should be hidden from the caller. What I want is something along the lines of
  • fopen()
  • fread()
  • fwrite()
  • fseek()
  • getcwd() (perhaps)
  • mkdir()
  • chdir()
  • rename()
and that's about it. But we'll see.

Neil

Re: Adventures in FAT32 with 65c02

Posted: Tue Dec 09, 2025 8:41 pm
by drogon
barnacle wrote:

One way to avoid all the arithmetic might be to track the cluster number throughout, but it seems to me that if I'm trying to find a file, I shouldn't have to manage that; it should be hidden from the caller. What I want is something along the lines of
  • fopen()
  • fread()
  • fwrite()
  • fseek()
  • getcwd() (perhaps)
  • mkdir()
  • chdir()
  • rename()
and that's about it. But we'll see.

Neil
open, read, write,close will get you a long way (ignore the semantics of 'f' open vs. open to start with). You forgot close above.

If you want Posix compliance you'll need permissions - basic read/write is good enough to have 'something' to stop accidental deletion or overwriting...

seek is the challenge. What happens if you seek past the end of a file... Is the file extended? Are holes allowed in the file (sparse files)? How do you copy a sparse file? Ho do you know where the next 'record' is? (FAT32 doesn't allow sparse files, so that might be moot anyway)

Filing system are fun... ;-)

-Gordon

Re: Adventures in FAT32 with 65c02

Posted: Tue Dec 09, 2025 8:56 pm
by BigDumbDinosaur
drogon wrote:
If you want Posix compliance you'll need permissions - basic read/write is good enough to have 'something' to stop accidental deletion or overwriting...

Being FAT32, I don’t think POSIX would get into the picture.  Mapping POSIX file attributes onto FAT32 is a murky endeavor—much of the complication in Samba comes from reconciling POSIX permissions with the much-more-limited equivalents in FAT filesystems.  Even more fun is working with POSIX file types that have no analog in FAT, e.g., symbolic links.

Re: Adventures in FAT32 with 65c02

Posted: Tue Dec 09, 2025 9:42 pm
by barnacle
I tend to agree, I'm not seeking POSIX compliance, just basic load, store, and suchlike. The attr_ro should provide a minimal level of safety from accidental delete, but with no concept of file ownership - and this is very definitely a single-user, single program design - there's no obvious way to include it.

Yes, I did forget fclose() - it was just a quick overview which will be amended later, but that's all the next level up. Hopping up and down the FAT cluster chain is occupying me at the moment. I probably need fflush(), too, at some level.

Seek past the end of a file? Return an error message, set the pointer to the last byte. Sparse files aren't as far as I know part of the FAT system and I have no intention of forcing them in.

Neil