Adventures in FAT32 with 65c02

barnacle · Post by **barnacle** » Fri Jan 09, 2026 5:32 am

Hopefully it won't be too painful. I have a handful of conversion routines: clus_to_1st_sec(), sec_to_cluster(), clus_to_fat_rec(), and fs_next_cluster which mediate the various addresses and counts, so once they're changed, it's _just_ a question of finding all the places I've had to add or subtract that offset and sort them out. Hopefully not too many places.

Neil

barnacle · Post by **barnacle** » Fri Jan 09, 2026 3:03 pm

Code: Select all

NeoDOS32 v0.0215

It looks like only two routines needed changing...

Code: Select all

;-----------------------------------------------------------------------
; clus_to_1st_sec
;
; Returns the LBA of the first sector of a cluster
;
; On entry: lba contains the cluster number (i.e. the FAT record number)
; On exit: lba contains the first sector of that cluster
; NOTE: assumes eight sectors per cluster; does not check
; 1st_sector = ((cluster - 2) * 8) + cluster_begin_lba

clus_to_1st_sec:
	LYA two
	jsr u32_sub				; lba = record minus two	
	ldy #3					; we're assuming 8 sectors per cluster
fs_ct_1:					; really we should check the variable
	asl lba
	rol lba+1
	rol lba+2
	rol lba+3				; multiply by two
	dey						; three times
	bne fs_ct_1				; lba = lba * 8	
	LYA cluster_begin_lba
	jsr u32_add				; and that should be that, result in lba
	rts

;-----------------------------------------------------------------------
; sec_to_clus
;
; Return the cluster number for any sector within that cluster
; On entry: lba contains the sector's LBA
; On exit: lba contains the cluster number
;
; cluster = ((lba - cluster_begin_lba) / 8) + 2

sec_to_cluster:
	LYA cluster_begin_lba
	jsr u32_sub
	ldy #3					; again, assuming 8 sectors per cluster
fs_stc_1:
	lsr lba+3
	ror lba+2
	ror lba+1
	ror lba					; divide by two
	dey						; three times
	bne fs_stc_1			; lba = lba / 8
	LYA two
	jsr u32_add				; the first cluster is #2
	rts

With a subtraction of two from the cluster count at the beginning of clus_to_1st_sec() and an addition at the end of sec_to_cluster() to keep things in order.

Microsoft wrote:

There is considerable confusion over exactly how this works, which leads to many “off by 1”, “off by 2”, “off by 10”, and “massively off” errors.

Gee, thanks. Now I can get on with trying to allocate a new cluster, which might just have started to work on its own.

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Jan 09, 2026 3:47 pm

barnacle wrote:

Microsoft wrote:

There is considerable confusion over exactly how this works, which leads to many “off by 1”, “off by 2”, “off by 10”, and “massively off” errors.

Gee, thanks. Now I can get on with trying to allocate a new cluster, which might just have started to work on its own.

Darn nice of them to tell you what you already know.

barnacle · Post by **barnacle** » Mon Jan 12, 2026 4:07 pm

I've rewritten the interrupt timer tick to update an ms_time structure directly, rather than updating H:M:S and converting. Because MS is involved, it's a bit convoluted; the 65c02 is not really suited to bitfield manipulation in 16-bit variables. Yet here we are...

Code: Select all

rtc_update:
	bit VIAT1CL			; reenable interrupts
	pha					; acc and x the only registers used
	phx
	inc fiftieths
	lda fiftieths
	cmp 100			; not yet two seconds?
	bne rtc_u_x			; nope
		stz fiftieths
		inc ms_time			; seconds count is bit 4-0
		lda ms_time
		and #%00011111		; isolate seconds
		cmp #30				; not a minute yet?
		bne rtc_u_x			; then we're done
			lda ms_time
			and #%11100000		; else clear the seconds
			clc
			adc #%00100000		; increment the minutes
			sta ms_time
			lda ms_time+1
			adc #0				; 60 mins = xxxx x111 1000 0000
			sta ms_time+1
			lda ms_time
			cmp #$80
			bne rtc_u_x			; not up to sixty yet
				lda ms_time+1		; else test the high byte
				and #%00000111
				cmp #%00000111
				bne rtc_u_x			; it's not an hour yet
					stz ms_time			; seconds are zero anyway
					lda ms_time+1
					and #%11111000		; clear the minutes
					clc
					adc #%00001000		; increment the hours
					sta ms_time+1
					cmp #$c0			; 24 << 3
					bne rtc_u_x			; not 24 yet
						stz ms_time+1	; else zero it
			
rtc_u_x:
	plx
	pla					; restore acc
	rti					; restores binary mode etc

ms_time uses the top five bits of a sixteen bit variable for the hour; the next six bits for the minute; and the final five bits for the seconds divide by two.

The curious will have observed that it does not update the date. That's next on the agenda, but what with leap years and all, it's a bit ticklish. I think 2100 will have to live with being a leap year, even though it's not; the whole system crashes in 2107 anyway.

As an aside, I'm writing up a tl:dr version of what I've discovered so far about FAT32. At the moment it's formatted as
a web page and the source is seven hundred lines, so it's quite long itself. When I'm happy, I'll stick it up as a pdf for reference.

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Jan 12, 2026 4:40 pm

barnacle wrote:

I've rewritten the interrupt timer tick to update an ms_time structure directly, rather than updating H:M:S and converting. Because MS is involved, it's a bit convoluted; the 65c02 is not really suited to bitfield manipulation in 16-bit variables.

Micro-$oft’s rendition of time in machine format is just plain stupid, not to mention brain-dead. Why they didn’t adopt the seconds-since-long-ago style of timekeeping is completely unfathomable, to me, at least.

Quote:

The curious will have observed that it does not update the date. That's next on the agenda, but what with leap years and all, it's a bit ticklish. I think 2100 will have to live with being a leap year, even though it's not...

You can easily check for a leap year with two right-shifts of the LSB of the year field. You can check for a centurial leap year, e.g., 2400, by noting if the LSB is $00 and if so, right-shifting the MSB twice. It’s somewhat annoying to have to do that, since the machine representation of the date and time shouldn’t be formed out of discrete fields. It’s just bad design on Micro-$oft’s part.

Quote:

...the whole system crashes in 2107 anyway.

You’ve got 81 years to devise a workaround.

barnacle · Post by **barnacle** » Mon Jan 12, 2026 5:21 pm

The obvious way would be the way many sorted y2k: dates less than some boundary are considered to be in the new epoch. But I doubt it will be an issue for me.

Brain dead indeed, but that's the format required for the time fields in the directory records. So easier (and saving six zero page bytes) to do it this way. By cunning manipulation of the comparison of fiftieths, I have confirmed that it works for every two seconds of a day and then resets as advertised.

Neil

(actually, it crashes at the last midnight of 2107).

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Jan 12, 2026 9:22 pm

barnacle wrote:

(actually, it crashes at the last midnight of 2107).

Oh, so you actually have 82 years to devise a solution!

barnacle · Post by **barnacle** » Mon Jan 12, 2026 9:59 pm

_I_ don't have to devise a solution. MS have to devise a solution. Which is backwards compatible...

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Jan 13, 2026 5:37 am

barnacle wrote:

_I_ don't have to devise a solution. MS have to devise a solution. Which is backwards compatible...

That ain’t gonna happen.

Micro-$oft is in the business of creating problems, not solving them.

barnacle · Post by **barnacle** » Tue Jan 13, 2026 3:29 pm

Allocating a New Cluster
Allocating new clusters for data storage is a critical part of the file system, and it's not as simple as it looks. Break this, and you've irrecoverably borked your file system data.

The FAT is the only thing which knows where there is any space on the system, and ideally, you navigate that using a single system variable: FSI_Nxt_Cluster. That tells you which cluster record to start searching; it seems to hold the address of the last allocated cluster in spite of its name. Another variable, FSI_Free_Count, is intended to track how many clusters remain unallocated. However, both of those variables should be checked to ensure that they are smaller than the total number of clusters, and if not, assume them to be damaged and must be recalculated.

Those variables live in the FSI sector, which usually follows the VIB, so it is necessary to read that sector to access them (though my software maintains a local copy of FSI_Nxt_Cluster, read during initialisation, and it must be written back to the FSI after every change).

Strategies
When the disc is newly formatted, the FAT entries (apart from entries 0 and 1) are zero. A zero value means 'this sector is not allocated' (the position of the FAT entry defines the cluster it's managing; the value of the entry manages the cluster chain).

So that leads to the highest level, and fastest, allocation strategy: starting with FSI_Nxt_Cluster , step through the subsequent entries until you find one that is zero. If you get to the end of the FAT sectors, you're out of space, so quit with an error. On a lightly used disk, that first free entry will likely be the next entry in sequence. That's a very fast test, though of course you have to read and write a couple of sectors with the updated data.

Using that strategy, you will use all the available space on the disk - provided that no file is ever deleted. As mentioned earlier, deleting a file does not free its allocated space on the disk; it merely marks the directory entry as 'deleted'. So you can end up deleting a 100MB file and still having no free space.

An external mechanism to recover that space is to copy all the files using another OS and machine. Once you have copied the files, format the disk, and then copy them all back. That will result in a FAT with unallocated entries (zeros) which can now be allocated; the deleted files won't have been transferred back and forth. This is obviously not something one can do in real time. Whether it's worth it or not is a question which only you can answer; if you are creating and deleting a lot of files, the answer is possibly no.

A more nuanced strategy, then, is to free and then reuse the clusters associated with a deleted file. To maintain the ability of undelete utilities to work, one would wish not to do that until as late as possible; ideally, not until they're actually required. In this strategy, we will use the entire disk, including the space released by deleted files.

But... there's always a but.

There is a huge question of how to find deleted files. A deleted file has all its original metadata with the exception of the first character of its name. We can't just zero out that entry if we reallocate its clusters; any subsequent use of the directory would be broken (later files would be invisible and potentially overwritten). However, we do need to mark it in some way to indicate that we've already re-used its clusters; otherwise, we might end up using them twice and that would not be good. However, recall that a newly created file has a file size of zero, and a cluster pointer pointing to zero. So we can modify the directory entry of a deleted file, once we reallocate, so that it reverts to looking like a file which has been created and deleted before any use. The directory entry remains FAT32 legal and we can identify that it's no longer got allocated clusters.

One other thing to remember is that we don't want to keep writing to the same sectors, if we can afford it. Solid state storage has a limited number of writes to a given section before it stops working, so we don't want to keep reusing the same sectors. CF cards have a greater or lesser amount of built-in wear leveling, mostly depending on price, but even so it would be bad practice to keep using the same sectors over and over.

A strategy to re-use 'deleted' clusters

Recursively search the root directory and its children for the first deleted file with non-zero filesize and cluster address. If we get to the end of the FAT, we have no free space; quite with an error.
Set the file length for the directory entry to zero.
Note the cluster start address. Locate that cluster entry in the FAT.
Follow the cluster chain in the FAT, setting each entry to zero (unallocated).
Mark the directory entry cluster link to zero.

With this done, one can use the following allocation strategy: starting with FSI_Nxt_Cluster , step through the subsequent entries until you find one that is zero. If you get to the end of the FAT sectors, perform a search for a deleted file that can be re-used as above. If you have been able to reallocate a cluster, reset FSI_Nxt_Cluster to the start of the FAT and search as previously.

Note that these strategies have not dealt with tracking available clusters in FSI_Free_Count ; this must also be done when clusters are deallocated and reallocated.

A Simpler Approach?
The strategies discovered above both have advantages and disadvantages. The first, simple approach is very fast, but does not allow all the disk to be used. The more complex second approach is much more complex and so error prone, but does allow the full use of the disk. It will also be susceptible to slower and slower searches as the disk fills, when it may be necessary eventually to search the entire FAT.

A third approach sacrifices the ability to use undelete utilities. Instead of simply marking deleted files as deleted, we reallocate the memory at delete time (marking the directory entry with zero for file size and cluster pointers). We don't change FSI_Nxt_Cluster so we keep writing new files further along the FAT, but we do update FSI_Free_Count.

What we're doing with this hybrid approach is making holes in the FAT as we delete files, but we continue to add new
clusters sequentially. So when we get to the end of the FAT, we can start at the beginning and look for the holes that deletion will have left (if we've deleted anything. If not, time to buy a bigger disk!)

This approach means that we know which clusters can be reallocated at delete time, so we don't need to do a deep search to find them at an inconvenient later time. The delete function takes a little longer, but allocating a new cluster is much faster.

Any thoughts? I think I like the third approach best, though for testing I might just start with the first. The second I think is how MS do it (and Linux) and it looks like a nightmare for an eight-bit machine with limited space for disk buffers.

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Jan 13, 2026 6:35 pm

barnacle wrote:

A more nuanced strategy, then, is to free and then reuse the clusters associated with a deleted file.

In principle, that is what happens with most UNIX-type filesystems, although with a little more activity due to it being possible for a file to have multiple links (directory names). Before the file is irretrievably destroyed, the link count (which is in the inode associated with the file) is decremented. If it is not zero, the file stays.

Assuming the file’s link count is reduced to zero, the inode’s table of contents is scanned to identify and deallocate the data blocks that were in use by the file. Once that has been done, the inode itself is returned to the free inode pool. As filesystem disk access is buffered, the required medium changes are likely occur sometime in the future. Such behavior tends to give the impression that the entire operation happens more quickly than it does.

Quote:

One other thing to remember is that we don't want to keep writing to the same sectors, if we can afford it. Solid state storage has a limited number of writes to a given section before it stops working, so we don't want to keep reusing the same sectors. CF cards have a greater or lesser amount of built-in wear leveling, mostly depending on price, but even so it would be bad practice to keep using the same sectors over and over.

FAT, of course, was devised long before solid-state disks came on the scene, so there was no thought given to scattering cluster usage to reduce wear-out. The problem with trying to cluster-scatter is severe fragmentation will develop from delaying cluster deallocation following file deletion.

I think that tailoring a FAT filesystem driver to scatter clusters is largely unwarranted effort. If the system is using a mechanical disk, wearout is a non-problem, but severe fragmentation can result. On the other hand, the producers of solid-state storage are well-aware of the wearout problem and have already devised solutions that are more-or-less effective in delaying its onset. That being the case, cluster-scattering will have no particular value, although the resulting fragmentation will become a non-problem.

Either way, the extra monkey motion required of the filesystem driver to scatter clusters will negatively affect performance. Don’t forget you are doing this with an eight-bit MPU in a system with little-to-no room for a disk buffer pool.

Quote:

A strategy to re-use 'deleted' clusters...

Sounds slower than a snake in a snowstorm. When you delete a file, you already are in the area of the filesystem structure that will point you to the clusters to be deallocated. Immediate deallocation is certainly a faster route to travel than walking the filesystem looking for clusters to free up.

Quote:

A third approach sacrifices the ability to use undelete utilities.

Outside of the various FAT filesystems, one doesn’t find much support for undelete. The fact is, the undelete capability of FAT is more accidental than intentional—it’s an artifact of how FAT works.

Outside of FAT, an undelete capability, if present, usually involves “remembering” the details of the most recently deleted file to re-establish its presence. Of course, if other filesystem activity occurs during the period between the delete and undelete operations, there’s no telling what may happen. So while an undelete feature may seem to be a desirable thing, you need to temper that with the reality of the significant grunt work required to scan the FAT some time in the future to reclaim clusters belonging to deleted files.

Quote:

I think I like the third approach best, though for testing I might just start with the first. The second I think is how MS do it (and Linux) and it looks like a nightmare for an eight-bit machine with limited space for disk buffers.

Yeah, trying to do it the Linux way with an unbuffered filesystem is going to be an exercise in patience for the (impatient) user as the MPU churns its way through a sea of clusters, pointers and what-have-you. I’ve messed with this sort of thing in the past on eight-bit hardware (not with FAT, however) and eventually realized that there are some tasks that require use of a bigger hammer.

barnacle · Post by **barnacle** » Tue Jan 13, 2026 7:03 pm

That was basically my conclusion. When you're deleting a file, you're already looking at its directory entry and so you have the necessary cluster pointers handy.

The CF documentation I have states explicitly that you should not attempt wear leveling at a sector level; let the CF look after it.

And as you point out, this has to work on a slow 8-bit system with limited space for buffers. In essence, the file system will be unbuffered for everything except actively open files which will be rewritten as and when necessary; each open file will only have a single sector available to it. There's a separate sector - transit - which is used at system level and for 'user operations' that don't require a dedicated sector. Like it or not, there's a lot of juggling going on.

Neil

barnacle · Post by **barnacle** » Wed Jan 14, 2026 6:59 am

Hmm. Simultaneously juggling four byte pointers and four byte values while following the chain and minimising writes to the FAT is proving, um, challenging... further thinking is required.

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Jan 14, 2026 4:30 pm

barnacle wrote:

Hmm. Simultaneously juggling four byte pointers and four byte values while following the chain and minimising writes to the FAT is proving, um, challenging... further thinking is required.

I can imagine it.

You are constrained by fundamental limitations of eight-bit hardware running in a comparatively small address space. Any filesystem, even a simple one like that on a Commodore 1541 floppy drive, has a lot of moving parts that must be correctly juggled in the correct order. That, along with the tedium associated with picking apart 32-bit LBAs, pointers and what-have-you for byte-at-a-time manipulation, creates multiple opportunities to cook up a big bowl of spaghetti code.

When I was mucking about with this sort of thing on the 65C02 years ago, the number one difficulty I had was in organizing and manipulating the required pointer sets. There always seemed to be more data to be stored than places to put it. At the time, it was my first foray into doing a filesystem, and there was much to be learned. I will hasten to add that slogging my way through it all gave me newfound respect for the pioneers who had to do it without the benefit of prior art, modern programming tools and near-instantaneous feedback during testing.

The number one takeaway I got from that experience is a filesystem driver is best structured as a set of “black box” functions that localize primitive operations. That’s tough to do with the 65C02—no easy way to use the stack for parameter passing or as ephemeral storage. Adding to that, everyone and his uncle is competing for zero page space. However, it is possible to achieve some compartmentalization with careful analysis and top-down planning.

I’m sure you will work it out, but achieving good throughput will remain a challenge, at least until you are able to run your system at double-digit clock rates. However, I think once your code comes together and you can do all required filesystem ops, the satisfaction of achievement will be immense.

barnacle · Post by **barnacle** » Wed Jan 14, 2026 8:44 pm

It's getting there, I think.

I agree with you completely about black boxes; the problem is handling black boxes that need two 32-bit input variables. I've dealt with that with the u32_ operations where one parameter is in lba and the other is pointed to by Y:A; I may be able to extend that to deal with current issues. Speed is not a _huge_ issue for a deallocate operation; it occurs only when a file is deleted. But I don't want to have to do 128 sector writes on the same sector, as might be the case for a long file...

This may do the trick; I'm still thinking about it so comments only at the moment, not proper code:

Code: Select all

;	convert cluster to fat record location
;	extract fat record to temp
;	write zero to fat record
;	temp is next in chain; is it in the same sector?
;		if no, write segment back to fat1 and fat2
;	is temp end-of-chain?
;		if no, cluster = temp and repeat
;		else, write segment back to fat1 and fat2

temp is a 32 bit temporary variable, ideally zero page (though I may have to do some register colouring; I'm already up to $36 plus 27 bytes of string space) and I want to leave as much as possible for user programs. And I hate to think how deep the stack is getting, pushing and popping the 32 bit lba.

The line that's going to be tricky is

Code: Select all

;	temp is next in chain; is it in the same sector?

It may well need further variables.

This would be _so_ much easier with a 32-bit processor.

Neil

Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02

Re: Adventures in FAT32 with 65c02