The Directory structure
In the last gripping installment, our hero had decoded the VIB, located the FAT, and from that, he now knows where the treasure is. Or at least, he knows where there is a map to the treasure. It wouldn't be a good quest without a cryptic and confusing map to pore over, right?
But he knows where the root directory is. It's always the first cluster in the data section. (Actually, you don't need the FAT to find it, the VID data told us where to look, but trust me, he'll need the FAT later.) On my example CF, it's at sector
Here's what the first sector looks like:
Code: Select all
0200 41 65 00 63 00 75 00 2E 00 63 00 0F 00 0B 00 00 Ae.c.u...c......
0210 FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................
0220 45 43 55 20 20 20 20 20 43 20 20 20 00 8A 21 60 ECU C ..!`
0230 74 5B 74 5B 00 00 A8 78 EA 52 04 00 F2 7C 00 00 t[t[...x.R...|..
0240 E5 4E 54 49 54 4C 7E 31 20 20 20 10 00 3E 08 60 .NTITL~1 ..>.`
0250 74 5B 74 5B 00 00 08 60 74 5B 03 00 00 00 00 00 t[t[...`t[......
0260 41 74 00 65 00 73 00 74 00 00 00 0F 00 32 FF FF At.e.s.t.....2..
0270 FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................
0280 54 45 53 54 20 20 20 20 20 20 20 10 00 3E 08 60 TEST ..>.`
0290 74 5B 74 5B 00 00 39 60 74 5B 03 00 00 00 00 00 t[t[..9`t[......
02A0 41 65 00 65 00 70 00 72 00 6F 00 0F 00 92 6D 00 Ae.e.p.r.o....m.
02B0 2E 00 63 00 00 00 FF FF FF FF 00 00 FF FF FF FF ..c.............
02C0 45 45 50 52 4F 4D 20 20 43 20 20 20 00 96 21 60 EEPROM C ..!`
02D0 74 5B 74 5B 00 00 98 78 EA 52 0C 00 14 0D 00 00 t[t[...x.R......
02E0 41 67 00 72 00 61 00 70 00 68 00 0F 00 C2 69 00 Ag.r.a.p.h....i.
02F0 63 00 73 00 2E 00 63 00 00 00 00 00 FF FF FF FF c.s...c.........
0300 47 52 41 50 48 49 43 53 43 20 20 20 00 A1 21 60 GRAPHICSC ..!`
0310 74 5B 74 5B 00 00 91 78 EA 52 0D 00 C8 78 00 00 t[t[...x.R...x..
0320 41 69 00 31 00 38 00 6E 00 2E 00 0F 00 FE 63 00 Ai.1.8.n......c.
0330 00 00 FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................
0340 49 31 38 4E 20 20 20 20 43 20 20 20 00 AC 21 60 I18N C ..!`
0350 74 5B 74 5B 00 00 86 78 EA 52 15 00 80 A7 00 00 t[t[...x.R......
0360 41 6D 00 61 00 69 00 6E 00 2E 00 0F 00 41 63 00 Am.a.i.n.....Ac.
0370 00 00 FF FF FF FF FF FF FF FF 00 00 FF FF FF FF ................
0380 4D 41 49 4E 20 20 20 20 43 20 20 20 00 B8 21 60 MAIN C ..!`
0390 74 5B 74 5B 00 00 A5 4C F1 52 20 00 49 3D 00 00 t[t[...L.R .I=..
03A0 41 73 00 65 00 72 00 69 00 61 00 0F 00 80 6C 00 As.e.r.i.a....l.
03B0 2E 00 63 00 00 00 FF FF FF FF 00 00 FF FF FF FF ..c.............
03C0 53 45 52 49 41 4C 20 20 43 20 20 20 00 C3 21 60 SERIAL C ..!`
03D0 74 5B 74 5B 00 00 E5 78 EA 52 24 00 ED 0C 00 00 t[t[...x.R$.....
03E0 41 73 00 73 00 64 00 31 00 33 00 0F 00 0B 30 00 As.s.d.1.3....0.
03F0 39 00 2E 00 63 00 00 00 FF FF 00 00 FF FF FF FF 9...c...........
Once again, this is a view of the transient area. The file names are obvious, right? Capital letters, look like sensible file names, start in convenient columns? There's one at offset $20, something a bit odd at offset $40, one at offset $80, and so on. But there are actually directory entries every 32 bytes (that is, a directory entry is structure 32 bytes long).
The mystery items are MS's long file name structures, and I'm ignoring them completely. They always precede the default 8.3 filenames that we're interested in, they're complex to decode (and encode), they use MS's wide character set, and most significantly we can work exclusively in 8.3 filenames. There's nothing to stop us using long file names, and maybe some time in the future, but I'm trying to keep the code down to a manageable (and understandable size).
However we do have to look at them, because they have one thing in common with an 8.3 filename, and that's the attribute byte.
Code: Select all
; directory record field offsets:
dir_name equ 0 ; eleven chars, space filled
dir_attrib equ 11 ; one byte
dir_frstcluhi equ 20 ; two bytes
dir_wrttime equ 22 ; two bytes
dir_wrtdate equ 24 ; two bytes
dir_frstcllo equ 26 ; two bytes
dir_size equ 28 ; four bytes
; (other fields are ignored)
; the values of the attribute byte
attr_ro equ 1 ; this is a read only file
attr_hid equ 2 ; this is a hidden file (ignored)
attr_sys equ 4 ; this is a system file (ignored)
attr_vol equ 8 ; this is a volume name (ignored; there
; should be only one file on a volume with
; this attribute and its cluster fields are
; both set to zero)
attr_dir equ 16 ; this is a directory
attr_arc equ 32 ; this bit is set when a file is created,
; written to, or renamed to indicate to a
; backup utility that things have changed
attr_lfn equ attr_ro + attr_hid + attr_sys + attr_vol
; indicates that this is a long file name
; long file name data precedes the default
; 8.3 filename record and is ignored here
For practical purposes, I'm ignoring most of the bits in the attribute field. If the
attr_lfn bits are set, I know I'm looking at a long file name and I don't need to worry about that. Equally, I show without fear or favour files that are hidden, system files (this flag is intended to indicate to defragmentation programs that a particular file should remain in a particular location on the disc, apparently so that the OS can find it without jumping through all the hoops of the FAT system!), and (for now) the volume name. I do care about the
attr_dir flag, though not immediately.
The purpose of a file system is to be able to find a file. Sadly, while FAT32 can do that, it doesn't do it terribly easily or quickly. The files records in the directory, for example, are not sorted alphabetically, and that means that to find a particular file you have to check every entry until you find it (or run out of directory). I've chosen to use a system with two calls:
fs_find_first sets up the internal pointers, and returns with those pointers indicating the first file entry's location, and
fs_find_next which returns subsequent files.
For
fs_find_first we enter with
lba holding the sector number of the first sector of the directory, and when we return, transient will contain directory data, and
fs_dir_ptr will point to the data in the transient section.
fs_find_next uses the same structures as prepared by
fs_find_first and returns the same data. It will reload sectors and clusters as they are required.
Code: Select all
bss
fs_dir_number ds 1 ; entry $0 to $7f
fs_dir_ptr ds 2 ; pointer to the entry in transient
fs_dir_sector ds 1 ; the sector in the cluster
code
fs_find_first:
jsr cf_set_lba ; set the current directory
jsr cf_read ; and load the first sector
stz fs_dir_number ; start with record zero
stz fs_dir_sector ; and the first sector
SHOWTRANS
fs_ff_01:
lda fs_dir_number
sta fs_dir_ptr+1 ; we need to multiply the dir number by 32
stz fs_dir_ptr ; so we * $100 and / 8
lsr fs_dir_ptr+1
ror fs_dir_ptr
lsr fs_dir_ptr+1
ror fs_dir_ptr
lsr fs_dir_ptr+1
ror fs_dir_ptr ; now add transient
clc
lda fs_dir_ptr
adc # lo transient
sta fs_dir_ptr
lda fs_dir_ptr+1
adc # hi transient
sta fs_dir_ptr+1 ; we are pointing to the desired record
ldy #dir_attrib
lda (fs_dir_ptr),y ; get the attribute byte
and #attr_lfn
cmp #attr_lfn ; is it a lfn fragment?
beq fs_find_next ; yes, so try the next
; it's not a long file name, so what is it?
lda (fs_dir_ptr) ; the first character of the name
cmp #$e5
beq fs_find_next ; it's deleted
cmp #$05
beq fs_find_next ; it's deleted and Kanji
cmp #$0
bne fs_fn_20 ; no more entries
; we've found a record or got to the end and can stop
fs_find_next:
; else get the next record
inc fs_dir_number ; have we reached record $10?
lda fs_dir_number
cmp #$10
bne fs_ff_01 ; try the next entry
inc fs_dir_sector ; otherwise we need the next sector
lda fs_dir_sector
cmp #$8 ; have we run out of cluster?
bne fs_fn_12
; FIXME load new cluster
; meanwhile, just wait forever
fs_fn_11:
bra fs_fn_11
fs_fn_12:
; increment the lba
inc lba
bne fs_fn_15
inc lba+1
bne fs_fn_15
inc lba+2
bne fs_fn_15
inc lba+3
fs_fn_15:
; load the new sector
LYA lba
jsr hexuint32
jsr crlf
jsr cf_set_lba
jsr cf_read
stz fs_dir_number
jmp fs_ff_01
fs_fn_20:
; we're done
rts
This is reasonably straightforward. The first part of the code loads the transient area with a sector, and sets up the pointer to the first entry. Then we look at the first character of the file name; there are some special characters hidden there too.
Specifically, if the value is $E5 then the file has been deleted. It's not physically deleted from the disc unless its cluster has been overwritten by a later file write, and replacing that $E5 will miraculously restore the file to life. I don't fully understand the $05 value: apparently, $E5 is a valid character value in Kanji encoding, so I guess you're only likely to see this on a Japanese file system, but it's an easy test. I assume some mechanism would indicate to the file system code whether the $E5 or $05 is used to indicate deletion, but here I just assume either.
Another special character is the value $00. That's possibly the most useful indicator, since it tells us that (a) this isn't a file entry, and (b) there aren't any more and you can stop looking now. But that's not checked here, it's up to whatever is calling fs_find_next to stop asking.
So if the file entry is indicating that we have a valid filename, we stop and return; we've found a file. Otherwise, we handle moving the file pointers a further 32 bytes and if necessary, getting the next sector. This is the entry point for
fs_get_next.
One important point: the code has a problem if it runs out of cluster before it runs out of files. We don't yet know how to find the next cluster; that can be the next adventure. For now, we'll just spin in place if that happens.
Here's the code I'm using to get each directory entry
Code: Select all
; call fs_find_first with fs_dir_cluster as directory wanted
lda cluster_begin_lba
sta lba
lda cluster_begin_lba+1
sta lba+1
lda cluster_begin_lba+2
sta lba+2
lda cluster_begin_lba+3
sta lba+3
jsr fs_find_first
dir:
lda (fs_dir_ptr) ; check the first character of name
beq done ; quit if it's zero
lda fs_dir_ptr
sta put_ptr
lda fs_dir_ptr+1
sta put_ptr+1
jsr putmemline ; show the first half of the record
jsr fs_find_next
bra dir ; until no more records
and here are the results:
Code: Select all
0220 45 43 55 20 20 20 20 20 43 20 20 20 00 8A 21 60 ECU C ..!`
0280 54 45 53 54 20 20 20 20 20 20 20 10 00 3E 08 60 TEST ..>.`
02C0 45 45 50 52 4F 4D 20 20 43 20 20 20 00 96 21 60 EEPROM C ..!`
0300 47 52 41 50 48 49 43 53 43 20 20 20 00 A1 21 60 GRAPHICSC ..!`
0340 49 31 38 4E 20 20 20 20 43 20 20 20 00 AC 21 60 I18N C ..!`
0380 4D 41 49 4E 20 20 20 20 43 20 20 20 00 B8 21 60 MAIN C ..!`
03C0 53 45 52 49 41 4C 20 20 43 20 20 20 00 C3 21 60 SERIAL C ..!`
00000FC1
0200 53 53 44 31 33 30 39 20 43 20 20 20 00 07 22 60 SSD1309 C .."`
0240 53 59 53 4D 45 4D 20 20 43 20 20 20 00 12 22 60 SYSMEM C .."`
02A0 46 41 54 5F 46 49 7E 31 50 44 46 20 00 7B 1D 77 FAT_FI~1PDF .{.w
0300 53 4C 49 44 45 53 7E 31 50 44 46 20 00 87 1D 77 SLIDES~1PDF ...w
0380 55 4E 49 54 31 30 7E 31 50 44 46 20 00 9C 1D 77 UNIT10~1PDF ...w
03E0 53 4C 49 44 45 53 7E 32 50 44 46 20 00 AD 1D 77 SLIDES~2PDF ...w
00000FC2
0260 55 4E 49 54 31 30 7E 32 50 44 46 20 00 C1 1D 77 UNIT10~2PDF ...w
02E0 53 52 41 4D 5F 42 7E 31 50 44 46 20 00 0A 20 77 SRAM_B~1PDF .. w
0340 43 46 5F 34 34 5F 7E 31 50 44 46 20 00 22 20 77 CF_44_~1PDF ." w
03A0 43 46 5F 34 34 5F 7E 32 50 44 46 20 00 2E 20 77 CF_44_~2PDF .. w
00000FC3
0200 36 35 30 32 5F 43 7E 31 50 44 46 20 00 3A 20 77 6502_C~1PDF .: w
0260 43 46 5F 42 4F 41 7E 31 50 44 46 20 00 4E 20 77 CF_BOA~1PDF .N w
02E0 36 35 30 32 41 4E 7E 31 50 44 46 20 00 5E 20 77 6502AN~1PDF .^ w
0340 4E 45 4F 4E 36 35 7E 31 41 53 4D 20 00 6A 20 77 NEON65~1ASM .j w
0380 49 44 45 20 20 20 20 20 50 44 46 20 00 74 20 77 IDE PDF .t w
03E0 46 41 54 33 32 46 7E 31 50 44 46 20 00 82 20 77 FAT32F~1PDF .. w
00000FC4
0220 46 20 20 20 20 20 20 20 50 44 46 20 00 95 20 77 F PDF .. w
0280 48 45 58 43 4F 4E 7E 31 50 44 46 20 00 AA 20 77 HEXCON~1PDF .. w
02C0 31 20 20 20 20 20 20 20 50 44 46 20 00 C0 20 77 1 PDF .. w
0320 36 35 30 32 5F 43 7E 32 50 44 46 20 00 04 21 77 6502_C~2PDF ..!w
0380 5A 38 30 2D 44 4F 7E 31 50 44 46 20 00 18 21 77 Z80-DO~1PDF ..!w
03C0 57 36 35 43 32 32 20 20 50 44 46 20 00 27 21 77 W65C22 PDF .'!w
00000FC5
0200 49 53 36 32 43 32 35 36 50 44 46 20 00 3E 21 77 IS62C256PDF .>!w
0260 53 59 4E 45 52 54 7E 31 50 44 46 20 00 4A 21 77 SYNERT~1PDF .J!w
02A0 53 4E 37 34 48 43 7E 31 50 44 46 20 00 7D 21 77 SN74HC~1PDF .}!w
0320 56 49 56 41 4C 44 7E 31 44 45 42 20 00 B5 21 77 VIVALD~1DEB ..!w
0360 38 31 34 30 34 39 20 20 50 44 46 20 00 92 2B 77 814049 PDF ..+w
03C0 41 50 58 38 32 33 7E 31 50 44 46 20 00 A8 2B 77 APX823~1PDF ..+w
00000FC6
0200 42 55 34 38 58 58 7E 31 50 44 46 20 00 C4 2B 77 BU48XX~1PDF ..+w
0260 44 41 54 41 53 48 7E 31 50 44 46 20 00 19 2C 77 DATASH~1PDF ..,w
02E0 44 41 54 41 53 48 7E 32 50 44 46 20 00 3D 2C 77 DATASH~2PDF .=,w
0360 36 35 30 32 2D 4B 7E 31 5A 49 50 20 00 5B 2C 77 6502-K~1ZIP .[,w
03C0 31 30 31 44 53 45 7E 31 50 44 46 20 00 67 2C 77 101DSE~1PDF .g,w
00000FC7
0220 31 30 31 44 2D 54 7E 31 50 44 46 20 00 78 2C 77 101D-T~1PDF .x,w
0280 46 43 49 2D 31 30 7E 31 50 44 46 20 00 88 2C 77 FCI-10~1PDF ..,w
02E0 31 30 31 44 53 45 7E 32 50 44 46 20 00 96 2C 77 101DSE~2PDF ..,w
0360 4E 45 57 53 4C 45 7E 31 50 44 46 20 00 A6 2C 77 NEWSLE~1PDF ..,w
03E0 5A 47 36 35 30 32 7E 31 50 44 46 20 00 0C 2D 77 ZG6502~1PDF ..-w
And we did indeed run out of cluster; recalling the last episode you will see that the FAT record for the cluster, at offset $08, had the address of another FAT record. There's more.
But for now, we can see that we have isolated only those files which are undeleted, and have valid 8.3 file names, and live in the first cluster.
Neil