Re: About the humble Floppy Disk…
Posted: Wed Jan 30, 2019 3:19 pm
To inform more precisely what the hardware needs to look like, I decided to think about the software (which until now has been handwaved). I'm assuming here that we've got a captive 6502 running at a fairly high speed, as in the Commodore drives, making the drive an "intelligent peripheral" that could be interfaced to many different machines. Since the mechanical drive has a TTL interface, it makes sense to run this captive CPU at 5V which should enable a 12MHz clock (trivially divided from the 24MHz master). Our job is to encode and decode EFM and locate sync marks as close to realtime as is practical.
Encoding EFM should be easy, so long as a suitable method of calculating pre-compensation can be found (and I have little doubt that it can be). A simple table of delta-time values can be built and iterated over. On the outer tracks at least, the raw table is likely to work without applying pre-compensation. Reading this table should be fast enough to do in realtime, with only a small FIFO to cover accidents.
The EFM sync mark is a 24-symbol sequence which uniquely contains two maximum-length gaps between flux reversals. The simplest and fastest code to recognise this pattern could be: This assumes that flux gaps longer than those in the sync sequence, as would be found on unformatted sections of the track, read as zeroes, which might not be true - but it requires just eight cycles to identify the second gap as completing the sync sequence, which at 12MHz is considerably shorter than the shortest flux gap the medium can support immediately after the sync sequence (ie. 2µs, or 24 cycles at 12MHz). This bodes well, as we can add a check for an over-length gap (another 4 cycles) and still be ready before the next pulse arrives, even if we reduce the clock speed to 8MHz. So it becomes reasonable to have the pulse detector indicate "timeout with no pulse" by returning 255, which should be straightforward to arrange in hardware.
So we can precisely identify the moment at which a sync mark arrives, without having to do a "track read" from the index pulse. In principle we could also recognise any other single symbol in a similar manner.
Implementing a decoding state machine in (effectively) the Program Counter is possible and potentially very fast, but rather wasteful of code space. There's a maximum of 5 flux reversals per decoded byte, so theoretically we could anticipate 1280 states in the decoder, each with at least 7 bytes of code to identify the next state to transition to (some of which will be error states). It might be more efficient to implement a data-driven state machine once past the sync mark.
Such a state machine would have 256 or so "leaf states", one for each valid data byte value, one each to identify the two metadata markers defined in EFM, three to handle the two merging symbols (which may be both blank, or at most one containing a reversal), and at least one error state in which a recovery operation can be scheduled. Such recovery might include deeper examination of the delta-time values to obtain a maximum-likelihood decoding, application of an error-correcting code (Reed-Solomon would be applicable; since EFM is a fixed-length code, the bytes won't easily get entirely out of step, and the errored byte can be marked as an erasure), or simply a second attempt at reading the whole sector.
At a valid-data-byte leaf state, it is only necessary to record the decoded byte, advance the pointer and check it against the sector length, then subtract the applicable gap value from the current delta-time (which will overlap the end of the 14-symbol group) and proceed to the merging-symbol states to align with the next group. From the merging-symbol states, a properly trimmed delta-time gap is used to re-enter the root state of the machine. The other leaf states require similar housekeeping, except that there is no decoded byte to record.
The inner states of the machine (including the root) have an entirely different function; they must choose between multiple following states based on the flux gap presently in question. Valid states may include gaps of 1-11 symbols - the shortest of these only occurring after trimming from a previous leaf state - with longer gaps being explicitly excluded from the encoding and constituting error states. Since each symbol is nominally 16 clocks long, it seems feasible to mask the value and implement this decision in an interleaved lookup table: This code however requires 36 CPU cycles, which means it will fall behind the denser sequences of flux reversals which can appear every 24 CPU cycles at 12MHz. Unless further optimisations can be found to make this code run in realtime, it will be necessary to implement a very deep read FIFO (ie. a whole-track buffer) to ensure that pathological data can be read correctly. As noted previously, this can be done with a readily-available 256Kx8b SRAM.
Another option is to exploit the 16-bit capabilities of a 65816 in place of a 6502, which would avoid the awkward shuffling through the X register to avoid disturbing the state pointer while loading the second half of its new value, as well as the second use of the expensive indirect-indexed addressing mode. That appears to save 6 cycles in a direct translation, which isn't quite enough. However, we don't need to use indirect-indexed with the '816, as we can use X directly as a 16-bit pointer, and the Direct Page Register of all things as the index! This abuse results in the following: …which takes 25 cycles. This may be close enough to permit a smaller FIFO, provided the leaf-state handling can be assured to be very fast. Further, slightly less clean optimisations can be envisaged, such as unrolling the loop so as to alternate the use of X and Y, and encoding the "leaf" flag into the high-order bit so as to elide the CPX/Y instructions - that would bring it down to 20 cycles per state transition, which is comfortably within requirements.
With that said, there is one key advantage to having a full-sized track buffer; the ability to perform a "forensic read" of a track, from index pulse to index pulse. That's the sort of capability normally reserved for the likes of the Catweasel FDC, which is reputed to be expensive and has closed firmware. So it may be sufficient to merely run the state machine fast enough to not be visibly slow.
Encoding EFM should be easy, so long as a suitable method of calculating pre-compensation can be found (and I have little doubt that it can be). A simple table of delta-time values can be built and iterated over. On the outer tracks at least, the raw table is likely to work without applying pre-compensation. Reading this table should be fast enough to do in realtime, with only a small FIFO to cover accidents.
The EFM sync mark is a 24-symbol sequence which uniquely contains two maximum-length gaps between flux reversals. The simplest and fastest code to recognise this pattern could be:
Code: Select all
FindSync:
LDA rfifo
CMP #gap10_11
BCC FindSync
LDA rfifo
CMP #gap10_11
BCC FindSync
FoundSync:
So we can precisely identify the moment at which a sync mark arrives, without having to do a "track read" from the index pulse. In principle we could also recognise any other single symbol in a similar manner.
Implementing a decoding state machine in (effectively) the Program Counter is possible and potentially very fast, but rather wasteful of code space. There's a maximum of 5 flux reversals per decoded byte, so theoretically we could anticipate 1280 states in the decoder, each with at least 7 bytes of code to identify the next state to transition to (some of which will be error states). It might be more efficient to implement a data-driven state machine once past the sync mark.
Such a state machine would have 256 or so "leaf states", one for each valid data byte value, one each to identify the two metadata markers defined in EFM, three to handle the two merging symbols (which may be both blank, or at most one containing a reversal), and at least one error state in which a recovery operation can be scheduled. Such recovery might include deeper examination of the delta-time values to obtain a maximum-likelihood decoding, application of an error-correcting code (Reed-Solomon would be applicable; since EFM is a fixed-length code, the bytes won't easily get entirely out of step, and the errored byte can be marked as an erasure), or simply a second attempt at reading the whole sector.
At a valid-data-byte leaf state, it is only necessary to record the decoded byte, advance the pointer and check it against the sector length, then subtract the applicable gap value from the current delta-time (which will overlap the end of the 14-symbol group) and proceed to the merging-symbol states to align with the next group. From the merging-symbol states, a properly trimmed delta-time gap is used to re-enter the root state of the machine. The other leaf states require similar housekeeping, except that there is no decoded byte to record.
The inner states of the machine (including the root) have an entirely different function; they must choose between multiple following states based on the flux gap presently in question. Valid states may include gaps of 1-11 symbols - the shortest of these only occurring after trimming from a previous leaf state - with longer gaps being explicitly excluded from the encoding and constituting error states. Since each symbol is nominally 16 clocks long, it seems feasible to mask the value and implement this decision in an interleaved lookup table:
Code: Select all
StateTrans:
LDA rfifo
PHA ; use stack as circular history buffer for error handling
AND #$F8 ; retains one fractional bit for rounding
TAY
LDA (state),Y
TAX
INY
LDA (state),Y
STX state
STA state+1
CMP #maxinnerstate
BCC StateTrans
LeafState:
Another option is to exploit the 16-bit capabilities of a 65816 in place of a 6502, which would avoid the awkward shuffling through the X register to avoid disturbing the state pointer while loading the second half of its new value, as well as the second use of the expensive indirect-indexed addressing mode. That appears to save 6 cycles in a direct translation, which isn't quite enough. However, we don't need to use indirect-indexed with the '816, as we can use X directly as a 16-bit pointer, and the Direct Page Register of all things as the index! This abuse results in the following:
Code: Select all
StateTrans:
; A is 8-bit, high half pre-cleared, indexes are 16-bit
LDA (rfifo) ; absolute, not indirect
PHA ; use stack as history buffer for error handling - must reset stack pointer regularly
AND $F8 ; immediate 8-bit
TCD
LDY {0+XX} ; direct-page indexed, not indirect
TYX
CPX maxinnerstate ; immediate 16-bit
BCC StateTrans
LeafState:
With that said, there is one key advantage to having a full-sized track buffer; the ability to perform a "forensic read" of a track, from index pulse to index pulse. That's the sort of capability normally reserved for the likes of the Catweasel FDC, which is reputed to be expensive and has closed firmware. So it may be sufficient to merely run the state machine fast enough to not be visibly slow.