I has the code. Took me a long while to understand precisely how the canonical code works, but when you see the codes laid out it becomes clear how to figure out to do the decoding without having to scan the entire code table.
Here's a blog post where I go into a bit more detail.
https://github.com/djangojames/exploratory/blob/master/blog.md#722018-efficient-huffman-decoding-in-8-bitsThis code only supports 16 bit Huffamn codes, but it can be made to support 24 bits by adding another byte to the accumulator, another ROL, CMP, SBC and the extra bytes in the table at the end.
I apologise in for the code addresses not being annotated inline (my disassembler doesn't do that...)
To read a symbol we do JSR huffman-next until we read our specific EOS indicator. Before we do any of that, we need to set the address, HUFFMAN-PTR, to where we want to read from and additionally set HUFFMAN-BITS to 1.
It has been unit tested with a bunch of strings round-trip fashion so it does work... I'm going to port my adventure game to use it over the next few days.
Code:
OUTPUT-STRING-INDEX -> 0000
HUFFMAN-POP-TABLE -> 0001
HUFFMAN-PTR -> 0003
HUFFMAN-BITS -> 0005
DECODER:ACC-HI -> 000C
DECODER:ACC-LO -> 000D
DECODER:NEXT-BYTE -> 000E
Code:
HUFFMAN-NEXT 070E A000 LDY #$00
0710 840C STY $0C ;clear accumulator
0712 840D STY $0D
0714 88 DEY ;Start at zero in pop table, following the first INY
DECODER:FETCH-BIT 0715 C605 DEC $05
0717 D014 BNE $072D ;DECODER:GOT-BITS
;New byte required
;Save our pop table index
0719 98 TYA
071A AA TAX
071B A000 LDY #$00
071D B103 LDA ($03),Y
071F 850E STA $0E
0721 E603 INC $03 ;PTR
0723 D002 BNE $0727 ;L1:INC16-DONE
0725 E604 INC $04 ;PTR + 1
L1:INC16-DONE 0727 A908 LDA #$08
0729 8505 STA $05
;Restore the pop table index
072B 8A TXA
072C A8 TAY
;Lets rotate a bit from the next byte
;into the accumulator
DECODER:GOT-BITS 072D 060E ASL $0E
072F 260D ROL $0D
0731 260C ROL $0C
0733 C8 INY
0734 B101 LDA ($01),Y ;Any symbols at this length?
0736 F0DD BEQ $0715 ;DECODER:FETCH-BIT
0738 C8 INY ;Skip row indicator byte
;We have some symbols- does the accumulator hold one?
;It does if the accumulator is less than the max-code
;for codes of this length. So let's compare it.
0739 B101 LDA ($01),Y
073B C50C CMP $0C
073D 3014 BMI $0753 ;DECODER:GT1 ;acc-hi > max-code(len)-hi
073F C8 INY
0740 B101 LDA ($01),Y
0742 C50D CMP $0D
0744 300E BMI $0754 ;DECODER:GT2 ;acc-lo > max-code(len)-lo
0746 C8 INY
;Now acc<=max-code(len), this means we have a symbol woo!
;To get the symbol index we subtract the offset
;which has kindly been pre-computed in the pop table
0747 38 SEC
0748 A50D LDA $0D
074A F101 SBC ($01),Y ;- offset lo
074C AA TAX
074D C8 INY
074E A50C LDA $0C
0750 F101 SBC ($01),Y ;- offset hi
;Return to caller with index-hi in A and index-lo in X
0752 60 RTS
;Skip rest of row and get next bit
DECODER:GT1 0753 C8 INY
DECODER:GT2 0754 C8 INY
0755 C8 INY
0756 D0BD BNE $0715 ;DECODER:FETCH-BIT
;String Table
;General strings huffman population
;0 symbols of length 1
;Max Code: NIL Offset: NIL
STRING-POP-TABLE 0758 00 DB $00
;1 symbols of length 2
;Max Code: 0000 Offset: 0000
0759 FF0000.. DB $FF, $00, $00, $00, $00
;0 symbols of length 3
;Max Code: NIL Offset: NIL
075E 00 DB $00
;6 symbols of length 4
;Max Code: 0009 Offset: 0003
075F FF0009.. DB $FF, $00, $09, $03, $00
;6 symbols of length 5
;Max Code: 0019 Offset: 000D
0764 FF0019.. DB $FF, $00, $19, $0D, $00
;7 symbols of length 6
;Max Code: 003A Offset: 0027
0769 FF003A.. DB $FF, $00, $3A, $27, $00
;8 symbols of length 7
;Max Code: 007D Offset: 0062
076E FF007D.. DB $FF, $00, $7D, $62, $00
;4 symbols of length 8
;Max Code: 00FF Offset: 00E0
0773 FF00FF.. DB $FF, $00, $FF, $E0, $00
END 0778 00 BRK ;HUFFMAN-PTR ;HUFFMAN-PTR + 1