Huffman decoding on a 6502
Posted: Fri Aug 18, 2017 7:34 am
So, I successfully wrote a PC Huffman encoder in C++. I use canonical codes and only store the "huffman tree" as a 256 byte table where each byte indicate the code length of the corresponding index (or $00 if a symbol unused). Cannonical modes means that if several symbols use the same bit length, then their encoding is sorted in order. For example if 'A', 'B' and 'C' are encoded in 5-bits (and nothing else is), then A is necessarily "11100", B "11101" and C "11110". Symbols coded on 6 and more bits will always start with 5 1s, and symbols starting with less than 3 1s are encoded in less than 5-bits.
I am looking at how to implement Huffman decoding in 6502. It doesn't have to be fast, but if it happen to be, then it's nice. Currently the C++ code is as follows :
My assembly code so far looks like this (untested).
Read bit from stream is a separate subroutine only for clarity, but it could easily be inlined in the main loop.
I wonder whether this decoding code can be (significantly) improved for speed and/or size. An obvious improvement is that if codes do never surpass 16-bits (they rarely do) then the 24-bit variables can be collapsed to 16-bits only, however it is not rare that some code lengths are above 16-bit if some symbols are very rare.
A problem with this approach is that the whole 256 byte table is traversed for every possible length (including unused lengths). So if we decode a symbol encoded in, say, 17 bits, the loops looks for the whote 256 byte table 17 times before it can decode the symbol, neededless to say it's slow. However it's slow for the rare symbols and faster for the frequent symbols, as such it's not that bad.
I am looking at how to implement Huffman decoding in 6502. It doesn't have to be fast, but if it happen to be, then it's nice. Currently the C++ code is as follows :
Code: Select all
static block_t decode_block(block_t & block, block_t const& depth_table, uint16_t size)
{
block_t out_buff;
BitArrayRead b(block);
for(uint16_t i=0; i<size; i++)
{
//Decode the input bitstream
//This decoding method is slow, however it requires no extra RAM or variables,
//which makes it interesting for embedded applications.
unsigned int nbits = 0;
//This hodls the currently read code, and the code we'll compare it to
long code = 0, matchcode = 0;
bool found = false;
uint8_t value = 0;
do
{
//We do the "clever algorithm" again but this time we search to which symbol
//the code we have belongs to
nbits++;
code <<= 1;
if(b) code |=1;
matchcode <<= 1;
//Search if the read code corresponds to something...
for(unsigned int j = min; !found && j <= max; j++)
{
unsigned int match_len = depth_table[j-min];
if(match_len != nbits) continue;
if(matchcode == code)
{
found = true;
value = j;
}
matchcode++;
}
//No -> continue searching among longer codes
}
while(!found);
//Yes -> we decoded a byte
out_buff.push_back(value);
}
return out_buff;
}
Code: Select all
; Decode one symbol from a cannonical huffman encoded bistream
; (To decode multiple symbols this have to be called within a loop)
; Note that some variables must be inintialized before decoding the 1st symbol
Get_Huff_Value
lda #$00
sta nbits
sta code_l
sta code_m
sta code_h ; Needed only if some codes are longer than 16-bit
sta match_code_l
sta match_code_m
sta match_code_h ; Needed only if some codes are longer than 16-bit
main_loop:
inc nbits
jsr Read_bit_from_stream ;Read one bit and shift in the current code variable
rol code_l
rol code_m
rol code_h
asl match_code_l
rol match_code_m
rol match_code_h
; Search if the read code correspond to something...
ldy #$00
search_loop:
lda Huffman_length_tbl, Y ; Search for a node that has the current length
cmp nbits
bne not_found
found_length:
lda match_code_l
cmp code_l
bne code_wrong
lda match_code_m
cmp code_m
bne code_wrong
lda match_code_h
cmp code_h
bne code_wrong
found:
tya ; Value in Y
rts
code_wrong:
inc match_code_l ; If the code is not this one, match_code is incremented by '1' and continue searching
bne not_found
inc match_code_m
bne not_found
inc match_code_h
not_found:
iny ; Next code
bne search_loop ; Continue for searching for same length
beq main_loop ; Searching for a longer length
Read_bit_from_stream: ; Read a single bit from input stream and return it in C flag
dec HuffBitCtr
bpl _not_next_byte
lda #$07
sta HuffBitCtr
inc HuffPointerL
bne +
inc HuffPointerH
+ lda (HuffPointerL),Y ; Read next byte in shift register
sta HuffShiftReg
_not_next_byte
lsr HuffShiftReg
rts
I wonder whether this decoding code can be (significantly) improved for speed and/or size. An obvious improvement is that if codes do never surpass 16-bits (they rarely do) then the 24-bit variables can be collapsed to 16-bits only, however it is not rare that some code lengths are above 16-bit if some symbols are very rare.
A problem with this approach is that the whole 256 byte table is traversed for every possible length (including unused lengths). So if we decode a symbol encoded in, say, 17 bits, the loops looks for the whote 256 byte table 17 times before it can decode the symbol, neededless to say it's slow. However it's slow for the rare symbols and faster for the frequent symbols, as such it's not that bad.