I seem to recall seeing Woz's code published somewhere around here as well.
Edit: Hmm, these could surely be improved for the 816 - maybe even change the formats to help. A byte-orientated layout won't work as well as a 16-bit word-oriented layout.
At one time, I looked at implementing Woz's FP stuff on the 65C816 but set the idea aside for two reason. Fully adapting the code to the 65C816 would almost require a rewrite, as merely running the code as is on the '816 would gain little to nothing. Also, the code would become somewhat cumbersome, as constant switching of the accumulator size in order to take advantage of 16 bit operations would be necessary.
The other reason is Woz's FP format is limited to no more than seven significant digits, due to the use of a three byte mantissa. Also, ASCII to binary conversion errors of fractional content will be a problem.
My humble opinion is that it would be more profitable to implement IEEE double-precision on the '816, not a simple feat, but not as onerous as it would be on a 65C02.
I've tried out that one but it's not very speedy. I don't think trying to run a graphics program using FP BCD would work out well. However, being BCD, it does eliminate ASCII-FP conversion errors. 1.2345 ends up being 1.2345.
