The code is on the MIT license so feel free to use it or add it to your benchmark.
Thanks. I had a go at optimising your routine (I hope that's ok), by calculating initial upper and lower bounds. This gives a smaller range to search. This improves the performance, but as it stands my sqrt10.a is still a little bit faster in general and smaller (no tables).
Here's the performance of my optimised version of your routine in red:
After briefly struggling with the nuts and bolts, I have tentatively concluded that Bruce's subroutine is the likely destination for my attempts to "integerize" MJM's subroutine. The probability of me being able to out-golf Bruce is low enough to steer me toward spending my increasingly limited attention elsewhere, after congratulating everyone for an entertaining and informative thread. Cheers!
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!