The slowest of my basic math building blocks in Forth is UM/MOD, which takes in a 32-bit unsigned number, divides by a 16-bit, and gives a 16-bit quotient and a 16-bit remainder. Any overflow condition including /0 results in both quotient and remainder being FFFF, which is easy to catch and handle as an error condition, or in some cases, even use as-is. This is not a problem, since UM/MOD is unsigned. In my 65816 Forth, UM/MOD takes 59 microseconds @ 16MHz to do 26,274,859/7458 (expressed here in decimal). I haven't measured yet on the 65c02. Logically the multiply and especially the add and subtract will be much faster. Beyond that, my Forth words for log, trig, square-root, random numbers, multiple-precision, etc. are secondaries, as the speed penalty is minimal once you have the basics done as primitives, which are hand-optimized in assembly language. Forth secondaries are of course not processor-specific. Some of my functions like square root are not really optimized for speed; but I think that for my next workbench computer, I'll have
large 64K-word look-up tables in memory. If your system can afford the memory requirements, this of course eliminates the speed-versus-accuracy tradeoff, giving you the best of both worlds. What I have not taken the time to figure out yet is if a 1/X table can be made practical to use for division. [
Edit, years later: What I did for the 1/X table is made it to give 32-bit answers to 16-bit inputs, in order to get the full resolution at both ends of the scale. This table then is 256KB, instead of 128KB like most of the other tables in the set. (That's not to say you have to use all 32 bits every time if you don't want to.)]
I've collected a few algorithms over the years, and found some really surprising things. For example, there is a way to get square root with no multiplication or division, if you can wait a long, long time for the answer. I think one of the next ones I want to write is an FFT word. [
Edit, years later: Done. It takes the 6502 five seconds to do a 2K complex FFT in 16-bit scaled-integer, without the large tables mentioned above. It took the original IBM PC about a hundred times as long to do one half that size in GWBASIC.] That would really come in handy sometimes on my workbench computer. 16-bit output would be more than adequate since the input will usually be an array of numbers read from 8-bit A/D converters.
About 10 years ago I began finding out how little you really need floating-point arithmetic for. FP really slows things down. It's necessary for scientific calculators, but has little use in embedded computers that control machines and processes, take data, even calculate graphics, and so on. For speed, use scaled integers, even for things like trig and log functions. If you really need more resolution, you can usually use multiple-precision scaled integers and avoid FP.
Jack Ganssle has a new book out called "Math Tool Kit for Real-Time Programming" (Lawrence, KS: CMP Books, 2000). He has his own website at
www.ganssle.com . I do intend to get it myself. [
Edit, years later: Our wonderful computer-science daughter-in-law got it for me. It's definitely a good book.] It's not at all specific to the 6502 family, but has all kinds of algorithms that are well explained without getting any more heavily into theory than necessary. I understand another good book is John Hart's "Computer Approximations" (Malabar, FL: Robert E. Drieger Publishing Company, 1968); but while Crenshaw lays out the Welcome mat, Hart's explanations will require a Ph.D. in math to understand. (That means it's not for me!)
As for my own code, you're welcome to it, except that I have a small problem now. It's on the DOS computer I use most, which has neither Windoze nor internet access; and when I put things on floppy disc, this other computer I'm on right now is unable to find and E-mail those files. It swallows and says thankyou, but then nothing happens. [
Edit, years later: After more FD problems on newer PCs, I finally got an IDE SD-card interface for the DOS computer, so file transfer between computers is more trouble-free.] If you're interested, I'll have to mail you a floppy or a printout. The routines are explained in detail, but do assume at least a little knowledge of Forth. I hope to get a website going possibly later this year, and this will definitely be on it. [
Edit: I finally got the website going 10+ years later and have the LUT material there and other major features, and have far more I want to post as time permits.]
Edit, 6/25/12: added link to the look-up tables for super fast, accurate 16-bit math. More edits done later.