It was not about getting a kind of checksum but getting a CRC32 computed as fast as possible on the 6502.
That's why I spoke up, seeing if a "lower quality" checksum would be acceptable, if it could run much faster. I suspect a CRC32 (or CRC16 for that matter) will be
much slower than mine. (which is why I posted my speed for comparison) I haven't done extensive testing with my method but it appears to be good entropy, as long as you have at least a few bytes. It doesn't show as much entropy in the digest when given only 1-2 bytes since it's not a "block digest" like the more popular ones are. But then one can't expect a lot of variety in a 16 bit output when you're only feeding it 8 or 16 bits in I suppose!
I can post my code here if they (or anyone else) is interested. The working goal behind my code is something of a staple of hashing - toggling any one bit in the input stream should result in approximately half of the bits in the output toggling". Mine seems to do that well, with no detectable patterns being produced.