Mr. Weaver knows more assembly languages than I can name, but it appears that he didn't expend much effort on the 6502 version, for either size or speed. In fact, many of his coding techniques (like the zp_save stuff above) look like a crude translation from a different processor's source. I'm sure that optimization priorities differed from processor to processor, and I have a feeling that the 6502 wasn't the only victim here, but dang, that 6502 source is crammed full of sub-optimal code!
I believe that I could get identical results with identical inputs on an Apple 2-anything in under 800 (780!?!?!) bytes, but I have too much on my plate right now to prove that claim in a timely fashion. I'll just add it to my lengthy to-do list, and get to it when I can. Hopefully, someone awesome here can beat me to the punch, because this challenge should not go unanswered.
Mike B.
[Edit: Important footnote from
here:
Quote:
* the 6502 results were adjusted to match the code present in other
architectures (i.e., not counting the graphical routines)
There is a lot of low-hanging fruit in the graphics routines, which wouldn't count, but I believe that there is considerable room for improvement elsewhere, which does count.
]