It wasn't terribly well thought-out to be honest, and I'm now second-guessing a lot of how setbits works!
The "setbits" code mostly works by scanning the next n addresses after addr, in case one of the eight bits in that byte represents a multiple of n - in fact we expect this to be true once for each bit because these remainders modulo 30 form a field (I believe - it's decades since I studied maths). When each bit matches up for an address, we mark it as non-prime (in case n<30) and then can quickly add n to the address - adding 30*n to the number we just found - and mark that as non-prime as well, repeating until we reach the end of the sieve. Many of these will already be marked non-prime but some may not be.
For n>30, when we find each bit's next mulitple of n, it will already be marked as non-prime because it is a multiple of a number smaller than n. But the value of finding these multiples is that we can then eliminate the higher multiples later in the sieve. My original test here was really checking whether there actually were any higher multiples within the sieve - in a very generous way - and not bothering with the scan at all if there were not going to be any.
The square root check is much better though, speeding things up by a factor of 2 to about 650ms - I tested against 1024 as it's a nice round number:
Code:
; If n is really large, we don't need to update the sieve
; Compare against 1024 as that's more than the square root of the sieve size
lda zp_n+2 : bne continue
lda #>1024 : cmp zp_n+1 : bcc continue ; inverted test so that carry is clear if we branch
There's also an sbc in the next instruction which should be modified as the carry will now be in the opposite state to what it was before.