The BCC loop1 was correct. I somehow erased the loop1 label. It has been added back. That should fix the skewed lines...
It does correct the problem.
...There are two sections with comments " dx,dy optimization." Try commenting these sections out to see if the sloppiness goes away. Cycles will increase again, but I have another possible option to fix that...
486,231 cycles with your optimizations and 15,787,440 cycels without. Everything looks good!
Two questions about the .b core.
1) Can you do branching (bcc, bne, etc) beyond 128 bytes now?
...
2) Does the BIT command now place bit15 and bit14 in the N an V flags?
...
1) Everything that used to be 256 bytes in the 6502, is now 65536 bytes in the 65O16.x. So it follows that branches can go half this in either direction as you have pointed out, i.e. 32K.
2) I've never used the BIT command when I've programmed the 6502. These opcodes remain untested in the .b core, but it won't be difficult to put them to the test!