Just found this page in the NESdev wiki:
Assembly OptimisationsContents:
1 Optimise both speed and size of the code
1.1 Avoid a jsr + rts chain
1.2 Split word tables in high and low components
1.3 Use Jump tables with RTS instruction instead of JMP indirect instruction
1.4 Use a macro instead of a subroutine which is only called once
1.5 Arithmetic shift right
1.6 Easily test 2 upper bits of a variable
1.7 Negating a value without temporaries
1.8 Avoiding the need for CLC/SEC with ADC/SBC
1.9 Test bits in decreasing order
1.10 Test bits in increasing order
1.11 Test bits without destroying the accumulator
1.12 Use opposite rotate instead of a great number of shifts
2 Optimise speed at the expense of size
2.1 Use identity look-up table instead of temp variable
2.2 Use look-up table to shift left 4 times
3 Optimise code size at the expense of cycles
3.1 Use the stack instead of a temp variable
3.2 Use an "intelligent" argument system
(Via exploration of pages like
this)