In response to jgharston, testing NMOS/CMOS/65816 is very easy. It is only complicated if you wish to cover obscure cases. As noted by Chromatix, NMOS is unable to follow BRA and this separates it from everything else. 65816 in emulation mode is only faithful to 65C02 if 65C02 instructions are executed. The extended instructions are always available. Although, register sizes are restricted in emulation mode.
In response to jeffythedragonslayer (and being needlessly
lambasted again by BigDumbDinosaur), and after considering the use of 65816 REP to clear multiple flags simultaneously (decimal, carry, interrupt), I find that REP can also be used to determine execution on 65816. This works in emulation mode or native mode.
SEC // REP #1 will clear carry on 65816 but leave carry set on 65C02. Unfortunately, while BRA ... SEC // REP #1 // BCC ... splits NMOS/CMOS/65816 into three chronological cases, it fails horribly on obscure processor variants, such as 65CE02. In this case, 65CE02 follows BRA. However, REP and SEP are interpreted as direct page, 16 bit increment or decrement (DEW and INW). The immediate value of REP is interpreted as the lower byte of the 16 bit decrement. Thankfully, carry is unaffected. However, DEW $01 may be incompatible with your operating system's allocation of direct page. In this case, REP can be used with higher values. On 65816, it will clear additional flags. On 65CE02, it will decrement at a higher address. Specifically: +$80 to clear negative flag, +$40 to clear overflow flag, +$08 to clear decimal flag. Or you can get fancy with +$04 to begin an atomic operation. Unfortunately, it is highly inadvisable to use +$20 to set accumulator to 16 bit or +$10 to set index registers to 16 bit because 65816 native mode cannot be assumed.
Do not abbreviate cases by omitting BRA. NMOS/65816 test seems very concise and convenient but it will cause chaos on 6510 and 6509 where REP #1 is illegal instruction LAX (dp,X) and simultaneously loads RegA and RegX. This leaves carry unchanged but may use memory location $0001 as the lower byte of a source address. On 6510 and 6509, LAX ($01,X) may strobe a bank latch before possibly strobing I/O. It is quicker and easier to test BRA rather than guard LAX (dp,X). It also preserves RegA and RegX across all three cases.
The obvious 65816 test is to invoke XCE. That is great if the test passes. However, XCE on NMOS becomes the three byte illegal instruction ISC abs,Y which is INC followed by SBC. In this case, XCE // NOP // NOP will become ISC $EAEA,Y. Worse, XCE on 65CE02 becomes PLZ (pull top of stack to RegZ). Overcoming that side-effect requires PHA // TSX // CLC // XCE // NOP // NOP // TXS // PLA // BCS which is longer, slower, requires two byte instruction sequence which is also a safe address region, destroys the contents of RegX, fails to distinguish NMOS/CMOS (which is partly why we do not use PHX/PLX and preserve RegX) and lacks the option to clear flags on 65816. XCE is otherwise a good choice.
Detecting CMOS/65816 with XBA // DEC // XBA is cunning but it doesn't generalize well. XBA on NMOS becomes the three byte illegal instruction DCP abs,Y which is DEC followed by CMP. This requires two instances of two byte instruction sequence which are also safe address regions. XBA on 65CE02 becomes an abs, 16 bit shift operation.