WDM is also a correct solution (since it is a 2 byte, 2 cycle no-op), but the downside is that a simulator might (for convenience) use opcode $42 to perform an operation such as I/O. An example of a 6502 simulator (rather than a 65C816 simulator) that does this is Easy6502 which uses DCB $42,0 to output a character.
There is a solution that will work on any 65C816 system (regardless of the memory map) and any 65C816 simulator.
This solution involves some hardware.
The '816 outputs the M and X status, so to branch on M,X the status could be latched in a reg. then the reg tested with a BIT instruction. Assuming reg outputs M and X on bits 7,6 the BVx and BPL/BMI instructions could be used to branch.
My $0.02
WDM is also a correct solution (since it is a 2 byte, 2 cycle no-op) ...There is a solution that will work on any 65C816 system (regardless of the memory map) and any 65C816 simulator.
The #0 is the deal breaker here, Ed, because it would be treated as an opcode in the 8-bit [Edit: er, 16-bit] register case. Valid operands for REP or SEP would be %xx0000xx, but I don't see any opcodes in that group that I would consider to be of any help to the problem at hand.
So, we have White Flame's 18 89 xx 38, which disturbs ^z and moves ^m to ^c. It takes advantage of the fact that bit# only messes with ^z.
But what about ^x? From what I can tell our best chance at a short solution makes use of cpx# or cpy#, but they both mess with ^nzc. If we had sev, the job is done in four bytes, but we don't, soo ... I'm mentally trapped inside that box.
Mike B.
Last edited by barrym95838 on Tue May 08, 2018 10:49 pm, edited 1 time in total.
Yeah, it has to be around CPX#/CPY#. LDX#/LDY# are obviously too destructive. Since using the stack is allowed, another x flag effect would be that PHX moves S by 2 bytes instead of 1, but I think that's also too big. N and Z generally require loading a register, which is again not desired, so I think C really is what's needed for branching. CPX #0 always sets the carry, so that really points to being half of it, but it might end up being the wrong tree to bark down. dclxvi did say that his solution was faster than BIT, so I don't think it's going to be more single-byte instructions, but rather a pair of 2-byte instructions for the 8-bit case.
After reading White Flame's concise work, it occurred to me that testing two mode bits and branching with each test requires 10 bytes. It also occurred to me that overlaps may reduce this by one byte or more. Working backwards and on approximately the fifth attempt, I found that A9 00 A2 A2 AA 8A is ambiguously interpreted as either:
LDA #0 // LDX #$A2 // TAX // TXA
LDA #0 // LDX #$AAA2 // TXA
LDA #$A200 // LDX #$AA // TXA
LDA #$A200 // LDX #$8AAA
The accumulator is zero only in the legacy case. Including branch, this requires eight bytes. Using accepted techniques, it may be possible to remove another byte or two from this sequence.
PHP
LDA ##$EA00
EOR ##$EA20 ; for M bit, or $EA10 to test X bit
AND 1,S
BEQ a16 ; or BNE a8
PLP
This is 12 bytes, counting rebalancing the stack afterwards. The main problem addressed is how to get a clean 8-bit constant into the accumulator when the M bit starts unknown. Another approach is to force the M bit to the favourable state:
PHA
PHP
SEP #$20 ; 8-bit accumulator
LDA 1,S
BIT #$20 ; or $10
BEQ a16 ; or BNE a8
PLP
PLA
This is also 12 bytes, with restoring the status and accumulator afterwards. It could be 10 bytes if you don't care about the accumulator value, but then you might as well use LDA ##$EA00 which gives a 5-byte test without touching the stack.