Since I've now got my 65020 running on an FPGA, I thought I've have a go. It could definitely be improved. 46 ... somethings. I'm still not sure what counts as a 'byte' on this system.
Code:
0000e99c: 102 tree
0000e99c: 00a2 000f 103 ldr x0, #15
0000e99e: 20a2 0014 104 ldr x1, #20
0000e9a0: 40a2 0020 105 ldr x2, #32
0000e9a2: 60a2 002a 106 ldr x3, #'*
0000e9a4: 20a0 000d 107 ldr y1, #13
0000e9a6: 02a0 4221 0643 108 ldr.l y0, #$06424321
0000e9a9: 90f0 0003 109 bra.l branches
0000e9ab: 02a0 2d48 02a7 110 ldr.l y0, #$022da748
0000e9ae: 111 branches
0000e9ae: d0a8 112 mov a2, y0
0000e9af: 503c 113 and a2, x0
0000e9b0: 00f0 0017 114 beq done
0000e9b2: 248a 115 mov a1, x1
0000e9b3: a8fc 116 sub a1, a2
0000e9b4: 088a 117 mov a0, x2
0000e9b5: 90f0 000d 118 bra.l row
119
0000e9b7: c87c 120 add a2, a2
0000e9b8: 18ca 121 dec a2
0000e9b9: a8a8 122 mov a1, a2
0000e9ba: 0c8a 123 mov a0, x3
0000e9bb: 90f0 0007 124 bra.l row
125
0000e9bd: 94a8 126 mov a0, y1
0000e9be: 90f0 07a9 127 bra.l chrout
0000e9c0: 2242 0004 128 lsr.l y0, #4
0000e9c2: 80f0 00ea 129 bra branches
0000e9c4: 130 row
0000e9c4: 90f0 07a3 131 bra.l chrout
0000e9c6: 14ca 132 dec a1
0000e9c7: 00d0 00fb 133 bne row
0000e9c9: 134 done
0000e9c9: 0060 135 rts
It's a 6502 with 16 bit 'bytes'. If the top half of the opcode is all 0, it behaves like the original 6502 instruction. The extra bits are used to select data size, data register, index register, and alternate operations (for example, turning ADC into ADD without carry). I've renamed many of the instructions, so it looks a lot less 6502ish than it really is (although having lots of registers and register-to-register operations goes the other way, making it feel even less like a 6502).
'LDR' is "Load Register", and includes the old LDA, LDX, and LDY. It can load any of the 12 general purpose registers A0-A3, X0-X3, and Y0-Y3. An optional suffix on the instruction controls the size of the data, with the default being 8 bit. LDR.L is a 32 bit load.
One of the extra bits on the branch instructions is used to select a different set of conditions, and one of those conditions is 'always'. Another bit (signalled by .L on the instruction) makes it push the current PC before branching. That gives an alternative to JSR with short (16 bit) offsets.
As there are lots more registers than the 6502, I've added register-to-register versions of the instructions that could use them. The various TAX, TXS, etc. instructions have all been renamed MOV, and extended to allow any register to be copied to any other.