The answer is somewhere in between. Our backend doesn't really have anything resembling real 6502 assembly until really really late in the compilation process, one or two passes before the final machine code is generated. All of the work of the compiler occurs on internal representations that very very gradually come to resemble 6502 assembly. So we can teach LLVM generic patterns, but it's difficult to teach it flat instruction sequences.
WARNING: A way-too-detailed look at what LLVM is doing follows.
The code that LLVM sees starts nearly as vague and high level as C; each LLVM pass either makes the sequence faster or makes the sequence more specific.
The advantage to doing things this way is that you avoid needing to go back and correct poor decisions later; instead the compiler attempts to avoid making decisions at all until a point where those decisions are relatively clear. This can't be done perfectly, but there's also hard limits to how many bad decisions you can clean up at the end.
For example, the DEC function under discussion starts out as:
Code:
define i32 @dec_i32(i32 %a) {
entry:
%0 = sub i32 %a, 1
ret i32 %0
}
The calling convention is implemented:
Code:
liveins: $a, $x, $rc2, $rc3
%1:_(s8) = COPY $a
%2:_(s8) = COPY $x
%3:_(s8) = COPY $rc2
%4:_(s8) = COPY $rc3
%0:_(s32) = G_MERGE_VALUES %1(s8), %2(s8), %3(s8), %4(s8)
%5:_(s32) = G_CONSTANT i32 1
%6:_(s32) = G_SUB %0, %5
%7:_(s8), %8:_(s8), %9:_(s8), %10:_(s8) = G_UNMERGE_VALUES %6(s32)
$a = COPY %7(s8)
$x = COPY %8(s8)
$rc2 = COPY %9(s8)
$rc3 = COPY %10(s8)
RTS implicit $a, implicit $x, implicit $rc2, implicit $rc3
Now, there are operations in here you can't natively do on the 6502: a 32-bit decrement. These are replaced with sequences that can plausibly be performed on the 6502 (legalization):
Code:
bb.1.entry:
liveins: $a, $x, $rc2, $rc3
%1:_(s8) = COPY $a
%2:_(s8) = COPY $x
%3:_(s8) = COPY $rc2
%4:_(s8) = COPY $rc3
%16:_(s8) = G_CONSTANT i8 1
%17:_(s8) = G_SUB %1, %16
%18:_(s8) = G_CONSTANT i8 0
%64:_(s1) = G_CONSTANT i1 true
%75:_(s8), %76:_(s1), %77:_, %78:_, %79:_ = G_SBC %1, %18, %64
%19:_(s1) = COPY %79(s1)
%45:_(s8) = G_SUB %2, %16
%70:_(s8), %71:_(s1), %72:_, %73:_, %74:_ = G_SBC %2, %18, %64
%46:_(s1) = COPY %74(s1)
%60:_(s8) = G_SUB %3, %16
%65:_(s8), %66:_(s1), %67:_, %68:_, %69:_ = G_SBC %3, %18, %64
%61:_(s1) = COPY %69(s1)
%62:_(s8) = G_SUB %4, %16
%63:_(s8) = G_SELECT %61(s1), %62, %4
%56:_(s8) = G_SELECT %46(s1), %60, %3
%57:_(s8) = G_SELECT %46(s1), %63, %4
%37:_(s8) = G_SELECT %19(s1), %45, %2
%38:_(s8) = G_SELECT %19(s1), %56, %3
%39:_(s8) = G_SELECT %19(s1), %57, %4
$a = COPY %17(s8)
$x = COPY %37(s8)
$rc2 = COPY %38(s8)
$rc3 = COPY %39(s8)
RTS implicit $a, implicit $x, implicit $rc2, implicit $rc3
This is the pattern that I mentioned for decrementing: always decrement the low byte, and decrement the high bytes if and only if the low byte was previously zero.
G_SELECT is like a C ternary: G_SELECT %61, %62, %4 = %62 if %61, otherwise %4.
G_SBC is a generalized SBC operation: it provides an output and all the possible flags, and takes two arguments and a carry in. Depending on what's used, this covers the behavior of the various comparison and subtraction instructions.
Note that at this phase, we haven't picked instructions, what registers things are going to go into, anything like that at all. Just the basic plan of how it's going to decrement exists.
From there, we lower the G_SELECT instructions to (still totally generic) branches.
EDIT: Surprisingly, the lowering below looks alright to me. I'm either missing something, or the problem actually lies elsewhere...
Code:
bb.1.entry:
successors: %bb.2(0x40000000), %bb.3(0x40000000)
liveins: $a, $x, $rc2, $rc3
%1:_(s8) = COPY $a
%2:_(s8) = COPY $x
%3:_(s8) = COPY $rc2
%4:_(s8) = COPY $rc3
%16:_(s8) = G_CONSTANT i8 1
%17:_(s8) = G_SUB %1, %16
%18:_(s8) = G_CONSTANT i8 0
%64:_(s1) = G_CONSTANT i1 true
%75:_(s8), %76:_(s1), %77:_, %78:_, %79:_ = G_SBC %1, %18, %64
G_BRCOND_IMM %79(s1), %bb.2, 1
G_BR %bb.3
bb.2.entry:
successors: %bb.5(0x40000000), %bb.6(0x40000000)
%45:_(s8) = G_SUB %2, %16
%70:_(s8), %71:_(s1), %72:_, %73:_, %74:_ = G_SBC %2, %18, %64
G_BRCOND_IMM %74(s1), %bb.5, 1
G_BR %bb.6
bb.5.entry:
successors: %bb.8(0x40000000), %bb.9(0x40000000)
%60:_(s8) = G_SUB %3, %16
%65:_(s8), %66:_(s1), %67:_, %68:_, %69:_ = G_SBC %3, %18, %64
G_BRCOND_IMM %69(s1), %bb.8, 1
G_BR %bb.9
bb.8.entry:
successors: %bb.10(0x80000000)
%62:_(s8) = G_SUB %4, %16
G_BR %bb.10
bb.9.entry:
successors: %bb.10(0x80000000)
G_BR %bb.10
bb.10.entry:
successors: %bb.7(0x80000000)
%63:_(s8) = G_PHI %62(s8), %bb.8, %4(s8), %bb.9
G_BR %bb.7
bb.6.entry:
successors: %bb.7(0x80000000)
G_BR %bb.7
bb.7.entry:
successors: %bb.4(0x80000000)
%57:_(s8) = G_PHI %63(s8), %bb.10, %4(s8), %bb.6
%56:_(s8) = G_PHI %60(s8), %bb.10, %3(s8), %bb.6
G_BR %bb.4
bb.3.entry:
successors: %bb.4(0x80000000)
G_BR %bb.4
bb.4.entry:
%39:_(s8) = G_PHI %57(s8), %bb.7, %4(s8), %bb.3
%38:_(s8) = G_PHI %56(s8), %bb.7, %3(s8), %bb.3
%37:_(s8) = G_PHI %45(s8), %bb.7, %2(s8), %bb.3
$a = COPY %17(s8)
$x = COPY %37(s8)
$rc2 = COPY %38(s8)
$rc3 = COPY %39(s8)
RTS implicit $a, implicit $x, implicit $rc2, implicit $rc3
From this, we select idealized versions of the 6502 instructions to use. These still aren't actual 6502 opcodes, but they capture the kinds of constraints that exist on the 6502 about where you can put things: for example, you can only natively compare when the LHS is A, X, or Y, but not an imaginary register, and the rhs has to either be an immediate or an imaginary register.
Code:
bb.1.entry:
successors: %bb.2(0x40000000), %bb.3(0x40000000)
liveins: $a, $x, $rc2, $rc3
%1:gpr = COPY $a
%2:gpr = COPY $x
%3:gpr = COPY $rc2
%4:gprimag8 = COPY $rc3
%17:gprimag8 = DEC %1
%89:cc = CMPImmTerm %1, 0, implicit-def $nz
BR %bb.2, $z, 1
JMP %bb.3
bb.2.entry:
successors: %bb.5(0x40000000), %bb.6(0x40000000)
%45:gprimag8 = DEC %2
%88:cc = CMPImmTerm %2, 0, implicit-def $nz
BR %bb.5, $z, 1
JMP %bb.6
bb.5.entry:
successors: %bb.8(0x40000000), %bb.9(0x40000000)
%60:gprimag8 = DEC %3
%87:cc = CMPImmTerm %3, 0, implicit-def $nz
BR %bb.8, $z, 1
JMP %bb.9
bb.8.entry:
successors: %bb.10(0x80000000)
%62:gprimag8 = DEC %4
JMP %bb.10
bb.9.entry:
successors: %bb.10(0x80000000)
JMP %bb.10
bb.10.entry:
successors: %bb.7(0x80000000)
%63:anyi8 = PHI %62, %bb.8, %4, %bb.9
JMP %bb.7
bb.6.entry:
successors: %bb.7(0x80000000)
JMP %bb.7
bb.7.entry:
successors: %bb.4(0x80000000)
%57:anyi8 = PHI %63, %bb.10, %4, %bb.6
%56:anyi8 = PHI %60, %bb.10, %3, %bb.6
JMP %bb.4
bb.3.entry:
successors: %bb.4(0x80000000)
JMP %bb.4
bb.4.entry:
%39:anyi8 = PHI %57, %bb.7, %4, %bb.3
%38:anyi8 = PHI %56, %bb.7, %3, %bb.3
%37:anyi8 = PHI %45, %bb.7, %2, %bb.3
$a = COPY %17
$x = COPY %37
$rc2 = COPY %38
$rc3 = COPY %39
RTS implicit $a, implicit $x, implicit $rc2, implicit $rc3
Note that at this stage, LLVM still hasn't decided where to put anything! DEC instructions are generalized: they can operate on A, X, Y, or any ZP register. The A variant uses CLC ADC; and will later use DEC A on the 65C02.
And then, much much later, after a ton of additional manipulations, LLVM allocates registers and decides where to put everything:
Code:
bb.0.entry:
successors: %bb.1(0x40000000), %bb.6(0x40000000)
liveins: $a, $x, $rc2, $rc3
renamable $y = COPY $x
renamable $x = COPY renamable $a
renamable $x = DEC killed renamable $x
renamable $rc4 = COPY killed renamable $x
dead renamable $c = CMPImmTerm killed renamable $a, 0, implicit-def dead $nz, implicit-def $z
BR %bb.1, killed $z, 1
JMP %bb.6
bb.1.entry:
successors: %bb.2(0x40000000), %bb.5(0x40000000)
liveins: $y, $rc2, $rc3, $rc4
renamable $x = COPY renamable $y
renamable $x = DEC killed renamable $x
dead renamable $c = CMPImmTerm killed renamable $y, 0, implicit-def dead $nz, implicit-def $z
BR %bb.2, killed $z, 1
JMP %bb.5
bb.2.entry:
successors: %bb.3(0x40000000), %bb.4(0x40000000)
liveins: $x, $rc2, $rc3, $rc4
renamable $y = COPY killed renamable $rc2
renamable $rc2 = COPY renamable $y
renamable $rc2 = DEC killed renamable $rc2
renamable $a = COPY killed renamable $rc4
dead renamable $c = CMPImmTerm killed renamable $y, 0, implicit-def dead $nz, implicit-def $z
BR %bb.3, killed $z, 1
JMP %bb.4
bb.3.entry:
successors: %bb.7(0x80000000)
liveins: $a, $x, $rc2, $rc3
renamable $rc3 = DEC killed renamable $rc3
JMP %bb.7
bb.4.entry:
successors: %bb.7(0x80000000)
liveins: $a, $x, $rc2, $rc3
JMP %bb.7
bb.5.entry:
successors: %bb.7(0x80000000)
liveins: $x, $rc2, $rc3, $rc4
renamable $a = COPY killed renamable $rc4
JMP %bb.7
bb.6.entry:
successors: %bb.7(0x80000000)
liveins: $y, $rc2, $rc3, $rc4
renamable $x = COPY killed renamable $y
renamable $a = COPY killed renamable $rc4
JMP %bb.7
bb.7.entry:
liveins: $a, $x, $rc2, $rc3
RTS implicit $a, implicit $x, implicit $rc2, implicit $rc3