Update:
I've done a ton of work over the last month, and I have absolutely nothing to show for it!
Well, except for this 318K (text) compiler intermediate file (attached).
Toy examples are all well and good, but I'll never be able to call this a compiler unless it has some degree of standards conformance.
The best means at my disposal to measure standards conformance is LLVM's enormous end-to-end test suite. However, even running the simplest of those tests requires a working printf() implementation, since that's how those tests output their results for evaluation.
So, to begin working towards feature completeness and standards compliance, the first step is to get a working printf. I chose the single-file printf
https://github.com/mpaland/printf, since it can be easily hooked up to a char-out hook provided by any of a number of existing 6502 simulators.
I've discovered that going from toy to printf is quite a journey; when you break it all the way down to 8 bits, a full printf implementation (sans float support) covers a pretty wide swath of integer bitwise, arithmetic, and comparison instructions.
LLVM operates in passes. When I started, the very first pass that we owned choked on printf. After gradually massaging and generalizing the code generator, I got it through that pass, and the next, and the next.
Getting the code through the register allocator was particularly tricky business, since unlike my toy examples, hundreds and hundreds of live values fly around inside printf, each needing to be in A, X, or Y, the zero page, or the stack at various times, in an intricate dance. Large function calls threw a wrench into that dance; the calling sequence I naively emitted pinned the A, X, and Y registers right at the beginning of the call, so there were no free registers available to load zero-page registers with the rest of the arguments. I had to teach the instruction scheduler that A, X, and Y are precious resources, and that it needed to work very diligently to keep them free around calls.
Along the way I've had to add support for generalized multi-byte signed and unsigned comparisons, and, or, xor, certain 1-bit rotations, certain 1-bit shifts, the V register, using SBC as a comparison operator (for V).
So, I'm now through all the way to stack access emission, which is more than 2/3rds of the way through the pipeline. The generated code so far is also pretty terrible; I'm leaving hundreds of easy optimization opportunities on the floor to try to race to completeness as quickly as possible. Even so, there's only maybe a 20% chance that printf actually works at the end of this process; after this, there will likely be a truly grueling debugging phase to get the reams of generated 6502 code to actually work. But, once it does, the project will have gone from "toy compiler" to "unfinished compiler." Even better, we'll be able to quantify just how unfinished it is, then track progress objectively towards "terrible finished compiler".