LLVM 6502 Codegen

BigEd · Post by **BigEd** » Sun Jun 20, 2021 7:14 am

Sounds like a great position to get to - well done - as noted, much work ahead. But that's why they are called milestones!

mysterymath · Post by **mysterymath** » Thu Jul 01, 2021 1:29 am

The test suite now passes at -O0, -Oz, and -O3. Doing so uncovered another dozen or so issues.

There's one known outstanding issue relating to when LLVM detects a "saturating arithmetic" pattern in the source language. Otherwise, the compiler works in all configurations we've tried, albeit with unrestricted use of the zero page.

Given that the saturating arithmetic issue was found by one of our users, and not by the end-to-end test suite, there's likely quite a few bugs still in there. But such is life.

My immediate next step will be to get Github automation working for these tests; they need to stay passing, and they're time-consuming to run. Afterwards, I'll be looking at testing different zero-page configurations.

From there, I'll begin auditing each phase of the compiler to ensure that the C99 standards are followed (to the best of my abilities), and all known cases from the previous pass are handled.

richard.broadhurst · Post by **richard.broadhurst** » Thu Jul 22, 2021 8:09 pm

I'm looking forward to seeing where this goes, as it looks like it is off to a great start.
It will also be interesting to compare to http://4corn.co.uk/articles/gccforbbc/.

mysterymath · Post by **mysterymath** » Sun Aug 01, 2021 12:16 am

Project Update!

Well, it's been another month! There's two interesting developments.

Continuous End-To-End Tests
First, the boring one. We now have continous nightly runs of LLVM's end-to-end test suite running and passing at each optimization level and allowed number of imaginary pointer registers (from 5 to 127).

Interrupt Handling
Next, the interesting one. It's come to my attention that interrupt handling is actually really important to programming on 6502 machines. I'd previously been attempting to defer handling this past our first release, but it didn't seem likely that we'd be able to shoehorn-in a good solution after the fact. So I've spent much of my time finding a way to incorporate good interrupt handling into llvm-mos.

There are two chief obstacles: the large number of zero page registers, and static stack allocation.

Zero Page Registers
First, it's just not practical to save the entire zero page when an interrupt happens. But if most of the registers were caller-saved, we'd have to do so if the interrupt handler called even a single C function. But we don't want to make most of the zero page registers callee-saved, since it makes it that much harder for the compiler to actually use them. And if a program doesn't involve interrupt handlers, then most of the registers can be caller-saved without issue.

So, if a program has any interrupt handlers that can call C, we require the C function entry points to bear one of two annotations: "interrupt" or "interrupt_norecurse." I'll get to the distinction between the two in a moment. Having such a function anywhere in the program will cause the calling convention to globally change. All but the first 5 zero page pointers become callee-saved, which allows the interrupt handler to only need to save 2 of them. This along with A, X, Y, and a handful of compiler temporaries leads to an okay-ish interrupt save/restore sequence.

Static Stack Allocation
Second, interrupt handling really futzes with static stack allocation, since interrupt handlers can come along and call anything at any time. Worse, an interrupt handler could itself be interrupted by the same interrupt handler, making everything possibly callable by that handler possibly recursive.

Accordingly, if a function is marked "interrupt", we walk the call graph, find anything that might possibly be called by it, and mark it possibly recursive.

However, in practice, a lot of interrupt handlers can be guaranteed non-recursive, that is, once an invocation of the handler becomes active, you can turn off the source of interrupts until it finishes. This is what the "interrupt_norecurse" annotation is for. If a function can be called by two *different* interrupt_norecurse, or by an interrupt_norecurse function and main, then it's marked as possibly recursive. Judicious use of this annotation should allow interrupt handlers to use static stack a lot of the time.

If you don't want to have C handle all of the save, restore, and RTI logic, you can also mark a function as "no_isr", which keeps the above semantics for static stacks, but makes the function use the original calling convention.

It's quite a complicated set of concepts, but it should allow writing interrupt handlers without completely disabling optimization opportunities like static stacks and zero page register function parameters. Eventually, I'm hoping to improve the zero-page register usage of interrupt functions by automatically promoting their static stacks to the zero page. This would automate the common practice of reserving zero-page registers for interrupt handler usage, ensuring that they don't conflict with main program functionality. We can even pass arguments directly via these locations when the callee is statically known to the caller.

A more thorough discussion is at https://llvm-mos.org/wiki/C_interrupts, and as always, I'm available to answer questions about the approach.

Until next time!

GARTHWILSON · Post by **GARTHWILSON** » Sun Aug 01, 2021 2:02 am

I have not been paying very much attention to this topic, and I don't speak C (I'm done trying) or Clang (and the link in the head post, https://github.com/mysterymath/clang6502, seems to be dead); but it seems like you're making the interrupt & stack stuff more difficult than it needs to be. When the 6502 gets an interrupt request, it sets the interrupt-disable flag as part of the interrupt sequence, so there's no risk of getting further IRQs before you're done unless you explicitly clear the interrupt-disable flag before the RTI. (RTI will pull the old status off the stack which will have the interrupt-disable flag clear unless you intentionally altered this status byte on the stack inside the ISR.) NMI can still hit, but NMI has a different vector. Also, stack operations are, by nature, re-entrant, and there's no need to save stack items or set up another stack. I go into this and much, much more in my 6502 stacks treatise which has 19 chapters plus appendices.

mysterymath · Post by **mysterymath** » Sun Aug 01, 2021 2:30 am

Re the link, the project has since moved to http://github.com/llvm-mos.

The issue is the notion of a "local variable" in C. C routines regularly allocate a reasonably large amount of memory for themselves on a per-invocation basis. These need to be kept separate for each invocation of the function, otherwise they would conflict and overwrite one another.

The size of these locations is regularly on the order of a kilobyte or so, so it's not reasonable to place them on the hard stack or a single-page data stack. So the usual C way of doing this is to maintain a stack pointer and do all accesses by indirecting through it. That works fine so far as it goes, but it's a lot worse than the usual way of doing assembly language programming on the 6502, since the indirect addressing modes are much slower than the absolute ones.

Assembly programmers can reason that two invocations of the function can't *actually* be active at the same time, so the local values don't need to be dynamically allocated; they can be placed somewhere statically in RAM. A really fancy assembly language programmer will also reason that if two functions can never simultaneously be active, their statically-allocated local regions can freely overlap.

One of the goals of this compiler project is to perform that type of analysis in the compiler itself. This requires the compiler to have a basic understanding of the semantics of interrupts. It's not *necessary* for a compiler to do this; the cc65 stack pointer model works fine, but it's not necessary for a compiler to do any optimization at all really. A big part of the fun of this project is to see just how much of the "usual" assembly language programming practice can be automated. Automation of this type can be far from perfect while still being useful; it the work is tiresome and error-prone, then compilers can through sheer tirelessness and inability to get bored do a more consistent job of applying these techniques to a large program than a human can.

It's worth noting that I picked a particular "style" of assembly language programming to target; there are many others. Forth, SWEET16 or ArcheronVM would all be interesting execution and runtime models to target with a compiler, but they're a bit less interesting to me, maybe just because they're a layer removed from the machine. The code density for any of these would likely be quite better than the approach I've taken though.

BigEd · Post by **BigEd** » Sun Aug 01, 2021 8:40 am

(Following this with great interest!)

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Aug 01, 2021 10:50 am

mysterymath wrote:

Re the link, the project has since moved to http://github.com/llvm-mos.

Don’t forget to update the lead post with the new link.

Quote:

The issue is the notion of a “local variable” in C. C routines regularly allocate a reasonably large amount of memory for themselves on a per-invocation basis. These need to be kept separate for each invocation of the function, otherwise they would conflict and overwrite one another.

The size of these locations is regularly on the order of a kilobyte or so, so it’s not reasonable to place them on the hard stack or a single-page data stack. So the usual C way of doing this is to maintain a stack pointer and do all accesses by indirecting through it. That works fine so far as it goes, but it’s a lot worse than the usual way of doing assembly language programming on the 6502, since the indirect addressing modes are much slower than the absolute ones.

You are encountering one of the reasons C was never popular on the 6502.

C is a stack-hungry language whose use of memory is not very efficient. 6502 assembly language tends to be stack-frugal and if written by someone who knows what they are doing, uses memory quite efficiently. Trying to duplicate 6502 assembly language’s efficiency with C is akin to trying to make a tugboat perform at the level of a cigarette boat.

The use of zero page (direct page on the 65C816) as an extension to the MPU’s registers is a basic 6502 idiom that is like death and taxes. It is difficult to write any practical 6502 assembly language program without using indirection, which means zero page will get into the picture. Considering the recursive nature of C, managing zero page becomes a complicated task. You have to snap-shot zero page with each function call, which means a loop has to be executed to copy the current zero page to somewhere else in memory. When the function exits you have reverse the process. Given that many 6502 systems don’t have a lot of RAM, a program that uses too much recursion can run out of places to stash copies of zero page.

Aside from the RAM consumption caused by making snap-shots of zero page, performance on function calls won’t be anything to get excited about, due to all that copying, as well as managing the places you’ve assigned to store the snap-shots. A function call will entail a lot more than just executing a JSR, followed later on with an RTS. And then there will be the issue of stack management. The two 6502 C compilers that I’ve monkeyed with used a software stack for parameter passing into and out of functions, and limited hardware stack usage to saving return addresses and temporary copies of registers.

Way back in the mid-1980s, there was a C compiler for the 6502 that was written by the author of the C-128 Buddy assembler (the latter a really fine piece of software). He supplied me with a copy to try out. At the time, my understanding of C was of the K&R variety as run on a typical UNIX system, so I wasn’t working totally blind. Several weeks of fooling around with the compiler convinced me C was just not the 6502’s forte—the programs ran only slightly faster than something written in BASIC. I was able to write smaller and much faster code in assembly language, and in about the same time.

Like Garth, I’m not enamored with C, although for different reasons. C’s algebraic notation doesn’t bother me nearly as much as its mostly-stupid syntax. C is best described as an abstraction of assembly language. Or, as a friend of mine who has written tons and tons of C and 68K assembly language once opined (correctly, I think), C is assembly language dumbed down.

Quote:

Assembly programmers can reason that two invocations of the function can’t *actually* be active at the same time, so the local values don’t need to be dynamically allocated; they can be placed somewhere statically in RAM. A really fancy assembly language programmer will also reason that if two functions can never simultaneously be active, their statically-allocated local regions can freely overlap.

That is one of the real strengths of assembly language. There isn’t something else trying to manage memory for you. You get to decide where things go and if a location is only used part of the time, you can use it for something else the rest of the time (kind of like using UNION in C). You don’t have to be a “fancy” programmer to do this. All you have to do is know what is used by your subroutines and when.

Speaking of which, the 65C816 is much better suited to compiled languages, mainly due to its 16-bit stack pointer and relocatable direct (zero) page. In particular, the 816’s ability to relocate direct page allows each function to have its own direct page that comes and goes when the function is called and later exited. My 816 string library makes extensive use of this capability and although I have not tried it, all of its functions should be recursive.

Quote:

One of the goals of this compiler project is to perform that type of analysis in the compiler itself. This requires the compiler to have a basic understanding of the semantics of interrupts. It’s not *necessary* for a compiler to do this; the cc65 stack pointer model works fine, but it’s not necessary for a compiler to do any optimization at all really. A big part of the fun of this project is to see just how much of the “usual” assembly language programming practice can be automated. Automation of this type can be far from perfect while still being useful; it the work is tiresome and error-prone, then compilers can through sheer tirelessness and inability to get bored do a more consistent job of applying these techniques to a large program than a human can.

I’ve been writing 6502 assembly language programs for 44 years and before that, did bare-metal development on post office mail-sorting computers (although calling one of those contraptions a “computer” was stretching the term). I’ve never seen any compiler that could optimize to the extent that can be accomplished by a skilled assembly language programmer. To be sure, the margin of optimization between the code emitted by a compiler and that written by a skilled assembly language programmer gets smaller as the target MPU gets more complicated. That, however, doesn’t describe any member of the 6502 family.

Having said all that, best of luck with your project.

BigEd · Post by **BigEd** » Sun Aug 01, 2021 11:08 am

I do like the idea of a compiler having a global view of the call graph and the total set of allocations.

I'm a bit less clear on the merits of a big switch which changes the calling convention - but I may have missed something.

On the ARM, the ISRs effectively have a few private registers, and then save as many of the others as they need to. Could the LLVM 6502 Codegen do the same kind of thing?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Aug 01, 2021 12:21 pm

BigEd wrote:

On the ARM, the ISRs effectively have a few private registers, and then save as many of the others as they need to. Could the LLVM 6502 Codegen do the same kind of thing?

I'm having a hard time envisioning that sort of thing with the 6502. The stumbling block is zero page, which the ISR has to respect and, if necessary, preserve. The only practical way to do that would be making a copy somewhere in absolute RAM and then restore it before the ISR finishes. With an arrangement such as that, nested IRQs would be an nightmare to correctly implement.

BigEd · Post by **BigEd** » Sun Aug 01, 2021 12:26 pm

No need to save all of zero page, only the few 'registers' which the ISR needs to use. As I say, having a global view is very helpful.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Aug 01, 2021 12:36 pm

BigEd wrote:

No need to save all of zero page, only the few 'registers' which the ISR needs to use. As I say, having a global view is very helpful.

Yes, I know that, although 'few' could be 'many' if an operating system of any magnitude is running. If the number of ZP locations is small, stashing them on the stack would be feasible. That said, recall my comment about nested ISRs.

That 256 byte stack could suddenly become too small if re-entrancy is a requirement.

MichaelM · Post by **MichaelM** » Sun Aug 01, 2021 12:38 pm

mysterymath wrote:

It's quite a complicated set of concepts, but it should allow writing interrupt handlers without completely disabling optimization opportunities like static stacks and zero page register function parameters.

After reading your description of some of the issues that you're considering for interrupt handling in C, I think Garth's assessment is on target: you're overcomplicating it.

Interrupt handling in C should be as simple possible. Handling all of the complex interactions that you're considering should be left to the programmer. I have written a fair share of interrupt handlers in C for small 8-bit microprocessors, and using C for the interrupt handler is only for convenience and because the handler's functionality or the interrupt rate is not all that critical.

I think that you're assigning more functionality to the interrupt handler than good practice and reality would allow. In other words, your effort to tag the routine as reentrant, and then perform the analysis described in your post is unnecessary. In a typical use case, an interrupt handler cannot be called multiple times. That event, i.e., multiple calls to the same interrupt handler, typically signifies a major system problem: your interrupt handler cannot perform the required processing in the available amount of time. The solution to this problem is not reentrant C-level interrupt handlers, but a faster processor, or slower, less burdensome interrupt-serviced events / functions.

In my opinion, I think tagging a C function as "interrupt @xxxx", (or something similar where @xxxx signifies the location where the "vector" to the function is stored,) should inform your toolchain that all call tree analysis is disabled. Application of any optimizations other than those normally applied should be disabled for any C interrupt handlers, especially for microprocessors / microcomputers like the 6502/65C02 ISA, since they don't include software interrupts / traps in the ISA like the x86 and other such ISAs. If you allow a C function to be tagged as an interrupt handler, you should assume that the programmer knows what he's doing, i.e., he's programming at a level beyond which normal compiler-implemented global optimizations and analysis fails.

In my opinion, if a programmer is using C for interrupt handling, he's got to understand the "system", and so would the compiler. That is too big a job for the compiler: (1) what is being handled in the function, (2) what is the maximum rate at which the function can be vectored to, (3) what happens if a second event queues before the function completes, etc. I did not see in your post on the analysis that you were considering your compiler to support for interrupt handlers any mention of the issues like the ones I've just listed. Without information like that, which is really outside of the knowledge base that your compiler is constructing for the program, your compiler's ability to "correctly" analyze any interrupt handler's call tree / graph is not possible.

Hence. Garth's assessment that you're overcomplicating the interrupt handler compilation issue for your compiler. In the past whenever I've written interrupt handlers in C, I performed the analysis required, outside of the compiler, and once convinced that the C function was capable enough to handle the interrupt's functionality, then and only then did I include the C interrupt handler in my program. Until then I implemented the interrupt handler in assembler, and either linked it into my program, or I developed a custom runtime library for my small microprocessor / microcomputer system. Modifying the runtime for small embedded systems, although more difficult than usual, is not outside of the tasks that an embedded systems programmer expects to perform.

The 6502/65C02 ISA is difficult enough as it is to map a C VM onto, and your example optimizations have been very instructive. I have certainly read and reviewed many of your posts on this thread, and I've enjoyed them. My recommendation with respect to interrupt handling in C is to disable global optimizations / analysis, and defer the system analysis to the programmer using your toolchain.

BigEd · Post by **BigEd** » Sun Aug 01, 2021 12:54 pm

(Just to be clear - I think you mean disabling global analysis on behalf of the ISR function, but retaining it for the main part of the application, which is to say all other functions.)

MichaelM · Post by **MichaelM** » Sun Aug 01, 2021 1:01 pm

BigEd wrote:

(Just to be clear - I think you mean disabling global analysis on behalf of the ISR function, but retaining it for the main part of the application, which is to say all other functions.)

Yes.

LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen

Re: LLVM 6502 Codegen