LLVM 6502 Codegen
-
mysterymath
- Posts: 79
- Joined: 10 Jan 2021
Re: LLVM 6502 Codegen
Believe it or not it's actually quite a bit easier to have a "modern register based" calling convention, since LLVM has excellent libraries for it. I have a fixed amount of "weirdness points"; I can spend; every time I try to do something like "a ZP data stack", I have to teach llvm how to do it from scratch. Whereas if I just tell it "these zp regs are callee saved", it'll handle basically all of the rest automatically.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: LLVM 6502 Codegen
mysterymath wrote:
If you squint at it the right way, a ZP data stack is another type of contract between caller and callee.
It says: no function is allowed to touch the zero page before this dynamic top of stack pointer. Functions are totally free to write to any region below the top of stack pointer. Functions must bump the pointer to protect themselves against interrupts.
The downside is of course that all accesses are relative to a dynamic location, so they require an indexed addressing mode. Still better than a full 16-bit soft stack, but not as good as direct addressing. It's places additional pressure on the X and Y registers, too. But, I'll have to think about it. Worth noting that you could reserve all zp regs above a certain point as a data stack, so long as it were contiguous (which we don't currently require our zero page to be).
It says: no function is allowed to touch the zero page before this dynamic top of stack pointer. Functions are totally free to write to any region below the top of stack pointer. Functions must bump the pointer to protect themselves against interrupts.
The downside is of course that all accesses are relative to a dynamic location, so they require an indexed addressing mode. Still better than a full 16-bit soft stack, but not as good as direct addressing. It's places additional pressure on the X and Y registers, too. But, I'll have to think about it. Worth noting that you could reserve all zp regs above a certain point as a data stack, so long as it were contiguous (which we don't currently require our zero page to be).
No protection against interrupts is needed; because the ISR never disturbs the stuff that was already on the stack, only adds whatever it needs to in order to do its job, and then cleans up after itself before returning.
The data-stack idea is of course not as efficient as dong everything with direct addressing; but it does open up a lot of possibilities to do things that otherwise would result in tying us up in knots and getting us into situations we can't get out of without starting over. For iterative things that will require a ton of accessing, for example a division routine, Forth moves them from the data stack to a ZP area of about eight bytes, called N, so you can save a cycle or two with every access. N never needs to be pushed or saved, because it is never used to pass parameters from one routine to another; IOW, its use is atomic. When the routine is done with those data, they are moved back to the data stack.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: LLVM 6502 Codegen
mysterymath wrote:
Believe it or not it's actually quite a bit easier to have a "modern register based" calling convention, since LLVM has excellent libraries for it.
P.S. I'm sure that Garth and I (and others) would be delighted to see a hyper-efficient C to FORTH translator, but it would be foolish to assume that something like that is your goal.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: LLVM 6502 Codegen
As the OS (and potentially other things) will already have a static allocation of some part of zero page - and this will vary in size and location between OSes, I think it's necessary to think not about zero page as a whole but about the application's share of zero page.
And as indirecting into zero page is only natural in limited circumstances, most accesses will be absolute rather than relative. It's conventional that some locations in zero page will be globals, or static locals. And that leaves some to be treated somewhat like a register file: some of those are local or scratch or parameters, and some are preserved across calls.
And because we like interrupts to be handled efficiently, and saving chunks of zero page is expensive, I'd be strongly inclined to follow ARM's lead and have a subset of zero page set aside for the register file, or part of the register file, used in an interrupt context.
Which would mean, I think, some specific handling of subroutines or functions called from within interrupt handlers, as they need to use a different section of zero page.
One might ask how common it is for an ISR to use subroutines, and how common for those subroutines also to be used in application context, and how common for that special set to be large and complex. It might (or might not) be simplest to have two implementations of such special special cases.
And as indirecting into zero page is only natural in limited circumstances, most accesses will be absolute rather than relative. It's conventional that some locations in zero page will be globals, or static locals. And that leaves some to be treated somewhat like a register file: some of those are local or scratch or parameters, and some are preserved across calls.
And because we like interrupts to be handled efficiently, and saving chunks of zero page is expensive, I'd be strongly inclined to follow ARM's lead and have a subset of zero page set aside for the register file, or part of the register file, used in an interrupt context.
Which would mean, I think, some specific handling of subroutines or functions called from within interrupt handlers, as they need to use a different section of zero page.
One might ask how common it is for an ISR to use subroutines, and how common for those subroutines also to be used in application context, and how common for that special set to be large and complex. It might (or might not) be simplest to have two implementations of such special special cases.
Re: LLVM 6502 Codegen
BigDumbDinosaur wrote:
Regarding interrupt handling, a basic rule of any ISR running on a 6502 is it should be small and succinct. "Small and succinct" doesn't describe the machine code generated by a typical C compiler. A fat and slow ISR may not be able to meet deadlines for servicing multiple interrupt sources, leading to inadvertent reentrancy. Reentrancy during interrupt processing on the 6502 can quickly exhaust resources and crash the machine due to stack wrap (been there, done that, long ago).
BigDumbDinosaur wrote:
I'm with Garth and Mike: you are over-complicating it. If an ISR written in C needs use of some zero page, that space should be statically allocated to the ISR only—foreground tasks should never touch it. Frankly, I think attempting to write an ISR in C is not a good idea. There are some things that are best done in assembly language on a 6502, and this is one of them.
-
mysterymath
- Posts: 79
- Joined: 10 Jan 2021
Re: LLVM 6502 Codegen
Quick Update:
Following the discussions about interrupt handling here, I started playing around with making the "interrupt-friendly" calling convention the default. This seems to have been a good decision, so I've made the change, and we'll see what breaks in continuous integration.
The short version is that C functions now make an attempt to "leave no trace" when they execute: they're allowed to freely overwrite only 4 bytes of the zero page. They can touch any of the rest of the zero page given to the compiler, but they have to put it back the way they found it before they return. This means that interrupt handlers only need to push the 4 scratch bytes to the stack, along with A, X, and Y (and a handful of compiler temporaries I'm getting rid of.) The C calling convention implicitly handles the rest.
Code quality took somewhat of a hit from this, but it's not nearly as bad as I've feared. The obvious downside is that functions begin by saving off one zero page register after another to the soft stack. Quite a lot; from 10-50 locations. The upshot is that once this is done, the bodies of the functions operate almost entirely in the zero page, as you might expect, while still being fully reentrant. If the function can be proven non-recursive, then the soft stack is just an absolute memory region, so the function copies out of the zero page into another page. Which is a bit silly.
That silliness was quite illuminating, though, and the problem isn't quite as serious as I'd feared.
Say you have the following function:
In order to return a+b, you have to stash it somewhere safe from "bar".
LLVM knows two techniques for this:
1) Save it to the stack before the function call, then reload it from the stack afterwards
2) Save a callee-saved (zero page) register at function entry, and restore it at function exit. Then, just compute the value right into the callee-saved register; bar has already promised not to touch those.
Without any more information, 2 is almost strictly superior to 1, no matter what function you ask the question about. They each produce the exact same number of stores and reloads, so they have the same downside. However, once you do 2), the register is also free throughout the whole function to use for all kinds of things, so 2 has a unique upside.
It's not quite that straightforward in our case though. Using static stacks can be almost as cheap as the zero page, and zero page accesses themselves are about a third of the cost of a soft stack access. And the reload in 1) can often be folded into whatever computation uses the value, e.g., LDY #42, EOR (SP),Y. This becomes dramatically more likely (and cheaper!) if static stacks are used: EOR sstk+42. So in a number of cases we'd really really want to do 1) instead, and those are precisely the cases where it's be really silly to have a prologue that frees up a bunch of the zero page. That's what the prior caller-save-heavy calling convention was actually buying us, it was forcing the compiler to choose 1 instead of 2 by limiting the number of callee-saved registers available.
So we can just teach LLVM how to effectively choose between 1) and 2) on MOS, and that should make the compiler quite tactical about when it's willing to save and restore a zero page register for internal use. We can also do all of the other whole-program zero page allocation discussed previously; this would just have the effect of making the use of reserved zero page registers free for the functions they were reserved for. Other functions could still use them, but they'd have to pay to save/restore them.
So, at the very least, I think there's a relatively clear path from here to good, performant code. Once this change percolates through testing, I'll be back to my end-to-end audit of the compiler for correctness.
Following the discussions about interrupt handling here, I started playing around with making the "interrupt-friendly" calling convention the default. This seems to have been a good decision, so I've made the change, and we'll see what breaks in continuous integration.
The short version is that C functions now make an attempt to "leave no trace" when they execute: they're allowed to freely overwrite only 4 bytes of the zero page. They can touch any of the rest of the zero page given to the compiler, but they have to put it back the way they found it before they return. This means that interrupt handlers only need to push the 4 scratch bytes to the stack, along with A, X, and Y (and a handful of compiler temporaries I'm getting rid of.) The C calling convention implicitly handles the rest.
Code quality took somewhat of a hit from this, but it's not nearly as bad as I've feared. The obvious downside is that functions begin by saving off one zero page register after another to the soft stack. Quite a lot; from 10-50 locations. The upshot is that once this is done, the bodies of the functions operate almost entirely in the zero page, as you might expect, while still being fully reentrant. If the function can be proven non-recursive, then the soft stack is just an absolute memory region, so the function copies out of the zero page into another page. Which is a bit silly.
That silliness was quite illuminating, though, and the problem isn't quite as serious as I'd feared.
Say you have the following function:
Code: Select all
int foo(int a, int b) {
int c = a + b;
bar();
return c;
}
LLVM knows two techniques for this:
1) Save it to the stack before the function call, then reload it from the stack afterwards
2) Save a callee-saved (zero page) register at function entry, and restore it at function exit. Then, just compute the value right into the callee-saved register; bar has already promised not to touch those.
Without any more information, 2 is almost strictly superior to 1, no matter what function you ask the question about. They each produce the exact same number of stores and reloads, so they have the same downside. However, once you do 2), the register is also free throughout the whole function to use for all kinds of things, so 2 has a unique upside.
It's not quite that straightforward in our case though. Using static stacks can be almost as cheap as the zero page, and zero page accesses themselves are about a third of the cost of a soft stack access. And the reload in 1) can often be folded into whatever computation uses the value, e.g., LDY #42, EOR (SP),Y. This becomes dramatically more likely (and cheaper!) if static stacks are used: EOR sstk+42. So in a number of cases we'd really really want to do 1) instead, and those are precisely the cases where it's be really silly to have a prologue that frees up a bunch of the zero page. That's what the prior caller-save-heavy calling convention was actually buying us, it was forcing the compiler to choose 1 instead of 2 by limiting the number of callee-saved registers available.
So we can just teach LLVM how to effectively choose between 1) and 2) on MOS, and that should make the compiler quite tactical about when it's willing to save and restore a zero page register for internal use. We can also do all of the other whole-program zero page allocation discussed previously; this would just have the effect of making the use of reserved zero page registers free for the functions they were reserved for. Other functions could still use them, but they'd have to pay to save/restore them.
So, at the very least, I think there's a relatively clear path from here to good, performant code. Once this change percolates through testing, I'll be back to my end-to-end audit of the compiler for correctness.
Re: LLVM 6502 Codegen
Sounds great!
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: LLVM 6502 Codegen
johnwbyrd wrote:
From this comment I can tell definitively that you have never examined the output of LLVM-MOS.
Yep! I've never seen such output.
Quote:
Please review the Findings section of https://llvm-mos.org/wiki/Rationale .
I did read it some time ago and am of the opinion what I read are claims, not proofs. I didn't see any code that could be examined and compared to what a competent assembly language programmer would write to perform an identical computation.
Incidentally, in the "code generation overview" page of the LLVM-MOS wiki, there is the following statement:
- The original architects of the NMOS 6502 compensated for the 6502's small number of registers by providing 256 bytes of memory called Zero page, which could be accessed relatively quickly and cheaply by the processor.
x86? We ain't got no x86. We don't NEED no stinking x86!
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: LLVM 6502 Codegen
What does LLVM stand for? The wiki says it doesn't stand for anything; but I expect the "VM" is the usual "virtual machine," and the "LL" must have come from something! I suppose I would feel just slightly less disoriented if I knew.
I've spent quite a lot of time reading the various wiki pages. LLVM-MOS appears to be quite a noble undertaking, but one we should view as being far from complete, and know that much will be learned by the author(s) as the effort progresses. (Hopefully we all learn something from every project.)
Hopefully it will stir up more excitement when there's a published collection of examples comparing its output to that of other compilers. I wrote earlier, "I suspect that a C compiler could be written that would be smart enough to understand the execution goal and completely rework the approach to fit the way the 6502 does things, and optimize accordingly, but that it just hasn't been done because there's not the necessary market to justify the many man-years it would take to do it. I do suspect it's possible though" (as addressed briefly as the fifth of the common assumptions on the "Rationale" page, and for the reason BDD and others correctly point out, that the '02 is not a compiler-friendly processor). Without such a market, the huge effort might be one of love rather than business, which is commendable nonetheless. And elsewhere, I wrote, "I would really like for someone to improve upon cc65. As it is, it generates horribly inefficient code. Anyone with any experience at all can do much better in assembly." and: "I just came across this page about benchmarking the various C compilers for the 6502. CC65 produced much slower, more bloated code than the other C options, although it was more solid." Somewhat related, there's also NutStudio 6502 Tools Guide to which BillG responded, "Thanks for that. Though I have no interest in programming the 6502 family in C or writing a C compiler, the manual is interesting reading as to ideas for generating good code for the 6502."
It might be safe to say that even those of us who won't be using it still look forward to whatever promotion it may give the 65's. Is there a plan to eventually extend it to the CMOS instruction set? Using only the NMOS is fine for something like the C64, but imposes limitations that will undoubtedly make the project more difficult.
I've spent quite a lot of time reading the various wiki pages. LLVM-MOS appears to be quite a noble undertaking, but one we should view as being far from complete, and know that much will be learned by the author(s) as the effort progresses. (Hopefully we all learn something from every project.)
Hopefully it will stir up more excitement when there's a published collection of examples comparing its output to that of other compilers. I wrote earlier, "I suspect that a C compiler could be written that would be smart enough to understand the execution goal and completely rework the approach to fit the way the 6502 does things, and optimize accordingly, but that it just hasn't been done because there's not the necessary market to justify the many man-years it would take to do it. I do suspect it's possible though" (as addressed briefly as the fifth of the common assumptions on the "Rationale" page, and for the reason BDD and others correctly point out, that the '02 is not a compiler-friendly processor). Without such a market, the huge effort might be one of love rather than business, which is commendable nonetheless. And elsewhere, I wrote, "I would really like for someone to improve upon cc65. As it is, it generates horribly inefficient code. Anyone with any experience at all can do much better in assembly." and: "I just came across this page about benchmarking the various C compilers for the 6502. CC65 produced much slower, more bloated code than the other C options, although it was more solid." Somewhat related, there's also NutStudio 6502 Tools Guide to which BillG responded, "Thanks for that. Though I have no interest in programming the 6502 family in C or writing a C compiler, the manual is interesting reading as to ideas for generating good code for the 6502."
It might be safe to say that even those of us who won't be using it still look forward to whatever promotion it may give the 65's. Is there a plan to eventually extend it to the CMOS instruction set? Using only the NMOS is fine for something like the C64, but imposes limitations that will undoubtedly make the project more difficult.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: LLVM 6502 Codegen
LLVM is a massive and relatively new project, relative to GCC, which is a massive and relatively old project. In both cases, the result is a very flexible and powerful compiler with a variety of front-ends, for different high level languages, and a variety of back-ends, to target different processors. The shared middle can contain various optimisation transformations which serve to improve the output for all inputs and all outputs.
Because LLVM is relatively new and a fresh start, it had a chance to be cleaner and simpler and to incorporate new ideas in the field which post-date GCC's beginnings.
Unfortunately, being a massive piece of software, LLVM isn't easy for the beginner developer to contribute - at least, it's far beyond me.
I find it very exciting and promising that someone - in this case mysterymath - is tackling the task of targetting the 6502. This is a serious development project, and very much to be welcomed. It doesn't detract from other, standalone, 6502 compiler projects, or indeed from efforts to get GCC to target the 6502.
From the end-user perspective, the thing to watch out for is the approach of a C compiler for the 6502. Anyone who appreciates 6502 code and the difficulties in generating code efficiently by program could be interested in what's going on here. Anyone who knows about the choices to be made and the difficulties of implementation of a compiler might be interested in what's going on here.
(As ever, anyone not interested in this thread should pass on by.)
Because LLVM is relatively new and a fresh start, it had a chance to be cleaner and simpler and to incorporate new ideas in the field which post-date GCC's beginnings.
Unfortunately, being a massive piece of software, LLVM isn't easy for the beginner developer to contribute - at least, it's far beyond me.
I find it very exciting and promising that someone - in this case mysterymath - is tackling the task of targetting the 6502. This is a serious development project, and very much to be welcomed. It doesn't detract from other, standalone, 6502 compiler projects, or indeed from efforts to get GCC to target the 6502.
From the end-user perspective, the thing to watch out for is the approach of a C compiler for the 6502. Anyone who appreciates 6502 code and the difficulties in generating code efficiently by program could be interested in what's going on here. Anyone who knows about the choices to be made and the difficulties of implementation of a compiler might be interested in what's going on here.
(As ever, anyone not interested in this thread should pass on by.)
Re: LLVM 6502 Codegen
GARTHWILSON wrote:
What does LLVM stand for?
Quote:
This paper describes LLVM (Low Level Virtual Machine), a compiler framework designed to support transparent, lifelong program analysis and transformation for arbitrary programs, by providing high-level information to compiler transformations at compile-time, link-time, run-time, and in idle time between runs. LLVM defines a common, low-level code representation in Static Single Assignment (SSA) form, with several novel features: a simple, language-independent type-system that exposes the primitives commonly used to implement high-level language features; an instruction for typed address arithmetic; and a simple mechanism that can be used to implement the exception handling features of high-level languages (and setjmp/longjmp in C) uniformly and efficiently.
Michael A.
Re: LLVM 6502 Codegen
(Perhaps important to note, the Virtual Machine mentioned here is not for execution, but as an intermediate target for an intermediate code, which can be optimised before translation to an actual hardware instruction set. And then possibly optimised again.)
Re: LLVM 6502 Codegen
BigEd wrote:
I find it very exciting and promising that someone - in this case mysterymath - is tackling the task of targetting the 6502. This is a serious development project, and very much to be welcomed. It doesn't detract from other, standalone, 6502 compiler projects, or indeed from efforts to get GCC to target the 6502.
BigEd wrote:
From the end-user perspective, the thing to watch out for is the approach of a C compiler for the 6502. Anyone who appreciates 6502 code and the difficulties in generating code efficiently by program could be interested in what's going on here. Anyone who knows about the choices to be made and the difficulties of implementation of a compiler might be interested in what's going on here.
So "good enough" is good enough for me (in the case of cc65, today). I was quite impressed that I could compile and run my editor almost un-changed from the projects it's part of (a nano-like screen editor) on my Ruby 6502 boards which is about 1600 lines of C and compiles to about 15KB of code with cc65. So who knows what it will compile to with LLVM...
And to compare... Would I write it in 6502 assembler? No. Not a chance. Sure, it would be smaller, faster, etc. but it would also take me far too long to write, debug and test. Same goes for my multi-tasking OS (viewtopic.php?f=1&t=6736) which runs under the '816, although that's written in BCPL - write it in '816 assembler? No thanks, but in a high level language which I can edit and compile directly on the target - keeps me happy!
There is huge value in high level languages - especially if you can use small snippets of assembler in ultra-critical places, but these days - well, good enough is often more than enough - probably for the same reason that we're now overclocking the 6502/816 to 16, 20, 30, more? Mhz... Is clock speed a substitute for code efficiency? Looks like it might be...
It would also be interesting to know if LLVM could target the Cintcode VM that my BCPL runs under... That might be quite impressive...
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: LLVM 6502 Codegen
GARTHWILSON wrote:
What does LLVM stand for? The wiki says it doesn't stand for anything; but I expect the "VM" is the usual "virtual machine," and the "LL" must have come from something! I suppose I would feel just slightly less disoriented if I knew.
I've spent quite a lot of time reading the various wiki pages. LLVM-MOS appears to be quite a noble undertaking, but one we should view as being far from complete, and know that much will be learned by the author(s) as the effort progresses. (Hopefully we all learn something from every project.)
Hopefully it will stir up more excitement when there's a published collection of examples comparing its output to that of other compilers.
I've spent quite a lot of time reading the various wiki pages. LLVM-MOS appears to be quite a noble undertaking, but one we should view as being far from complete, and know that much will be learned by the author(s) as the effort progresses. (Hopefully we all learn something from every project.)
Hopefully it will stir up more excitement when there's a published collection of examples comparing its output to that of other compilers.
And LLVM-MOS is vastly more standards compliant than any other 6502 C compiler.
No, I don't expect this information will stir up much excitement. We've still got a lot of work to do on the SDK.
Re: LLVM 6502 Codegen
It certainly is exciting, for those who like to watch the journey! Those who are mainly interested in the destination may have to wait.