LLVM for MOS update 2020.08.19

laoo · Post by **laoo** » Wed Sep 09, 2020 2:25 pm

johnwbyrd wrote:

I don't understand what a "section target specific flag" is.

Obviously I'm not an expert. I've just spotted that the section flags are pretty extetsible. Please look that there are placeholders for target specific flags and processor-specific flags. There are already X86, MIPS and ARM specific flags. Additionally the .section pseudo-op in GAS has a flags argument being a string of letters that are mapped to above-mentioned enumeration. I was just thinking about adding a new mos-specific flag here and not depend on arbitrary list of section names that will denote zero-page sections.

johnwbyrd · Post by **johnwbyrd** » Wed Sep 09, 2020 3:06 pm

cjs wrote:

Clever comments aside, this is really great work, and I'm really happy to see it all easily available on GitHub. The only suggestion I would make there is to move all that great documentation in the wiki into `.md` files in your repo instead, so that pulling the repo gets the documentation, too, and so it's easier to track changes to it and to keep the code and documentation updates together. (Markdown files in a repo can be browsed and searched in pretty much the same way as the Wiki; see, e.g., my sedoc repo for an example.)

Thank you for the encouragement. I take it where I can get it.

I'm aiming to make this port part of LLVM proper someday, and as such I am trying to rigidly follow LLVM coding and documentation standards during development. LLVM uses Sphinx internally for documentation, not Markdown.

Things are in such flux with llvm-mos right now, that I figured it would be best just to document it wiki style until it becomes feature complete.

I intend to be fairly loosey goosey about letting others make changes to the wiki, if anyone feels that they need write access.

White Flame · Post by **White Flame** » Thu Sep 10, 2020 9:08 am

Does LLVM have different classes of registers that it can handle differently? Because maybe you could define zeropage as a set of 256 registers. Just a random thought.

BigEd · Post by **BigEd** » Thu Sep 10, 2020 9:12 am

(Somewhat parenthetically: the small fly in the ointment of 'zero page as 256 registers' is that zero page is 256 bytes. And yet commonly zero page is used to store addresses, for the use of indexed modes, and there's room for 128 of those. And then, they don't need to be aligned. So it's a bit more subtle than a big byte-wide register file. Not that I've ever dealt with the internals of a compiler.)

White Flame · Post by **White Flame** » Thu Sep 10, 2020 10:02 am

Sure, but some other architectures deal with somewhat related issues, like AX overlapping with AH & AL on 8086. But yes, that's different than the 16-bit vector (zp04) overlapping with (zp03) and (zp05), and the fact that a single 8-bit zp location is part of 2 simultaneous vectors.

BigEd · Post by **BigEd** » Thu Sep 10, 2020 10:57 am

On reflection, it wouldn't hurt too much, I think, to insist that pointers should be aligned.

laoo · Post by **laoo** » Thu Sep 10, 2020 12:42 pm

The other thing that might be a problem when treating zp as a register file: when programming in assembly by hand we really seldom treat it as registers. There are virtually no scenarios where a data from memory would be transferred to zp, manipulated and then written back - and that's the way registers work. We even don't have special instructions that does stuff on zp explicitly. It's just a memory that is faster and has additional usage - to store pointers. So I'm afraid that tuning the compiler to tread zp as registers could produce ridiculous scenarios of transferring data between zp and memory hence and forth.

johnwbyrd · Post by **johnwbyrd** » Thu Sep 10, 2020 9:51 pm

laoo wrote:

johnwbyrd wrote:

I don't understand what a "section target specific flag" is.

Obviously I'm not an expert. I've just spotted that the section flags are pretty extetsible. Please look that there are placeholders for target specific flags and processor-specific flags. There are already X86, MIPS and ARM specific flags. Additionally the .section pseudo-op in GAS has a flags argument being a string of letters that are mapped to above-mentioned enumeration. I was just thinking about adding a new mos-specific flag here and not depend on arbitrary list of section names that will denote zero-page sections.

You're touching on an important point here and I want to discuss it in some detail with you.

Obviously it's possible to do what you are describing; that's not in question.

However, the documentation at https://sourceware.org/binutils/docs/as ... ml#Section shows that there are no processor-specific flags currently present in LLVM.

My ultimate goal with this project is to submit it upstream for eventual inclusion in LLVM proper.

To do that, it will be absolutely necessary not to change the semantics of existing functionality in the assembler or the compiler.

This is entirely possible. Please see the AVR backend in LLVM, which took great pains not to break or change anything in the main LLVM body of code.

Adding a new processor-specific flag in the .section command makes it less likely that the work will ever be upstreamed.

Please give the .zeropage section flag a try, and let me know how it works for you. If that does not work for some reason, let's figure out a solution together that does not affect current gas semantics.

Please see also https://github.com/johnwbyrd/llvm-mos/wiki/Philosophy

johnwbyrd · Post by **johnwbyrd** » Thu Sep 10, 2020 9:55 pm

White Flame wrote:

Does LLVM have different classes of registers that it can handle differently? Because maybe you could define zeropage as a set of 256 registers. Just a random thought.

In general, the notion of different sized pointers is alien to the gcc and llvm way of doing things.

We're going to sidestep this whole problem by treating a contiguous range of zero page as LLVM virtual registers.

https://github.com/johnwbyrd/llvm-mos/w ... n-thoughts

johnwbyrd · Post by **johnwbyrd** » Thu Sep 10, 2020 10:23 pm

laoo wrote:

The other thing that might be a problem when treating zp as a register file: when programming in assembly by hand we really seldom treat it as registers. There are virtually no scenarios where a data from memory would be transferred to zp, manipulated and then written back - and that's the way registers work. We even don't have special instructions that does stuff on zp explicitly. It's just a memory that is faster and has additional usage - to store pointers. So I'm afraid that tuning the compiler to tread zp as registers could produce ridiculous scenarios of transferring data between zp and memory hence and forth.

For most compilers, this is an entirely valid concern. However, LLVM has a fairly advanced algorithm for keeping active variables in registers. Our strategy is to try to get LLVM to keep as much as possible in virtual registers (e.g. zero page) at all times.

In practice this does mean that LLVM generated code will move all code from 16-bit memory to 8-bit memory before doing math operations on it and then pushing it back to 16 bit memory. However, I'm expecting that giving LLVM a plentiful range of virtual registers, should allow it to be smarter about what to cache there, and will hopefully avoid thrashing.

Really, this is no different from what most modern 6502 programmers do: they move memory to zero page before multiplying or doing multi-byte operations anyway. Although it seems wasteful to move memory back and forth, it works out to a net performance win because the instructions are so much faster and smaller when working on zero page.

If it does turn out that the memory thrashing is pessimal in some usage cases, I'm sure we can create an LLVM pass to reduce it. But we should deal with that problem when it becomes a bottleneck and not before. Premature optimization and all.

Link-time code generation will, I strongly suspect, be incredibly important for the 6502. As many others have recognized, C stack operations are performance killers on the 6502. In principle, LLVM can simply optimize most or all of them away, if it's able to see the entire program at once at link time.

johnwbyrd · Post by **johnwbyrd** » Thu Sep 10, 2020 10:31 pm

White Flame wrote:

Does LLVM have different classes of registers that it can handle differently? Because maybe you could define zeropage as a set of 256 registers. Just a random thought.

https://github.com/johnwbyrd/llvm-mos/w ... n-thoughts

laoo · Post by **laoo** » Fri Sep 11, 2020 8:54 am

johnwbyrd wrote:

In practice this does mean that LLVM generated code will move all code from 16-bit memory to 8-bit memory before doing math operations on it and then pushing it back to 16 bit memory. However, I'm expecting that giving LLVM a plentiful range of virtual registers, should allow it to be smarter about what to cache there, and will hopefully avoid thrashing.

Really, this is no different from what most modern 6502 programmers do: they move memory to zero page before multiplying or doing multi-byte operations anyway. Although it seems wasteful to move memory back and forth, it works out to a net performance win because the instructions are so much faster and smaller when working on zero page.

Yeah, your right. When there is no other choice and some extensive data crunching must be done, doing it on zero page should amortize the need to copy some data in and out. The main win though for lighter, but more common, algorithms would be to keep temporaries on the zero-page, instead on some software stack.

johnwbyrd · Post by **johnwbyrd** » Sun Sep 13, 2020 8:22 pm

laoo wrote:

johnwbyrd wrote:

In practice this does mean that LLVM generated code will move all code from 16-bit memory to 8-bit memory before doing math operations on it and then pushing it back to 16 bit memory. However, I'm expecting that giving LLVM a plentiful range of virtual registers, should allow it to be smarter about what to cache there, and will hopefully avoid thrashing.

Really, this is no different from what most modern 6502 programmers do: they move memory to zero page before multiplying or doing multi-byte operations anyway. Although it seems wasteful to move memory back and forth, it works out to a net performance win because the instructions are so much faster and smaller when working on zero page.

Yeah, your right. When there is no other choice and some extensive data crunching must be done, doing it on zero page should amortize the need to copy some data in and out. The main win though for lighter, but more common, algorithms would be to keep temporaries on the zero-page, instead on some software stack.

The compiler will need to be able to handle the case where operands really are on the 16-bit stack, as a pessimal case. How's the assembler going over there for you?

After reviewing how clean your section specific zero page flag implementation was, I changed my mind and admitted it into the repository. Clean code is always the best possible argument.

johnwbyrd · Post by **johnwbyrd** » Sat Oct 10, 2020 12:33 am

BigEd wrote:

On reflection, it wouldn't hurt too much, I think, to insist that pointers should be aligned.

As you know, alignment is not strictly necessary on the 65xx series. I'm not sure whether I'll require it or not. Aligning 16 and 32 bit values has the convenient advantage that we don't have to deal with 6502 idiosyncrasies like the JMP (bug), as well as timing discrepancies due to page crossings. I have no particular religion in this regard -- let's see how codegen goes.

fachat · Post by **fachat** » Wed Dec 23, 2020 9:25 pm

johnwbyrd wrote:

laoo wrote:

The other thing that might be a problem when treating zp as a register file: when programming in assembly by hand we really seldom treat it as registers. There are virtually no scenarios where a data from memory would be transferred to zp, manipulated and then written back - and that's the way registers work. We even don't have special instructions that does stuff on zp explicitly. It's just a memory that is faster and has additional usage - to store pointers. So I'm afraid that tuning the compiler to tread zp as registers could produce ridiculous scenarios of transferring data between zp and memory hence and forth.

For most compilers, this is an entirely valid concern. However, LLVM has a fairly advanced algorithm for keeping active variables in registers. Our strategy is to try to get LLVM to keep as much as possible in virtual registers (e.g. zero page) at all times.

In practice this does mean that LLVM generated code will move all code from 16-bit memory to 8-bit memory before doing math operations on it and then pushing it back to 16 bit memory. However, I'm expecting that giving LLVM a plentiful range of virtual registers, should allow it to be smarter about what to cache there, and will hopefully avoid thrashing.

I assume you're meaning "all data" is being moved?

Quote:

Really, this is no different from what most modern 6502 programmers do: they move memory to zero page before multiplying or doing multi-byte operations anyway. Although it seems wasteful to move memory back and forth, it works out to a net performance win because the instructions are so much faster and smaller when working on zero page.

Is that what "modern" programmers do? Maybe I'm old

On the other hand, when I have an optimized library routine this uses zeropage and you'd have to copy data there.

I'd keep the "fast variables" in zeropage, without needing to copy them back and forth. Maybe there is a way to "pin" these "registers" so backing storage in 16bit is not needed?
Other stuff that does not need to be as fast, I keep in 16bit memory even, without the need for copying it around.

Quote:

If it does turn out that the memory thrashing is pessimal in some usage cases, I'm sure we can create an LLVM pass to reduce it. But we should deal with that problem when it becomes a bottleneck and not before. Premature optimization and all.

Link-time code generation will, I strongly suspect, be incredibly important for the 6502. As many others have recognized, C stack operations are performance killers on the 6502. In principle, LLVM can simply optimize most or all of them away, if it's able to see the entire program at once at link time.

I'm looking forward to the results of your analysis! Should be very interesting!

LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19

Re: LLVM for MOS update 2020.08.19