LLVM for MOS update 2020.08.19
LLVM for MOS update 2020.08.19
Hi folks, I just wanted to give you an update on the status of my MOS backend for the LLVM compiler suite, as of 2020.08.10. The code is very raw at this point, but the LLVM assembler successfully compiled a hello-world type program for three 6502 targets, thus providing a proof of concept.
The LLVM assembler, llvm-mc, understands and assembles all NMOS 6502 opcodes. The assembler correctly understands symbols, and it's possible to use them as branch targets, do pointer math on them, and the like. Fixups work as expected at link time. The assembler correctly deals with 6502 relative branches. BEQ, BCC, etc., all correctly calculate PC relative offsets in the unusual 6502 convention, in the range of [-126,+129]. Since llvm-mc is GNU assembler compatible, you can use all GNU assembler features while writing 65xx code, including macros, ifdefs, and similar.
The assembler is capable of intelligently figuring out whether symbols should refer to zero page or 16-bit locations, at the time of compilation. If, at compile time, you place a symbol in a section named ".zeropage", ".directpage", or ".zp", then that symbol will be assumed to be located in zero page; otherwise, it will be assumed to refer to a 16-bit address.
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants. Much existing 6502 assembly code depends on this older convention. Everything that depends on the lexer (which is almost everything in LLVM) can now recognize 6502 format hexadecimal constants. The modern 0x prefix works fine as well.
Both the assembler and the linker support the ELF format, for both object files and executables. The ELF format has been extended with a machine type of 6502 (naturally) to permit storing 65xx code in ELF files. Also, the ELF file format has been extended to support 65xx compatible processors, and it includes support for 65xx specific relocations and fixups.
Because the 6502 assembler and linker both work with ELF files, you can use any of your favorite ELF tools to inspect or understand ELF files generated by the LLVM tools. The llvm-readobj, llvm-objdump, llvm-objcopy, llvm-strip, and likely the other command line tools as well, work as expected. This also means that generic tools that work on ELF files, can read and dump basic information about MOS executables.
Hello-world type programs have been proven to compile, and work as expected, on emulated Commodore 64, VIC-20, and Apple II machines.
C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.
I'm interested in finding people with LLVM experience, who want to work on this project with me, either by developing new features or beating out bugs. LLVM is a huge code base, and the barrier to being productive in it, is quite high; but LLVM should be usable now, as a gas-compatible assembler and linker, for MOS and MOS clones.
Please do not post this information to social media. The code is absolutely not ready for mass consumption yet.
UPDATE 2021.06.21: This work has been merged with Daniel Thornburgh's work to create a functional C compiler. Please see https://www.llvm-mos.org for an overview.
The LLVM assembler, llvm-mc, understands and assembles all NMOS 6502 opcodes. The assembler correctly understands symbols, and it's possible to use them as branch targets, do pointer math on them, and the like. Fixups work as expected at link time. The assembler correctly deals with 6502 relative branches. BEQ, BCC, etc., all correctly calculate PC relative offsets in the unusual 6502 convention, in the range of [-126,+129]. Since llvm-mc is GNU assembler compatible, you can use all GNU assembler features while writing 65xx code, including macros, ifdefs, and similar.
The assembler is capable of intelligently figuring out whether symbols should refer to zero page or 16-bit locations, at the time of compilation. If, at compile time, you place a symbol in a section named ".zeropage", ".directpage", or ".zp", then that symbol will be assumed to be located in zero page; otherwise, it will be assumed to refer to a 16-bit address.
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants. Much existing 6502 assembly code depends on this older convention. Everything that depends on the lexer (which is almost everything in LLVM) can now recognize 6502 format hexadecimal constants. The modern 0x prefix works fine as well.
Both the assembler and the linker support the ELF format, for both object files and executables. The ELF format has been extended with a machine type of 6502 (naturally) to permit storing 65xx code in ELF files. Also, the ELF file format has been extended to support 65xx compatible processors, and it includes support for 65xx specific relocations and fixups.
Because the 6502 assembler and linker both work with ELF files, you can use any of your favorite ELF tools to inspect or understand ELF files generated by the LLVM tools. The llvm-readobj, llvm-objdump, llvm-objcopy, llvm-strip, and likely the other command line tools as well, work as expected. This also means that generic tools that work on ELF files, can read and dump basic information about MOS executables.
Hello-world type programs have been proven to compile, and work as expected, on emulated Commodore 64, VIC-20, and Apple II machines.
C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.
I'm interested in finding people with LLVM experience, who want to work on this project with me, either by developing new features or beating out bugs. LLVM is a huge code base, and the barrier to being productive in it, is quite high; but LLVM should be usable now, as a gas-compatible assembler and linker, for MOS and MOS clones.
Please do not post this information to social media. The code is absolutely not ready for mass consumption yet.
UPDATE 2021.06.21: This work has been merged with Daniel Thornburgh's work to create a functional C compiler. Please see https://www.llvm-mos.org for an overview.
- Attachments
-
- hello-c64.png (98.1 KiB) Viewed 4273 times
-
- hello-vic20.png (48.68 KiB) Viewed 4273 times
-
- hello-apple2.png (6.41 KiB) Viewed 4273 times
Last edited by johnwbyrd on Thu Jun 17, 2021 6:00 am, edited 5 times in total.
Re: LLVM for MOS update 2020.08.19
Wow - this seems like a major step forward!
Re: LLVM for MOS update 2020.08.19
BigEd wrote:
Wow - this seems like a major step forward!
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: LLVM for MOS update 2020.08.19
Nice! It'll be interesting to see how a C ABI will map to it, and how much existing optimization can be reused from LLVM to have it take good advantage of zeropage etc.
Re: LLVM for MOS update 2020.08.19
White Flame wrote:
Nice! It'll be interesting to see how a C ABI will map to it, and how much existing optimization can be reused from LLVM to have it take good advantage of zeropage etc.
Conveniently, the latter calling convention was designed for me by the other more intelligent members of this board, on this thread: viewtopic.php?f=2&t=6181
Here is a concept sketch of the calling conventions and how they are similar: https://github.com/johnwbyrd/llvm-mos/w ... n-thoughts
Because of the way that LLVM optimizes code, I think that the calling convention (while it must of course exist) will be less important than it is for cc65. i strongly suspect that the default use case will enable link time code generation, to maximize the possibilities for intelligent reuse of zero page. LLVM registers equal zero page. This idea is not original to me. LLVM spends a lot of time trying to figure out how to max out register usage.
Now's a great time for comments or advice, by the way.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: LLVM for MOS update 2020.08.19
Quote:
To implement something like:
We would just:
Code: Select all
LOAD rdest, (rsource)Code: Select all
load_indirect_short:
lda ($1,x)
sta $0001,y
load_indirect_char:
lda ($0,x)
sta $0000,yThe desired behavior is certainly possible, but not nearly as clean, because what you really want is ($0,x),$1 (which unfortunately is not provided natively). I'll give you some time to think of your own solution to get that $1 outside the parentheses ... I can imagine a few different ways, but none of them are pretty.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: LLVM for MOS update 2020.08.19
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.
Quote:
C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.
Clever comments aside, this is really great work, and I'm really happy to see it all easily available on GitHub. The only suggestion I would make there is to move all that great documentation in the wiki into `.md` files in your repo instead, so that pulling the repo gets the documentation, too, and so it's easier to track changes to it and to keep the code and documentation updates together. (Markdown files in a repo can be browsed and searched in pretty much the same way as the Wiki; see, e.g., my sedoc repo for an example.)
Curt J. Sampson - github.com/0cjs
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: LLVM for MOS update 2020.08.19
cjs wrote:
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.
0x___ is C language. Did it even exist before that? C first appeared in '72; so if $ first appeared in '74, the $ is newer. I sure don't like 0x____. In any other context, x represents a digit that either you don't know or don't care about, or can have more than one value, like x86 and 680x0 in microprocessor families. I suppose a similar thing could be said about $; but assemblers usually will accept ___H as well, or in the case of Microchip's MPASM which I've used a lot, H'__' for example H'7E'.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: LLVM for MOS update 2020.08.19
GARTHWILSON wrote:
cjs wrote:
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.
0x___ is C language. Did it even exist before that? C first appeared in '72; so if $ first appeared in '74, the $ is newer. I sure don't like 0x____. In any other context, x represents a digit that either you don't know or don't care about, like x86 and 680x0 in microprocessor families. I suppose a similar thing could be said about $; but assemblers usually will accept ___H as well.
The H suffix is typically Intel (as far as I'm aware) and I've only ever see/used it on 8080 (and Z80) code.
Acorn used & as a hexadecimal prefix for constants. (in e.g. BBC Basic)
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: LLVM for MOS update 2020.08.19
I must say that this is a project that I was looking for for a while now because I'm currently in need of an assembler that will produce reasonably relocatable code. Something that produces ELF files is just perfect!
I have zero LLVM experience. Only manage to compile your sources (I don't think that correctly though because I can't find llvm-mc in executables), but I've already spotted a bug and made an issue.
I'll try to read the codes and try to help somewhere.
Thanks for that 6502 LLVM implementation!
I have zero LLVM experience. Only manage to compile your sources (I don't think that correctly though because I can't find llvm-mc in executables), but I've already spotted a bug and made an issue.
I'll try to read the codes and try to help somewhere.
Thanks for that 6502 LLVM implementation!
Re: LLVM for MOS update 2020.08.19
Will this eventually get merged into the mainline LLVM codebase? Or, will it remain a fork?
Re: LLVM for MOS update 2020.08.19
Hi!
You are checking whether a section is zero-page section if its name is ".zp", ".zeropage" or ".directpage".
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.
You are checking whether a section is zero-page section if its name is ".zp", ".zeropage" or ".directpage".
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.
Re: LLVM for MOS update 2020.08.19
load81 wrote:
Will this eventually get merged into the mainline LLVM codebase? Or, will it remain a fork?
Last edited by johnwbyrd on Wed Sep 09, 2020 10:28 pm, edited 2 times in total.
Re: LLVM for MOS update 2020.08.19
laoo wrote:
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.
If you're saying that you'd like to change the meaning of the .section command to add a modifier for indicating that that section goes into zero page, I'll probably want to give a no on that, because LLVM uses gas format assembly, and the gas format has been fixed for decades. AFAIK the gas assembler doesn't know anything implicitly about pointer lengths per platform. When I get to clang, all pointers will be 16 bits, because ANSI C assumes a consistent pointer size.
The .zeropage special section name retains backward compatibility, and it allows the assembly programmer to choose flexibly between 8 vs. 16 bit addresses (and even easily change between them if needed).
The old school method of hard-coding addresses as variables obviously works fine as well.
I tried to think all this through at https://github.com/johnwbyrd/llvm-mos/wiki/Zero-page . Perhaps you can improve things though.
Last edited by johnwbyrd on Wed Sep 09, 2020 2:59 pm, edited 4 times in total.