LLVM for MOS update 2020.08.19

Programming the 6502 microprocessor and its relatives in assembly and other languages.
johnwbyrd
Posts: 89
Joined: 01 May 2017

LLVM for MOS update 2020.08.19

Post by johnwbyrd »

Hi folks, I just wanted to give you an update on the status of my MOS backend for the LLVM compiler suite, as of 2020.08.10. The code is very raw at this point, but the LLVM assembler successfully compiled a hello-world type program for three 6502 targets, thus providing a proof of concept.

The LLVM assembler, llvm-mc, understands and assembles all NMOS 6502 opcodes. The assembler correctly understands symbols, and it's possible to use them as branch targets, do pointer math on them, and the like. Fixups work as expected at link time. The assembler correctly deals with 6502 relative branches. BEQ, BCC, etc., all correctly calculate PC relative offsets in the unusual 6502 convention, in the range of [-126,+129]. Since llvm-mc is GNU assembler compatible, you can use all GNU assembler features while writing 65xx code, including macros, ifdefs, and similar.

The assembler is capable of intelligently figuring out whether symbols should refer to zero page or 16-bit locations, at the time of compilation. If, at compile time, you place a symbol in a section named ".zeropage", ".directpage", or ".zp", then that symbol will be assumed to be located in zero page; otherwise, it will be assumed to refer to a 16-bit address.

The assembler and linker both understand that $ is a legal prefix for hexadecimal constants. Much existing 6502 assembly code depends on this older convention. Everything that depends on the lexer (which is almost everything in LLVM) can now recognize 6502 format hexadecimal constants. The modern 0x prefix works fine as well.

Both the assembler and the linker support the ELF format, for both object files and executables. The ELF format has been extended with a machine type of 6502 (naturally) to permit storing 65xx code in ELF files. Also, the ELF file format has been extended to support 65xx compatible processors, and it includes support for 65xx specific relocations and fixups.

Because the 6502 assembler and linker both work with ELF files, you can use any of your favorite ELF tools to inspect or understand ELF files generated by the LLVM tools. The llvm-readobj, llvm-objdump, llvm-objcopy, llvm-strip, and likely the other command line tools as well, work as expected. This also means that generic tools that work on ELF files, can read and dump basic information about MOS executables.

Hello-world type programs have been proven to compile, and work as expected, on emulated Commodore 64, VIC-20, and Apple II machines.

C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.

I'm interested in finding people with LLVM experience, who want to work on this project with me, either by developing new features or beating out bugs. LLVM is a huge code base, and the barrier to being productive in it, is quite high; but LLVM should be usable now, as a gas-compatible assembler and linker, for MOS and MOS clones.

Please do not post this information to social media. The code is absolutely not ready for mass consumption yet.

UPDATE 2021.06.21: This work has been merged with Daniel Thornburgh's work to create a functional C compiler. Please see https://www.llvm-mos.org for an overview.
Attachments
hello-c64.png
hello-c64.png (98.1 KiB) Viewed 4266 times
hello-vic20.png
hello-vic20.png (48.68 KiB) Viewed 4266 times
hello-apple2.png
hello-apple2.png (6.41 KiB) Viewed 4266 times
Last edited by johnwbyrd on Thu Jun 17, 2021 6:00 am, edited 5 times in total.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: LLVM for MOS update 2020.08.19

Post by BigEd »

Wow - this seems like a major step forward!
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: LLVM for MOS update 2020.08.19

Post by johnwbyrd »

BigEd wrote:
Wow - this seems like a major step forward!
A baby step only. But this is the first actual 6502 LLVM project that does more than merely take up disk space.
White Flame
Posts: 704
Joined: 24 Jul 2012

Re: LLVM for MOS update 2020.08.19

Post by White Flame »

Nice! It'll be interesting to see how a C ABI will map to it, and how much existing optimization can be reused from LLVM to have it take good advantage of zeropage etc.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: LLVM for MOS update 2020.08.19

Post by johnwbyrd »

White Flame wrote:
Nice! It'll be interesting to see how a C ABI will map to it, and how much existing optimization can be reused from LLVM to have it take good advantage of zeropage etc.
Well, there will be two calling conventions to the ABI. The lowest level is how the LLVM instructions will be implemented and lowered to 6502 code. I foresee LLVM being able to generate speed-optimized and size-optimized code. If the code generated is speed optimized, then LLVM will lower all LLVM instructions directly to 65xx. However, if LLVM is asked to generate size-optimized code, then LLVM will lower to a 6502-specific bytecode which is parsed by a run-time interpreter.

Conveniently, the latter calling convention was designed for me by the other more intelligent members of this board, on this thread: viewtopic.php?f=2&t=6181

Here is a concept sketch of the calling conventions and how they are similar: https://github.com/johnwbyrd/llvm-mos/w ... n-thoughts

Because of the way that LLVM optimizes code, I think that the calling convention (while it must of course exist) will be less important than it is for cc65. i strongly suspect that the default use case will enable link time code generation, to maximize the possibilities for intelligent reuse of zero page. LLVM registers equal zero page. This idea is not original to me. LLVM spends a lot of time trying to figure out how to max out register usage.

Now's a great time for comments or advice, by the way.
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: LLVM for MOS update 2020.08.19

Post by barrym95838 »

Quote:
To implement something like:

Code: Select all

    LOAD rdest, (rsource)
We would just:

Code: Select all

load_indirect_short:
    lda ($1,x)
    sta $0001,y
load_indirect_char:
    lda ($0,x)
    sta $0000,y
Err ... that'll do something, but I don't think it's what you hope it'll do ... [Hint: ($1,x) is no bueno in this perceived context]

The desired behavior is certainly possible, but not nearly as clean, because what you really want is ($0,x),$1 (which unfortunately is not provided natively). I'll give you some time to think of your own solution to get that $1 outside the parentheses ... I can imagine a few different ways, but none of them are pretty.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
User avatar
cjs
Posts: 759
Joined: 01 Dec 2018
Location: Tokyo, Japan
Contact:

Re: LLVM for MOS update 2020.08.19

Post by cjs »

johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.
Actually, I think that the `$` prefix may be more "modern" than the `0x` prefix. :-) The former was not introduced until 1974, by Motorola, right?
Quote:
C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.
Given that I am sad when compiling C code even if it successfully compiles and runs, "working as intended"? :-)

Clever comments aside, this is really great work, and I'm really happy to see it all easily available on GitHub. The only suggestion I would make there is to move all that great documentation in the wiki into `.md` files in your repo instead, so that pulling the repo gets the documentation, too, and so it's easier to track changes to it and to keep the code and documentation updates together. (Markdown files in a repo can be browsed and searched in pretty much the same way as the Wiki; see, e.g., my sedoc repo for an example.)
Curt J. Sampson - github.com/0cjs
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: LLVM for MOS update 2020.08.19

Post by GARTHWILSON »

cjs wrote:
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.
Actually, I think that the `$` prefix may be more "modern" than the `0x` prefix. :-) The former was not introduced until 1974, by Motorola, right?

0x___ is C language. Did it even exist before that? C first appeared in '72; so if $ first appeared in '74, the $ is newer. I sure don't like 0x____. In any other context, x represents a digit that either you don't know or don't care about, or can have more than one value, like x86 and 680x0 in microprocessor families. I suppose a similar thing could be said about $; but assemblers usually will accept ___H as well, or in the case of Microchip's MPASM which I've used a lot, H'__' for example H'7E'.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: LLVM for MOS update 2020.08.19

Post by drogon »

GARTHWILSON wrote:
cjs wrote:
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.
Actually, I think that the `$` prefix may be more "modern" than the `0x` prefix. :-) The former was not introduced until 1974, by Motorola, right?

0x___ is C language. Did it even exist before that? C first appeared in '72; so if $ first appeared in '74, the $ is newer. I sure don't like 0x____. In any other context, x represents a digit that either you don't know or don't care about, like x86 and 680x0 in microprocessor families. I suppose a similar thing could be said about $; but assemblers usually will accept ___H as well.
BCPL, c1966, uses #x as a hexadecimal prefix, or just # as an octal prefix. It's easy to see how that can become 0x for heX (and 0b for Binary and just 0 on its own for Octal) prefixes, given that BCPL led to B which led to C. It makes parsing a constant number easier if nothing else.

The H suffix is typically Intel (as far as I'm aware) and I've only ever see/used it on 8080 (and Z80) code.

Acorn used & as a hexadecimal prefix for constants. (in e.g. BBC Basic)

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: LLVM for MOS update 2020.08.19

Post by BillG »

laoo
Posts: 9
Joined: 20 Jan 2020
Location: Wrocław, Poland

Re: LLVM for MOS update 2020.08.19

Post by laoo »

I must say that this is a project that I was looking for for a while now because I'm currently in need of an assembler that will produce reasonably relocatable code. Something that produces ELF files is just perfect!
I have zero LLVM experience. Only manage to compile your sources (I don't think that correctly though because I can't find llvm-mc in executables), but I've already spotted a bug and made an issue.
I'll try to read the codes and try to help somewhere.

Thanks for that 6502 LLVM implementation!
load81
Posts: 71
Joined: 16 Nov 2018

Re: LLVM for MOS update 2020.08.19

Post by load81 »

Will this eventually get merged into the mainline LLVM codebase? Or, will it remain a fork?
laoo
Posts: 9
Joined: 20 Jan 2020
Location: Wrocław, Poland

Re: LLVM for MOS update 2020.08.19

Post by laoo »

Hi!
You are checking whether a section is zero-page section if its name is ".zp", ".zeropage" or ".directpage".
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: LLVM for MOS update 2020.08.19

Post by johnwbyrd »

load81 wrote:
Will this eventually get merged into the mainline LLVM codebase? Or, will it remain a fork?
Depends on how much traction it gets over time. The LLVM project has an incubator project that I'll probably introduce llvm-mos into at some point. I'd like to firm up a basic SDK with it, and maybe get some feedback from some old school 6502 coders about its stability. Work in progress and all. But yes, I'd like to see it become LLVM_EXPERIMENTAL someday.
Last edited by johnwbyrd on Wed Sep 09, 2020 10:28 pm, edited 2 times in total.
johnwbyrd
Posts: 89
Joined: 01 May 2017

Re: LLVM for MOS update 2020.08.19

Post by johnwbyrd »

laoo wrote:
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.
I don't understand what a "section target specific flag" is.

If you're saying that you'd like to change the meaning of the .section command to add a modifier for indicating that that section goes into zero page, I'll probably want to give a no on that, because LLVM uses gas format assembly, and the gas format has been fixed for decades. AFAIK the gas assembler doesn't know anything implicitly about pointer lengths per platform. When I get to clang, all pointers will be 16 bits, because ANSI C assumes a consistent pointer size.

The .zeropage special section name retains backward compatibility, and it allows the assembly programmer to choose flexibly between 8 vs. 16 bit addresses (and even easily change between them if needed).

The old school method of hard-coding addresses as variables obviously works fine as well.

I tried to think all this through at https://github.com/johnwbyrd/llvm-mos/wiki/Zero-page . Perhaps you can improve things though.
Last edited by johnwbyrd on Wed Sep 09, 2020 2:59 pm, edited 4 times in total.
Post Reply