6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 26, 2024 4:28 pm

All times are UTC




Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Mon Aug 10, 2020 7:47 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 82
Hi folks, I just wanted to give you an update on the status of my MOS backend for the LLVM compiler suite, as of 2020.08.10. The code is very raw at this point, but the LLVM assembler successfully compiled a hello-world type program for three 6502 targets, thus providing a proof of concept.

The LLVM assembler, llvm-mc, understands and assembles all NMOS 6502 opcodes. The assembler correctly understands symbols, and it's possible to use them as branch targets, do pointer math on them, and the like. Fixups work as expected at link time. The assembler correctly deals with 6502 relative branches. BEQ, BCC, etc., all correctly calculate PC relative offsets in the unusual 6502 convention, in the range of [-126,+129]. Since llvm-mc is GNU assembler compatible, you can use all GNU assembler features while writing 65xx code, including macros, ifdefs, and similar.

The assembler is capable of intelligently figuring out whether symbols should refer to zero page or 16-bit locations, at the time of compilation. If, at compile time, you place a symbol in a section named ".zeropage", ".directpage", or ".zp", then that symbol will be assumed to be located in zero page; otherwise, it will be assumed to refer to a 16-bit address.

The assembler and linker both understand that $ is a legal prefix for hexadecimal constants. Much existing 6502 assembly code depends on this older convention. Everything that depends on the lexer (which is almost everything in LLVM) can now recognize 6502 format hexadecimal constants. The modern 0x prefix works fine as well.

Both the assembler and the linker support the ELF format, for both object files and executables. The ELF format has been extended with a machine type of 6502 (naturally) to permit storing 65xx code in ELF files. Also, the ELF file format has been extended to support 65xx compatible processors, and it includes support for 65xx specific relocations and fixups.

Because the 6502 assembler and linker both work with ELF files, you can use any of your favorite ELF tools to inspect or understand ELF files generated by the LLVM tools. The llvm-readobj, llvm-objdump, llvm-objcopy, llvm-strip, and likely the other command line tools as well, work as expected. This also means that generic tools that work on ELF files, can read and dump basic information about MOS executables.

Hello-world type programs have been proven to compile, and work as expected, on emulated Commodore 64, VIC-20, and Apple II machines.

C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.

I'm interested in finding people with LLVM experience, who want to work on this project with me, either by developing new features or beating out bugs. LLVM is a huge code base, and the barrier to being productive in it, is quite high; but LLVM should be usable now, as a gas-compatible assembler and linker, for MOS and MOS clones.

Please do not post this information to social media. The code is absolutely not ready for mass consumption yet.

UPDATE 2021.06.21: This work has been merged with Daniel Thornburgh's work to create a functional C compiler. Please see https://www.llvm-mos.org for an overview.


Attachments:
hello-c64.png
hello-c64.png [ 98.1 KiB | Viewed 3118 times ]
hello-vic20.png
hello-vic20.png [ 48.68 KiB | Viewed 3118 times ]
hello-apple2.png
hello-apple2.png [ 6.41 KiB | Viewed 3118 times ]


Last edited by johnwbyrd on Thu Jun 17, 2021 6:00 am, edited 5 times in total.
Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 10, 2020 7:55 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Wow - this seems like a major step forward!


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 10, 2020 8:15 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 82
BigEd wrote:
Wow - this seems like a major step forward!


A baby step only. But this is the first actual 6502 LLVM project that does more than merely take up disk space.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 12, 2020 11:38 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
Nice! It'll be interesting to see how a C ABI will map to it, and how much existing optimization can be reused from LLVM to have it take good advantage of zeropage etc.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 14, 2020 8:06 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 82
White Flame wrote:
Nice! It'll be interesting to see how a C ABI will map to it, and how much existing optimization can be reused from LLVM to have it take good advantage of zeropage etc.


Well, there will be two calling conventions to the ABI. The lowest level is how the LLVM instructions will be implemented and lowered to 6502 code. I foresee LLVM being able to generate speed-optimized and size-optimized code. If the code generated is speed optimized, then LLVM will lower all LLVM instructions directly to 65xx. However, if LLVM is asked to generate size-optimized code, then LLVM will lower to a 6502-specific bytecode which is parsed by a run-time interpreter.

Conveniently, the latter calling convention was designed for me by the other more intelligent members of this board, on this thread: viewtopic.php?f=2&t=6181

Here is a concept sketch of the calling conventions and how they are similar: https://github.com/johnwbyrd/llvm-mos/w ... n-thoughts

Because of the way that LLVM optimizes code, I think that the calling convention (while it must of course exist) will be less important than it is for cc65. i strongly suspect that the default use case will enable link time code generation, to maximize the possibilities for intelligent reuse of zero page. LLVM registers equal zero page. This idea is not original to me. LLVM spends a lot of time trying to figure out how to max out register usage.

Now's a great time for comments or advice, by the way.


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 16, 2020 1:18 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1926
Location: Sacramento, CA, USA
Quote:
To implement something like:
Code:
    LOAD rdest, (rsource)

We would just:
Code:
load_indirect_short:
    lda ($1,x)
    sta $0001,y
load_indirect_char:
    lda ($0,x)
    sta $0000,y

Err ... that'll do something, but I don't think it's what you hope it'll do ... [Hint: ($1,x) is no bueno in this perceived context]

The desired behavior is certainly possible, but not nearly as clean, because what you really want is ($0,x),$1 (which unfortunately is not provided natively). I'll give you some time to think of your own solution to get that $1 outside the parentheses ... I can imagine a few different ways, but none of them are pretty.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 16, 2020 4:30 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.

Actually, I think that the `$` prefix may be more "modern" than the `0x` prefix. :-) The former was not introduced until 1974, by Motorola, right?

Quote:
C support is unimplemented as of this writing. Don't try to compile C code. You will be sad.

Given that I am sad when compiling C code even if it successfully compiles and runs, "working as intended"? :-)

Clever comments aside, this is really great work, and I'm really happy to see it all easily available on GitHub. The only suggestion I would make there is to move all that great documentation in the wiki into `.md` files in your repo instead, so that pulling the repo gets the documentation, too, and so it's easier to track changes to it and to keep the code and documentation updates together. (Markdown files in a repo can be browsed and searched in pretty much the same way as the Wiki; see, e.g., my sedoc repo for an example.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 16, 2020 4:58 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
cjs wrote:
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.

Actually, I think that the `$` prefix may be more "modern" than the `0x` prefix. :-) The former was not introduced until 1974, by Motorola, right?

0x___ is C language. Did it even exist before that? C first appeared in '72; so if $ first appeared in '74, the $ is newer. I sure don't like 0x____. In any other context, x represents a digit that either you don't know or don't care about, or can have more than one value, like x86 and 680x0 in microprocessor families. I suppose a similar thing could be said about $; but assemblers usually will accept ___H as well, or in the case of Microchip's MPASM which I've used a lot, H'__' for example H'7E'.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 16, 2020 6:15 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1398
Location: Scotland
GARTHWILSON wrote:
cjs wrote:
johnwbyrd wrote:
The assembler and linker both understand that $ is a legal prefix for hexadecimal constants.... The modern 0x prefix works fine as well.

Actually, I think that the `$` prefix may be more "modern" than the `0x` prefix. :-) The former was not introduced until 1974, by Motorola, right?

0x___ is C language. Did it even exist before that? C first appeared in '72; so if $ first appeared in '74, the $ is newer. I sure don't like 0x____. In any other context, x represents a digit that either you don't know or don't care about, like x86 and 680x0 in microprocessor families. I suppose a similar thing could be said about $; but assemblers usually will accept ___H as well.


BCPL, c1966, uses #x as a hexadecimal prefix, or just # as an octal prefix. It's easy to see how that can become 0x for heX (and 0b for Binary and just 0 on its own for Octal) prefixes, given that BCPL led to B which led to C. It makes parsing a constant number easier if nothing else.

The H suffix is typically Intel (as far as I'm aware) and I've only ever see/used it on 8080 (and Z80) code.

Acorn used & as a hexadecimal prefix for constants. (in e.g. BBC Basic)

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 17, 2020 8:15 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
Interesting tangent...

https://en.wikipedia.org/wiki/Hexspeak


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 01, 2020 2:56 pm 
Offline

Joined: Mon Jan 20, 2020 11:22 am
Posts: 9
Location: Wrocław, Poland
I must say that this is a project that I was looking for for a while now because I'm currently in need of an assembler that will produce reasonably relocatable code. Something that produces ELF files is just perfect!
I have zero LLVM experience. Only manage to compile your sources (I don't think that correctly though because I can't find llvm-mc in executables), but I've already spotted a bug and made an issue.
I'll try to read the codes and try to help somewhere.

Thanks for that 6502 LLVM implementation!


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 01, 2020 5:29 pm 
Offline

Joined: Fri Nov 16, 2018 8:55 pm
Posts: 71
Will this eventually get merged into the mainline LLVM codebase? Or, will it remain a fork?


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 03, 2020 10:46 am 
Offline

Joined: Mon Jan 20, 2020 11:22 am
Posts: 9
Location: Wrocław, Poland
Hi!
You are checking whether a section is zero-page section if its name is ".zp", ".zeropage" or ".directpage".
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 09, 2020 1:52 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 82
load81 wrote:
Will this eventually get merged into the mainline LLVM codebase? Or, will it remain a fork?


Depends on how much traction it gets over time. The LLVM project has an incubator project that I'll probably introduce llvm-mos into at some point. I'd like to firm up a basic SDK with it, and maybe get some feedback from some old school 6502 coders about its stability. Work in progress and all. But yes, I'd like to see it become LLVM_EXPERIMENTAL someday.


Last edited by johnwbyrd on Wed Sep 09, 2020 10:28 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 09, 2020 1:57 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 82
laoo wrote:
How about introducing new section target specific flag? It could have a letter "z" in section declaration for example and I think it can be done in the implementation.


I don't understand what a "section target specific flag" is.

If you're saying that you'd like to change the meaning of the .section command to add a modifier for indicating that that section goes into zero page, I'll probably want to give a no on that, because LLVM uses gas format assembly, and the gas format has been fixed for decades. AFAIK the gas assembler doesn't know anything implicitly about pointer lengths per platform. When I get to clang, all pointers will be 16 bits, because ANSI C assumes a consistent pointer size.

The .zeropage special section name retains backward compatibility, and it allows the assembly programmer to choose flexibly between 8 vs. 16 bit addresses (and even easily change between them if needed).

The old school method of hard-coding addresses as variables obviously works fine as well.

I tried to think all this through at https://github.com/johnwbyrd/llvm-mos/wiki/Zero-page . Perhaps you can improve things though.


Last edited by johnwbyrd on Wed Sep 09, 2020 2:59 pm, edited 4 times in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: