6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Sep 19, 2024 9:14 pm

All times are UTC




Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon May 11, 2020 10:43 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Hi all,

I've been experimenting with a skunkworks port of the LLVM tools to the 65xx. I'm at the point where I need to choose standard relocation types for the compiler and linker. LLVM clearly prefers ELF style executables and libraries, and I'm able to emit compiled assembly code plus fixups now.

One thing that's still currently up in the air, is a canonical set of ELF relocation types for the 65xx series. As arbitrary choices sometimes get set in stone, I figured I'd poll for a sanity check instead of declaring them unilaterally.

EDIT 2021.09.13: A version of this specification is viewable at:

https://llvm-mos.org/wiki/ELF_specification

Comments and improvements are solicited.


Last edited by johnwbyrd on Mon Sep 13, 2021 2:25 pm, edited 4 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 7:52 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Welcome! A good idea all round, I think, and good to have read up on previous work.

The only thing I can think of is stack addressing - not sure if that's a different case
TSX
LDA 01xx, X
It's an absolute two byte address, not a relocatable two byte address.

Actually, sometimes the same thing will happen with zero page: for some reason code might use a two byte address to refer to zero page.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 11:41 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8385
Location: Midwestern USA
johnwbyrd wrote:
One thing that's still currently up in the air, is a canonical set of ELF relocation types for the 65xx series...I've read Fachat's .o65 document. If there are any other well-established conventions for 6502 relocation types, I would appreciate being redirected.

André's o65 format is the only one of which I'm aware. It's well thought out and were I seeking to build relocatable 65xx binaries o65 is what I would likely implement.

That said, neither the 6502 or 65C02 is particularly well-suited to this sort of thing. The number one problem is the 65xx instruction set is really geared to fixed addressing—the 6502 was not really conceived as a general-purpose microprocessor. Complicating matters, of course, is the stack being hard-wired to page 1 and the heavy dependence on zero page for indirection. It's not impossible, of course, but very tricky to implement well.

The 65C816 is substantially better suited for this sort of thing due to its movable zero page (aka direct page), 16-bit stack pointer and availability of more flexible addressing modes (e.g., stack relative).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 12:54 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
BigDumbDinosaur wrote:
johnwbyrd wrote:
One thing that's still currently up in the air, is a canonical set of ELF relocation types for the 65xx series...I've read Fachat's .o65 document. If there are any other well-established conventions for 6502 relocation types, I would appreciate being redirected.

André's o65 format is the only one of which I'm aware. It's well thought out and were I seeking to build relocatable 65xx binaries o65 is what I would likely implement.


I have also been frustrated by the relocation types in the ELF format. It appears there are different sets for each processor type and they are not well documented without having to find source code which has to deal with it.

I am avoiding things like o65 and x86 OMF out of fear they may not be well suited for other processors.

BigDumbDinosaur wrote:
That said, neither the 6502 or 65C02 is particularly well-suited to this sort of thing. The number one problem is the 65xx instruction set is really geared to fixed addressing—the 6502 was not really conceived as a general-purpose microprocessor. Complicating matters, of course, is the stack being hard-wired to page 1 and the heavy dependence on zero page for indirection. It's not impossible, of course, but very tricky to implement well.


I get the distinct impression he is looking for a linkable file format and not a relocatable binary image.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 2:10 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
In my linker I export the required expression wrapped in an element that determines the target size. It lets you do some interesting things:
Code:
   .EXTERN LABA
   .EXTERN LABB
   .EXTERN LABC
   
LABD    .EQU (LABA-LABB)*2+LABC/3

   jmp   LABD

assembles into (my object code format is XML)
Code:
<?xml version='1.0'?>
<module target='65XX' endian='little' byteSize='8' name='test.obj'>
<section name='.code' size='3408'>4C<word><add><mul><sub><ext>LABA</ext><ext>LABB</ext></sub><val>2</val></mul><div><ext>LABC</ext><val>3</val></div></add></word>...

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 9:00 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Quote:
The only thing I can think of is stack addressing - not sure if that's a different case
TSX
LDA 01xx, X
It's an absolute two byte address, not a relocatable two byte address.


Perhaps I wasn't entirely clear in my original post... Although this work may have future applications to dynamic loaders, and relocations on loading and such, I'm really just focusing on extending the ELF format sufficiently to encompass all the kinds of fixups that would happen during ordinary linking of 65xx executables. ELF provides a great deal of flexibility for a target to declare its own relocation types. A short overview of ELF relocation types for x64 and x86 is at https://intezer.com/blog/elf/executable-and-linkable-format-101-part-3-relocations/.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 9:04 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Quote:
André's o65 format is the only one of which I'm aware. It's well thought out and were I seeking to build relocatable 65xx binaries o65 is what I would likely implement.


At this particular moment, I'm not intending to create another 6502 dynamic loadable format. This could certainly be done at some point, but I'm merely trying to extend ELF .lib and .o formats to encompass all the types of relocations that would normally be necessary, during LLVM compilation and linking of MOS targets. I'm sure that other authors here have thought through all the possible fixup types in their own linkers. Thoughts?


Last edited by johnwbyrd on Tue May 12, 2020 9:39 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 9:14 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
BitWise wrote:
In my linker I export the required expression wrapped in an element that determines the target size. It lets you do some interesting things:
Code:
   .EXTERN LABA
   .EXTERN LABB
   .EXTERN LABC
   
LABD    .EQU (LABA-LABB)*2+LABC/3

   jmp   LABD



Thanks for the use case. LLVM seems to have support for complicated expression types built in: https://llvm.org/doxygen/classllvm_1_1MCExpr.html So it seems to me that LLVM should be able to handle complex linker expressions already, and I don't personally have to do anything special to support it.

You probably already know that LLVM uses gas as an assembly format. I've extended LLVM to understand that $ is a hex prefix, as so much existing 6502 code depends on that convention. The more modern 0x convention works as well, just because that's what LLVM does anyway.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 12, 2020 9:38 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Quote:
I have also been frustrated by the relocation types in the ELF format. It appears there are different sets for each processor type and they are not well documented without having to find source code which has to deal with it.


Absolutely. I think you can appreciate my concern about checking with the community, before I run off and implement Yet Another Relocation Convention. Whichever one we go with, I'll make sure it is sanely documented somewhere.

Quote:
I am avoiding things like o65 and x86 OMF out of fear they may not be well suited for other processors.


Agreed. ELF contains a superset of information contained in those formats, and it might be possible at some point to write an elf2o65 converter; but premature optimization is still the root of all evil.

BillG wrote:
I get the distinct impression he is looking for a linkable file format and not a relocatable binary image.


BillG, you are entirely correct. The linkable file format that LLVM prefers, is good old ELF. ELF must be extended per platform to support exactly the types of fixups that can exist for that particular platform, during normal linking; but everyone seems to write these relocations without consulting anyone else, or documenting their work.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 13, 2020 2:56 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
I've been working on an assembler that currently outputs ELF as one of its output formats. I do intend to implementing linking as well, but currently it only outputs a .o file. One of my next steps is relocations, so I could use the same constants that you define. Eventually I'll probably also need a relocating loader. Potentially I could use the llvm linker instead?

So far readelf is able to read my output file and make sense of it. My next step was adding relocations but I haven't started that yet.

I wasn't too concerned with ELF compatibility as I didn't think that there was anything else out there to be compatible with, but I've been using the linux tools for testing, readelf etc. and it's quite helpful to maintain ELF compatibility so that these tools can be used.

ELF doesn't have standard constants for 6502, 65816 etc. as far as I could find. It would be good to document these too so that everyone uses the same values. I've made my own up currently, below is what I have been using.

The machine architecture. This is problematic for standard ELF as I started the numbering again, so these values would need to change if we had any sort of standard:

Code:
/* Legal values for e_machine (architecture).  */

#define EM_NONE          0              /* No machine */
#define EM_816           1              /* 65C816 */
#define EM_C02           2              /* 65C02 */
#define EM_02            3              /* NMOS 6502 */
#define EM_RC02          4              /* Rockwell 65C02 */
#define EM_RC19          5              /* Rockwell C19 */
#define EM_NUM           6


I have these (future) OS definitions. If there was such a thing as standard calling conventions we could use these constants to identify the calling convention standards.

Code:
/* Legal values for e_type (object file type).  */

#define EI_OSABI                7       /* OS ABI identification */
#define ELFOSABI_OS16           66      /* OS 16 */
#define ELFOSABI_OS8            65      /* OS 8 */
#define ELFOSABI_STANDALONE     255     /* Standalone (embedded) application */


I've started work on relocations, it looks like you have all the values covered for 6502 up to 65816 for simple cases. There could be more if, for example, you allowed for linking Program Counter Relative Long values across code modules. But we probably don't need that sort of complexity.

I have used Direct Page as a name in my code instead of Zero Page as it's more generalised. If you are working on 6502 output for llvm then targeting the 65C816 would be worthwhile as well as it should generate significantly better code.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 13, 2020 6:33 am 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Code:
#define EM_NONE          0              /* No machine */
#define EM_816           1              /* 65C816 */
#define EM_C02           2              /* 65C02 */
#define EM_02            3              /* NMOS 6502 */
#define EM_RC02          4              /* Rockwell 65C02 */
#define EM_RC19          5              /* Rockwell C19 */
#define EM_NUM           6


Hmm, if you're defining e_machine then I think those values are not compatible with the current ELF standard. My understanding is that there should be a single value for everything under the MOS umbrella (EM_MOS, let's say) and there should be architecture-specific e_flags values for each of the various instruction sets. So, for example, there should be exactly one e_machine value for everything related to MOS (shall we say 6502?), and there should be e_flags that look something like:

Code:
  EF_MOS_ARCH_6502    = 0x10,
  EF_MOS_ARCH_6502X   = 0x20,
  EF_MOS_ARCH_65SC02  = 0x30,
  EF_MOS_ARCH_65C02   = 0x40,
  EF_MOS_ARCH_SWEET16 = 0x50,
  EF_MOS_ARCH_65816   = 0x60


Here is how I'm currently doing it: https://github.com/johnwbyrd/llvm-mos/blob/mos/master/llvm/include/llvm/BinaryFormat/ELF.h

Quote:
I have these (future) OS definitions. If there was such a thing as standard calling conventions we could use these constants to identify the calling convention standards.


I do have some ideas about how a calling convention will behave with LLVM, but of course nothing is set in stone. Please feel free to review and comment on https://github.com/johnwbyrd/llvm-mos/wiki/To-do, specifically the Calling Convention section.

Code:
/* Legal values for e_type (object file type).  */

#define EI_OSABI                7       /* OS ABI identification */
#define ELFOSABI_OS16           66      /* OS 16 */
#define ELFOSABI_OS8            65      /* OS 8 */
#define ELFOSABI_STANDALONE     255     /* Standalone (embedded) application */


I'm not sure how an eabi even makes sense for 6502. I'm not aware of an "OS 16", are you? It's not like it's Linux versus Win32 out there for 6502. I suggest just using eabi-none, unless you can think of a compelling reason not to.

Quote:
I have used Direct Page as a name in my code instead of Zero Page as it's more generalised. If you are working on 6502 output for llvm then targeting the 65C816 would be worthwhile as well as it should generate significantly better code.


At the moment I'm simply referring to it as 8-bit addressing. "Direct page" to me implies that the other 16 bit memory addresses are accessed indirectly, and indirect addressing has another meaning on the 65xx.

I certainly want to leave the door open to '816 compatible output in the future, so yes, we might as well get those constants solidified early on.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 13, 2020 7:23 am 
Offline

Joined: Thu Apr 23, 2020 5:04 pm
Posts: 50
johnwbyrd wrote:
ADDR8 A zero-page address.
ADDR16 A 16-bit address.
ADDR16_B0 The low byte of a 16-bit address.
ADDR16_B1 The high byte of a 16-bit address.
ADDR24 A 24-bit address: 16 bits plus an 8-bit segment.
ADDR24_B0 The lowest byte of a 24-bit address.
ADDR24_B1 The middle byte of a 24-bit address.
ADDR24_B2 The segment byte of a 24-bit address.
ADDR32 A 32-bit address (future expansion)
ADDR32_B0 Byte 0 of a 32-bit address (future expansion)
ADDR32_B1 Byte 1 of a 32-bit address (future expansion)
ADDR32_B2 Byte 2 of a 32-bit address (future expansion)
ADDR32_B3 Byte 3 of a 32-bit address (future expansion)


PC-relative addressing probably should be added.

I am not sure whether additional relocations might be useful for non-base models of the 6502 series.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 13, 2020 12:45 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
With respect to sub-models of the 65xx family, I think there are three major generations of the architecture, with some degree of upward compatibility.

1: The NMOS 6502. Supports the common minimum instruction set. Also supports some extended, "undocumented" opcodes that do not exist on later models. There are some bugs which a compiler and/or linker needs to be aware of and avoid.

2: The CMOS 6502. Supports an improved instruction set and fixes most of the noticeable bugs in the NMOS design. There are two common extensions to this architecture model which are unlikely to be used by compilers: the Rockwell instructions and the WDC WAI/STP instructions.

3: The 65816. As a compiler target, this is a very different animal from the earlier CPUs, with much more flexible addressing modes and better support for 16-bit operations. Is incompatible with the Rockwell instruction set extension, but usually implements WDC WAI/STP. Technically upwards compatible with earlier models, but only in Emulation mode; most systems will however run in Native mode.

There are also some very rare variants, which are not useful to write a compiler backend for.

So an e_flags field might look like this:
Code:
EF_MOS_ARCH_6502  = 0x01,  // basic architecture as documented in MOS datasheets
EF_MOS_ARCH_65C02 = 0x03,  // superset of 6502
EF_MOS_ARCH_65816 = 0x07,  // superset of 65C02, considerably enhanced
EF_MOS_ARCH_6502_NMOS     = 0x10,  // undocumented quirks/bugs of early CPUs
EF_MOS_ARCH_6502_ROCKWELL = 0x20,  // RMB, SMB, BBR, BBS instructions
EF_MOS_ARCH_6502_WDC      = 0x40,  // WAI and STP instructions


Top
 Profile  
Reply with quote  
PostPosted: Wed May 13, 2020 7:07 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Quote:
PC-relative addressing probably should be added.


D'oh! Of course, thank you very much.


Top
 Profile  
Reply with quote  
PostPosted: Thu May 14, 2020 3:32 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8385
Location: Midwestern USA
color=#000000]
Chromatix wrote:
2: The CMOS 6502. Supports an improved instruction set and fixes most of the noticeable bugs in the NMOS design. There are two common extensions to this architecture model which are unlikely to be used by compilers: the Rockwell instructions and the WDC WAI/STP instructions.

Just to be clear, the "Rockwell instructions" (BBR, SMB, etc.) were backported into the core 65C02 design by WDC shortly after conception (it's a not-uncommon misconception the 65C02 was a Rockwell design—Rockwell was licensed by WDC to produce the 'C02). It is extremely unlikely one will encounter a 65C02 that doesn't have them.

That said, and as often noted, BBR, SMB and the like are limited in value, generally not being of much use unless device registers appear in zero page. If I were designing a 65C02 compiler I would ignore those instructions, since the more flexible TRB and TSB instructions can do what BBR, BBS, RMB and SMB do.[/color]

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 26 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: