6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Nov 11, 2024 1:19 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Fri Oct 30, 2020 4:47 am 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
Hi all,

My LLVM port proceeds apace, and the assembler portion of things is stabilizing. One of the biggest questions I'm dealing with now, is how LLVM decides to use 8 bit vs 16 bit address forms in an instruction. I'm interested in getting feedback from you regarding my current approach. I'm hoping it will cover everyone's use cases now and in the future.

The 65xx series of processors has multiple types of instructions that can be encoded differently, depending on the size of the target address. For example, consider:

Code:
lda hello,x


In the 6502 case, this instruction could be encoded as 0xb5, indicating that hello is an 8-bit (zero page) address. Alternately, it might be encoded as 0xbd, indicating that hello is a 16-bit address.

In order to determine which encoding to use, the assembler must either (1) calculate the final value of the hello symbol, or (2) receive a hint as to which address space that hello should exist in.

LLVM's assembler has a feature called relaxation, in which individual instructions may be replaced with larger instructions, depending on whether the target address can be encoded in the smaller instruction or not. However, the exact encoding of an instruction depends on the value that the symbol resolves to. And the value of that symbol might not be resolved until link time. By the time that the linker is running, the relaxation step of the assembler is well over.

This chicken-and-egg problem has multiple solutions. It might be possible to create different pseudo opcodes, such as lda8 and lda16, that map directly to one specific encoding. But this solution is incompatible with all existing 6502 code. It also might be possible to modify the LLVM linker to rerun the assembly relaxation step once all memory addresses are finalized during linking. The gcc toolchain has some support for this. But as of this writing, this idea is still novel for the LLVM toolchain.

The solutions I've gone with, provide multiple ways to tell the assembler and the linker that you want to put a symbol in zero page.

A symbol will resolve to an 8-bit address, and instruction encoding will occur under that assumption, if at least one of the following are true:

1. the value of the symbol resolves, at assembly time, to an 8-bit non-zero constant; or
2. the symbol is previously defined in a section with one of the following names: .zp, .zeropage, and .directpage; or
3. the symbol is defined in a section marked with the special z flag.

If none of these conditions apply, then the symbol will refer to a 16-bit address, and instruction encoding will take place under that assumption.

So, one way to force a zero page access is fairly straightforward:

Code:
low_addr = 55 + 2 * 4
lda low_addr,x


Just define the address as a constant expression, and the assembler will deduce that a zero-page opcode is required. This is the classic solution, and most 8-bit programmers will be comfortable with it. The downside to this method, is that you have to do all memory management yourself, when ELF and the linker already have the information they need to assign those 8-bit addresses for you.

Another way to force a zero page access, is to tell the assembler your intention, by placing the symbol in one of the specially named zero-page sections:

Code:
.section zp
low_short:
.byte 0x00 0x00
.section text
high_short:
.byte 0x01 0x00
lda low_short,x


A third way to force a zero page access is to mark the section with the special z flag:

Code:
.section .lowmemory,"z",@nobits
adrlowmemory: .ds.b 1


Here, the assembler will understand that the adrlowmemory symbol will eventually be located in zero page. Therefore all subsequent references to it will require one byte.

This solution lets the linker figure out the exact location in zero-page memory where the symbol can go. It's up to the linker script to choose a reasonable zero-page location, on a per-target basis, for symbols marked as described above. Meanwhile, the assembler gets the hint it needs to make the lda instruction reference zero page, not 16-bit memory.

If you know nothing about this feature, then by default all symbols end up in 16-bit memory. This is the safest, but most memory hungry, option. This should work fine for most applications. But people who want to mess with zero page directly, have a choice of weapons in their arsenal to get there.

I should also mention that all this information gets serialized into ELF objects and executables, so that future tools can do other transforms or analysis on 6502 binary code directly.

Comments or thoughts or improvements?


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 30, 2020 5:23 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
I think the linker scenario is the only one where this is relevant. Do you really need a linker if you're the only one working on a project? (I truly don't know; but I've never needed one myself, which means this is all musings on my part. Assemblers running on modern computers are so fast it's no problem to assemble the entire 6502 project every time with INCLude files. Having various parts pre-assembled, then linking them, would not save any substantial amount of time. Perhaps there's something I'm not considering.) You can also tell from the addressing mode, as there are some that are ZP-only. Otherwise, commercial assemblers I've used allow you force the ZP mode by saying for example STA <foobar,X, where the < means "take the low byte and discard the high byte." Since the result is 8-bit, you'll get the ZP version. (Since I have not needed a linker, I can only hope this would work.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 30, 2020 8:32 am 
Offline

Joined: Thu Apr 23, 2020 5:04 pm
Posts: 50
johnwbyrd wrote:
The solutions I've gone with, provide multiple ways to tell the assembler and the linker that you want to put a symbol in zero page.

A symbol will resolve to an 8-bit address, and instruction encoding will occur under that assumption, if at least one of the following are true:

1. the value of the symbol resolves, at assembly time, to an 8-bit non-zero constant; or
2. the symbol is previously defined in a section with one of the following names: .zp, .zeropage, and .directpage; or
3. the symbol is defined in a section marked with the special z flag.

In vasm I use option 1 for constants (btw. why non-zero?) and there is a directive "zpage" that tells the assembler that a symbol is zero-page-addresssable. I do not know your framework, but in my toolchain the assembler usually does not know the section an external symbol is defined in. Therefore basing the decision on section names/flags does not work most of the time.


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 30, 2020 9:21 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
GARTHWILSON wrote:
I think the linker scenario is the only one where this is relevant. Do you really need a linker if you're the only one working on a project? (I truly don't know; but I've never needed one myself, which means this is all musings on my part. Assemblers running on modern computers are so fast it's no problem to assemble the entire 6502 project every time with INCLude files. Having various parts pre-assembled, then linking them, would not save any substantial amount of time...)

I'm with Garth on this one. Unless the assembler is running on the target machine and the target machine is slow or the assembly address on the target machine overlaps currently-running code, I too question the point behind linking as part of the object code generation step. Linking makes sense with compiled languages, such as C, mostly because external libraries that are not part of the source code being compiled are being referenced.

In the case of assembly language, at least in the 6502 universe, libraries are often source code as well, and thus have to be assembled along with the main program, even if not edited. Most 6502 software is cross-assembled on machines that are many times faster than the target system. Hence assembly time is really a non-consideration at this point. For example, the firmware for my POC V1.2 unit has nearly 15,000 lines of source code spread out over 61 files. Assembly time runs to about one second. So there is no point to selective assembly and linking—it would take longer than just assembling the entire source code.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 31, 2020 12:46 am 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
Linking makes sense with compiled languages, such as C, mostly because external libraries that are not part of the source code being compiled are being referenced.
Agreed. Do you ever have a situation where you have an assembly file you are reusing but you only use some of the functions in it? What do you in that case? I haven't found a good solution for that other than commenting out unsued parts of the file manually. A linker might be able to help with that.


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 31, 2020 5:06 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8481
Location: Midwestern USA
Druzyek wrote:
Quote:
Linking makes sense with compiled languages, such as C, mostly because external libraries that are not part of the source code being compiled are being referenced.
Agreed. Do you ever have a situation where you have an assembly file you are reusing but you only use some of the functions in it? What do you in that case? I haven't found a good solution for that other than commenting out unsued parts of the file manually. A linker might be able to help with that.

In a case such as you describe I'd use conditional assembly directives.

Code:
.if MPU == 6502
    ...assemble this code...
.else
    .if MPU == 65C02
        ...assemble this code...
    .else
        .if MPU == 65C816
            ...assemble this code...
        .else
            .error "symbol MPU: bogus value!"
        .endif
    .endif
.endif

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 01, 2020 3:51 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Druzyek wrote:
Agreed. Do you ever have a situation where you have an assembly file you are reusing but you only use some of the functions in it? What do you in that case? I haven't found a good solution for that other than commenting out unsued parts of the file manually. A linker might be able to help with that.

Yeah, all the time. Mostly I just split files into ever smaller files, though now that I've read BDD's post above it occurs to me that conditional assembly is probably less messy yet still as reliable if you default all but the always-common code not to assemble. (Reliability, which means not only avoiding a successful assembly of code that won't work, but not accidentally including unused code, is achieved—one hopes—in both cases by not including code by default and relying on assembler errors to tell you about anything you forgot to include.)

This is clearly not nearly as reliable or nice as a linker, which can do this automatically, but having a separate link stage rather than doing whole-program assembly (or compilation) has its own issues, particuarly when trying to produce highly-optimized code. (E.g., dealing optimally with what gets assigned to the zero page in a large program is a lot easier if you can do it from a global view, as we see in this thread.)

So that's why I switched from linking (ASxxxx) to whole-program assembly (AS) a while back; the benefits of the latter outweigh those of the former for me. But it would be nice to be able to somehow bring some of the advantages of linking into the whole-program assembly world.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 01, 2020 4:28 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
cjs wrote:
Yeah, all the time. Mostly I just split files into ever smaller files, though now that I've read BDD's post above it occurs to me that conditional assembly is probably less messy yet still as reliable if you default all but the always-common code not to assemble. (Reliability, which means not only avoiding a successful assembly of code that won't work, but not accidentally including unused code, is achieved—one hopes—in both cases by not including code by default and relying on assembler errors to tell you about anything you forgot to include.)

This is something I have not really thought about until now, but I'll blurt it out anyway. There's a ton of stuff you can do with conditional assembly. If the source code can test for various assembler status and error conditions, you could have conditional assembly ask the assembler if a certain thing is undefined after the first pass (meaning you'd also have to ask it which pass number it's on), and if it is, INCLude a particular source-code file.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 01, 2020 4:55 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
GARTHWILSON wrote:
If the source code can test for various assembler status and error conditions, you could have conditional assembly ask the assembler if a certain thing is undefined after the first pass (meaning you'd also have to ask it which pass number it's on), and if it is, INCLude a particular source-code file.[/color]

Oh, wow, this is a great idea! AS does have a symbol indicating the current pass and, indeed, will do as many passes as necessary to complete an assembly, so one might actually be able to build something quite useful out of this. Thanks for the tip!

This does remind me of another issue that had occurred to me; on the 6800, when especially pressed for space it's nice to be able to locate some subroutines such that a maximal number of callers can use BSR (2 bytes) instead of JSR (3 bytes) to call them. But I don't think that any 6502 variants had a BSR instruction, did they? I know that the 6502 did not, nor many of the 65C02s. This kind of thing might arise with some BRA vs. JMP situations on the 65C02, but it seems less likely to me....

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 01, 2020 5:16 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
cjs wrote:
This does remind me of another issue that had occurred to me; on the 6800, when especially pressed for space it's nice to be able to locate some subroutines such that a maximal number of callers can use BSR (2 bytes) instead of JSR (3 bytes) to call them. But I don't think that any 6502 variants had a BSR instruction, did they? I know that the 6502 did not, nor many of the 65C02s. This kind of thing might arise with some BRA vs. JMP situations on the 65C02, but it seems less likely to me....

It can be synthesized, but not economically as far as the number of bytes goes. On the '816, it would take seven bytes, but reach anywhere in the 64K bank (ie, it's not limited to a -128 to +127 range). In my own experience, only a tiny percentage of the subroutine calls have been to subroutines such a short distance away.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 01, 2020 3:02 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
cjs wrote:
This does remind me of another issue that had occurred to me; on the 6800, when especially pressed for space it's nice to be able to locate some subroutines such that a maximal number of callers can use BSR (2 bytes) instead of JSR (3 bytes) to call them. But I don't think that any 6502 variants had a BSR instruction, did they? I know that the 6502 did not, nor many of the 65C02s. This kind of thing might arise with some BRA vs. JMP situations on the 65C02, but it seems less likely to me....


BSR is only truly useful for hand-optimized code. While a compiler with a good peephole analyzer may be able to replace some JSR instructions, it needs to be able to move a subroutine close to where it is called to really benefit.

It is interesting to note that while the Z80 added relative forms of the absolute and some conditional jumps to the 8080 instruction set, it did not add a relative call.

I have found BRA to be much more useful than BSR.


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 01, 2020 3:24 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
GARTHWILSON wrote:
cjs wrote:
Yeah, all the time. Mostly I just split files into ever smaller files, though now that I've read BDD's post above it occurs to me that conditional assembly is probably less messy yet still as reliable if you default all but the always-common code not to assemble. (Reliability, which means not only avoiding a successful assembly of code that won't work, but not accidentally including unused code, is achieved—one hopes—in both cases by not including code by default and relying on assembler errors to tell you about anything you forgot to include.)

This is something I have not really thought about until now, but I'll blurt it out anyway. There's a ton of stuff you can do with conditional assembly. If the source code can test for various assembler status and error conditions, you could have conditional assembly ask the assembler if a certain thing is undefined after the first pass (meaning you'd also have to ask it which pass number it's on), and if it is, INCLude a particular source-code file.


My home-grown tools do not currently include relocatable object files or a linker.

I have implemented an alternative I call the Poor Man's Linker. It relies on conditional assembly to control whether to include the various library files and which features within each file to activate. A small example can be seen here:

viewtopic.php?p=77431#p77431

It is actually more powerful in some ways than linking in that indivual library subroutines can be somewhat customized by the compiler.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 23, 2020 9:32 pm 
Offline

Joined: Mon May 01, 2017 7:13 am
Posts: 83
GARTHWILSON wrote:
I think the linker scenario is the only one where this is relevant. Do you really need a linker if you're the only one working on a project?


Yes, you would probably only care about late address size resolution if you were linking your code. You could probably compile simple assembly programs without a linker, but for high level languages, I suspect it would be a non-starter to lack a linker in some way. You'll need one for things like dead code stripping or link-time optimization, like llvm does.

GARTHWILSON wrote:
Commercial assemblers I've used allow you force the ZP mode by saying for example STA <foobar,X, where the < means "take the low byte and discard the high byte." Since the result is 8-bit, you'll get the ZP version.[/color]


In llvm parlance, an operation taking a range of bits from a possibly larger address is referred to as a modifier. The llvm-mos project already supports a variety of these, including the < modifier you describe:

https://github.com/johnwbyrd/llvm-mos/wiki/Modifiers-for-assembly

Some test cases:

https://github.com/johnwbyrd/llvm-mos/blob/master/llvm/test/MC/MOS/modifiers.s


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: