6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 14, 2024 4:54 pm

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Jul 11, 2019 11:04 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
I'm fairly sure Acorn had an approach to relocating code at one point: I think the ROM image was followed by a list of patch-up addresses. I'm not sure though that there's any approach in Acorn land to run code where the load address is not known until load time - it would be useful if there were, because in Acorn land the free RAM, after OS use, can start (and end) at a variety of places. Sometimes an application might load at 400, or at E00, or 1900, or 8000, or B800, depending. Other places too.

What would be nice then, would be an approach and a format which will run anywhere, possibly relocating itself. At least two things are needed:
- an executable header which is position independent and able to discover its location
- a means of patching up absolute addresses in the code
and possibly
- a means of block-moving the loaded and patched code to a different location

For discovering location, I had this thought:
- Use a few bytes just below 0200, having first saved those values on the stack
- place a routine there, call it, and it will examine the stack and report back

For example, this code
Code:
PLA
TAX
PLA
TAY
PHA
TXA
PHA
RTS
would pick its own return address (-1) from the stack. Being 8 bytes, we'd place it at 01f8, and we'd need first to save those 8 bytes from the bottom of the stack, on the stack. Something like
Code:
LDX #7
LDA 01f8,x
PHA
DEX
BPL *-5

(Edit: we need first to be sure the stack is already at least 8 bytes deep! Maybe check S, or just subtract from S, or do a bunch of pushes.)

Then we'd need to store the code using immediate values
Code:
LDA #PLA
STA 01f8
STA 01fa
LDA #TAX
STA 01f9
LDA #TAY
STA 01fb
LDA #PHA
STA 01fc
STA 01fe
LDA #TXA
STA 01fd
LDA #RTS
STA 01ff

Now we can call the code, and then repair the stack.

(We will of course need some workspace: in Acorn land there are known areas of zero page available to applications. Possibly it's worth somehow parameterising which section of zero page we use. We can use the stack to save anything we might intend to trample on temporarily.)

How to patch up absolute addresses? One approach is to have a list of addresses to patch, or possibly delta-encode that list. Or possibly use a byte-stuffing tactic and have a special value to stand for each address MSB.

(If I've made any small mistakes, please let me know by PM and I'll correct them.)

But more interesting: have I missed some good way to tackle this, or not noticed some obstacle? Or prior art?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 12:36 pm 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 352
Interesting topic.

If you can exclude the possibility of NMI, then I think you can determine the load address of the code in a more efficient way that doesn't involve copying code to the stack.
Code:
start:
SEI
LDA #$60  ;; RTS
STA $80
JSR $0080 ;; start + $07 is pushed to the stack
TSX
SEC
LDA $00FF, X
SBC #$07
STA $80
LDA $0100,X
SBC #$00
STA $81
CLI

You should end up with the adress of .start in $80, $81

I've not tested this so the could be bugs, but you get the idea....

Dave


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 12:55 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Ah yes, that's quite nice.

BTW I realised we can stash 8 bytes from the bottom of the stack even if the stack has fewer bytes on it, if we're careful about ordering. Instead of
BigEd wrote:
Code:
LDX #7
LDA 01f8,x
PHA
DEX
BPL *-5


something like
Code:
LDX #F8
LDA 0100,x
PHA
INX
BNE *-5


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 1:51 pm 
Offline
User avatar

Joined: Thu May 14, 2015 9:20 pm
Posts: 155
Location: UK
The approach that Dave showed was what I was expecting to see.

Do the later CMOS 6502 versions offer any advantages in getting the return address off the stack?

Mark


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 3:19 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Well, at minimum you have access to PLX/Y and PHX/Y instead of having to transfer through A:
Code:
PLX
PLY
PHY
PHX
STX $FE
STY $FF
But an NMOS compatible method is to use the stack pointer directly as an index:
Code:
TSX
INX
LDA $100,X
STA $FE
INX
LDA $100,X
STA $FF
On the '816 you can directly do stack-relative addressing:
Code:
REP #$30    ; assume native mode, set 16-bit A,X,Y
LDA 1,S
STA $FE
But then again, on the '816 you also have PER, which directly allows you to determine your current execution address or any 16-bit offset from it - so you can directly obtain a pointer to the reloc table, and another to your load base. You can then do stack-indirect addressing to rewrite the relevant places, without having to move the pointer to Direct Page first. This on top of the fact that you're less reliant on absolute addresses in the first place.
Code:
 REP #$30   ; 16-bit everything
 PHB ; save current data bank
 PHK
 PLB ; data access to program bank
 PER relocTable
 PLX
 PER loadBase
loop:
 LDY $0000,X
 BEQ out
 LDA (1,S),Y
 CLC
 ADC 1,S
 STA (1,S),Y
 INX
 INX
 BRA loop
out:
 PLX ; restore stack
 PLB ; back to old data bank


Last edited by Chromatix on Thu Jul 11, 2019 3:35 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 3:23 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
The stack isn't the first place I would choose to store executable code. A bit of carnal knowledge of the system should be able to scare up eight contiguous bytes elsewhere, I would imagine. Michael J. Mahon (among others in Apple ][ land) just JSR to a known RTS in ROM then quickly examine the ghost of the return address, presumably before it gets overwritten by an IRQ or NMI.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 3:52 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
If we *were* to use the stack to store temporary executable code, this might be a safer way - no interrupt avoidance necessary:
Code:
TSX
LDA #RTS
PHA
LDY #$FF
PHY
LDA #STY_zp
PHY
PHA
DEY
LDA #STX_zp
PHY
PHA
LDA #PHX
PHA
LDA #PHY
PHA
LDA #PLY
PHA
LDA #PLX
PHA
PHX
TSX
INX
INX
STX $FE
LDA #1
STX $FF
LDA #JMP_abs
STX $FD
JSR $00FD
PLX
TXS ; restore stack
This does still require three bytes in ZP to get a jump to the correct stack offset, because the 'C02 doesn't have JSR (abs,X).


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 4:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Just found this from my records:
Quote:
The centerpiece of Apple Computer's ProDOS Assembler Tools is EDASM, a powerful, disk-based 65(c)02 macro assembler. One of its least used features is the ability to generate relocatable object modules using the REL pseudo-operation code. The feature is integrated with Applesoft Basic using the relocating loader tools, RBOOT and RLOAD.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 4:56 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Honestly, this seems like a great thing to integrate into your OS kernel, if it plans to allow loading user programs at arbitrary addresses. Since the OS itself knows where it loaded the code, much of these shenanigans can be avoided. Or it can be a subroutine called by the subject program at a documented entry point, also allowing relocation to be redone when overlays are loaded.

Bonus functionality: dynamic library linking.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 5:03 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Yes, it would be a good feature of a loader. But failing that, it's a preamble. Fitting in one page for bonus points! (I would personally also assume that the load address and eventual start address were page-aligned.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 6:17 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8540
Location: Southern California
Is the goal specifically to fix absolute addresses at load time, or just to make it so code is not stuck at the addresses determined at assembly time? Doing the latter and making it movable, even after it has been loaded is, as we all know, an inefficient proposition on the '02; but it can be done. Chapter 12 of my 6502 stacks treatise, on "Where-am-I?" routines, addresses this. About 3/4 of the way down the page, starting where it says, "However, there's something you can do to gain a small improvement in performance", it shows ways to cut some of the overhead by requiring movement to be in increments of 256 bytes so at least the low byte can be fixed at assembly time rather than having to be calculated at run time. The minimal price you pay to get this added performance is some memory waste with gaps between programs, averaging about 128 bytes. Having for example five relocatable programs in memory at once then (with four gaps) would, on the average, result in somewhere around two pages wasted, or less than 1% of the 6502's memory map space (although a somewhat higher percentage of available RAM).

The following chapter is about the 65816's instructions and capabilities relevant to stacks and relocatable code, and 65c02 code which partially synthesizes some of them. Again, it's inefficient on the '02, but it can be done.

On my workbench computer with onboard Forth, any code I load gets compiled (if Forth) or assembled (if assembly language) on the fly, from source code every time; so although there's nothing specifically geared toward making code relocatable, it turns out to be anyway. (I just can't move it after it's loaded.)

For an impressive 6502 OS that allows program relocation at the time of loading (but not after), see André Fachat's GeckOS scalable preëmptive multitasking/multithreading OS which has Unix-like features, dynamic memory management, relocatable file format, a standard library, internet support, virtual consoles, and remote login, and runs on a Commodore 64 and other 6502 platforms. Undefined address references are solved at load time. http://6502.org/users/andre/osa/index.html I have not taken the time to learn and understand it myself.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 11, 2019 6:42 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Ah, yes, thanks for the pointer to GeckOS.

I am imagining an assembly-time process which detects the bytes which will need to be changed at load time, instruments the executable and prepends the loader. (One idea I've seen is to assemble twice, to different target addresses, and compare the results, as a way to find the bytes which will need to be changed at load time.)

In Acorn land, Basic programs are always loaded at PAGE, the lower boundary of usable RAM, wherever that might be. So a loader which is, at least initially and functionally, a Basic program, would naturally land in a reasonably convenient location. This might be similar to the Apple II case.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 12, 2019 8:17 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Whether or not it's integrated into the OS, it strikes me that a loader that runs independently of the subject program may have an easier time of performing the relocation. If you have a filesystem - especially one that supports random access, but this is not strictly necessary - then you could structure program binaries so that they are not blindly loaded as a block into some area of RAM, but instead consist of a header containing a relocation table, followed by the code and data. A separate loader program would process this file with full advance knowledge of the load address.

Yet another approach, which lends itself well to modular programming techniques, is to "pre-link" the binaries so that they can be loaded (in any reasonable combination) at pre-chosen addresses, without needing relocation or symbol resolution at runtime. This may require generating extra copies of each binary with the modifications already performed, alongside the original which retains the data necessary to do it again.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 12, 2019 8:41 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Quote:
I am imagining an assembly-time process which detects the bytes which will need to be changed at load time, instruments the executable and prepends the loader. (One idea I've seen is to assemble twice, to different target addresses, and compare the results, as a way to find the bytes which will need to be changed at load time.)
Were you thinking of something like an ELF format file and ELF loader? ELF is a generic format that supports relocatable files.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 13, 2019 9:26 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
I wasn't thinking of a structured file, although that would be a solution, but it would need a loader. If writing an OS, one would write a loader, and hopefully give it useful capabilities! (The Amiga loader also springs to mind: it can relocate as it loads, which is needed in the Amiga world of multitasking without an MMU.)

But, I was thinking of an executable which can be loaded and run by a dimwitted loader, but which contains a set of things: an executable relocater, the binary, the information needed to patch the binary.

It might be that the best form for the initial or outermost section is a Basic stub: in Acorn, and probably Apple, and maybe Commodore, worlds, Basic will load a program into the lowest free space, wherever that might be. The Basic stub can call the payload which is appended to it, which is the set of things listed above. Or one could write the relocater in Basic, but that would probably be larger and certainly slower.

There's certainly room for two discussions on this: the prepended "loader" which is loaded by a dimwitted OS, which was my original train of thought; and the design of a full-featured loader which an OS might usefully contain.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: