6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Jun 27, 2024 9:28 am

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
 Post subject: Assembly optimizer
PostPosted: Thu Aug 15, 2019 2:35 am 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Last year I was wondering about assigning zero page usage automatically in another thread, and I have made some progress doing it with a Python script.

I've been using Macroassembler AS, since it can output the source after the first pass after all macros have been expanded, so my script doesn't have to do the expansion. I already have two macros called FUNC and LOCAL that provide a very small amount of abstraction without using the optimizer script. For example:
Code:
FUNC apple
   LOCAL foo
   STZ foo
   ...
ENDFUNC

FUNC banana
   LOCAL foo
   STA foo
   ...
ENDFUNC

FUNC main
   LOCAL foo
   LOCAL ptr,2
   JSR apple
   JSR banana
   ...
ENDFUNC

FUNC creates a label, assigns some temporary symbols, and begins a scope block but doesn't lay down any instructions. ENDFUNC closes the scope block and lays down RTS. Each occurence of LOCAL assigns the next zero page address to that symbol and keeps it local to the function. In the example above, apple and banana each need 1 byte for locals and main needs 3 for a total of 5 zero page bytes.

Turning on a flag in the source file makes the LOCAL macro generate symbols that tell the optimizer to assign variable addresses rather than having the macro assign them. After expansion, the file is read in by the optimizer script, which analyzes the flow of the program to see if any functions can share memory. In this example, the locals in apple and banana can be assigned the same zero page address since they never call each other. The local variable usage will fit into 4 bytes instead of 5.

Non-local labels are considered possible functions and any subsequent variables assigned with LOCAL are in scope until the next non-local label, the same way local labels work. The script looks for JSRs to determine which functions call other functions to generate the call graph. The FUNC macro can optionally take a list of attributes to describe the function like "FUNC main, begin" which tells the scipt to begin the call graph with main. I'd like to add other attributes like "interrupt" and "re-entrant" since the memory for those needs to be handled differently. It should also be possible to detect re-entrant functions in the call graph and either generate an error if they aren't marked re-entrant or mark them automatically and handle them differently. Because it only looks for JSRs and not RTSs, it isn't a problem to embed a string after a JSR and use the return address to point to the data then return to the calling function with a calculated jump. Functions that can be reached with JMP (indirect) will also need to be labeled if they use LOCAL but should work too. It's not possible to tell at compile time what functions the indirect JMP might target to make sure they have all been labeled correctly, but I could output information about them and load that into my emulator to help debug.

My plan is to add as few abstractions as possible so that the project stays closer to a small set of macros than a programming language. FUNC and LOCAL are both optional in the optimizer script. You can use them to manage a chunk of zero page memory and allocate the rest how you're used to doing it. Because I have everything set up to read in, tokenize, and analyze assembly, I would like to implement a few other optimizations eventually too.

In this test example I have 38 bytes worth of local variables that fit into only 19 bytes of zero page:


Attachments:
Call graph.png
Call graph.png [ 28.52 KiB | Viewed 2166 times ]
Locals.png
Locals.png [ 48.41 KiB | Viewed 2166 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Sun Dec 08, 2019 5:43 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
The code is now up on GitHub: https://github.com/JoeyShepard/65C02_Assembly_Optimizer


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Mon Dec 09, 2019 2:41 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Druzyek wrote:
The code is now up on GitHub....

Thanks for this! Too much stuff gets done and lost because it never got posted, so even if this never proceeds further it's still a benefit to the world.

I'm not at a stage yet where I can try this out, but it does seem to address some issues that I've been thinking about lately.

Now the following is a bit of a vague, arm-wavy idea, but I wonder if it would be useful to split things into generating and checking assignments, with the generation producing a separate assembly file that would be included at build time. The idea is that to give you several options for builds of your software:
  • No optimization (all ZP assignments separate), which might be good for e.g. a debugging version.
  • Initial optimization, where the ZP assignments are all done for you, in an optimal way.
  • Checks of changes to an existing program where you want to keep at least a subset the existing ZP assignments for backwards compatibility.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Mon Dec 09, 2019 3:59 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8460
Location: Southern California
Another way to approach the problem is to use a ZP data stack, using X as the stack pointer. Although you have the extra cycle of overhead for indexing, you reap the benefits given in the 4th, 5th, and 6th bullet points near the top of the 6502 stacks treatise's "Virtual stacks" page. More description picks up about half way down the page. Later pages in the treatise further develop the matters of stack addressing, passing parameters, local variables and environments, recursion, and other things.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Mon Dec 09, 2019 4:10 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
The idea is that to give you several options for builds of your software
I think this is a good idea. The goal is to impose as little dependency on the script and macros as possible so you can turn them completely off and still run your program. The catch is that when zero page usage is way more than the existing memory, you have to rely on memory optimizing to even get it to run.

Quote:
Another way to approach the problem is to use a ZP data stack, using X as the stack pointer.
The shortcomings of this approach are why I started this project. I personally don't like the slowdown and especially don't like losing the use of the X register.

From your page:
Point 4:
Quote:
The resulting easier stack operations reduce the number of variables needed along with the risk of overwriting data that another pending routine expects to find still intact.
You can accomplish this with the script as well but it wastes less cycles.

Point 5:
Quote:
It improves nestability and re-entrancy of routines.
Good point. I would have the script either generate a virtual stack as you describe or save zero page to another stack to free it up while the recursive or re-entrant code was running.


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Mon Dec 09, 2019 8:38 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8460
Location: Southern California
Druzyek wrote:
Quote:
Another way to approach the problem is to use a ZP data stack, using X as the stack pointer.
The shortcomings of this approach are why I started this project. I personally don't like the slowdown and especially don't like losing the use of the X register.

I find that Forth seldom uses X for anything else; so saving and restoring it is something you don't have to do very often. This would be true also if you do things in a Forth-like way, even if you're not actually running Forth. You'll want to leave all methods available though, rather than being tied strictly to one; so you'd have to consider how often you go from one to another. As the page says, no one method will be best for all situations, nor does the use of one method preclude the use of all others.

Quote:
From your page:
Point 4:
Quote:
The resulting easier stack operations reduce the number of variables needed along with the risk of overwriting data that another pending routine expects to find still intact.
You can accomplish this with the script as well but it wastes less cycles.

As I was reading earlier posts here yesterday, one thing that came to mind is that besides looking all the way down the chain of nested subroutines for local variable uses, the optimizer will need to consider what happens when a routine's address is picked from a table at run time, by conditions not foreseen at assembly time. There may also be ISRs installed on the fly, at run time.

Quote:
Point 5:
Quote:
It improves nestability and re-entrancy of routines.
Good point. I would have the script either generate a virtual stack as you describe or save zero page to another stack to free it up while the recursive or re-entrant code was running.

Fortunately you don't have to copy the entire page out. Copying whatever material you do need to copy however gets back to the undesired slowdown, requiring an evaluation of how it compares to the overhead of indexing.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Tue Dec 10, 2019 6:30 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Druzyek wrote:
The shortcomings of this [the ZP stack] approach are why I started this project. I personally don't like the slowdown and especially don't like losing the use of the X register.

I don't find I'm using the X register that much (in some routines it actually just sits around holding 0 for the odd load and 0-offset index operation here and there, which is still nice to have), but what kills me is that most of my zero-page variables so far are pointers into the heap where I need to index them with [ptr],y. So X free or not it doesn't matter: because there's no [ptr,x],y indexing mode I have to hardcode to the ZP addresses used for those pointers.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Tue Dec 10, 2019 2:25 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3366
Location: Ontario, Canada
cjs wrote:
because there's no [ptr,x],y indexing mode I have to hardcode to the ZP addresses used for those pointers.

Just a reminder... [ptr,x],y address mode can easily be provided by Self-Modifying Code. Garth shows a similar example halfway down this page of his article on SMC. He says, "Here's an LDA table,X,Y equivalent, ie, double-indexed."

The address mode you're asking for includes indirection, but is just as valid if not more so. The revised code snippet in your case will be...
Code:
        STX  label+1
label:  LDA  (placeholder_value),Y

SMC entails certain caveats, but the overall tradeoff can be overwhelmingly positive. :shock:

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Tue Dec 10, 2019 2:59 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Dr Jefyll wrote:
Just a reminder... [ptr,x],y address mode can easily be provided by Self-Modifying Code....
Ah, of course I knew that, but thanks for the reminder, particularly the snippet showing how simple the double-indexing can be. I can see myself using that, actually.

The ROM caveat is usually my main concern, since generally envision putting my code in ROM on those machines that have it. But I will probably one day end up building the framework I need to deal with that, for fun if for nothing else. The debugging difficulty isn't really much of a deal for me since I rarely debug code. :-P (Well, I guess fixing unit tests could be considered "debugging.")

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Tue Dec 10, 2019 4:00 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3366
Location: Ontario, Canada
cjs wrote:
The ROM caveat is usually my main concern, since generally envision putting my code in ROM on those machines that have it.
Anything you build nowadays probably has RAM that's much faster than ROM; hence the ROM will be wait-stated or clock-stretched. So, fast RAM and SMC are both incentives to copy your code from ROM to RAM upon bootup.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Tue Dec 10, 2019 4:23 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Dr Jefyll wrote:
Anything you build nowadays probably has RAM that's much faster than ROM....So, fast RAM and SMC are both incentives to copy your code from ROM to RAM upon bootup.
Sure, but my targets are 70s-era computers where not only is the ROM just as fast as the RAM, but you'd also run out of RAM about half-way through copying the ROM into it.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
 Post subject: Re: Assembly optimizer
PostPosted: Wed Dec 11, 2019 10:06 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10835
Location: England
It was pretty common back in the day for the ROM to copy a short routine into RAM when that was needed. The bottom of page 1 is one favourite place.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bergrobb and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: