6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 01, 2024 3:33 am

All times are UTC




Post new topic Reply to topic  [ 79 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Wed Mar 19, 2014 7:04 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Is there any way to compress 6502 executable code while compiling it ?

like this :

Step 1) Assemble the part of the code that is to be compressed
Step 2) Compress it
Step 3) Assemble the whole code, including the previously generated compressed parts

The problem is for labels. For any RAM-based variable, it's simple enough as it's possible to hard-core it's address somewhere instead of having the assembler determine it dynamically.

The problem is for ROM labels. Any use of a lookup table, a jsr or a jmp in the compressed code should use a label which is unknown (because it exists only on step 3) and that is a problem.


Last edited by Bregalad on Wed Mar 19, 2014 8:29 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 7:42 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8534
Location: Southern California
Are you talking about optimization or something else? What language?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 8:43 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Normally you would either:

- compile/assemble the whole program in one go so that forward references can be resolved in the second (or later) pass. This is the approach taken by absolute assemblers.

- compile/assemble the program in a number of sections using an output format that indicates where forward references are in the generated code that will need patching during a final linking phase. This is the approach taken by relocatable assemblers and high level compilers languages like languages C.

You can build multi-module programs with absolute assemblers by standardising the key subroutine addresses to fixed locations, for example by using jump tables.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 10:44 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Sorry for not being clear enough, I'm not talking about optimisation or anything, and the language is 6502 assembly of course.

I am limited to 32kb of ROM, I'm soon running out of space, and after doing profiling, it turns out unfortunately the code is what takes (by far) the most ROM space, so if I have to compress something that'd have to be the code itself (which is, unfortunately, also the hardest thing to compress, (before white noise haha) ).

The difficulty doesn't come from the compressability of the data itself (common opcodes are more frequent than others so this can be exploited), but from the fact you want to pack in a mix of compressed and uncompressed code together in the same ROM where each one can reference the other.

Quote:
- compile/assemble the whole program in one go so that forward references can be resolved in the second (or later) pass. This is the approach taken by absolute assemblers.

Could you please explain more how you do this ? Unless your assembler is specifically made for this kind of applications (compressing while compiling) I don't think it's possible to do this in one pass.

Quote:
compile/assemble the program in a number of sections using an output format that indicates where forward references are in the generated code that will need patching during a final linking phase. This is the approach taken by relocatable assemblers and high level compilers languages like languages C.

This sounds like what WLA-DX (the assembler I'm using) is doing. Yes, I know, it's not a very good one, however I made the (bad) decision of using it when I started my project long ago, so I'm pretty much stuck to it unless I spend hours converting assembly code from an assembler to another (I could do it if I have to).

So I'd have to compile the compressed code into a .o object file, then adapt my copressor program so it converts this .o into another compressed .o which still have the references, and finally compile the remaining (normal, not compressed) code into another .o and link both .o together ?

Sounds good enough, but I'd have to find how .o files works internally (as well as adapt my compressor to work with those files). Sounds like a great challenge.

PS :

After some thoughts, working with .o files would not fix the fundamental problem in any way. The problem is cross-referencing, and there is (likely) no true solution for the following reasons :

1) When the main code references anything within the compressed code, it either refers to something different because it's either scrambled (compressed) data in ROM, or a location in RAM after decompression. Fortunately the only thing that really needs to be referenced is the start address of the routines (where it's jsr-ed) so this is solved by having some kind of compression management routine, that will either decompress + call the routine in RAM, or call it if it's already there. If any other kind of reference is needed, then the code practically can't be compressed

2) When the compressed code references anything within the main itself, things are all right as soon as it's a relative reference (such as the branches instructions) but will go horribly wrong if an absolute jmp or jsr is present. This can be solved if (and only if) we know the absolute address where the code is going to be decompressed in RAM, and we can make the jmp and jsr directly point in this RAM area.

3) When the compressed code makes references to the main (not compressed) code or lookup tables, things are going horribly wrong. This is a snake eating it's tail kind of problem, as we need the code to be compressed to be known and fully defined, so the arguments of instructions with absolute referneces (jmp, jsr, lda or whathever) should be known, but at the same time we should compile the main (not compressed) code after the compressed one, which is a contradiction.

So 1) and 2) are manageable although it's some additional work, but 3) is a real problem, and even with all tools in the world I can't think there is a practical solution. If anyone has a bright idea here I'd be infinitely grateful.


Last edited by Bregalad on Wed Mar 19, 2014 11:08 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 11:00 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 284
Have you considered using sweet-16 for parts of your code? I think Woz once stated that he would be able to get a 20% reduction in code on his Integer Basic with little speed degradation by selectively using Sweet-16...

Ref: http://www.6502.org/source/interpreters/sweet16.htm


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 11:27 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Wow I didn't know about this ! Sounds extremely interesting, and pure genius, to interpret a more compact form of assembly.

Unfortunately, I'd have to completely rewrite enormous parts of my engine from 6502 to SWEET16 and that'd be very time consuming, without knowing in advance how many bytes I'll save, considering only the non-time critical parts should be translated. I of course use some 16-bit arithmetics here and there, but by far not all the time.

With the compression at least I don't need a complete code rewrite.

(also I edited my above post while you were posting this)


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 11:44 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I'm not sure that decompressing code on the fly is that feasible on a 6502. Decompressing is (relatively) slow and the best compression schemes need lots of memory for decoding tables.

When I used to write 16K ROM applications for the BBC I frequently ran out of space and had to stop and analyse the code so see if I could do things more compactly. I'd try refactoring common code sequences, using compact looping alogrithms instead of 'unrolled' code, ensuring frequently accessed variables are on zero page, using SWEET-16 or compiled FORTH style execution for some primitives and looking for ways to compress data and text strings before trying to get too clever with compression.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 12:09 pm 
Offline
User avatar

Joined: Fri Oct 31, 2003 10:00 pm
Posts: 200
Bregalad wrote:
Is there any way to compress 6502 executable code while compiling it ?

like this :

Step 1) Compile the part of the code that is to be compressed
Step 2) Compress it
Step 3) Compile the whole code, including the previously generated compressed parts

The problem is for labels. For any RAM-based variable, it's simple enough as it's possible to hard-core it's address somewhere instead of having the assembler determine it dynamically.

The problem is for ROM labels. Any use of a lookup table, a jsr or a jmp in the compressed code should use a label which is unknown (because it exists only on step 3) and that is a problem.


Compression is used often in games, to make it all it fit in a ROM. Mostly for data (graphics, text, tables) but code is also possible. The compression is mostly done on a PC. 6502 decompression routines, unpacking it to RAM, can be small and efficient.

Better search with google for C64 game developers methods for example.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 1:07 pm 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Quote:
I'm not sure that decompressing code on the fly is that feasible on a 6502. Decompressing is (relatively) slow and the best compression schemes need lots of memory for decoding tables.

Yes, I didn't mention it but it is perfectly feasible, I made my own tool that accepts both binary and assembly style data, can split data in arbitrary sections of any lenght with labels, and output assembly data with the same sections/labels. It tries quite a few algorithms and gives the compression ratio for each one of them, and all of them except one only requires RAM for only a few variables and the decompression buffer. Some of them could be slow of course.

I already used this to compress my levels.

I also did optimize all my game engine to be as tight as possible, but the engine is still the bottleneck when it comes to ROM usage.

Now I realize that because of this :

Quote:
2) When the compressed code references anything within the main itself, things are all right as soon as it's a relative reference (such as the branches instructions) but will go horribly wrong if an absolute jmp or jsr is present. This can be solved if (and only if) we know the absolute address where the code is going to be decompressed in RAM, and we can make the jmp and jsr directly point in this RAM area.

It basically means there can't be more than one "slot" for a decompressed routine in RAM. If there's more than one "slot" it means that the decompression routine would have to detect jsr and jmp opcodes and re-locate their destination in function of the slot address, something not only boring but dangerous to do (if $4c or $20 are used as an argument to another instruction for instance).

So only one routine will be decompressed at a time, which means a compressed routine cannot normally call another compressed routine from a different area (this would be technically possible but especially annoying to implement and slow at runtime).
Then, only leaf routine and almost leaf routines (routines which only calls a small routine that is not called form anywhere else) can be compressed, if that's not the case then there is a risk of accidentally calling another compressed routine and that'd not work.
This coincides to the fact that even if we wanted to call or jump to something outside of it, we couldn't reference it (unless we use an explicit jump table).

So I come to the following conclusions :

1) I can compress only leaf and almost-leaf routines (almost-leaf = only calls a local routine)
2) There is no problem with cross-referencing as those don't reference anything. If they HAVE TO reference something, a jump table should be used (then, I should be 100% sure to not call another compressed routine)
4) Only a routine, it's local subroutine and it's local lookup tables can be compressed in the same "pack", in order to make it accessible
5) The good news is that most routines are leaf routines, which means a lot of stuff can potentially be compressed
6) The bad news is that leaf routines are also the most often called, so it means a lot of CPU time wasted while decompressing
7) All compressed routines should be assembled separatedly, with .ORG at the start of the RAM buffer where the decompressed code will lie

It sounds like a good challenge, but I don't know how well it'll compress. I think I'll wait to be SURE I'm running out of space before implementing all this mess.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 1:45 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 408
Location: Minnesota
Certainly you can compress executable code, though data compression is more common. It's probably a better idea, though - in the sense of faster and easier - to just let an assembler do its normal thing on each routine you want to compress and use a separate program to do the actual compression of the binary (and then some sort of batch or make file to pull everything together to create a ROM image).

But first you might experiment with one or two of the routines you propose to compress to see just how effective that might be in terms of size versus how much code it takes to do the de-compression, and how long it takes to do it. With the idea of trying to answer the question "Is the trade-off worth it?" IIRC, a Huffman decompressor is reasonably fast and not too large in 6502 (assuming only one common decompression table created by scanning all the routines being compressed).

The idea of using Sweet-16 or Forth or a byte-code interpreter (eg., the way UCSD Pascal was implemented) isn't a bad one. After a point, all of these tend to produce smaller final programs than pure 6502 can.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 3:08 pm 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
OK, I just did a detailed profiling of all my game (phew...), and turns out that compressing leaf routines is not a very good idea. Most of those are extremely small anyway, so it would lead nowhere. My estimation is, for a shrink rate of 35% (the programs are shrank by 35% - that is, they take 65% of their original size) I'd save about 1.5k of data, 2k if being extremely optimistic (that's a code size reduction of about than 10%).

Most really big routines are the main program and those close to it, and they are the bottleneck. I should rather compress these (either using one of the technique I mentioned) or rewrite them using Sweet-16 or any other kind of byte code interpreter.

Is sweet-16 open and free for use ?

Quote:
It's probably a better idea, though - in the sense of faster and easier - to just let an assembler do its normal thing on each routine you want to compress and use a separate program to do the actual compression of the binary (and then some sort of batch or make file to pull everything together to create a ROM image).

Mmh, now that you mention it, it sounds so simple.
1) Compile an intermediate binary file with extra data in a dummy sector (for example $6000-$7fff when ROM normally lies at $8000-$ffff)
2) Compress the data in the dummy sector with your own program
3) Remove this sector entierely from the binary file and insert compressed data where it should go

The problem is making a program that can locate this "dummy sector", and that can locate free space to insert data back afterwards, and that can fix all the references to it from the main code.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 19, 2014 10:49 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Since this is a game, I think you need to look at this in terms of "overlays", if possible. Hopefully, not all of your code needs to be accessable at a time, thus you can "swap in" blocks of code to RAM out of your compressed ROM representation for different coarse phases of your game state.

The problem with compressed machine code is that all your currently executing decompressed code and all of the compressed code coexist in your memory space, which can be a very large load, compared to bytecodes & an interpreter.

The source code for Sweet 16 has been publicly published by Woz. I don't know if there has been a direct legal statement saying it's unambiguously free to use, but I believe it's considered so by Woz and Sw16 users. The nice thing about Sweet 16 is how small its interpreter is, just a bit over 1 page. It's not the fastest, and it doesn't have as many operations as other assembly or bytecoded languages (no bitwise ops, mul/div, etc). It's also made to be embeddable, unlike many Forths which assume OS-like control over your whole memory space.

I built another embeddable VM that was originally based off Sweet16 called Acheron VM which does a lot more, is faster, and supports adding custom instructions pretty easily, but the interpreter itself takes over 1K. It's also heavily tied with the ca65 development environment at the moment.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 20, 2014 2:25 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Acheron VM looks pretty cool, White Flame! I don't have time to check it out as thoroughly as I'd like right now (my plate is already overflowing), but I've bookmarked it for future reference.

Mike

P.S. Is the 'ch' pronounced in the Italian, Spanish, or French way?


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 20, 2014 6:52 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
I haven't touched it in a while, but there is a new version that discards the "4bit op + 4bit reg" encoding, which makes things more uniform and faster, at a slight expense to code size overall.

Plus, the description is a bit hard to read, seeing it fresh again. Basically, the sliding register window is like stack frames. Consider no "CPU registers" at all, just directly working with local regs in the stack frames.

The word is Greek-derived, so I believe it's like the German "ch", not knowing whichever of the 3 you listed that happens to be closest to. It's the name of the river bordering hell in Dante's Inferno. Prior versions of my VM ideas were completely impossible for humans to write code for. This VM brings those ideas more to the cusp of human usability. ;)

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 20, 2014 7:00 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Bregalad wrote:
I am limited to 32kb of ROM, I'm soon running out of space, and after doing profiling, it turns out unfortunately the code is what takes (by far) the most ROM space, so if I have to compress something that'd have to be the code itself (which is, unfortunately, also the hardest thing to compress, (before white noise haha) ).


Can you use a bank-switched ROM ?

It sounds like you are maxing out the system's capabilities. Have you considered using an upgraded system ? (eg 65816)
It sounds like a lot of work to come up with a custom compression/decompression routine and address linking. Even if the compression works this time, the app may run out of room again in the future.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 79 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: