Compressing during the compilation process

Bregalad · Post by **Bregalad** » Thu Mar 20, 2014 3:38 pm

I did a few tests, and it turns out 6502 executable code's compressibility is very low anyway. So if I run out of space I'll probably have to use a SWEET16 like scripting if I run out of space. The problem will then be to convert all the code to be compatible with CA65 since it's the major modern assembler that supports this... oh well I hope I won't need it after all. I still have 17% of free ROM space.

As for the ROM yes I can have bank switching, but I'd like to avoid this whenever possible, because it's challenging to fit everything in 32k.

GARTHWILSON · Post by **GARTHWILSON** » Thu Mar 20, 2014 8:24 pm

Bregalad wrote:

I did a few tests, and it turns out 6502 executable code's compressibility is very low anyway. So if I run out of space I'll probably have to use a SWEET16 like scripting if I run out of space. The problem will then be to convert all the code to be compatible with CA65 since it's the major modern assembler that supports this... oh well I hope I won't need it after all. I still have 17% of free ROM space. [...] it's challenging to fit everything in 32k.

I haven't used SWEET16 myself, but it looks impressive in terms of the density of the resulting code and also the small size of the kernel (or whatever it's called in this case). The down sides are the speed and also the fact that there are many kinds of functions it doesn't have at all.

If you're game for SWEET16, it sounds like A., it's not critical that your application use assembly throughout for maximum performance, and B., you're game for re-writing it in a different language. A Forth kernel takes several KB minimum, but at some point it definitely pays for itself in the overall size of the code, and everything past that (ie, as the application grows) is a further savings of ROM space compared to having done it in assembly. I have an old note here saying an assembly program of 40KB might be replaced with a Forth program half that size or better, including the Forth kernel. I don't remember where that came from, but I suspect it's referring to the case where Forth headers are left in, and compiling headerless code would cut another 25-30% off the length. If the end application doesn't need to compile or command-line-interpret, it doesn't need to be able to look up the individual words, so their headers (name fields and link fields) can be left out, and of course you can leave out all the words that were only there only for compiling or for interpreting a command line. I haven't done any complete applications in both assembly and Forth just to compare apples to apples in overall size to confirm or adjust the size ratio claim. Forth Inc. claims a minimum compactness improvement of 4:1 over C, and that program development is typically close to 10 times as fast. As for performance, Forth still lets you do speed-critical parts in assembly— and as they say, a little assembly goes a long way. Any Forth word or runtime can be written in high-level to quickly test the concept, then re-written in assembly without having to modify any of the words that use it. The FORTH words INLINE and END-INLINE even let you break into assembly for a portion in the middle of a high-level definition.

White Flame · Post by **White Flame** » Fri Mar 21, 2014 4:25 am

In my opinion: Measuring "compressability" of code in terms of using regular byte compression schemes is not a good measuring stick. The compressability of executable code, in my mind, has to do with finding commonly entwined operations and creating macro-operations that can be expressed in smaller forms, by using subroutines, defining data structures which the same small code block can work over, or a byte code architecture which contains your macro operations. These reduce the amount of code that you need to state to accomplish the same behavior. For instance, if you have a lot of 16-bit operations, then Sweet16/Forth/AcheronVM should get you far greater size gains over data-compressed 6502 code.

But again, a very major problem you have in the data compression view is that you need to also have room for decompressed copies of the code to execute in at runtime. That footprint needs to be taken into account, and likely overwhelms any benefits you might have gained from the compression. All the other solutions generally work in-place.

Bregalad · Post by **Bregalad** » Fri Mar 21, 2014 2:04 pm

For an interesting record, here is the visualisation of the routine's size (the largest one, at the left, is the main program).

We observe that by reducing the size of only a few parts (the main program and other routines close to it "high level in the call stack") we could reduce code size dratiscally. This is also, for the most, part the less time critical routines.

Because they do a lot of boring stuff like initializing variables and call functions (sometimes with more than 3 arguments so they have to be copied to zero-page), they tend to eat a lot of space.

Quote:

A Forth kernel takes several KB minimum [...]

You say this as if it was obvious that everyone knowns perfectly what Forth is, but I basically knew nothing except the name, which I was under the impression it was a old high level programming language.
Anyways, the game engine's code (which, as I already said, is the thing that eats the morst space) takes 10kb, so a kernel that "takes several KB at minimum" is out of the question.

Sounds like SWEET-16 could be improved a little to be more intermixable with 6502 : Namely it should be possible to call SWEET-16 from 6502 and 6502 from SWEET-16 easily, and the registers should be directly "converted" instead of being saved. (i.e. A = SWEET16's R0, X = SWEET16's R1 and Y = SWEET16's R2, they are "converted" whenever a mode switch happens).

I still don't know enough to judge how SWEET16 would be useful in my case though.

Quote:

But again, a very major problem you have in the data compression view is that you need to also have room for decompressed copies of the code to execute in at runtime. That footprint needs to be taken into account, and likely overwhelms any benefits you might have gained from the compression. All the other solutions generally work in-place.

Actually I have a lot of unused RAM space, so this is not much of a problem.

Quote:

For instance, if you have a lot of 16-bit operations, then Sweet16/Forth/AcheronVM should get you far greater size gains over data-compressed 6502 code.

The problem is, I don't have quite a lot of 16-bit operations. Just a few of them here and there.

BitWise · Post by **BitWise** » Fri Mar 21, 2014 3:01 pm

Bregalad wrote:

Quote:

A Forth kernel takes several KB minimum [...]

You say this as if it was obvious that everyone knowns perfectly what Forth is, but I basically knew nothing except the name, which I was under the impression it was a old high level programming language.
Anyways, the game engine's code (which, as I already said, is the thing that eats the most space) takes 10kb, so a kernel that "takes several KB at minimum" is out of the question.

FORTH like SWEET-16 gives you a virtualized CPU so the code that uses it will often be much smaller than the native code version. Your 10K of 6502 probably contains lots of repeated code sequences which would be smaller as compiled FORTH (e.g. a FORTH word call is 2-bytes compared to a 3-byte JSR, a 16-bit add is 2-bytes compared to 13-bytes in 6502 [CLC/LDA/ADC/STA/LDA/ADC/STA], a 16-bit literal is 4 bytes compared to 8 in 6502 [LDA#/STA/LDA#/STA]). So even when the kernel is added in the total may still be less than 10K.

BitWise · Post by **BitWise** » Fri Mar 21, 2014 3:03 pm

What is the target platform for the game?

Bregalad · Post by **Bregalad** » Fri Mar 21, 2014 5:27 pm

So, Forth looks like the Java of the 70s : A language + a bytecode, and multiple ports of the virtual machine for each plaform, right ?

Anyways SWEET16 looks better (in this case) as it's 300 bytes interpreter and only 32 bytes of memory required (though they have to be Z-PAGE, but that isn't much of a problem).

I target the NES.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Mar 21, 2014 6:24 pm

Bregalad wrote:

So, Forth looks like the Java of the 70s : A language + a bytecode, and multiple ports of the virtual machine for each plaform, right ?

Sort of, except that Forth is the Teddy Roosevelt of operating environments: walks softly (small kernel footprint) and carries a big stick (a lot of performance and flexibility). In my opinion, Java is a classic form of bloatware.

Quote:

Anyways SWEET16 looks better (in this case) as it's 300 bytes interpreter and only 32 bytes of memory required (though they have to be Z-PAGE, but that isn't much of a problem).

Of course, you can always get all of the benefits of Sweet16, and then some, with a 65C816—without having to use up valuable direct page space in the process.

GARTHWILSON · Post by **GARTHWILSON** » Fri Mar 21, 2014 7:16 pm

We have a good, not-too-long topic "What is Forth?" at viewtopic.php?f=9&t=777 with a link to Leo Brodie's book "Starting Forth" which has been kind of the standard, now available to read for free, on Forth, Inc.'s site, with updates for ANS Forth.

BigEd · Post by **BigEd** » Fri Mar 21, 2014 8:13 pm

Maybe that thread should be a sticky Garth?

GARTHWILSON · Post by **GARTHWILSON** » Fri Mar 21, 2014 9:25 pm

Bregalad wrote:

Quote:

For instance, if you have a lot of 16-bit operations, then Sweet16/Forth/AcheronVM should get you far greater size gains over data-compressed 6502 code.

The problem is, I don't have quite a lot of 16-bit operations. Just a few of them here and there.

When you count the address operations, like for indexing into arrays and tables and data structures of more than 256 bytes, you might have more than initially meet the eye; and having a powerful, efficient way to define and use these (like Forth affords) might further open up possibilities for encapsulation and abstraction which might help you do more with less code, and develop faster and keep better control of the project at the same time. I agree however that for a lot of small, embdeed applications, most operations may be 8-bit. I have no idea about games though.

Once an application is well on its way and 90% developed like it sounds like this one is, it might be a bit unrealistic to expect you to start over in another language, especially one that's totally new to you. It's probably much more realistic to just present the idea for future projects, and you can approach learning it at whatever pace you choose.

Quote:

Maybe that thread should be a sticky Garth?

Good idea. I think we'll do that.

Bregalad · Post by **Bregalad** » Fri Mar 21, 2014 10:02 pm

Quote:

Sort of, except that Forth is the Teddy Roosevelt of operating environments: walks softly (small kernel footprint) and carries a big stick (a lot of performance and flexibility). In my opinion, Java is a classic form of bloatware.

I agree, I used to use Java but now I sweared to never use it again for new development, and I wasn't insinuating Forth was bloatware. Just the idea of a language and interpreatable bytecode tied to it.

And yes I saw that thread but didn't mention it because you guys sound pretty religious about Forth... Honestly it doesn't look like a very appealing language to me. I don't like to think in terms of stacks, because this is very unnatural for me. Usually when people are claiming a language is the best in the world and that it should be used to program every thing thing in the world TM, it turns on my scepticism.

Quote:

Once an application is well on its way and 90% developed like it sounds like this one is, it might be a bit unrealistic to expect you to start over in another language, especially one that's totally new to you. It's probably much more realistic to just present the idea for future projects, and you can approach learning it at whatever pace you choose.

Yeah I don't intend to rewrite everything in another language.

GARTHWILSON · Post by **GARTHWILSON** » Fri Mar 21, 2014 10:27 pm

Bregalad wrote:

And yes I saw that thread but didn't mention it because you guys sound pretty religious about Forth... Honestly it doesn't look like a very appealing language to me. I don't like to think in terms of stacks, because this is very unnatural for me. Usually when people are claiming a language is the best in the world and that it should be used to program every thing thing in the world TM, it turns on my scepticism.

I was pretty sceptical myself at first, but it made a convert out of me, and now I don't want to use an algebraic language ever again. I brought a couple of engineers into it too, kind of kicking and scraming inside, but then they totally fell in love with it. You could write algebraic parsers in Forth, but there's no reason to when stacks make it so much more efficient. I'm slowly working on a treatise on stacks to post on my website, and I anticipate 15 or more chapters even though it's mostly about assembly-language usage. There are tons of tricks to be pulled with stacks. Edit, after it was done: It's at http://wilsonminesco.com/stacks/, with 19 sections plus appendices, starting with the definition and gradually reaching a stage a little bit past intermediate use, mostly ignoring the loftier subjects like multi-user, multithreading, and multitasking systems (which the 6502 is not very well suited for anyway). The 19 sections / pages / chapters are:

definition and very basics
subroutine return addresses and nesting
interrupts (plus link to interrupts primer)
virtual stacks and various ways to implement them
stack addressing, both hardware and virtual
passing parameters, and comparison of methods
having a subroutine find inlined data, using the return address
doing math and other operations by stacks in RPN
RPN efficiency
65c02's added instructions that are useful in stacks
using RTS, RTI, and JSR to synthesize other instructions
where-am-I routines, for self-relocatable code
a peek at the 65816's new instructions and capabilities that are relevant to stacks, and 65c02 code which partially synthesizes some of them
local variables and environments
recursion
enough stack space?
compiling or assembling program structures
stack potpourri
for further reading

White Flame · Post by **White Flame** » Sat Mar 22, 2014 4:16 am

Forth has a fair number of commonalities with Lisp. It's in some respects both a high-level and low-level language at the same time. There's reasonable code generating facilities in Forth (whereas Lisp's is much easier to use), ending up in lots of very precisely targeted roll-your-own environments that are built up. It is relatively daunting at first, especially when it comes to heap-style memory management. Because it's such a different environment, it's difficult to see the usages & benefits until you get a few "Ah hah!" moments. Then things start to fall into place. (But Lisp still reigns supreme

)

teamtempest · Post by **teamtempest** » Sat Mar 22, 2014 4:24 am

Quote:

Because they do a lot of boring stuff like initializing variables and call functions (sometimes with more than 3 arguments so they have to be copied to zero-page), they tend to eat a lot of space.

Uh huh. Lots of repeated operations, hmm? Same code, different data? Perhaps it might be worthwhile to step back a bit and ask yourself if there's a way to generalize what you're doing to the point of "one code for all data".

F'rinstance, why does every routine with more than three arguments have to copy data to zero page? How about making the callee, not the caller, do the copying? Speed's not an issue at this point, I understand.

Here's an outline of what I mean (I didn't invent this, BTW):

Code: Select all

    [...]
    jsr myFuncWith3Args ; or whatever
    dw arg1
    dw arg2
    dw arg3
    jsr myFuncWith4Args ; or whatever
    dw arg1
    dw arg2
    dw arg3
    dw arg4
    [...]

myFuncWith3Args
    pla            ; pull return address
    sta ptr        ; use it as a pointer
    pla
    sta ptr+1
    ldy #3*2      ; or whatever the total size in bytes is
:   lda (ptr),y   ; move args to zero page
    dey
    sta zarg1,y
    bne :-
    clc
    lda ptr         ; update return address to skip args
    adc #3*2
    tay
    lda ptr+1
    adc #0
    pha
    tya
    pha
    [...]       ; the actual work, whatever it is
    rts

; and so on

If you really wanted to get tricky you could write just one "copy-and-call" function along these lines. That version would have one extra argument: the address of the routine being called. That would be pushed on the stack after the updated return address is, then the "copy-and-call" function would execute RTS to get to it (or just JMP (function address) using whatever location the real function address was moved to). The calls to "myFunctionxxx" would all be replaced by "jsr myFunctionDispatcher".

Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process

Re: Compressing during the compilation process