This won't work, I'm afraid. I've reverse engineered _a lot_ of programs and based on that experience I can only say that such a task cannot be done for several reasons.
First of all, if you take a look at the examples on that website, you soon see the great disadvantage of this approach: it does not add any semantics. It just exchanges one obscurity for another. Looking at the C code you still don't know what it means, even worse, what might have been readable for an experienced 6502 programmer turns into some cryptic gibberish. You do not gain meaning, you lose it. So why do it?
The very idea of a decompiler is based on a (old) misconception how language works, how syntax and semantics relate to each other. While this, sorry to say so, is typical of a computer scientiest, a linguist will only shake his head over this rather naive idea. While there is Turing to help us emulating a program code there is also Goedel who tells us there are unavoidable pitfalls in the way referencing is done. Try to translate the following sentence into French: 'This sentence will have a different meaning when translated into French.' And now translate the following expression into C:
Code: Select all
lda #$60
sta label
label: jmp label
Want some real examples? Here we go...
A lot of programs that deal with 3d use look-up tables for fast multiplication. If the program uses logarithm look-up tables, then how will the decompiler know what the following subroutine actually does?
Code: Select all
clc
lda $4000, x
adc $4000, y
lda $4100, x
adc $4100, y
bcc label
tax
lda $4200, x
rts
label: lda #0
rts
A good conversion to C should result in
A = (X * Y) >> 8;
Bet, the decompiler won't do that.
Then, of course, there is the problem of self-modifying code. Prince of Persia, for example, will patch its drawing routines with instructions AND, ORA, EOR, or STA to generate different effects.
Code: Select all
lda midOP, x; X is a shape index
sta OPACITY
...
; later in the subroutine
ldx OPACITY
lda OPCODE, x
sta :smod; patch instruction!
...
lda (IMAGE), y
:smod: ora (BASE), y
sta (BASE), y
OPCODE: dfb $31 ; and (oper),Y
dfb $11 ; ora
dfb $91 ; sta
dfb $51 ; eor
dfb $31 ; and
dfb $91 ; sta
Now how will the decompiler handle this?
Furthermore, not all games are written in pure 6502 assembly language. There are adventure games that use their own action tables, and there are a lot of games that use some form of bytecode for their logic. Just to name a few:
- Zork I
- Maniac Mansion
- The Guild of Thieves
- Wizardry
- Dragon Wars
Infocom adventures use their own virtual machine known as the Z-machine.
Lucasfilm Games adventures come with SCUMM.
Magnetic Scrolls adventures are actually written in some kind of 68000 subset, so when the C64 version runs 'The Pawn', 'The Guild of Thieves', 'Jinxter' etc it runs an emulator that basically emulates 68000 machine language (I'm not kidding).
Wizardry was written in UCSD-Pascal and runs on a virtual stack machine, and Dragon Wars uses a virtual 8/16 bit processor.
Okay, you could say: that doesn't count, these are adventures, not 'real' games. How about the game 'Mercenary' (C64) then? It uses its own bytecode which handles the 'Benson' messages and will also enable you to buy and sell things on different locations. This bytecode has direct access to the 6502 memory and it uses this access to set flags, e.g. to flip the X values (used in 'The Second City'). And now? Even a tiny arcade game like 'Buck Rogers - Planet of Zoom' uses bytecode to move the aliens around the screen. And this bytecode can do GOSUBs and GOTOs as well. And this means that the address of the jump target is vital for the bytecode to function properly. To decompile this, you would have to understand the meaning of this bytecode. You need to know that certain bytes of the bytecode (which just look like data to the 6502 decompiler) are in fact pointers to different code sections.
If you really want to handle this automatically without knowing the meaning of the bytecode, you somehow have to emulate the 64kb memory of the 6502. But when you start emulating the memory, then why not emulate the whole machine? One may not forget: games were written for a specific machine in terms of video and sound. C64 games rely heavily on sprite collision detection or sprite multiplexing. Prince of Persia (AppleII) needs the high bit set in its shapes (possible on the AppleII resulting in four colours black, white, orange, and blue) so that it can shift and mirror the shapes easily. How should one decompile this?
Conclusion: sorry, but I only can say this: 'Decompiler? Impossible!'
Cheers
Miles