Crowdfunding pledge for a 6502 decompiler
Crowdfunding pledge for a 6502 decompiler
Hi there,
I have got a mail notification from the poster of that interesting project: a 6502 decompiler:
http://www.indiegogo.com/6502
to "decompile" 6502 machine code into C - even code that had been written purely in assembler.
with a demo here
http://6502.emulinks.de/
which explains some of the pitfalls the author wants to fix by working on the pledge.
I think the project is interesting and ambitious - I'm looking forward to the results!
I have got a mail notification from the poster of that interesting project: a 6502 decompiler:
http://www.indiegogo.com/6502
to "decompile" 6502 machine code into C - even code that had been written purely in assembler.
with a demo here
http://6502.emulinks.de/
which explains some of the pitfalls the author wants to fix by working on the pledge.
I think the project is interesting and ambitious - I'm looking forward to the results!
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Crowdfunding pledge for a 6502 decompiler
It looks like a very ambitious project. I'm partly thinking about all the spaghetti code that would have to not only be transferred to a higher-level language, but the decompiler would have to figure out entirely different approches to arrive at structures, if it is going to go so much further than just a disassembler. I will be anxious to see what secrets might be discovered. With such a huge software base, there's bound to be some outstanding method someone kept to themselves, either on purpose, or simply never got around to publishing it.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Crowdfunding pledge for a 6502 decompiler
This won't work, I'm afraid. I've reverse engineered _a lot_ of programs and based on that experience I can only say that such a task cannot be done for several reasons.
First of all, if you take a look at the examples on that website, you soon see the great disadvantage of this approach: it does not add any semantics. It just exchanges one obscurity for another. Looking at the C code you still don't know what it means, even worse, what might have been readable for an experienced 6502 programmer turns into some cryptic gibberish. You do not gain meaning, you lose it. So why do it?
The very idea of a decompiler is based on a (old) misconception how language works, how syntax and semantics relate to each other. While this, sorry to say so, is typical of a computer scientiest, a linguist will only shake his head over this rather naive idea. While there is Turing to help us emulating a program code there is also Goedel who tells us there are unavoidable pitfalls in the way referencing is done. Try to translate the following sentence into French: 'This sentence will have a different meaning when translated into French.' And now translate the following expression into C:Want some real examples? Here we go...
A lot of programs that deal with 3d use look-up tables for fast multiplication. If the program uses logarithm look-up tables, then how will the decompiler know what the following subroutine actually does?A good conversion to C should result in
A = (X * Y) >> 8;
Bet, the decompiler won't do that.
Then, of course, there is the problem of self-modifying code. Prince of Persia, for example, will patch its drawing routines with instructions AND, ORA, EOR, or STA to generate different effects.Now how will the decompiler handle this?
Furthermore, not all games are written in pure 6502 assembly language. There are adventure games that use their own action tables, and there are a lot of games that use some form of bytecode for their logic. Just to name a few:
- Zork I
- Maniac Mansion
- The Guild of Thieves
- Wizardry
- Dragon Wars
Infocom adventures use their own virtual machine known as the Z-machine.
Lucasfilm Games adventures come with SCUMM.
Magnetic Scrolls adventures are actually written in some kind of 68000 subset, so when the C64 version runs 'The Pawn', 'The Guild of Thieves', 'Jinxter' etc it runs an emulator that basically emulates 68000 machine language (I'm not kidding).
Wizardry was written in UCSD-Pascal and runs on a virtual stack machine, and Dragon Wars uses a virtual 8/16 bit processor.
Okay, you could say: that doesn't count, these are adventures, not 'real' games. How about the game 'Mercenary' (C64) then? It uses its own bytecode which handles the 'Benson' messages and will also enable you to buy and sell things on different locations. This bytecode has direct access to the 6502 memory and it uses this access to set flags, e.g. to flip the X values (used in 'The Second City'). And now? Even a tiny arcade game like 'Buck Rogers - Planet of Zoom' uses bytecode to move the aliens around the screen. And this bytecode can do GOSUBs and GOTOs as well. And this means that the address of the jump target is vital for the bytecode to function properly. To decompile this, you would have to understand the meaning of this bytecode. You need to know that certain bytes of the bytecode (which just look like data to the 6502 decompiler) are in fact pointers to different code sections.
If you really want to handle this automatically without knowing the meaning of the bytecode, you somehow have to emulate the 64kb memory of the 6502. But when you start emulating the memory, then why not emulate the whole machine? One may not forget: games were written for a specific machine in terms of video and sound. C64 games rely heavily on sprite collision detection or sprite multiplexing. Prince of Persia (AppleII) needs the high bit set in its shapes (possible on the AppleII resulting in four colours black, white, orange, and blue) so that it can shift and mirror the shapes easily. How should one decompile this?
Conclusion: sorry, but I only can say this: 'Decompiler? Impossible!'
Cheers
Miles
First of all, if you take a look at the examples on that website, you soon see the great disadvantage of this approach: it does not add any semantics. It just exchanges one obscurity for another. Looking at the C code you still don't know what it means, even worse, what might have been readable for an experienced 6502 programmer turns into some cryptic gibberish. You do not gain meaning, you lose it. So why do it?
The very idea of a decompiler is based on a (old) misconception how language works, how syntax and semantics relate to each other. While this, sorry to say so, is typical of a computer scientiest, a linguist will only shake his head over this rather naive idea. While there is Turing to help us emulating a program code there is also Goedel who tells us there are unavoidable pitfalls in the way referencing is done. Try to translate the following sentence into French: 'This sentence will have a different meaning when translated into French.' And now translate the following expression into C:
Code: Select all
lda #$60
sta label
label: jmp label
A lot of programs that deal with 3d use look-up tables for fast multiplication. If the program uses logarithm look-up tables, then how will the decompiler know what the following subroutine actually does?
Code: Select all
clc
lda $4000, x
adc $4000, y
lda $4100, x
adc $4100, y
bcc label
tax
lda $4200, x
rts
label: lda #0
rts
A = (X * Y) >> 8;
Bet, the decompiler won't do that.
Then, of course, there is the problem of self-modifying code. Prince of Persia, for example, will patch its drawing routines with instructions AND, ORA, EOR, or STA to generate different effects.
Code: Select all
lda midOP, x; X is a shape index
sta OPACITY
...
; later in the subroutine
ldx OPACITY
lda OPCODE, x
sta :smod; patch instruction!
...
lda (IMAGE), y
:smod: ora (BASE), y
sta (BASE), y
OPCODE: dfb $31 ; and (oper),Y
dfb $11 ; ora
dfb $91 ; sta
dfb $51 ; eor
dfb $31 ; and
dfb $91 ; sta
Furthermore, not all games are written in pure 6502 assembly language. There are adventure games that use their own action tables, and there are a lot of games that use some form of bytecode for their logic. Just to name a few:
- Zork I
- Maniac Mansion
- The Guild of Thieves
- Wizardry
- Dragon Wars
Infocom adventures use their own virtual machine known as the Z-machine.
Lucasfilm Games adventures come with SCUMM.
Magnetic Scrolls adventures are actually written in some kind of 68000 subset, so when the C64 version runs 'The Pawn', 'The Guild of Thieves', 'Jinxter' etc it runs an emulator that basically emulates 68000 machine language (I'm not kidding).
Wizardry was written in UCSD-Pascal and runs on a virtual stack machine, and Dragon Wars uses a virtual 8/16 bit processor.
Okay, you could say: that doesn't count, these are adventures, not 'real' games. How about the game 'Mercenary' (C64) then? It uses its own bytecode which handles the 'Benson' messages and will also enable you to buy and sell things on different locations. This bytecode has direct access to the 6502 memory and it uses this access to set flags, e.g. to flip the X values (used in 'The Second City'). And now? Even a tiny arcade game like 'Buck Rogers - Planet of Zoom' uses bytecode to move the aliens around the screen. And this bytecode can do GOSUBs and GOTOs as well. And this means that the address of the jump target is vital for the bytecode to function properly. To decompile this, you would have to understand the meaning of this bytecode. You need to know that certain bytes of the bytecode (which just look like data to the 6502 decompiler) are in fact pointers to different code sections.
If you really want to handle this automatically without knowing the meaning of the bytecode, you somehow have to emulate the 64kb memory of the 6502. But when you start emulating the memory, then why not emulate the whole machine? One may not forget: games were written for a specific machine in terms of video and sound. C64 games rely heavily on sprite collision detection or sprite multiplexing. Prince of Persia (AppleII) needs the high bit set in its shapes (possible on the AppleII resulting in four colours black, white, orange, and blue) so that it can shift and mirror the shapes easily. How should one decompile this?
Conclusion: sorry, but I only can say this: 'Decompiler? Impossible!'
Cheers
Miles
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: Crowdfunding pledge for a 6502 decompiler
I wholeheartedly concur with Miles. Having done a ton of 6502 reverse engineering as well, I noted the repetitiveness of what I was doing in order to try to understand the code, and sought automation. It is absolutely an AI problem, not one of merely "smart" conversion.
I actually don't consider it impossible for a large majority of cases (including selfmod code examples listed above), and would like to eventually get a proper AI framework set up in order to tackle conceptualization of assembly language in context of a full machine (I build commercial AI frameworks for a living), but this is nothing that $48k will hope to touch. It's a massive undertaking, and nothing on his page indicates that he has any breakthrough that would allow him to get through it, nor does he even acknowledge the difficulty of automated reverse engineering. The output of a straight asm-to-C conversion will be as unreadable (or moreso) than the original asm.
edit: I had a look at one of the example disassemblies, and yeah that's pretty much the case. There's simply no way to directly convert the stack return address trickery used, nor clever usages of the carry flag & bit-shifting instructions in C, as noted in his comments. He ties their solution to specific buzzwords, but that's only the tip of the iceberg regarding the challenges he faces.
I actually don't consider it impossible for a large majority of cases (including selfmod code examples listed above), and would like to eventually get a proper AI framework set up in order to tackle conceptualization of assembly language in context of a full machine (I build commercial AI frameworks for a living), but this is nothing that $48k will hope to touch. It's a massive undertaking, and nothing on his page indicates that he has any breakthrough that would allow him to get through it, nor does he even acknowledge the difficulty of automated reverse engineering. The output of a straight asm-to-C conversion will be as unreadable (or moreso) than the original asm.
edit: I had a look at one of the example disassemblies, and yeah that's pretty much the case. There's simply no way to directly convert the stack return address trickery used, nor clever usages of the carry flag & bit-shifting instructions in C, as noted in his comments. He ties their solution to specific buzzwords, but that's only the tip of the iceberg regarding the challenges he faces.
Re: Crowdfunding pledge for a 6502 decompiler
Eh. It's fair to criticize a decompiler for it's potential lack of understanding of 6502 nuances, but blaming a 6502 decompiler for not being able to decompile byte code is a bit much. Of course it can't.
What you'd end up with is a decompilation of the run time, and blocks of data representing the byte code. The decompiler would be mostly savvy enough to capture the distinction between "code" and "data", even if the data itself is also "code" (just not 6502 code).
A basic decompiler would have a mess of a time decompiling a Forth interpreter. Especially automatically. You'd be lucky if it got past decompiling the most basic primitives, and yet it would still miss a majority of the code words since at start up, Forth is pretty much entirely data driven.
But reverse engineering never works alone, there's typically a person guiding it and using the tools as appropriate.
What you'd end up with is a decompilation of the run time, and blocks of data representing the byte code. The decompiler would be mostly savvy enough to capture the distinction between "code" and "data", even if the data itself is also "code" (just not 6502 code).
A basic decompiler would have a mess of a time decompiling a Forth interpreter. Especially automatically. You'd be lucky if it got past decompiling the most basic primitives, and yet it would still miss a majority of the code words since at start up, Forth is pretty much entirely data driven.
But reverse engineering never works alone, there's typically a person guiding it and using the tools as appropriate.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Crowdfunding pledge for a 6502 decompiler
Miles J. wrote:
This won't work, I'm afraid....'Decompiler? Impossible!'
I seldom use the word "impossible" when it comes to computers, but I'd have to agree with Miles that this would be a very tough nut to crack. I can think of many instances where a "decompiler" would fail. A simple example would the use of the PRIMM subroutine in the Commodore C-128 kernel (also posted in the code archives in several forms):
Code: Select all
jsr primm
.byt 'Text printed by PRIMM.',$00How would the decompiler know that the bytes following the PRIMM call are text, not code? It wouldn't unless it was able to recognize a call to the C-128 kernel when it "sees" one. If that were the case, then it would no longer be a general purpose tool.
The fact is, many 6502 programs were written with data tables scattered everywhere, and few have them neatly ensconced at the end of the program (the Commodore kernel is full of such code). Again, how would the decompiler know this unless it already had detailed knowledge of the machine code and the system on which it is/was run? Or, what about the use of one of the BIT opcodes to produce "blind" jumps over pieces of code? What would that look like to a decompiler? In order for it to get that sort of information, someone (human) would have had to have written it into the decompiler. If someone is going to expend that kind of effort and has that level of programming talent, they might as well produce a commented disassembly of the program of interest and publish the results so others can, if they're so inclined, write the equivalent in C (which should be less difficult than what it took to develop the original).
While I would never discourage anyone from working on whatever it is that floats their boat, I stop at the point where funding is requested for something that will most likely prove to be an exercise in futility.
BTW, the author at that site said this:
- Machine code is really hard to grasp because it is entirely void of any sort of structure. It is simply not designed to be understandable.
Machine code is not "designed" to be anything, other than executable. I had to chuckle when I read that, since I, as well as numerous others, obviously do understand this stuff.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Crowdfunding pledge for a 6502 decompiler
White Flame wrote:
Having done a ton of 6502 reverse engineering as well
whartung wrote:
Eh. It's fair to criticize a decompiler for it's potential lack of understanding of 6502 nuances, but blaming a 6502 decompiler for not being able to decompile byte code is a bit much. Of course it can't.
whartung wrote:
What you'd end up with is a decompilation of the run time, and blocks of data representing the byte code. The decompiler would be mostly savvy enough to capture the distinction between "code" and "data", even if the data itself is also "code" (just not 6502 code).
Code: Select all
@label 'vertex_rotate' := $87e4
@collect $4000
@checklabel
@set $800
@write $bfff
whartung wrote:
A basic decompiler would have a mess of a time decompiling a Forth interpreter.
whartung wrote:
But reverse engineering never works alone, there's typically a person guiding it and using the tools as appropriate.
BTW unfortunately I'm not a millionaire. Otherwise I would like to spend $3000 on this:
Ulrich Hecht wrote:
$3,000 Custom reversing
You'll get updates, a note, a 6502, git access, and you can name one historical 6502 program that I will personally reverse-engineer completely: I will decompile it, give functions and variables sensible names, and add comments, until the result is as good as hand-written C code.
You'll get updates, a note, a 6502, git access, and you can name one historical 6502 program that I will personally reverse-engineer completely: I will decompile it, give functions and variables sensible names, and add comments, until the result is as good as hand-written C code.
Cheers
Miles
Re: Crowdfunding pledge for a 6502 decompiler
@White Flame
Sorry, just noticed you joined today. Welcome to the forum!
Only for fun I fed the prototype decompiler with the following codeThe result was:
However, what the code does is:
What does it tell us? The prototype decompiler does not detect self-modifying code yet(!), and, to be frank, I don't see how even the most sophisticated decompiler could handle this piece of mean, ugly, nasty code properly. As you can see there is no added data involved, just pure 6502 code. And yet, under the surface (or rather on top of it) there is another layer of threaded code with the address values of the routines used as opcode values. Since assembly language is an interpreted language, code that at first glance looks static (with the exception of the patched JMP instruction) can easily produce a continuous state change. The generated static C code in contrast depends on one single state of the memory. If it wants to imitate the state change of the 6502 program, it has to convert the code flow somehow into some data flow which (in case it will ever be successful) will result in ... exactly, you've guessed it: an emulator. So, what was the point exactly?
Oh well.. Enough of this silly rant... Where is the weekend when you need it...?
Cheers
Miles
Sorry, just noticed you joined today. Welcome to the forum!
Only for fun I fed the prototype decompiler with the following code
Code: Select all
$0: a0 0b ldy #$b
$2: a2 00 ldx #0
$4: 84 09 sty $9
$6: 86 0a stx $a
$8: 4c 0b 00 jmp $b
$b: a0 11 ldy #$11
$d: e0 01 cpx #1
$f: 90 f3 bcc $4
$11: a9 12 lda #$12
$13: e8 inx
$14: d0 ee bne $4
$16: 4c 00 00 jmp $0
$19: 60 rts
; insert some other code here
$1a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$2a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$3a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$4a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$5a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$6a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$7a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$8a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$9a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$aa: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$ba: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$ca: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$da: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$ea: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$fa: 00 00 00 00 00 00
$100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
$110: 00
$111: 69 34 adc #$34
$113: a0 19 ldy #$19
$115: ca dex
$116: 4c 04 00 jmp $4
Code: Select all
extern uint8_t mem_000a;
extern uint8_t mem_0009;
void start_0000()
{
uint8_t y;
uint8_t x;
label_0000:
y = 11;
x = 0;
do {
do {
mem_0009 = y;
mem_000a = x;
y = 17;
} while (x < 1);
x = x + 1;
} while (x);
goto label_0000;
}
Code: Select all
clc
lda #$12
adc #$34
rts
Oh well.. Enough of this silly rant... Where is the weekend when you need it...?
Cheers
Miles
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
Re: Crowdfunding pledge for a 6502 decompiler
If the decompiler performs a static analysis of the code it will never correctly figure out what is going on in self-modifying programs. You'd have to emulate the CPU and its hardware to build up a more dynamic picture of how the code executes and changes over time.
However, judging from the video the main target for this decompiler is ROM based code in game cartridges so its not likely to come across anything too horrible. It should be able to reverse engineer a high percentage of most games.
However, judging from the video the main target for this decompiler is ROM based code in game cartridges so its not likely to come across anything too horrible. It should be able to reverse engineer a high percentage of most games.
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
Re: Crowdfunding pledge for a 6502 decompiler
This is indeed a very open-ended project. I can't even think how one might specify exactly what the perfect decompiler would do. The aim, clearly, is to move incrementally to ever-better decompilers, presumably picking off the tractable problems found in common code. Much like static analysis for defects, it's not something which is ever finished. But with a budgeted 9 months of full-time effort, could be some significant result.
Note that Ulrich has already helpfully provided 3 links for people arguing that it can't be done: he's not unaware of these points.
The big problem seems to me to be the budget: he needs quite a number of supporters willing to donate.
(The demo is interesting. Note that Ulrich is responsible for tcc-65816)
Cheers
Ed
Note that Ulrich has already helpfully provided 3 links for people arguing that it can't be done: he's not unaware of these points.
The big problem seems to me to be the budget: he needs quite a number of supporters willing to donate.
(The demo is interesting. Note that Ulrich is responsible for tcc-65816)
Cheers
Ed
Re: Crowdfunding pledge for a 6502 decompiler
BigDumbDinosaur wrote:
Miles J. wrote:
This won't work, I'm afraid....'Decompiler? Impossible!'
I seldom use the word "impossible" when it comes to computers, but I'd have to agree with Miles that this would be a very tough nut to crack. I can think of many instances where a "decompiler" would fail. A simple example would the use of the PRIMM subroutine in the Commodore C-128 kernel (also posted in the code archives in several forms):
Code: Select all
jsr primm
.byt 'Text printed by PRIMM.',$00I assume that PRIMM take the return address from the stack, prints out the text following it up to the $00 byte and then JMPs back to the end.
That wouldn't necessarily be a difficult platform specific rule to add to a decompiler.
Just like if the compiler sees a block of text that's not JMPd, JSRd, or falls in to (via the PC just running in to it), but also all happens to be ASCII, ending with either a hi-bit ascii or a null, that perhaps that's a 'Some String' vs a blob of hex codes. It's fair guess lacking any other information.
Over time, more generic rules as well as platform specific rules "Hey I saw this address passed to an IO routine, it's probably an IO block structure".
Lots of stuff can be done statically. If a simulator is added, then even more flow analysis can be done. And it doesn't have to be done on the entire program all at once. The user can point it at a piece of code and say "try this out and see what you can learn", and away it goes.
Do not underestimate what a tool like this can be capable of given time. One of their nice features is that once they learn something, they never forget. They just need to be guided by an experienced hand.
Finally, not all of the code is written obscurely. It's meant to be understood by humans as well, and folks tend to write clearer code over obscure code when they possibly can. Unlike the machines, we forget stuff all the time. Things like "now, how did this work again?".
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: Crowdfunding pledge for a 6502 decompiler
Miles J. wrote:
White Flame wrote:
Having done a ton of 6502 reverse engineering as well
I also started on WFDis back in 1996 or so (wow). It wasn't great but is at least interactive and did a lot of things that I'd not seen any other disasm do back then. I've since done a lot of analysis work on what it would take to have the machine perform reverse engineering, instead of just displaying information for a human to reverse engineer from. That's actually what drove me to get into AI.
BigEd wrote:
This is indeed a very open-ended project. I can't even think how one might specify exactly what the perfect decompiler would do. The aim, clearly, is to move incrementally to ever-better decompilers, presumably picking off the tractable problems found in common code. Much like static analysis for defects, it's not something which is ever finished. But with a budgeted 9 months of full-time effort, could be some significant result.
At some point, adding in more knowledge via explicit tests and exceptions turns even a slick, working proof of concept into a buggy, unwieldy mess.
However, there is value to a system that only goes somewhat smarter than most disassemblers. It just can't be expected to grow into a fully aware decompiler, especially for the idiosyncracies of the platforms involved.
edit:
whartung wrote:
Simple, you tell the decompiler that the binary is for a C-128 so that it can apply known heuristics to the process (such as "what PRIMM does").
...
Over time, more generic rules as well as platform specific rules "Hey I saw this address passed to an IO routine, it's probably an IO block structure".
...
Over time, more generic rules as well as platform specific rules "Hey I saw this address passed to an IO routine, it's probably an IO block structure".
Since it's also converting to another language, any misunderstanding it has will lose a ton of information about what the original program actually did, instead of just having a confusing 1:1 copy of the original pass through.
Re: Crowdfunding pledge for a 6502 decompiler
whartung wrote:
Simple, you tell the decompiler that the binary is for a C-128 so that it can apply known heuristics to the process (such as "what PRIMM does").
Code: Select all
$7b3a: jsr $7439
hex f4,d2,68,98,22,89,83,29
hex 89,51,90,52,68,9b,61,43
hex d4,e4,07,fa,52,1f,d4,af
hex b3,bf,ae,03,e2,bc,37,7d
hex 03,65,49,21,9b,c3,31,2c
hex c1,9a,4c,df,f1,de,80
whartung wrote:
It's fair guess lacking any other information.
whartung wrote:
Finally, not all of the code is written obscurely.
White Flame wrote:
The problem is that the very architecture he's working on won't get him any farther than anyone else has gone, regardless of time spent.
BigEd wrote:
Note that Ulrich has already helpfully provided 3 links for people arguing that it can't be done: he's not unaware of these points.
White Flame wrote:
it needs some very fundamental and powerful conceptualization basis
White Flame wrote:
Plus, it must "know" some useful description of the system
games & scene stuff, helped out a lot with others trying to figure out copy protections, compression algorithms, etc.see there are still people actively supporting the C64!
I also started on WFDis back in 1996 or so (wow). It wasn't great but is at least interactive and did a lot of things that I'd not seen any other disasm do back then. I've since done a lot of analysis work on what it would take to have the machine perform reverse engineering, instead of just displaying information for a human to reverse engineer from. That's actually what drove me to get into AI.Sorry, I've never come across that program. Well, I bet you know more about decompiling then than Mr Hecht.
Cheers
Miles
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Crowdfunding pledge for a 6502 decompiler
Miles J. wrote:
Good to see there are still people actively supporting the C64!
I have a C-128D here with a Lt. Kernal hard drive subsystem here, and it all works. And I'm still impressed with what Lloyd Sponenburgh and Roy Southwick accomplished in developing that subsystem. I'd love to see someone try to diassemble, let alone, decompile the Lt. Kernal DOS.
There is a lot of obfuscation in that system, some of which was necessary to prevent piracy (there were two piracy attempts in the late 1980s that I know of—neither of them successful). However, a lot of the obfuscation had to do with trying to cram a minicomputer-like operating system into an eight bit machine with only 8 KB in which to do it. The inevitable result was a lot of self-modifying code, overlays read in from specific disk locations, tons of hardware bit-twiddling to map shadow RAM (where the LK-DOS lived) in and out of processor space, etc. Adding insult to injury, the Lt. Kernal was a SCSI system and the SCSI bus had to be bit-banged—there weren't any SCSI ASIC's around back then to off-load the bus protocol. The core of the Lt. Kernal substantially emulated the Point 4 IRIS operating environment, which ran on one of the first minicomputers to use SASI to communicate with disks and tapes. The LK even supported keyed index files, themselves devilishly difficult to implement on a 8 bit MPU running at 1 MHz.
While I wouldn't say turning all that back into assembly language is impossible, the question I would have is, "Where do you start?" It isn't like a typical computer program, where there is a beginning, middle and end. Just the startup alone, which involved looking for a disk, figuring out the disk geometry (it was all ST-506/412 hardware in the early days), loading the DOS one piece at a time, and then making the entire mess not get in the way of the BASIC interpreter but still respond to typed commands, such as copying a file, running a program, etc., was terribly complex—far more so than the most complicated game ever written for 8 bit Commodore hardware.
Now, suppose someone did succeed in rendering the LK-DOS into C source code. What good would it do? Due to the small space in which the LK-DOS has to run, it's unlikely the most efficient C compiler on the planet could produce machine code that would be sufficiently succinct to work. Back in the day, I was very much involved with Lt. Kernel database development and had access to information that Fiscal Information was normally unwilling to disseminate. I also had many phone conversations with developers Sponenburgh and Southwick and heard all the accounts of what they went through to get it to work on the C-64, let alone on the C-128. They often spent days on one routine trying to contract it to where it would fit in the available space and run as fast as that 6510/8502 could run it.
I seriously doubt anyone will ever be able to reduce the LK-DOS to C—same for a lot of other "outer edge" C-64/C-128 programs written to squeeze the last drop of performance from the system. There is way more abstractness in it than can be imagined and despite heroic efforts by many brilliant people, machine intelligence is still at the level of a well-trained donkey.
x86? We ain't got no x86. We don't NEED no stinking x86!
-
Nightmaretony
- In Memoriam
- Posts: 618
- Joined: 27 Jun 2003
- Location: Meadowbrook
- Contact:
Re: Crowdfunding pledge for a 6502 decompiler
Am willing to bet this project is being put forth by a C programmer type who is not so familiar with the basics and nuances of assembly language and is merely seeking an easy reverse engineering by automate means. This would be impossible since programing is very much of an art with each programmer as style. An automated decompiler translator would not do so well, as BDD's example above shows nicely.
Miles: Thanks on bringing up Becky. did music for her Dragon Wars and Ultima. I owe my own programming skills to her.
And for my own bit of reverse engineering of software, will be translating over a slot machine program over to my pinball board (a vegas slot machine, not a video version) The original is written in 8039 but is well commented. I plan to use it as a jumping off point for understanding the various functions the original program does then write my own version.
Miles: Thanks on bringing up Becky. did music for her Dragon Wars and Ultima. I owe my own programming skills to her.
And for my own bit of reverse engineering of software, will be translating over a slot machine program over to my pinball board (a vegas slot machine, not a video version) The original is written in 8039 but is well commented. I plan to use it as a jumping off point for understanding the various functions the original program does then write my own version.
"My biggest dream in life? Building black plywood Habitrails"