6502.org • View topic - Static recompilation of 6502 machine code from NES ROMs

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Static recompilation of 6502 machine code from NES ROMs

Page 3 of 3

[ 36 posts ]

Go to page Previous 1, 2, 3

Previous topic | Next topic

Author

Message

TMorita

Post subject: Re: Static recompilation of 6502 machine code from NES ROMs

Posted: Thu Aug 22, 2013 7:49 am

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214

White Flame wrote:

The other argument you bring up is that these techniques aren't always possible, depending on the situation. Well, that doesn't stop anybody from writing optimization strategies, which would also generate very incorrect code if their constraints were ignored.
...

You seem to consider yourself very knowledgeable about compiler optimizations.
Let me briefly describe my compiler optimization experience.
I would be interested in hearing about your experiences in the field of compiler
optimzation, and invite you to list your experience in return.

From 1993 to about 2004 I was a contributor to the GNU C codebase. If you search the
GNU C Changelogs, you will find references to my name because I submitted numerous patches.

From 2001 to 2004 I worked at Renesas and one of my responsibilities was to manage
GNU C development worldwide for Renesas. I designed both new compiler optimizations
and enhancements to existing compiler optimizations, and we had a team in India
at HCL Technologies that implemented the optimiations. We impelemented about a dozen
new compiler optimizations, and four of the projects resulted in technical papers which were
presented at the GCC Developer's Summit:

"Alias Analysis for Intermediate Code" in Proceedings of the 2003 GCC Developer's Summit
Sanjiv K. Gupta, HCL Technologies and Naveen Sharma, HCL Technologies
http://ols.fedoraproject.org/GCC/Reprin ... alysis.pdf

"Optimal Stack Assignment in GCC" in Proceedings of the 2003 GCC Developer's Summit
Naveen SHarma, HCL Technologies and Sanjiv K. Gupta, HCL Techologies
http://ols.fedoraproject.org/GCC/Reprin ... Assign.pdf

"Register Rematerialization in GCC" in Proceedings of the 2004 GCCC Developer's Summit
Mukta Punjani, Cadence
(Mukta was hired away by Cadence just before the paper was submitted,
thus the Cadence affiliation)
http://www.netgull.com/gcc/summit/2004/ ... zation.pdf

"Addressing Mode Selection" in Proceedings of the 2004 GCC Developer's Summit
Rakesh Kumar, HCL Technologies
ftp://ftp.lip6.fr/pub/gcc/summit/2004/A ... ection.pdf

If you read the papers, you'll see that I am mentioned in the Acknowledgements section
in three of the papers.

I was also on the paper review committee of the GCC Developer's Summit for a few years
and reviewed compiler papers to be presented at the summit.

I am also listed as inventor or co-inventor on five patents, of which one is in the
field of compiler optimization:

"System and Method for delaying Indirect Register Offset Resolution"
Assignee: Sega of America
U.S. Patent 6,275,984 (Date of Patent: Aug 14, 2001) (sole inventor)
http://www.google.com/patents/US6275984

This is a patent for a compiler which implements an optimization for processor
targets which have a small number of offset bits for the indirect addressing mode.

I also have other compiler experience such as designing and implementing two custom
programming languages, implementing a MIPS JIT, compiler front-end work, Javascript JIT
work, etc but I won't go into detail since they're not optimization-related.

So hopefully now that you understand that I have some experience in the field
of compiler optmization, you will take my ramblings a little more seriously.

The problem that I see is that you have latched onto this idea that this is a problem
which can be solved by a compiler optimization, even after I have pointed out the holes
in the proposed solution. You keep insisting that it's possible to solve it in even more
complex ways. This is a slippery slope.

First, you think cprop will solve the problem. I point out it doesn't handle the PRIMM case.
Your solution is to simulate execution of the code in order to derive the correct values.
I will point out (for some cases) that values are dependent on values in NES hardware registers.
Your solution will be to simulate the NES hardware during recompilation.
I will point out that some NES hardware registers values are timing dependent.
Your solution will be to simulate the timing of the NES hardware during recompilation.
Some values are dependent on player input. Your solution will be to simulate
player input during recompilation. The solution keeps getting more complicated because the
original concept (static analysis can yield branch target of JMP (x)) is flawed.
.
This is why I started thinking about non-compiler-optimization-related solutions, and
developed the compiled-code-in-a-switch-statement solution described earlier.

Toshi

Top

White Flame

Post subject: Re: Static recompilation of 6502 machine code from NES ROMs

Posted: Thu Aug 22, 2013 12:29 pm

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672

TMorita wrote:

My original point is that static code analysis problem on the 6502 is completely different from the static code analysis problem with Java bytecodes, because the Java bytecodes were designed for specifically to allow static analysis.

And I agreed with that. The particulars of what JSR, RTS, JMP, etc do are completely different than what the invoke and return (or jsr/ret) bytecodes do, as are the boundaries of functions and data (Java = explicit, 6502 = nonexistent in machine code), low-level memory access, return address access, etc.

The very simplistic detection of enumerations, however, is NOT analysis of machine- or bytecode. It is simply looking at the set of stores to & usage of a variable, and seeing which values are known compile-time constants. This is at instruction-level IR, not 6502 or Java specific, and does not need to consider any larger block structure like function boundaries. Therefore it is a static code analysis technique that can work on (and is agnostic to) both 6502 and Java.

The problems with analyzing the PRIMM/PUTSTRI function on the 6502 are of discovering what the parameters are, and how the function returns. If those are found, then the function's model can match that of Java and it can be expressed cleanly in a Java method call, passing a proper parameter and returning properly, (and explicitly performing any additional side effects that were implicitly performed on the 6502, of which there were none in this case), without any hackery... if the transformations involved are within the bounds of the system's capability. (Consider a simple case where the JSR is always followed by a single 1-byte parameter on one end, vs followed by an interpreted bytecode stream on the impossible end.)

Are these statements agreeable?

Quote:

You insisted there was no difference between the two.

For analysis that can work at an IR level or above, there is no difference. As we both know, LLVM is built on the entire concept that once the level is raised, analysis becomes portable, and GCC of course exploits this as well. I stated that one particular analysis, which works on the generalized concepts of the presence of variable access with constants, is indifferent to architecture.

Quote:

In order to implement PRIMM natively in Java, this requires popping the return address off the stack and storing it in a stack slot or local variable.

That is incorrect. There are other ways to implement it, and it is obviously impossible to do so in your proposed way in Java, because the language doesn't even have operators for that (ie, for the reasons you repeated). Let me quote myself: "the inter-method return address is not even visible to the bytecode language. Intra-method subroutine return addresses are technically 'visible', but immutable [should be 'opaque'] from the bytecode", and yes I understand the type system and what jsr/ret do with returnAddress values.

Converting it into a proper method call with a proper parameter, and transforming the calls to match, is really the sensible and ideal way to implement PRIMM natively in Java, but is constrained to when it's possible to do so. The highly-restrictive JVM will put a quick stop to attempts to work around it otherwise.

Again, I said straight up that Java is not a good target platform to assume reasonable native translation from 6502 code, and I care more about figuring out what unknown 6502 code does than how it's expressed in other languages. Porting 6502 to the JVM is your bone to pick here, not mine. The particular example you brought up worked out in expressability, which is a separate issue than typecast attempts.

Quote:

1) To demonstrate the JMP (x) problem is not solvable via constant propogation.

It is not always solvable via constant propagation. I never claimed it was, and pointed out your misrepresentation of this multiple times. If the variable is treated by the code under scrutiny as a simple enumeration of literal addresses, then in that particular case the problem of what x can hold is solvable. (again, replace "address" with "type" or whatever, and the concept is platform-agnostic)

Quote:

2) To demonstrate that certain 6502 code sequences are not natively implementable in Java bytecode because Java bytecode is limited, because it is designed to allow static code analysis.

...which is exactly why non-literal transformation would be required. Such transformation is potentially, but not always, possible with complexities approaching the halting problem. The particular example presented did allow for algorithmic transformation of 6502 stack hackery into a clean Java method call, and is a great case for when such analysis is fruitful.

Quote:

You seem to consider yourself very knowledgeable about compiler optimizations.

I mentioned them in passing reference. Garth already pointed to your resume, which I read, and thought optimization would be an apt parallel to the issue of constraints surrounding the usability of a technique, which you listed as a negative trait. Code analysis and potential recompilation as discussed here are non-optimizing transformations, as far as I'm concerned. My focus is reverse-engineering.

Quote:

The problem that I see is that you have latched onto this idea that this is a problem which can be solved by a compiler optimization, even after I have pointed out the holes in the proposed solution.

1) Constness is simply a predicate, not an optimization as used here. 2) You correctly point out holes in things that I didn't say, and I agree.

Quote:

First, you think cprop will solve the problem. I point out it doesn't handle the PRIMM case.
Your solution is to simulate execution of the code in order to derive the correct values.
I will point out (for some cases) that values are dependent on values in NES hardware registers.
Your solution will be to simulate the NES hardware during recompilation... recompilation... recompilation...

At every step, I noted the boundaries of tractability. The point is that if it doesn't work everywhere, some progress on analyzing a piece of code is better than no progress. Those who refuse to do any such analysis because its applicability is contextual make no progress on automatically reversing any particular code sample. Too many people simply throw their hands up and walk away from the concept in general, but some immediate chipping can be made at these barriers.

Complete coverage of everything needed to resolve everything would be nice. This is feasible on the low end and known to be impossible on the high end of complexity. The point is to raise that low end, not to claim all is possible.

Plus, again, I'm talking about reverse engineering analyses that might be useful in recompilation, not about the end goal of recompilation and all it would require.

(I also realized that I made reference to the crowdfunding guy in a prior post; sorry, wrong thread)

PS. I continue in good faith of interesting technical discussion and resolving misunderstandings, not argumentative rage.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor

Top

TMorita

Post subject: Re: Static recompilation of 6502 machine code from NES ROMs

Posted: Thu Aug 22, 2013 11:08 pm

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214

White Flame wrote:

I'm under the impression that that's exactly how emulator JITs structure their output. Interesting take on having a static C representation of it with a switch to build the jump table for you.

I don't understand how it would handle self-modifying code, though. If the opcode (or operand thereof) changed at an already-known address, it would still jump to the statically defined version. It would seem to require getting the PC to a non-statically defined address within the routine in order for the interpreter to kick in and see the selfmods.

The self-modifying code problem can be split into to problems: ROM and RAM.

The self-modiifying code problem does not exist for ROM, because ROM contents cannot be modified.
So code in ROM can be safely statically compiled because there is no danger of self-modification.
So the giant switch() statement only handles instructions in ROM for this reason.

The self-modifying code problem is handled for the RAM case by using the 6502 interpreter.
If the code jumps to RAM, then by definition it is running dynamically generated code because
the contents of RAM are undefined after reset.
When the interpreter executes a branch, it returns back to the main code loop to allow the switch
statement to resume execution of statically compiled code if applicable.

Toshi

Top

TMorita

Post subject: Re: Static recompilation of 6502 machine code from NES ROMs

Posted: Fri Aug 23, 2013 2:09 am

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214

WhiteFlame wrote:

Tmorita wrote:

My original point is that static code analysis problem on the
6502 is completely different from the static code analysis problem
with Java bytecodes, because the Java bytecodes were designed for
specifically to allow static analysis.

No. because this "simplistic detection of enumerations" algorithm will fail
for many cases due to failure to consider spatial/temporal locality of usage.

Here's an example of this algorithm leading to the wrong conclusion
for the 6502 case:

1) Store of 0 to address 4
2) Store of 0 to address 5
3) Store of $ed to address 4
4) Store of $fd to address 5
5) jmp ($0004) parsed

So the algorithm you described would conclude the JMP instruction may
jump to either $0000 or $fded. However, an actual examination of the
code might reveal that address 4/5 are used as a counter by one routine,
and as a jump target pointer by another routine.

Here's an example of this algorithm faling for the Java case:

for (i=0; i<8; i++) {
func(i)...
}
for (i=9; i<16; i++) {
...
}

So "simply looking at the stores to" the variable i would lead to the
incorrect conclusion that func() can be called with a parameter value of
zero through fifteen.

In order to make this work, you need two other compiler features:

1. Single Static Assignment form. SSA form treats every write to a variable as a separate instance of the variable.
2. In order to implement SSA form, it requires control flow analysis in order to build the use-def chains required for generating SSA form.

After these analysis routines are added, the optimization is the equivalent of "sparse conditional constant propagation". If you Google for these terms and use them, you can avoid retyping the description of the algorithm every time you want to describe it, and also avoid the hassle of describing it algorithmically every time it's referenced.

WhiteFlame wrote:

TMorita wrote:

You insisted there was no difference between the two.

For analysis that can work at an IR level or above, there is no difference.
As we both know, LLVM is built on the entire concept that once the level is raised,
analysis becomes portable, and GCC of course exploits this as well. I stated that
one particular analysis, which works on the generalized concepts of the presence
of variable access with constants, is indifferent to architecture.

The code may be portable, but that doesn't mean the two cases are identical.
As you agreed previously, the 6502 static code analysis case has special cases
that do not exist for the Java bytecode static analysis case.

So the set of features required to perform 6502 static code analysis intersects
the set of features required to perform Java static code analysis, but the
two feature sets are not equal.

WhiteFlame wrote:

TMorita wrote:

In order to implement PRIMM natively in Java, this requires popping the
return address off the stack and storing it in a stack slot or local variable.

I appreciate that you're finally conceding one point, although you've prefaced it
with, "That is incorrect."

Toshi

Top

TMorita

Post subject: Re: Static recompilation of 6502 machine code from NES ROMs

Posted: Fri Aug 23, 2013 5:17 am

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214

White Flame wrote:

...
Again, I said straight up that Java is not a good target platform to assume reasonable native translation from 6502 code, and I care more about figuring out what unknown 6502 code does than how it's expressed in other languages. Porting 6502 to the JVM is your bone to pick here, not mine.
...

The original poster of this thread asked about static recompilation of 6502 code from NES ROMs.
This "recompilation" part implies a target language which is unspecified, of which Java is a possibility.

If you only want to discuss what unknown 6502 code does and not how it's expressed in other languages, then you should create a separate thread to discuss your personal interests instead of hijacking fachat's thread about recompilation of 6502 code from NES ROMs.

WhiteFlame wrote:

...
The particular example you brought up worked out in expressability, which is a separate issue than typecast attempts.
...

The "little picture" or microscopic view of why PRIMM cannot be implemented with adjacent invoke* and ASCII data is the proper operators do not exist.

The underlying reason the relevant operators do not exist and are purposely not implemented is because they would violate the type safety model of Java.

So the "big picture" or macroscopic view of why PRIMM cannot be implemented with adjacent invoke* and ASCII data is because it creates a type-safety issue.

Toshi

Top

White Flame

Post subject: Re: Static recompilation of 6502 machine code from NES ROMs

Posted: Sat Aug 24, 2013 9:50 am

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672

TMorita wrote:

No. because this "simplistic detection of enumerations" algorithm will fail
for many cases due to failure to consider spatial/temporal locality of usage.

The question was if this analysis is processor/platform agnostic. You applied similar failure cases to both 6502 and Java, so it seems that you agree with what I asked in particular. Yes, the conditions appear greater on the 6502, but only because they're already met on the JVM. As you show they're still fundamentally present across platforms.

I was going to bring up the particular example of spurious writes to vectors, but decided to keep it shorter instead of getting into postconditions, and since they can potentially be proven to be irrelevant. Regarding your particular simplified example (if that's considered to be one block of code), obviously tracing the program can show that the $0000 value is never called, but it's not guaranteed that that's always discernable. However, that's why systems can hypothesize and iterate, focusing analysis on trouble cases. If $0000 doesn't end up being valid code or causes other issues elsewhere, that is known to stem from the combination of the JMP($0004) and the 2 store operations. If by happenstance it does end up being valid but uncalled code, one could argue if the extra baggage is innocuous, or if various runtime conditions could actually call those vectors that don't seem involved and aren't spurious in reality. This is why I'm using symbolic AI methods to track chains of assumptions, weight analysis directions, and keep all analyses feeding back into the common knowledge base instead of working in isolation. They need feedback and verification from each other in order to solve more complex cases.

Quote:

If you only want to discuss what unknown 6502 code does and not how it's expressed in other languages, then you should create a separate thread to discuss your personal interests instead of hijacking fachat's thread about recompilation of 6502 code from NES ROMs.

My goal is to raise the level of 6502 code (or other implementation languages) to express platform-specific tricks in terms of what they're accomplishing more generally. This can be better reported for human understanding (my fundamental purpose), but also appropriate for recompilation.

Often, there's quite a finite toolbox of hacks that assembly coders and compilers rely on, so this is a compilation of known techniques as revealed by what analysis doesn't yet work at all, but keeping the strategies as generalized as possible for broad applicability. I've always brought up the limitations, boundaries, tractability issues, etc, and I know that running recompiled code requires a case of complete assurance for that particular piece of code, and/or fallback emulation strategies.

In this example of finding out what PRIMM is doing with its parameters, raising the level it's expressed at involves the discovery that the return pointer is also being used as a data pointer and copying it out to be an explicit function parameter, giving a much more describable and compatible functional unit as opposed to blindly reporting/translating stack mashing instructions and inlined data. This is what we humans do in both reading and writing platform-trick code, and much of it is rote or discoverable iteratively.

Another example would be loop detection. If 6502 were being converted to Java source code instead of byte code, and if the loop counters were known to be limited in use in the 6502 case to just maintaining a simple loop, a 'for' construct could be created in Java instead of individual register comparison operations and dispatch, giving more information to both human readers and the JIT as the bytecode would match common situations it's geared for. The more information that can be extracted from the code the better.

Quote:

I appreciate that you're finally conceding one point, although you've prefaced it with, "That is incorrect."
...
The "little picture" or microscopic view of why PRIMM cannot be implemented with adjacent invoke* and ASCII data is the proper operators do not exist.
...
So the "big picture" or macroscopic view of why PRIMM cannot be implemented with adjacent invoke* and ASCII data is because it creates a type-safety issue.

In my very first reply to that issue, I said it was pointless and impossible to do it that way, and gave a way that worked for that example. Why do you require that it be implemented with adjacent invoke* and ASCII? I don't concede my point: It is possible to implement PRIMM in Java byte code, but obviously not with additional completely arbitrary and impossible constraints of requiring that particular instruction layout.

(woohoo, getting shorter!)

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor

Top

Page 3 of 3

[ 36 posts ]

Go to page Previous 1, 2, 3

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 11 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum