TMorita wrote:
My original point is that static code analysis problem on the 6502 is completely different from the static code analysis problem with Java bytecodes, because the Java bytecodes were designed for specifically to allow static analysis.
And I agreed with that. The particulars of what JSR, RTS, JMP, etc do are completely different than what the invoke and return (or jsr/ret) bytecodes do, as are the boundaries of functions and data (Java = explicit, 6502 = nonexistent in machine code), low-level memory access, return address access, etc.
The very simplistic detection of enumerations, however, is NOT analysis of machine- or bytecode. It is simply looking at the set of stores to & usage of a variable, and seeing which values are known compile-time constants. This is at instruction-level IR, not 6502 or Java specific, and does not need to consider any larger block structure like function boundaries. Therefore it is a static code analysis technique that can work on (and is agnostic to) both 6502 and Java.
The problems with analyzing the PRIMM/PUTSTRI function on the 6502 are of discovering what the parameters are, and how the function returns. If those are found, then the function's model can match that of Java and it can be expressed cleanly in a Java method call, passing a proper parameter and returning properly, (and explicitly performing any additional side effects that were implicitly performed on the 6502, of which there were none in this case), without any hackery... if the transformations involved are within the bounds of the system's capability. (Consider a simple case where the JSR is always followed by a single 1-byte parameter on one end, vs followed by an interpreted bytecode stream on the impossible end.)
Are these statements agreeable?
Quote:
You insisted there was no difference between the two.
For analysis that can work at an IR level or above, there is no difference. As we both know, LLVM is built on the entire concept that once the level is raised, analysis becomes portable, and GCC of course exploits this as well. I stated that one particular analysis, which works on the generalized concepts of the presence of variable access with constants, is indifferent to architecture.
Quote:
In order to implement PRIMM natively in Java, this requires popping the return address off the stack and storing it in a stack slot or local variable.
That is incorrect. There are other ways to implement it, and it is obviously impossible to do so in your proposed way in Java, because the language doesn't even have operators for that (ie, for the reasons you repeated). Let me quote myself: "the inter-method return address is not even visible to the bytecode language. Intra-method subroutine return addresses are technically 'visible', but immutable [should be 'opaque'] from the bytecode", and yes I understand the type system and what jsr/ret do with returnAddress values.
Converting it into a proper method call with a proper parameter, and transforming the calls to match, is really the sensible and ideal way to implement PRIMM natively in Java, but is constrained to when it's possible to do so. The highly-restrictive JVM will put a quick stop to attempts to work around it otherwise.
Again, I said straight up that Java is not a good target platform to assume reasonable native translation from 6502 code, and I care more about figuring out what unknown 6502 code does than how it's expressed in other languages. Porting 6502 to the JVM is your bone to pick here, not mine. The particular example you brought up worked out in expressability, which is a separate issue than typecast attempts.
Quote:
1) To demonstrate the JMP (x) problem is not solvable via constant propogation.
It is not always solvable via constant propagation. I never claimed it was, and pointed out your misrepresentation of this multiple times. If the variable is treated by the code under scrutiny as a simple enumeration of literal addresses, then in that particular case the problem of what x can hold is solvable. (again, replace "address" with "type" or whatever, and the concept is platform-agnostic)
Quote:
2) To demonstrate that certain 6502 code sequences are not natively implementable in Java bytecode because Java bytecode is limited, because it is designed to allow static code analysis.
...which is exactly why non-literal transformation would be required. Such transformation is potentially, but not always, possible with complexities approaching the halting problem. The particular example presented did allow for algorithmic transformation of 6502 stack hackery into a clean Java method call, and is a great case for when such analysis is fruitful.
Quote:
You seem to consider yourself very knowledgeable about compiler optimizations.
I mentioned them in passing reference. Garth already pointed to your resume, which I read, and thought optimization would be an apt parallel to the issue of constraints surrounding the usability of a technique, which you listed as a negative trait. Code analysis and potential recompilation as discussed here are non-optimizing transformations, as far as I'm concerned. My focus is reverse-engineering.
Quote:
The problem that I see is that you have latched onto this idea that this is a problem which can be solved by a compiler optimization, even after I have pointed out the holes in the proposed solution.
1) Constness is simply a predicate, not an optimization as used here. 2) You correctly point out holes in things that I didn't say, and I agree.
Quote:
First, you think cprop will solve the problem. I point out it doesn't handle the PRIMM case.
Your solution is to simulate execution of the code in order to derive the correct values.
I will point out (for some cases) that values are dependent on values in NES hardware registers.
Your solution will be to simulate the NES hardware during recompilation... recompilation... recompilation...
At every step, I noted the boundaries of tractability. The point is that if it doesn't work everywhere, some progress on analyzing a piece of code is better than no progress. Those who refuse to do any such analysis because its applicability is contextual make no progress on automatically reversing any particular code sample. Too many people simply throw their hands up and walk away from the concept in general, but some immediate chipping can be made at these barriers.
Complete coverage of everything needed to resolve everything would be nice. This is feasible on the low end and known to be impossible on the high end of complexity. The point is to raise that low end, not to claim all is possible.
Plus, again, I'm talking about reverse engineering analyses that might be useful in recompilation, not about the end goal of recompilation and all it would require.
(I also realized that I made reference to the crowdfunding guy in a prior post; sorry, wrong thread)
PS. I continue in good faith of interesting technical discussion and resolving misunderstandings, not argumentative rage.