6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 9:09 pm

All times are UTC




Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Jun 15, 2013 9:56 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
Hi there,

I recently found a really nice article on static recompilation of 6502 code into LLVM compiler intermediate representation, and the difficulties that come with it.

Here's my write up https://plus.google.com/108984290462000 ... PHzWoosXYS and here's the original (lengthy, but worthy) article: http://andrewkelley.me/post/jamulator.html

What I find interesting is how to resolve the static recompilation issues posed by indirect addressing modes, stack-based jumps (like pushing the address of a function from a table onto the stack and then RTS). His final words on that are that it's probably easier to use just-in-time compilation, but I wonder if there still is a way for static recompilation - be it with manually added hints, or simulation are analysis time to find all possible code paths.

What do you think?
André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 15, 2013 1:41 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Seeking to reverse engineer 6502 code via static analysis is what bootstrapped my AI career. :)

What we see when we look at code and think "Ok, this is what this is doing at this time" is something that I believe is a rational process that's technically describable to a computer, but it won't be feasibly done with standard imperative/OO programming. Working with declarative and inference systems seems to be a lot closer to be able to tackle such problems. However, any solution using this approach is also still a long way off.


Simulation and using feedback is a good approach, and is exactly how the JVM gets its speed so high, running both sim and compiled systems in a single runtime. I don't think that the notion of including a 6502 emulator should kill his idea; I think it should be the source of information as to how to run the compiler.

Having the emulator track which types of stack accesses happen (manual pushes/pops vs auto-effects from JSR/RTS/RTI/IRQ/NMI/BRK), and noting where there is a mismatch, would point to places in the code where hackish indirect jumps and caller dereferences are happening. Watching indirect JMPs allows you to build up a vector table for that particular dispatch, and get those entry points compiled if they haven't already been. Watching code getting selfmodded can help you build up a switch case for handling the enumerated modifications it's seen.

Now, this is no longer static recompilation, but does end up generating an optimized recompiled static footprint of the application that no longer uses emulation once complete code coverage is achieved. If he used the runtime features of LLVM during gameplay to build up these fast paths as they were discovered (as the JVM does), then he'd be golden. Maybe even save an executable image of all the fast paths that have been built up, and discarding the emulation portion if possible. I know that LLVM has built-in profiling systems that can feed back into the compiler to help it out, too, so this model is up LLVM's alley. Plus, giving up and saying "We could compile before, but now during gameplay we can't compile anymore" is anathema to my Lisp-infected brain. ;)


I am surprised that he didn't show a solution to the issue of jumping into the middle of instructions. It seems pretty easy to normalize, by injecting a JMP after the "outer" instruction that bypasses the "inner" instruction to remove the overlap.

Code:
; Before
Label_8220:
    LDY #$00
    .db $2c
Label_8223:
    LDY #$04

; After
Label_8220:
    LDY #$00
    BIT $04a0  ; bytes from LDY #$04 duplicated into 2 separate instructions, here ...
    JMP skip
Label_8223:
    LDY #$04   ; ... and here
skip:

(Note that this is an AST transformation, so the added instructions don't reshuffle the addresses of the existing instructions. Ideally the JMP wouldn't exist as a 6502 instruction, but as a low-level instruction to the compiler, so it doesn't consume any emulated 6502 clock cycles.)

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 16, 2013 1:27 am 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Are either of these techniques really viable for the emulation use case? Most of the really good emulators strive for perfect cycle and circuitry performance. If the CPU is "fast enough" and can meet the other criteria, I don't know if there is a need to optimize it even more. If the CPU is not fast enough, then you still would have the issue of potentially "too slow" instructions, or padding (or waiting) the "too fast" instruction.

Other than that it's a novel idea, simulation and JITting can capture the edge cases and degenerate CPU hackery that goes on in these systems.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 17, 2013 9:54 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Optimization of a running, working system is always a good thing. ;) There is also a lot of weak hardware out there, including mobile processors and virtual machines. Plus, the faster it runs, the more time there is to manage ultra-accurate details. With a JIT strategy, the parts of the system that run without reliance on such accuracy (should be the majority) can run much faster, leaving plenty of time to focus on the problematic parts.

Most emulators sync up to realtime at the frame level. They run emulated cycles as fast as possible until it fills one frame of video and 1/60th (or 1/50th) of a second worth of audio buffer, output those, then sleep if it's running fast. In the inner workings, it needs to know at what emulated cycle reads & writes happen so that hardware registers behave properly as they interplay with the running code, but that part doesn't watch the realtime clock at all.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 22, 2013 3:48 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
Static recompilation of NES code is a bad idea.

I worked on two NES games, and the code I wrote used JMP (x) (where X is a RAM address) very heavily. This obviously causes huge problems for static recomplation.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 22, 2013 5:28 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Not really, if the writes to those addresses can be determined as part of the state transition that includes the jump, and the values written there can be statically determined. Generally, there will only be a small handful of addresses written to those vectors, and that effectively can be turned into an enum.

I'm just saying it's tractable, but not that it's common.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 25, 2013 7:27 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
White Flame wrote:
Not really, if the writes to those addresses can be determined as part of the state transition that includes the jump, and the values written there can be statically determined. Generally, there will only be a small handful of addresses written to those vectors, and that effectively can be turned into an enum.

I'm just saying it's tractable, but not that it's common.


Have you actually tried to do this?

I've written a 65816 disassembler which traced code to determine the status of the mode bits.
The same problem occurs, and I came to the conclusion it's not easily solvable.

You should try doing this, and then you realize how many different ways there are for the RAM location to be modified, and you will most likely give up.

Another way to think of this is: you're trying to solve the halting problem.

A machine halts when it enters a halting state when given a set of inputs. The halting state is the state of the machine, e.g. the values in memory and registers.

If you can statically determine all possible values in a particular memory location, then you can solve the halting problem.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 16, 2013 2:00 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
TMorita wrote:
If you can statically determine all possible values in a particular memory location, then you can solve the halting problem.

If you can statically determine the few possible values one particular memory location takes by literal memory-set instructions in the code, then you can solve the jump vector enumeration problem, without bothering about the entire halting problem.

For this particular situation (detecting what values go into a jump vector), the modified values are generally LDA#lit,STA vec[+1] pairs setting the vector to a literal value at some point in the code. Of course, this admits that overly clever ways of updating the jump vectors would not be as tractable.

The point is, you do not have to solve the general halting problem. It is perfectly fine to have "unknowns" instead of taking all-or-nothing approaches about analysis. Partial knowledge based on constant propagation to determine which values various points in the program literally set to fixed address which are used as jump vectors are tractable. And yes, I have done this type of analysis with JVM byte code, but not 6502 code yet. Again, you do not have to solve everything in order to get an answer to one small part of a common usage. For this particular case, shallow analysis can cover many if not most practical cases.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 16, 2013 4:04 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
White Flame wrote:
TMorita wrote:
If you can statically determine all possible values in a particular memory location, then you can solve the halting problem.

If you can statically determine the few possible values one particular memory location takes by literal memory-set instructions in the code, then you can solve the jump vector enumeration problem, without bothering about the entire halting problem.


I just explained why you can't reliably statically determine the possible values in a memory location.

If you can solve this problem, then the halting problem would be solvable.
The halting problem is not solvable.
Therefore, this problem is not solvable.

The point is not to solve the halting problem. The halting problem is just a piece of the explanation.

Toshi


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 16, 2013 8:56 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
"Those who say it can't be done shouldn't stand in the way of those doing it"
Toshi: I know you're sure of yourself, but having put across your point, you don't have to keep hammering at it. The other person in the conversation may be wrong, or may be solving a different problem, or may know something you don't. Allow yourself to be heard, and believe yourself to be right, but please don't engage in endless debate.
http://xkcd.com/386/


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 16, 2013 7:33 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
White Flame wrote:
TMorita wrote:
If you can statically determine all possible values in a particular memory location, then you can solve the halting problem.

If you can statically determine the few possible values one particular memory location takes by literal memory-set instructions in the code, then you can solve the jump vector enumeration problem, without bothering about the entire halting problem.

For this particular situation (detecting what values go into a jump vector), the modified values are generally LDA#lit,STA vec[+1] pairs setting the vector to a literal value at some point in the code. Of course, this admits that overly clever ways of updating the jump vectors would not be as tractable.

The point is, you do not have to solve the general halting problem. It is perfectly fine to have "unknowns" instead of taking all-or-nothing approaches about analysis. Partial knowledge based on constant propagation to determine which values various points in the program literally set to fixed address which are used as jump vectors are tractable. And yes, I have done this type of analysis with JVM byte code, but not 6502 code yet. Again, you do not have to solve everything in order to get an answer to one small part of a common usage. For this particular case, shallow analysis can cover many if not most practical cases.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 16, 2013 7:33 pm 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
White Flame wrote:
...
The point is, you do not have to solve the general halting problem. It is perfectly fine to have "unknowns" instead of taking all-or-nothing approaches about analysis. Partial knowledge based on constant propagation to determine which values various points in the program literally set to fixed address which are used as jump vectors are tractable. And yes, I have done this type of analysis with JVM byte code, but not 6502 code yet. Again, you do not have to solve everything in order to get an answer to one small part of a common usage.


The Java case and the 6502 case are totally different cases.

Java was originally designed for downloading code to consumer devices. Therefore, the original intent of the Java bytecode instruction set was to be as secure as possible. The Java class loader has a bytecode verifier which performs static code analysis to ensure that only legitimate instructions are executed. The reason it can do this is: the bytecodes instruction set is specifically designed for this purpose.

The 6502 instruction set (like most instruction sets) has unrestricted jump instructions, such as JMP (ZP), RTS, etc. These instruction causes problems for utilities which statically analyze the control flow of the code because the branch target is unrestricted. Therefore, Java bytecode instruction set only contains restricted jump instructions.

There are basically three classes of branch instructions in Java bytecode instruction set:

1) Branch target is explicitly specified and restricted

Theese instruction explicitly encode the branch target address in the instruction either via a PC-relative offset or by an absolute address. This set includes if_eq, if_ne, goto, goto_w, jsr, jsr_w, tableswitch, etc.

2) Branch target is implicitly specified and restricted

The target address is not encoded in the instruction but is known. This set includes: invokestatic, invokevirtual, etc.

3) Branch target is arbitrary but restricted

The instructions in this class include return, lreturn, ret, etc. These instructions look like unrestricted jumps, however this loophole is closed by the bytecode verifier by only allowing the caller's address to be stored to the stack or in the variable.

Also see: http://gallium.inria.fr/~xleroy/publi/b ... on-JAR.pdf

So, there are no unrestricted jump instructions in Java bytecode.

Some other notable differences between Java bytecodes and 6502 instructions which complicate
static code analysis:

1) The Java class file format has semantic information about code and data, which
simplifies static analysis. 6502 file formats are usually binary dumps and have no
semantic information.

2) Self-modifying code is not allowed in Java bytecode, which simplifies analysis.
Self-modifying code is allowed on 6502 machines.

So the Java static code analysis problem is completely different from the 6502 static code analysis problem.

White Flame wrote:
For this particular case, shallow analysis can cover many if not most practical cases


My experience differs. I wrote a 65816 disassembler which follows control flow to discover the state of the M and X bits. Most of the cases I found were not amenable to shallow analysis.

Also, it's not very useful IMHO to cover "many if not most" of the cases.

Imagine you have a C compiler, and the optimization algorithm is theoretically faulty and only produces correct code for "many if not most" of the cases. If you compile a million lines of code with this compiler, then the chance of triggering the problem cases multiple times is fairly high, and then you spend days/weeks/months to determine why the code doesn't run correctly.

White Flame wrote:
The point is, you do not have to solve the general halting problem. It is perfectly fine to have "unknowns" instead of taking all-or-nothing approaches about analysis.


Given the instruction sequence:

JMP (zp)

where the contents are not known, and a 4 megabit NES cartridge such as Final Fantasy III, the number of possible branch destinations is 512K.

So, with your approach, you would have to analyze 512K entry points into the code and check if each one executes valid instruction sequences, and if so, generate the corresponding equivalent code for each one.

Another complication: The JMP (zp) could jump to code in RAM. This code would not be there in a ROM image, so the static analyzer would need to determine this. So not only would the static recompiler need to analyze and possibly generate 512k entry points into ROM code, it would also need to detect code copied from ROM to RAM.

Yet another complication: Probably the most common reason why code would be copied from ROM to RAM in a NES environment is to use self-modifying code, such as a fast memory copy routine. So not only would the static recompiler need to analyze and possibly generate 512k entry points into ROM code, and detect code copied from ROM to RAM, it would also need to generate all the possible self-modifying code variations which could be generated in RAM.

Yet another complication: The static recompiler detects code copied from ROM to RAM, and it detects four byte writes to the code in RAM (memory copy), but it cannot determine all possible values which can be written to the self-modifying code. So in this case the number of possible permutations is 2^32 or about 4 billion possible permutations, so the static recompiler would need to generate 4 billion routines, each representing a possible permutation of the self-modifying code.

Yet another complication: The self-modifying routine in RAM copies from source to dest address. However, on a cartridge with an MMC such as MMC1 or MMC3, the memory map changes depending on the contents of the bank register. From the NESdev wiki, for an MMC1:

CPU $6000-$7FFF: 8 KB PRG RAM bank, fixed
CPU $8000-$BFFF: 16 KB PRG ROM bank, either switchable or fixed to the first bank
CPU $C000-$FFFF: 16 KB PRG ROM bank, either fixed to the last bank or switchable

So the additional complications are:

1. The static recompiler must understand the MMC on the cartridge.
2. It must track the state of the bank register to determine the correct memory mapping
3. For every memory read operation, the static recompiler must determine the correct bank,
and read the data from the correct ROM bank. If the correct bank is unknown, then the static
recompiler must generate multiple copies of the current routine, one for each possible bank
configuration. So instead of 2^32, the number of possible variations becomes 2^(32 +
number of MMC bank configurations).

So IMHO it's not practical to allow unknowns in a 6502 static recompiler.

Toshi


Last edited by TMorita on Fri Aug 16, 2013 8:31 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 16, 2013 8:26 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Good post, Toshi. Very interesting.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 17, 2013 12:16 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
TMorita wrote:
The Java case and the 6502 case are totally different cases.

The Java case was analyzing the derived types which would be stored in a particular variable, then called through a particular interface-dispatching point in code. There is a limited enumeration of types that end up there, and the setting of them was direct. This is quite similar to analyzing a 16-bit integer in 6502 that is manually used as a jump vector.

It has nothing to do with selfmod or type-safety or verification or branch targets internal to Java functions, it's purely a problem of "what values/derived-types can this location, which is directly set from literals in other places in code, hold?" which is a very limited problem.


Quote:
So, there are no unrestricted jump instructions in Java bytecode.

Would you consider the invoke* instructions or calling a Runnable.run() to be unrestricted? Those are the closest equivalent of a JMP(ind) in a working 6502 program. The java manifestation of the exact same analysis problem is on a higher level than on the 6502.


Quote:
My experience differs. I wrote a 65816 disassembler which follows control flow to discover the state of the M and X bits. Most of the cases I found were not amenable to shallow analysis.

And I agree with that. Note that enumerating the values of a vector does not determine when those values hold, just that they are some values that are put there, thus is not state tracking except the tiny bit between the LDA and STA instructions which write to the vector, to propagate the constant. Your flag example is one of state tracking across large swaths of code, which is a much larger problem.


Quote:
Imagine you have a C compiler, and the optimization algorithm is theoretically faulty and only produces correct code for "many if not most" of the cases.

That is the wrong way of looking at it. A better analogy would be the optimizer only performs optimizations where it can statically determine certain features, and not optimizing when it can't.

Analysis brings forth known, known-complete, and/or known-incomplete information about code which adds to the system's understanding of it. Any individual bit of analysis doesn't necessarily bring all truth about a system on its own, and if you deny the use of any input indicators beyond "proven fact" then you end up with a simplistic and underpowered analysis. Too many people are simply afraid of or don't know how to work with "unknown"s and "might be"s, which is disappointing when it comes to fields like this which really depend on them. Converging unknowns is a late-stage process; resolving types and enumerations is earlier and iterates based on dependencies and new information.


Quote:
Given the instruction sequence:

JMP (zp)

where the contents are not known, and a 4 megabit NES cartridge such as Final Fantasy III, the number of possible branch destinations is 512K.

So, with your approach, you would have to analyze 512K entry points into the code and check if each one executes valid instruction sequences, and if so, generate the corresponding equivalent code for each one.

Then you have not read my approach at all. I agree that what you say would be a dumb approach, however such brute-force analysis can be useful (see link at the end of the post).

My approach does the equivalent of the following:
  • Note what locations JMP(ind) instructions use as vectors. (JMPs found in known code areas)
  • Note any stores into those vector locations. (Stores found in known code areas)
  • See if basic constant propagation gives that store a known value.
  • Associate low & high byte sets together if they easily chain.
Setting vectors, as well as self-modifying opcodes, are generally unparameterized pieces of code and the vast majority are done with a simple LDA #imm, STA vec pair. This is a very common pattern for setting vectors in the 6502 (I'm no expert on idiomatic 65816 code or how they're set there, and the complications that the banking registers bring). If there are other writes to that zp location which are not a determinable propagated constant, then that would be a known-incomplete enumeration set, or the vector could be used for other purposes (LDA zp[+1] would be a good indicator towards that).

Again, the reason this is tractable is because programmatic calculation of jump vector values (and selfmod opcodes) on 6502 is rare, and would very obviously break this method, but in ways that note that unknown stores are being done to the vector. Forth interpreters are an example where this wouldn't work, but the common dynamic dispatch via indirect jump used in assembly programs would.


Quote:
Another complication: The JMP (zp) could jump to code in RAM. This code would not be there in a ROM image, so the static analyzer would need to determine this. So not only would the static recompiler need to analyze and possibly generate 512k entry points into ROM code, it would also need to detect code copied from ROM to RAM.

That is correct. The issue of knowing what's in code space is different from the issue of determining the values of an enumeration which only has a few possible values that are all set literally. However, if the zp location is only used as a jump vector, static analysis could at least determine the jump vector values themselves even if the code in that place in RAM is unknown.


Quote:
Yet another complication: Probably the most common reason why code would be copied from ROM to RAM in a NES environment is to use self-modifying code, such as a fast memory copy routine. So not only would the static recompiler need to analyze and possibly generate 512k entry points into ROM code, and detect code copied from ROM to RAM, it would also need to generate all the possible self-modifying code variations which could be generated in RAM.

An interesting feature there is that LDA #imm, STA zp remains the same readable code regardless of where in memory it is, so a pattern match could hunt for those occurrences (also LDY/STY and LDX/STX obviously), though they're not guaranteed to be part of actual executed code and don't take advantage of constant propagation analysis.

Again, getting clues and insight to possibilities is a whole lot more valuable than simply saying "It's not exact, don't do any of it ever!" You need to be able to teach the computer how to try things and put partial information & indicators together into real results.


Quote:
Yet another complication: The static recompiler detects code copied from ROM to RAM, and it detects four byte writes to the code in RAM (memory copy), but it cannot determine all possible values which can be written to the self-modifying code. So in this case the number of possible permutations is 2^32 or about 4 billion possible permutations, so the static recompiler would need to generate 4 billion routines, each representing a possible permutation of the self-modifying code.

Self-modifying opcodes is generally done via LDA #imm, STA abs patterns as well, where abs is a known-executing point in memory. Hunting those is exactly the same as hunting literal jump vectors: They're a limited set of enumerated, literal values being stored into one suspect location. Self-modifying operands are the equivalent of RAM variables (in 6502 where operands are always bytes or words, unlike other archs where operands are bitwise embedded into code words).


Quote:
Yet another complication: The self-modifying routine in RAM copies from source to dest address. However, on a cartridge with an MMC such as MMC1 or MMC3, the memory map changes depending on the contents of the bank register. From the NESdev wiki, for an MMC1:

Yes, like relocating code in the first place, that is a separate problem of figuring out where your executable code ends up.



Quote:
So IMHO it's not practical to allow unknowns in a 6502 static recompiler.

I think this is your problem here. I never said that I'm writing a static recompiler, nor think it's a generally tractable problem (for many other issues than just the semantic discovery). I'm just saying that a few analysis features that people say are impossible aren't. You said "[JMP (x) where X is a RAM address] obviously causes huge problems for static recomplation", and I disagree that it's a huge problem. There are other huge problems though, like the banking, code copying, speedcode/JIT generation, code decompression, CPU & other chip state tracking, etc. :)

Here's a great talk from last year (slides at the bottom of the page) about doing analysis of binary code where the instruction set & encoding is unknown! There's a ton that you can do with analysis that can reveal some very definite indicators about what's what in code that seem "impossible".


(also apologies if some of this is confusing, as I'm still dealing with medical issues. Just ask and I'll try to explain better)

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 17, 2013 2:02 am 
Offline

Joined: Sun Sep 15, 2002 10:42 pm
Posts: 214
White Flame wrote:
TMorita wrote:
The Java case and the 6502 case are totally different cases.

The Java case was analyzing the derived types which would be stored in a particular variable, then called through a particular interface-dispatching point in code. There is a limited enumeration of types that end up there, and the setting of them was direct. This is quite similar to analyzing a 16-bit integer in 6502 that is manually used as a jump vector.


The 6502 case the Java case are totally different.

In the Java type case, the worst-case behavior is the same as the unoptimized case, basically you need to execute generic code.

In the 6502 case, the worst-case behavior is much worse, because the jump target is potentially every single instruction in the program.

WhiteFlame wrote:
TMorita wrote:
So, there are no unrestricted jump instructions in Java bytecode.

Would you consider the invoke* instructions or calling a Runnable.run() to be unrestricted? Those are the closest equivalent of a JMP(ind) in a working 6502 program. The java manifestation of the exact same analysis problem is on a higher level than on the 6502.


You need to read the actual definition of the invokestatic opcode here:

http://en.wikipedia.org/wiki/Java_bytec ... n_listings

The invokestatic opcode is followed by two bytes representing a 16-bit index into the constant pool.

The index is constant, because it is encoded into the instruction, and Java does not allow self-modifying code. The constant pool contains constant values. This is why it's called a constant pool.

Therefore, the invokestatic opcode is not an unrestricted branch. It can only branch to one particular method, and not arbitrary address calculated at runtime.

If you read the rest of the invoke* opcodes, such as invokevirtual, invokedynamic, etc the other instructions are encoded the same way.

As I mentioned before, the Java bytecodes were specifically designed to allow static code analysis by the bytecode verifiier in the class loader. The Java bytecode case is not the same as the 6502 case.

WhiteFlame wrote:
Tmorita wrote:
Given the instruction sequence:

JMP (zp)

where the contents are not known, and a 4 megabit NES cartridge such as Final Fantasy III, the number of possible branch destinations is 512K.

So, with your approach, you would have to analyze 512K entry points into the code and check if each one executes valid instruction sequences, and if so, generate the corresponding equivalent code for each one.

Then you have not read my approach at all. I agree that what you say would be a dumb approach, however such brute-force analysis can be useful (see link at the end of the post).

My approach does the equivalent of the following:
[list][*]Note what locations JMP(ind) instructions use as vectors. (JMPs found in known code areas)
[*]Note any stores into those vector locations. (Stores found in known code areas)
[*]See if basic constant propagation gives that store a known value.
[*]Associate low & high byte sets together if they easily chain.


This algorithm trivially fails if the JMP (x) instruction is dynamically generated in RAM.
This algorithm trivially fails if the stores to X are dynamically generated in RAM.

Constant propagation and value range propagation only work well if the code flow is nice and neat, like from a structured language. Hand-written 6502 assembly code does not necessarily have an orderly code flow.

There are no "known code areas" on a Nintendo cartridge. The program ROM contains both code and data, and it is unknown which sections are code and which are data until the code is executed or traced.

WhiteFlame wrote:
Setting vectors, as well as self-modifying opcodes, are generally unparameterized pieces of code and the vast majority are done with a simple LDA #imm, STA vec pair.


You are only considering the easy cases, and failing to consider the difficult cases.
There are nearly an unlimited number of ways to set a vector to a particular value.

WhiteFlame wrote:
Again, the reason this is tractable is because programmatic calculation of jump vector values (and selfmod opcodes) on 6502 is rare, and would very obviously break this method, but in ways that note that unknown stores are being done to the vector.


You're making huge assumptions based on your own experience.

The way many games are written on the C64 is all the code logic runs during the interrupt by setting jump vectors during the interrupt. So there are probably hundreds of jump vectors called per second during game execution.

Many of the Nintendo NES programmers were ex-C64 programmers since the basic system architecture was simillar (character graphics and sprites). So many of the Nintendo NES code was written in the same style.

WhiteFlame wrote:
An interesting feature there is that LDA #imm, STA zp remains the same readable code regardless of where in memory it is, so a pattern match could hunt for those occurrences (also LDY/STY and LDX/STX obviously), though they're not guaranteed to be part of actual executed code and don't take advantage of constant propagation analysis.


Again, you are considering only the simplest case.

WhiteFlame wrote:
Self-modifying opcodes is generally done via LDA #imm, STA abs patterns as well, where abs is a known-executing point in memory.


Again, you are considering only the simplest case.

WhiteFlame wrote:
TMorita wrote:
]So IMHO it's not practical to allow unknowns in a 6502 static recompiler.

I think this is your problem here. I never said that I'm writing a static recompiler, nor think it's a generally tractable problem (for many other issues than just the semantic discovery).


I never implied that you were writing a 6502 recompiler. The original poster Andrew Fachat asked if it was feasible to do a static recompilation of 6502 machine code from NES ROMs. I replied to him. Then asserted that this was feasible, so I replied to your posts.

WhiteFlame wrote:
I'm just saying that a few analysis features that people say are impossible aren't. You said "[JMP (x) where X is a RAM address] obviously causes huge problems for static recomplation", and I disagree that it's a huge problem.
...


We will have to agree to disagree, then.

Toshi


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: