First off, I never claimed that every use of JMP(ind) was analyzable through that particular algorithm. It's for analyzing what targets exist when indirect jumps are for an enumerated program flow state selection, which is the most common occurrence of JMP(ind) in the software that I've seen (including interrupt handlers and simple software indirection).
Stack return address manipulation is a separate field of indirection study, and thus this post has nothing to do with my prior specific point. I am not interested in 6502->Java conversion, as the idioms are too different to bother with automation. However, what you ask here is still possible. This PUTSTRI style is also very familiar to Commodore coders, as a routine like this is even in the C128's ROM, there called PRIMM (print immediate).
The final equivalent Java bytecode translation would put the character data into a parameter to not bother mucking with (or, more particularly to
avoid the invalid operation of mucking with) the frame state.* Depending on implementation decision, and what types of callers are used, this parameter could either be an array (or string) or an index into a global memory array. The bytecode would thus end up idiomatic. I'm assuming an integer type to hold the memory offset, to avoid signedness issues, and I avoid using char type for non-char values, so 16-bit wrapping has to be inserted. The JIT's smart enough to collapse a lot of it down anyway.
(* = If I remember correctly, the inter-method return address is not even visible to the bytecode language. Intra-method subroutine return addresses are technically 'visible', but immutable from the bytecode.)
Code:
; Locals: 0=this, 1=int-param, using a passed index into a global 6502 memory array.
loop:
; Get the current byte as memory[ptr+1], as ldy#1,lda(DPL),y does with its pointer
; Performing static transforms on the 6502 code first might eliminate the various +1/-1 offsets
getstatic <global memory byte array>
iload_1
iconst_1
iadd
baload
dup
; Increment pointer & mask
iinc param1, 1
iload_1
ldc <0xffff>
iand
istore_1
; Test the dup'd memory value
ifeq exit
;output it, platform-specific transform of jsr CHAROUT -> System.out.print(char)
getstatic <fieldref "java/lang/System" "out">
swap
i2c
invokevirtual <methodref "java/io/PrintStream" "print" "(c)v">
goto loop
exit:
pop
; The final increment can be elided, as its value is no longer used
return
(Note: I know most java bytecode compilers wouldn't use dup/swap/pop as I do to retain the memory byte, but whatever, I'm hand-rolling this.
Plus, static transforms of the 6502 code can avoid some of the +1 offset & postinc duplication; IOW, it's equivalent to inc DPL ... ldy #0, lda (DPL),y without the ora. However, my goals in analysis are on the original machine code, not recompilation.)
Now, the question is how to get to that point?
Loop detection works on PSINB - PSIX1 (of course, contingent on no additional entry points from external code), and variable propagation of .A and the Z flag through the lda/ora/beq instructions indicate the loop exit condition, so the meat of the function is relatively straightforward.
The code does a 16-bit pull off the stack, which is a nice clean transfer. Now, that means what happens is that in cases where PUTSTRI are entered is dependent on what's on the stack. For each case of JSR PUTSTRI, we take the knowledge of both the routine and the JSR's location (since PC+2 is pushed on the stack) to transform the JSR itself. Each other entry of PUTSTRI needs known values on the stack, or else it's intractable.
The bug in the ointment is that the pointer is transformed between the JSR PUTSTRI and the JMP back. In this case, the loop (increment until exit condition mem[ptr+1]==0) and final increment are the totality of transforms applied. This is simple enough to apply to the data per calling point, and is the meat of the transform which allows conversion to "normal" languages:
Code:
From:
jsr PUTSTRI
.byte "Hello", 0
lda #00 ; code afterwards
...
To:
jsr PUTSTRI' ; PC+2 is known to be pushed onto the stack
[jmp _cont] ; this is not an injected instruction, but an annotation on the JSR that the return location is overridden. Since some JSRs don't return, etc, they need annotation capability anyway.
.byte "Hello", 0 ; return address transformation is applied knowing that the pointer starts at *-1 here
[_cont:] ; the location of _cont comes from applying the transform
lda #00 ; code afterwards
...
This is all tractable transformation. The only "difficulty" is creating an ontology which can express the pointer transformation sufficient for this case (ie, includes loops), and that's squarely in the known realms of symbolic AI, lambda calculus, and compiler intermediate representations.
Now, this still can fall apart if the data in the .byte statement is self-modified, or if PUTSTRI is or can be further self-modified from the outside, or if the transformation applied is not tractable given the analysis tools available, etc. However, the system can detect when such cases
exist, but not what their outcomes would be, making it known-intractable.
The halting problem does not dictate that every single program (+input) is intractable to complete analysis; it only dictates the impossibility of a single system to analyze all possible programs. This approach simply builds up tractable scenarios for the subset of techniques that are actually encountered in the wild, not every possible theoretical program.
It would theoretically be possible to extend analysis to hypothetical, parametric RAM states to cover some of the problems I listed, but that's still pretty far out there.
To reiterate, this sort of problem must take advantage of whole-program analysis, not just looking at snippets of code in isolation. The strategies that are useful per piece of code depend on what is known from elsewhere, and the completeness of that knowledge.
Also to reiterate, I'm more about reverse engineering than recompilation, but in the case of recompilation there still isn't lenience for allowing incorrect code to be generated. Any final transformations are constrained to no known-unknowns, including disallowing specific transformation on the presence of completely separate known-unknown sections of code which might call or modify the code in question in different ways. Of course, these are tunable decisions depending on your "get
something running" vs stability tradeoffs, and the platform-specific conventions (which on the 6502 are pretty much completely unconstrained, except maybe the decision to analyze NMOS illegal opcodes or just consider them non-code to avoid wasting time).
Also to reiterate further, the original fundraiser's presentation didn't include anything like this, and I do not hold that the technology he presented was sufficient to meet his goals, and that he seemed unaware of the limitations of his chosen approach. I do not state that development such that I present is complete in and of itself, but acknowledge that it needs human involvement to add general knowledge & strategies when it encounters cases it currently deems intractable with its tools, and adding software-specific knowledge & strategies is generally not a good idea. The concepts of "jump through a manipulated stack return address" vs "jump indirect to put a branch in the state flow" vs "jump using pha pha rts" etc are in my opinion general enough idioms to be described to a system for good reusability. Plus, the complexity of creating something capable of representing the required information to handle really bad situations can become human-intractable, and/or approach the halting problem.