I chose Robotron 2084 as a target for reverse engineering because I play it reasonably well and have very fond memories. An added advantage which might come in handy at a later point in the project is that Robotron is a well-researched game. There is 6502 source code available for the Williams coin-op system (where the game was officially launched) as well as for the Atari7400 and other platforms - - but not the Apple II.
From reverse engineering projects in other domains I know that it is never recommended to try to understand a target system completely. This rule certainly applies here, because
a) I basically don't know anything about the 6502
b) I owned an Apple II in the 80s but never understood the hardware.
c) I don't have a clue about game design
Apart from those pretty damning constraints, Robotron is a pretty big game, with a lot of stuff happening on screen. I have to be prepared that the inner workings of the software are pretty arcane and convoluted.
The starting point for the project was James Tauber's Apple II emulator (http://jtauber.com/applepy), written in Python. At this point I should add another one:
d) I am a Python newbie
Python knowledge is needed for a different project, expected to start towards the end of 2019. I think it is easier to learn a language with a good project and right now I can't think of any project being a better one than reverse engineering a 6502 game!
Reverse engineering generates a lot of data which has to be collected and correlated as easily as possible, which maybe makes the choice of using Python a bit more sensible. Note that I am not interested in a high fidelity recreation of Apple II artefacts, nor execution speed - - to achieve both I use the excellent https://en.wikipedia.org/wiki/AppleWin.
I had to port ApplyPy to Python 3, but else nothing else was needed to do a "first light" with running Robotron from the ram image.
The last couple of days I did the following:
- * read relevant sections of the Apple II Reference Manual (Woz, 1979)
* understand 6502 opcodes and addressing modes
* collect execution traces
* collect info on memory locations (leap from/to info, touch counts etc.)
* disassemble code with automatic labels and other leap info (who calls what)
* intercept self-modifying code
* link the emulator to Excel (via https://www.xlwings.org/
* save/load complete state of 6502, memory, as well as Apple II display and softswitches
The immediate goal is to map the execution flow for this very first part of the game. So far that has turned out considerably more difficult than expected, mainly because of "subroutines" PLA'ing the return addresses from the stack (i.e. mixing JSR and JMP), also PHA'ing different addresses and doing RTS more as a jump than as a return from subroutine. The whole idea of statically analysing the 6502 code seems a bit flawed, or it has to be at least complemented with runtime info.
Still, it has been possible to identify a bunch of "compact" subroutines: a contiguous stretche of executable 6502 code, leaped into the top by 1 or more JSRs, exit via normal RTS, no jumps from anywhere else into the middle of the stretches - - what I would call a "subroutine" in the normal sense. Call trees from those compact subroutines are directed acyclic graphs, which in turn will help me understand the execution flow.
The next step will be to establish which stretch of the code reads and writes certain memory locations, together with the particular addressing modes. Currently I am far too weak in 6502 to try to understand the code itself, just by reading it. I need more hints on what a stretch of instructions does. Of particular interest are the indirect-index / index-indirect instructions, because they shed light not only on the organisation of the overall memory but also on the meaning of zeropage addresses.
At this point I am looking for general advice on reverse engineering 6502 code.
- * How to generate sensible call trees for code which uses RTS as tabled jump facility?
* Are there reverse engineering tools for 6502 code?
* Are there any good books or other documention about reverse engineering 6502 out there?
* What are common pitfalls when analysing 6502 code?
* What are structures to look out for?
* Is there value in trying to identify blocks of instructions and grouping them into macros, to speed up the understanding?
I intend to come back to this forum topic from time to time and add info on my progress. The project could very well be too ambitious for me, but I am looking forward to the challenges on the way.