sark02, fantastic info, thanks for taking the time. I really appreciate your help. This is so motivating! I will certainly come back to your post many times.
I have peeked into the game for a couple of days now. My understanding is, well, limited - - even more so because I've never coded the 6502 so far. I'd still like to give you some feedback on how I work my way towards the "big picture".
Quote:
My approach to the big-picture disassembly was:
My disassembly follows the execution path, i.e. I decide on the basis of actually executed instruction where the opcode, the operands and the data are.
At this point in time my disassembler creates the following file:
Code:
.L0 < $2dfd (jmp)
$4000 a2 3f LDX #$3f
$4002 9a TXS
$4003 a9 2e LDA #$2e
$4005 a2 0c LDX #$0c
$4007 a0 08 LDY #$08
$4009 20 9d 44 JSR .closed01
.L4 < $44c1 (rts)
$400c a9 3a LDA #$3a
$400e a2 06 LDX #$06
$4010 a0 01 LDY #$01
$4012 20 9d 44 JSR .closed01
.L5 < $44c1 (rts)
$4015 a2 3f LDX #$3f
$4017 9a TXS
$4018 20 2f 51 JSR .closed13
.L21 < .L20 (rts)
$401b a9 ea LDA #$ea
$401d a2 14 LDX #$14
The disassembler assigns labels to every point where we leap to (branches, jsr, absolute/indirect jmp, rts). Consequently, I can show the addresses from where I leap to a label w/ "<", together with the type of leap instruction. RTS which do not match a calling JSR (e.g. stack was manipulated) are treated as an own type of leap. The branch instructions are marked depending if the branch was sometimes, always or never executed.
I then take the code and group consecutive instructions. The code (seen as a list of consecutive instructions in memory) is broken into "tiles" at instructions which leap away or instructions which are leaped to from elsewhere. Tiles are then connected automatically using rules, e.g. for sequential execution, normally returning JSR, branch instruction which never branch but which could etc.
Those "stretches" of tiles are a first pretty good approximation of subroutines. I collect the JSR in each stretch and collect to which other stretch/subroutine they leap and return.
So far I have emulated and disassembled only a couple of seconds into the intro of the game. The disassembly therefore has a lot of "holes", because certain parts of the intro have not been executed yet. Still, the game intro runs, so there has to be some structural logic, even if the coverage of code by the execution path is not complete. I'd rather take it slowly in the beginning
Quote:
With the known-function-list you can create a call-graph. There will likely be many call graphs - they won't all neatly form one big tree. The call-graph lets to see structure. There will be many shared nodes on the graph. These help identify utility functions.
I know of course what's happening in the game, but I am not yet able to relate it to the disassembly. No surprise there; a steep learning curve is to be expected. The structural information by executed the code is enough, though, to get a first glimpse on the call-graph.
Attachment:
tree0.jpg [ 30.58 KiB | Viewed 1925 times ]
The attached graph shows the flow between the stretches of the code, identified by the label of the first instruction of the stretch. The numbers show the count how often the stretch was entered (from the sample execution path). => The helper subroutines are clearly discernible, even with this very crude analysis.
The call-graph as well as the underlying stretches and their tiles are automatically generated from the memory map and disassembly information. The resulting call-graph covers most of the execution paths through the code. There are few locations which the algorithm cannot stitch together correctly, e.g. the cyclical calling in the right sub-call-graph. It offers enough information, though, to guide the next steps.
Quote:
The hardest thing with reverse-engineering is figuring out intent. Not _what_ something does, mechanically, but _why_.
I cannot even imagine how hard this is going to be. It will take ages until I will make good progress into the game. The info from your post and the posts of the others in this thread is a fantastic help, thanks!