Reverse Engineering Cartridge to Source, Need Feedback
Posted: Sun Mar 29, 2020 6:29 pm
I've finally returned to my "learning 6502 via disassembly and reverse engineering a cartridge game" project. My goal is to learn 6502 Assembly and be able to reliably reconstruct working source code to stand in for sources that have been lost to time.
This has been my approach in a nutshell:
Getting this far required some research on my part, because it wasn't immediately obvious that the first byte after the header was anything other than an instruction. My first clue that I had done something wrong was that several instructions in things stopped making sense and I started hitting invalid opcodes.
Even though this is tedious as hell, I'm learning a lot just by doing it. While I could use the likes of radare2 to disassembly this for me, going through this by hand is teaching me a lot. In an effort to better understand what I'm seeing I'm also reading C64 Machine Language for the Absolute Beginner by Danny Davis. Some of Mansfield's stuff is next.
My questions are:
This has been my approach in a nutshell:
- Get a hex dump of the file that is one byte wide. I used xxd for this.
- Use a counter with bash and sed to pace the memory offset (file location + start of execution) in a comment off to the right.
- Prefix all hex with .byte instructions, so that 64tass can theoretically assemble it as-is.
- Create a center column of comments that is my manual disassembly.
- When I'm done (I'm about 1/10th of the way through) awk the file so that my disassembly and offset comments are the the only thing left.
- Build that.
- Find and fix errors until the whole thing assembles, runs, and has the same hash as the original file.
- Trace all of the branches and jumps and figure out what's really going on at those memory locations.
- Go back and make macros and functions where it makes sense to do so.
Code: Select all
.byte $78 ; sei ;$8009
.byte $20 ; jsr $ff84 ;$800A
.byte $84 ; op ;$800B
.byte $ff ; op ;$800C
.byte $20 ; jsr $ff87 ;$800D
.byte $87 ; op ;$800E
.byte $ff ; op ;$800F
.byte $20 ; jsr $ff8a ;$8010
.byte $8a ; op ;$8011
.byte $ff ; op ;$8012
.byte $20 ; jsr $ff81 ;$8013
Even though this is tedious as hell, I'm learning a lot just by doing it. While I could use the likes of radare2 to disassembly this for me, going through this by hand is teaching me a lot. In an effort to better understand what I'm seeing I'm also reading C64 Machine Language for the Absolute Beginner by Danny Davis. Some of Mansfield's stuff is next.
My questions are:
- In terms of reverse engineering process/workflow and bearing in mind that my end product is reconstructed source code, am I doing this right?
- At a certain point, I want to do this to fairly large multi-disk C64 programs. At what point (if any) is it best to partially automate disassembly?
- Is anybody else doing this sort of thing in the C64 space? It seems more common among the Atari 2600 crowd.
- Is there a good "6502 ASM style guide" of sorts anywhere? I want the end result to be readable so that even a very fresh student of ASM can begin to follow it?
- Is there an unofficial "standard library" of sorts for 64 projects that will work with 64tass? I know a lot of the A2600-types use a consistent set of headers for dasm.
- What's with the first instruction being sei? I very dimly recall reading about this as a common thing to do for cartridges, but I don't know why this is done.