6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Jun 16, 2024 5:00 am

All times are UTC




Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Sun Nov 15, 2020 1:20 pm 
Offline

Joined: Fri Nov 16, 2018 8:55 pm
Posts: 71
Once I've disassembled a game and successfully re-assembled it, I'm no longer sure what the best git workflow is. In an ordinary project taking the common "feature branch" approach makes sense once you've got the essence of your project committed to code. But, reverse engineering games to maintainable source code does not work that way in my experience.

If you're adding features you've already completed through disassembly of the game and understand its ins and outs. Fixing bugs or adding a feature the original designers didn't have time to implement before going to market, e.g, high scores is the tail end of the process.

At the point you have your initial re-assembly working it's likely that you still have whole segments of .byte $nn sleds because the exact boundary between data and instructions is not (yet) clear. The process of going from minimal re-assembly to fully commented code isn't exactly linear the way I thought it would be. Yes, you start with a known vector and work your way though it. You may not fully understand the code on first pass either due to (a) deficiencies in your understanding of assembly or, just as likely, (b) the code is incomprehensible without more context elsewhere in the code. There is also the dreaded option (c) non-obvious junk bytes that are neither data nor code that somehow make it into the final product.

None of this lends itself to an obvious git workflow. What is your approach?


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 15, 2020 1:33 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10827
Location: England
(I'm just going to pitch in here, without having experience... my feeling is that git, or other revision control system, gives you a series of snapshots of progress, running from the first raw hex dump to the final fully annotated source. It's not clear to me that you need much by way of branches, or even labels, so long as you commit reasonably often and use meaningful commit messages. Then you can, for example, reassemble a series of versions hoping and intending to get the same binary every time, and if you don't you can bisect to find the first point of divergence. As you have a full history, you don't need to keep anything which isn't current in the sources: you can cut things out, rather than commenting them out, or conditionally assembling them.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 15, 2020 2:11 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
I'm feeling that my thoughts here may not be all that applicable to the kind of work the OP is doing, but I'll post them here anyway just in case there's anything helpful in them.

My viewpoint might be slightly different because I generally reverse-engineer system and BASIC ROMs and the like, rather than games, so I generally have no need to get new builds of the code working soon after starting, if at all.

But I feel that it's pretty awkward and to be reworking large chunks of .byte directives or misaligned disassembly in source code, not to mention handling new or renamed labels for those; I'd rather just run it back through the disassembler. So I generally just stick with adding all my material to the diasembler's info files (and re-running the disassembly) until I'm reasonably convinced that I've got the vast majority of the code worked out to at least the level of what's code and what's data. (This can be tricky in some instances; for example the National/Panasonic JR-200 BIOS ROM is full of code and data that's never called or used in the BIOS ROM itself, but only used from the BASIC ROM. Also, sometimes labels are never used anyway, such as with routines that take a function number and then look up the address to call from a table.)

That said, this approach is not without its own pain. I had to develop techniques to quickly re-disassemble and switch between the info files and the disassembly output (a script that watches for changes to the files and automatically re-runs the disassembly, and setting my Vim buffer for the disassembly output to automatically reload when the disassembly output changes) and also write some post-processing to make the output format of f9dasm a bit nicer. It's also not as nice editing f9dasm's info files as it is editing source code directly, though it's not been quite annoying enough yet for me to just go write my own disassembler.

The Git workflow for all of this is simple enough, though; do a bit of hacking and commit. I don't use branches and the like nearly as extensively in disassembly as I do when writing or modifying my own code, though I do regularly use branches local to my repo for short periods of time when I'm working something out. (These generally live for only a few hours, at most, and never leave my local repo.) I do commit the disassembly output, even though it's an object file, which means I need to take slightly more care when doing a commit to avoid it getting out of sync, but this is necessary so that I can just point people to the repo on the web (in GitHub or GitLab or whatever) to read the disassembled code, rather than making them pull the repo and run the disassembly themselves. (As an example of the source info file and the disassembly output you can have a look at Bn-BIOS/B1.info and Bn-BIOS/B1.dis. Note that that's 6800 code.)

BigEd wrote:
Then you can, for example, reassemble a series of versions hoping and intending to get the same binary every time, and if you don't you can bisect to find the first point of divergence.

Going back through commits to figure out where your assembly output diverged from the original binary generally isn't a thing so long as you have a script that does the assembly and compare for you and you run on every commit before pushing it. I suppose it can be handy to be able to go back to specific points in history and find out where you went wrong in assigning labels or adding comments or whatever, but honestly I don't do that much when reverse-engineering, instead just fixing things in the current version and moving on. However, if you end up with more than one developer working on the code simultaneously, Git will be invaluable in helping you merge your changes together. (Sadly, I have always ended up working on this stuff alone, to date.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 27 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: