6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 2:05 pm

All times are UTC




Post new topic Reply to topic  [ 98 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next
Author Message
PostPosted: Tue Feb 26, 2019 8:35 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Cleverness in the very early stages of a game may be part of a copy-protection scheme, checking for example that it's been loaded at the correct address and verifying a simple checksum. Such things are easy to defeat by overwriting with NOPs, as long as they don't induce side-effects elsewhere, and this is often done to port them to environments that are almost, but not entirely, like the one originally intended.

I would advise looking at a "leaf function" first, perhaps one of the subroutines that does nothing or only simple things with the stack, and try to work out what it does. Especially you want to look at leaf functions that hit frequently at runtime (since you have traces, you can identify those easily). You might not get much of a high-level overview at first, but the gestalt will begin to emerge over time.

You could also statistically correlate accesses to memory areas by sections of code. Do this at the byte level for zero-page and device addresses, and the page level (high byte of the address) elsewhere, then drill down as required. For zero-page in particular, you might distinguish between addressing modes used for access, as indirect accesses signify use of that location as a pointer, while direct accesses may just be for temporary storage (but also for updating pointers). For example, this should help you identify code and zero-page pointers associated with the screen drawing process.

Also pay attention to subroutines that call into the ROM. Those routines will be documented in Apple literature and offer more clues as to what is intended by the game's programmer. Modify your tracer to log the parameters passed to (and maybe results returned from) these standard routines, while skipping the actual instructions making them up.


Last edited by Chromatix on Tue Feb 26, 2019 8:54 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 26, 2019 8:46 pm 
Offline
User avatar

Joined: Wed Aug 17, 2005 12:07 am
Posts: 1250
Location: Soddy-Daisy, TN USA
FWIW, the comp.sys.apple2 newsgroup is an excellent place for all things Apple II. Probably even better than the AtariAge forum (which can also be good).

Those guys really know the A2 game scene. Of course, for expert level 6502 information, you can't beat this place.

_________________
Cat; the other white meat.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 26, 2019 9:01 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
cbmeeks wrote:
FWIW, the comp.sys.apple2 newsgroup is an excellent place for all things Apple II. Probably even better than the AtariAge forum (which can also be good).

Those guys really know the A2 game scene. Of course, for expert level 6502 information, you can't beat this place.

It may very well be worthing pinging those groups to see if anyone has done this already, if for no other reason than to get a leg up.

I would focus on the largest, clearest chunks of code as I can to (ideally) eliminate them from the prospective code base of "unknown code". The more you know earlier, the better it will go, and it can very well help in understanding the complicated code.

One of the hardest, yet in many ways very important, things to suss out is the use of the global variables, since those will likely point to the important internal data structures. From there you can work on the code the manipulates those structures, and then the code above that.

You can also start at the inputs (I have no idea how Robotron worked on the Apple II), and go from there. What's reading the joysticks? What do they do with the data?

Lots of different approaches to poking the black box.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 26, 2019 9:12 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
A note about addressing modes - these can broadly be divided into three major categories on the 6502:

1: Implied and Immediate - nothing happens outside of the program bytestream. Arguably the stack instructions also go here, though their effect is also bounded.

2: Zero Page, Absolute, and Indexed. These are "direct" accesses; an address is read from the program bytestream, possibly modified by the X or Y index registers, and used directly to locate the operand. Indexing is always by the X register except for LDX and STX, which can index on the Y register instead. These are reasonably fast memory accesses, especially to Zero Page since one less byte has to be loaded.

3: Indirect modes. For JMP this is a non-indexed indirection through a vector located anywhere, and thus referred to by a 16-bit address. For all other instructions there is a pre-indexed indirect in which a zero-page address is modified by the X register to locate the pointer to the operand, and a post-indexed indirect in which the pointer in zero-page memory is modified by the Y register. These are quite slow accesses since the CPU must access several bytes of memory and perform pointer arithmetic just to locate the operand - however the flexibility is very valuable.

Self-modifying code might seek to increase speed by replacing the operand bytes of instructions to gain the flexibility of Indirect addressing modes without the performance penalty normally associated with them. Detect this by watching for writes to addresses that have previously been executed.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 26, 2019 9:25 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
fschuhi wrote:
Is there a consensus on which 6502 book is the best? In school we were pointed to Zaks' "Programming the 6502", but it's quite verbose. Any love for Scanlon's "6502 Software Design"?


I've started a thread, over here. Let's hope!


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 27, 2019 8:22 am 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
Thank you all for your advice! Really valuable, and motivating.

This is my plan:
    * Study 6502 assembler. I am not versed enough to read 6502, structures do not stand out yet. My reading list starts w/ "Assembly Lines" (Wagner 1982), because it's 6502 + Apple II centric, then "6502 Assembly Language Subroutines" (Leventhal 1982).
    * Study the Apple II. After browsing a couple of books I've settled on "Understanding the Apple II" (Sather 1985), together with the Woz-1979 reference manual.
    * Stick to the Robotron game intro. I'm in no hurry to get to the really difficult stuff. First I need to have a reasonably small workbench with simple and deterministic execution flow. In addition, from comp.sys.apple2 I learned about a patched version of Robotron with different intro screen elements and menus. This will be helpful to guide my understanding.
    * Derive a (graphical) call tree. Needed for both understanding what the leaves do, as well as "zooming out" to see the bigger picture. I'll use Graphviz to picture the tree.
    * Collect memory info for subroutines. One of the recurrent themes from your posts is the importance to understand memory access, both direct and indirect. I'll also try to devise ways to automatically identify global variables.

This should keep me occupied for a while. I'll resurface when I have something tangible to show.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 27, 2019 8:44 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Great - do keep us updated with progress!

(The Apple II disk system is another twisty maze with marvellous possibilities, but hopefully you can sidestep all that, if the program just loads once and doesn't use the drive again.)


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 27, 2019 1:36 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Can you post a binary of the specific Apple II program version you're using, so we can look at it as well?

Regarding WFDis, it is an in-progress front-end prototype with a simple tracing disassembler included. There's a smarter back-end server that does heavier analysis, but is waiting on new front-end features to support it better.

The biggest advantage I see in an interactive disassembler like this is the ability to rename things on the fly. Applying a label, then seeing all instances of the code using it turn from "LDA $81" into "LDA playerAnimFrameVector + 1" is a huge help in interactively exploring the codebase, even if the names are temporary and hold partial understanding (they can always be renamed just as easily). Plopping in comments and being able to follow address references via hyperlinks and back/forward browser history is really nice. Even with the relatively basic feature set in there now, I find this workflow incredibly quick.

Discovering code vs data is actually quite a small part of the work, though it's nice when it's automatic. Figuring out why a piece of code/data is in there and what it's doing is the real reverse engineering.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 27, 2019 8:35 pm 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
I've downloaded the version I use from https://www.myabandonware.com/game/robo ... Apple%20II

I pulled the image from the disk with CiderPress, it's ROBOTRON.BIN, to be loaded and started at $2DFD.

White Flame, I've dabbled a bit with your WFDis, it's really impressive, and easy to use. I actually took the feature of on-the-fly renaming as an inspiration for my workbench. That workbench runs in Excel, which might sound weird but it's actually a really nice free-form way to work with different kinds of data. Supplying labels is of course not as WYSIWIG as WFDis, but reasonably convenient for now.

Please keep in mind that I run the code in an Apple II emulator. This gives me a lot of ways to hook into the CPU and the memory. It also enables me to distinguish RTS from PHA/PHA/RTS, not by looking at the code but simply running it and continuously inspecting the stack.

Quote:
Discovering code vs data is actually quite a small part of the work, though it's nice when it's automatic.

With an emulator the execution path determines what to disassemble, of course. I wouldn't even call it "automatic", the disassembly is just a byproduct. Where it gets difficult, though, is to unentangle the execution flow itself. Which parts do what? I wouldn't be able to answer this question by tearing into the 6502 disassembly, because I'm simply not good enough in 6502. When I get more proficient then the WFDis frontend could actually become valuable for me.

From what I've gleaned from a comment in comp.sys.apple2, Robotron might not be easy to reverse engineer. From the few things I've learned so far, it doesn't look like intentional obfuscation. But already the intro section is really convoluted, with parts of subroutines are stiched together with JMP, there is some recursive calling, tabled RTS, self-modified code, moving around of data, code is executed in weird places.

I've been able to learn about all this by collecting runtime info and doing some first analysis on it. So far the algorithms for this have been adhoc. From the feedback I got in this thread I decided to focus first on "call trees" for the 6502. I'm currently working on a concept and first prototypes, but it'll take some time before I resurface. In the meantime it would be awesome if you shared some of your reverse engineering know-how, maybe even running your backend on the Robotron binary? :)


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 27, 2019 9:08 pm 
Offline
User avatar

Joined: Wed Aug 17, 2005 12:07 am
Posts: 1250
Location: Soddy-Daisy, TN USA
FYI, over on comp.sys.apple2, there is a user named qkumba who is an absolute legend in porting Apple II games from DOS to PRODOS. He has done literally hundreds. He might be a good resource on learning the tricks to Robotron's disk system.

_________________
Cat; the other white meat.


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 28, 2019 8:52 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
White Flame wrote:
Applying a label, then seeing all instances of the code using it turn from "LDA $81" into "LDA playerAnimFrameVector + 1" is a huge help...

Really good example: an operand byte might be best viewed as an expression, which you'll only be able to conjure up as you get to appreciate the code. It's not too unusual for an index not to be zero-based - if the valid values for an index are 10-25 then the index might well be used to point a little below the data in question.

Another 6502 feature: in most situations, zero page addresses wrap around within zero page. Therefore a negative byte does act as a negative offset. (Being very clear about byte values and their interpretation either as signed or unsigned numbers, or as an enumerated type, or a character, or other things, could be an important watershed.)

Likewise, the idioms for performing arithmetic or comparisons on two-byte or multi-byte values, the idioms for using the carry bit as a boolean variable or as an accessory in multi-byte shifts, are important to understand. They are not exactly macros, but they are idioms larger than single instructions.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 02, 2019 1:14 pm 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
BigEd, your points show how important it is to understand the code not only broadly but also on the scale of what you call "idioms". I think that's also what White Flame alluded to, and I totally agree. Furthermore, understanding idioms just by reading them is much harder than using them in own software. That was a good point of Whartung in one of the earlier posts: it's easier to understand things by making them, not just reading about them. I'd like to spend my time efficiently, so I hope that there is some balancing point between such a "micro" approach, and the more macro of looking at the "whole picture".


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 02, 2019 2:01 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I'm a bit of a reader of code, myself! Studying some of Bruce's code, or Woz', or maybe Tom Pittman's TinyBasic, might be rather illuminating.

But if you want to write code, here's an idea: write a routine to clear a 1000 byte block of memory, the address being known at assembly time. You can probably write it three or four ways, and doing so might itself be a good exercise. In fact, this might be a good idea for a thread...

... Edit: here's the new thread.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 02, 2019 7:36 pm 
Offline

Joined: Tue Nov 10, 2015 5:46 am
Posts: 230
Location: Kent, UK
I've had a background project over the past decade of disassembling the original Williams Defender code. I tend to work on this about 3 or 4 half-days a year, when I'm on vacation - between doing other stuff. So although a decade sounds like a long time (I think it might actually be 12 or 13), I've only just begun. It's not a serious thing, just a little fun when I'm absolutely completely bored.

Defender was written by Eugene Jarvis, the same guy who wrote the original Robotron 2084. The main game loop iterates over object lists: linked-lists of data structures which have references to sprite objects, "think" (behavior) functions, timers and other things (TBD!). Sprite objects reference pixel data, and have pointers to draw and erase functions. Then there's the think functions for the on-screen objects, input, overall game logic, etc.

The 6502 version of Robotron is likely going to be structured similarly, even if it's an independent reimplementation by a different programmer. This "object oriented" style is a pretty natural way of writing games with many different "actors" that have their own lifespan, behavior, and graphics/animation.

My approach to the big-picture disassembly was:
- Start with a two-pass disassembly that creates labels for all the trivial branch/jump destinations. This identified the 'direct-call' subroutines (called via JSR) and the intrafunction logic (control flow / loops).
- Make a list of all the JSR destinations. This is the "known function list".
- For each RTS instruction that isn't followed by a known function or an intrafunction label, take the address of the next byte and look through the binary for (LO-BYTE, HI-BYTE) references to that address. This becomes the "possible function list". If there a multiple bytes between the RTS and the next known-function, search for those too. The locations of the (LO, HI) references are "possible function pointers". Keep those.
- For each "possible function", disassemble it and see if it looks like garbage or an actual function. If you know 6502 then it's pretty easy to tell the difference.

One difference between 6809 and 6502 is that 6809 natively handles 16-bit values, so (LO, HI) pairs are often directly stored in contiguous memory addresses. When I used to write small games on the 6502 (back in the early 80s) I would spread object structure fields over arrays indexed by X. So, for example:

Code:
struct object {
    struct object *next;
    uint8 xpos;
    uint8 ypos;
    uint16 thinkfunc;
};


In 6809 this would be a 6-byte object (2 bytes for the 'next' pointer), whereas in 6502 I might have structured it like this:

Code:
obj_next:     .ds 128   ; support 128 objects
obj_xpos:     .ds 128
obj_ypos:     .ds 128
obj_thinkhi:  .ds 128
obj_thinklo:  .ds 128

obj_head: .ds 1    ; head index

obj_runloop:
    lda   obj_head
obj_loop:
    tax
    ;  call the think function
    lda   obj_thinklo,x
    pha
    lda   obj_thinkhi,x
    pha
    rts
objloop_next:    ; think function JMPs back to here
   lda   obj_next,x
   bne   obj_loop
   rts


If 6502 Robotron structures things like this then finding these (LO, HI) pairs might be a challenge. You can find loops like the above example to find the arrays, and then adjust your analysis accordingly.

With the known-function-list you can create a call-graph. There will likely be many call graphs - they won't all neatly form one big tree. The call-graph lets to see structure. There will be many shared nodes on the graph. These help identify utility functions.

- Using a graphical utility, locate the sprite patterns for the program. I wrote my own utility for this, by studying how Defender draws its sprites and therefore what order the data had to appear in memory. Different sprites are different widths and heights, so the utility was interactive I could move though memory and adjust the x and y strides. Find all the sprite data and record the start address of each pattern.

- Find all the draw and erase functions. Defender did not use a double-buffered screen, rather it would erase an object from its old position just before drawing it at its new one. The draw and erase functions in 6809 are pretty neat as they use two stack pointers to efficiently move memory in blocks.

- Reference to the draw/erase functions should be in your (LO, HI) function pointer lists, so with that you should be able to find the sprite descriptors (data structures that describe a sprite, what it looks like, how wide and high it is, and where its draw/erase functions are).

- Find the object lists. Objects in Defender come in different sizes and are allocated and free using utlity functions. Want a new alien? Call the allocate function for a "type 1" object, and then fill in its fields. The object comes to life on the next iteration of the main loop.

- Find the hardware manipulations. These are reads/writes to hardware registers, which you find out by studying the Apple II reference manuals. Every time there's a register read, understand what that does. Every time there's a write, know what it does. It is changing a color register? Is it starting a timer? Is it clearing an interrupt? Is the read getting a joystick value? A key value? Find all these and try to understand their enclosing function.

- Find the interrupt handlers. If the Apple II uses interrupts then find the vector and start trying to figure out its big-picture. Interrupt functions end with RTI, but the Apple II ROM may take care of that and the interrupt might be invoked via a software vector. Go read that vector and see where it points to.


Being able to BREAK into an emulator makes a lot of this easier as you can see the state of the machine while the game is actually running. For Defender, the MAME emulator was invaluable in this initial study as the game uses ROM paging, and just figuring out which banks are the game proper vs. the menus and diagnostic pages wouldn't have been easy without prior knowledge.

The hardest thing with reverse-engineering is figuring out intent. Not _what_ something does, mechanically, but _why_. Why is the code comparing this to that? What's the significance of the literal value #$CE in this CMP? With an emulator you can change these values and see if you observe a difference. That's playing on hard mode, for sure... but you can do it.

I hope this gives you some ideas. Some people have a natural talent at reverse-engineering. I'm not one of them, and I'm sure I'll never make significant progress with Defender, but hey... everyone needs a hobby... even if it's only a handful of hours a year.

Good luck!


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 02, 2019 7:40 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Oh, that's a good point, sark02, the use of two tables for the high and low bytes of pointers (or other values) so the same index can be used for both bytes - that's a fairly common idiom, and not an obvious one.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 98 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: