6502.org

Posted: **Fri Mar 08, 2019 10:47 pm**

fschuhi wrote:

Thank you for putting in the time, that was a great intro into how to work with your tool. Very much appreciated.

I need to work more on, and document more about, WFDis. Any use of it automatically gives me a little kick to do so for its own purpose as well.

Quote:

The adhoc emulation with Shift-R is awesome. I didn't know that WFDis can do that. With regard to the task at hand, it was helpful to see how a subroutine address on the stack can be used to access data immediately after the JSR. That's an important idiom to know.

Yes. When it comes to either pre-run snapshots (static analysis) or post-run snapshots (emulation), it's important to understand there is a flow of behavior beyond just the byte dump you happen to be seeing at one time. There might have been intermediate states in the memory footprint before/after, and that requires knowing what the code is actually doing, and why. Just looking at 1 state of memory might not reveal everything.

Quote:

Prepending labels with L and S is sensible, so I have added this right away to my workbench. I would have expected that jumps were labeled with 'J', but you stick to 'L' here as well. Any particular reason why?

No particular reason. "L" was the default, then I added "S" (I used "Sub" originally, but ended up being annoyingly long on earlier pre-html displays). I agree that a prefix for subroutine, branch/jump code, and data accesses would be reasonable.

Quote:

I had originally lowercased mnemonics and addresses (like seen for x86). Most of the snippets on this forum are uppercase, though. In WFDis you decided to lowercase everything. What is your reasoning behind deviating from the common "style guide"? )

Old code written way back in the 1000s was often on platforms with only uppercase available, or only uppercase by default, so there was little choice, and it created an ad-hoc standard. For most actual modern-written code bases, I believe most of them use lowercase. However, when posting code snippets and stating things inline in text (like here talking about LDA label,Y), it visually helps to distinguish the code from normal labels & prose by using uppercase, as ye olde code did. Automated output sometimes keeps uppercase to distinguish it from human-written code.

There've also been studies back in the day of single-case text that lowercase is easier to read than upper, but was determined too disrespectful for certain names and titles which should be Proper Cased, so they went with upper-only. I tend to find it more readable in lowercase as well, when looking at full listings.

Quote:

I lack the skill of keeping the structure of even smaller stretches of code in mind.

In order to get any level of this type of work done, you need to be able to just look at a few instructions at a time to see what they do, independent of the code that surrounds it. Each piece performs some deterministic mechanical step, and doesn't necessarily require the full structure to understand. The small basic blocks that I split up in the video are only like 2-4 instructions long, and are pretty representative of what you need to determine the lowest granularity of documentable functionality. At that point, you should start looking at those small blocks as whole units that interact with each other, instead of always at the instruction level.

Quote:

For this reason I have put most of the work so far into automated structural analysis of the code. I do this in Python, because the emulator is written in Python, too. BTW Excel doesn't play a big role in the project, for me it's just a convenient notebook with additional intelligence and easy interface to the emulator and tracer.

I so far find structural analysis to be a bit of a red herring with 8-bit code. There is no ABI or anything that code needs to follow, except maybe calls to ROM routines. The control flow especially in games is convoluted for speed, not clarity, and most code takes shortcuts to make it easier to write as well. Timer or video-based interrupts can do weird mutations to code and data, because they need to be fast, and obviously don't fit in clean control flow graphs. Since there are so few registers, lots of transient state is passed around in memory, reusing memory locations for multiple purposes at multiple times.

So it doesn't make a lot of sense to me to try to map all of this hand-wrangled bit banging into some clean single model of execution. But that also depends on what you mean by "structural analysis".

Quote:

I think using a manual disassembler is compatible with my reverse engineering approach. I lean on the ideas in Don Lancaster's "Tearing Into Machine-Language Code". He advises against using tools to disentangle the code and rather advocates "Do the dull stuff yourself!". But his method also assumes that those who want to reverse engineer should be versed in 6502. A bit more automated structure discovery is necessary to help my learning.

As I mentioned above, I don't believe that the code/data separation is a significant portion of the work. Having static tracing or emulation traces really assists in those steps, though not without their faults (both can miss code areas, for instance), but it's relatively easy to deduce which portions of the data are code:

"Holes" of uncalled data surrounded by large code areas are often also code.
A9 is "LDA #xx" which very commonly starts code paths.
In my video example where one of the pointers went to an area with something like "00 00 10 20 30 ff ff ff" I assumed that it was not code, because it appears more structured like bitwise data.

With static tracing from some starting point (or emulation traces), it saves a ton of tedium, but the majority of time is still spent trying to glean understanding from the code, not the actual code/data separation.

What I find the most useful by far is giving names to things, especially variables and subroutines. Even when they're just guesstimates, once you name something and look at its various uses you can piece together a picture of what it's for, in a somewhat bottom-up fashion, but you need to focus on things that will reveal the most. There's a few good starting points to look at that can anchor some of this understanding:

Accesses to well-known I/O addresses (keyboard/joystick inputs, video registers)
Accesses to screen memory
Functions that are called from many places (usually indicates main loop or small utility functions)
Writes to known system or software vectors (which will generally point to code)
Calls into ROM
Initialization code can reveal the overall memory layout

These can help to name some of the primitives that the program is constructed from, which can start to make the overall structure more visible/readable.

Quote:

I'm going to continue working on the analysis over the weekend, let's see if I am able to share interim results. I would certainly consider transferring at least some of the automatically generated info to the listing in WFDis (e.g. caller/callee), so it would be nice if your next version re-enables inline comments. But that's just a nice to have, the tool is certainly powerful as it is. Thanks again for the help!

It can import label files, but not currently comments. The internal rewrite I'm still working on shouldn't change the current appearance or functionality much, but is necessary for future expansion and things like standalone and multi-line comments (right now all comments are tied to a single line of code as well). Cross-reference data is easily stored, but still remains on the TODO list for exposing graphically.

Posted: **Fri Mar 08, 2019 11:09 pm**

White Flame wrote:

fschuhi wrote:

I had originally lowercased mnemonics and addresses (like seen for x86). Most of the snippets on this forum are uppercase, though. In WFDis you decided to lowercase everything. What is your reasoning behind deviating from the common "style guide"? )

Old code written way back in the 1000s was often on platforms with only uppercase available, or only uppercase by default, so there was little choice, and it created an ad-hoc standard. For most actual modern-written code bases, I believe most of them use lowercase. However, when posting code snippets and stating things inline in text (like here talking about LDA label,Y), it visually helps to distinguish the code from normal labels & prose by using uppercase, as ye olde code did. Automated output sometimes keeps uppercase to distinguish it from human-written code.

There've also been studies back in the day of single-case text that lowercase is easier to read than upper, but was determined too disrespectful for certain names and titles which should be Proper Cased, so they went with upper-only. I tend to find it more readable in lowercase as well, when looking at full listings.

Slightly O.T.; but since it came up, I'll add this for those who don't have habits firmly formed yet.

In prose, the lines on the page exist because a book is more manageable than a mile-long ribbon with a lump here and there for pictures. In programming however, the separation into lines becomes very significant for visual factoring, and having ascenders and descenders (as lower-case does) blurs those divisions. Similarly, numerals 0-9 are always "capitals." Do not mix lower-case a-f into them in hEx nUmBErS! Write for example $3EA9, not $3ea9. Here on the forum where the non-code sections are in proportional spacing, when someone writes for example 3fff, in whatever font is used here, it initially looks like 3111 to me because the f's are so narrow. (Actually now in the preview, they're showing up even narrower than 1's, which is backwards.) It's no harder to type capitals with the caps lock on. (I do take it off for comments.) If you feel like the computer is yelling at you, then turn the font size down. Easy enough. People who look through my code have said they like my style.

Posted: **Fri Mar 08, 2019 11:21 pm**

GARTHWILSON wrote:

Here on the forum where the non-code sections are in proportional spacing, when someone writes for example 3fff, in whatever font is used here, it initially looks like 3111 to me because the f's are so narrow. (Actually now in the preview, they're showing up even narrower than 1's, which is backwards.)

In most proportional fonts, the digits are still uniform width, so a 1's character cell is still as wide as a 0's. Plus, kerning can mush 'f's even closer together. However, a lot of these textual display issues are why I chose HTML output to make it render very cleanly and legibly compared to most bitmap font environments that other disassemblers stick with, even if it means a quite large memory footprint.

(Also, I do know people have complained about the colors before, which were copied over from a black-background prior version. Those colors will be darkened in the future as well, for better contrast in the white-background version.)

Posted: **Sat Mar 09, 2019 12:25 am**

I actually spent some time last week making a proportional bitmap font - in which the uppercase A-F are purposely the same, constant width as the digits 0-9 (and thus slightly wider than most other capitals).

Posted: **Sun Mar 10, 2019 12:31 am**

GARTHWILSON wrote:

Do not mix lower-case a-f into them in hEx nUmBErS! Write for example $3EA9, not $3ea9.

To provide an opposing view: I prefer lower case hex over upper case, as it's easier for me to differentiate e and f (vs. E and F).

There's no wrong answer, just personal preference.

I learned all my 8-bit assembly (Z80, 6502, 6809) exclusively upper-case, but everything since has been lower-case.

Posted: **Sun Mar 10, 2019 12:39 am**

sark02 wrote:

I prefer lower case hex over upper case, as it's easier for me to differentiate e and f (vs. E and F).

Then how about a versus e, as in $3EA9 and $3ea9? I think personal preference is your key.

Posted: **Sun Mar 10, 2019 4:05 am**

GARTHWILSON wrote:

sark02 wrote:

I prefer lower case hex over upper case, as it's easier for me to differentiate e and f (vs. E and F).

Then how about a versus e, as in $3EA9 and $3ea9? I think personal preference is your key.

Those are all equally readable to me, as long as they're in a fixed-width font. I don't try to impose my preferences on others, and kind of "go with the flow". If I'm replying to a post with suggestions of my own, I usually make a cursory effort to match the style of the original source. If I'm coding for myself, I almost always use lower case if it's available, and I have recently switched exclusively to spaces instead of tabs for my own creations, but that's just how I roll.

Posted: **Sun Mar 10, 2019 11:57 am**

White Flame wrote:

Quote:

I lack the skill of keeping the structure of even smaller stretches of code in mind.

In order to get any level of this type of work done, you need to be able to just look at a few instructions at a time to see what they do, independent of the code that surrounds it. Each piece performs some deterministic mechanical step, and doesn't necessarily require the full structure to understand. The small basic blocks that I split up in the video are only like 2-4 instructions long, and are pretty representative of what you need to determine the lowest granularity of documentable functionality. At that point, you should start looking at those small blocks as whole units that interact with each other, instead of always at the instruction level.

This seems to be a common theme: think in idioms or building blocks which interact with each other. 6502 with its small number of statements takes up much vertical space in the disassembler listing. Scrolling takes time and destroys the immediate context. In order to trace what's happening mentally, one must therefore be able to see those small repeating patterns standing out.

My statement was meant a little differently, though. I still focus on just the first seconds of the intro. That's just 500 instructions which are of course not consecutive in memory but organised in "stretches" of different length in various memory locations. For the purpose of reverse engineering, even the best tools cannot replace the ability to combine those stretches into some form of spatio-temporal mental model of the code.

Quote:

So it doesn't make a lot of sense to me to try to map all of this hand-wrangled bit banging into some clean single model of execution. But that also depends on what you mean by "structural analysis".

"Structural analysis" is that part of the static analysis of the code that discovers how parts of the code hang together, at different levels of granularity. The success depends a bit on how "rational" the code is structured.

Most of the machine code reverse engineering nowadays is probably happening in the anti-virus space. I suspect that at least some of the virus code out there is heavily obfuscated, on top of the usual bit banging which is part of the territory when dealing with machine language. Still, specialists regularly break into that virus code uses a combination of a number of tools, using a combination of static and dynamic analysis.

It should be possible to attack an 8-bit game like Robotron in a comparable fashion. Most of them were handcoded by single developers who needed to keep a mental model of the whole thing in their memory. In this case that was originally Eugene Jarvis, a developer known for rational internal design of his games. Comparing the Williams Arcade and the Atari versions, they are different but seem to implement a comparable execution flow. I suspect that Steve Hays, who ported the game to the Apple II, used at least some of the original ideas. It should therefore be possible to derive quite a bit of what I call "structure" automatically.

Only a bit of the work can be automated, of course, but some of it can. If I've understood you correctly that's also what's happening on the server version of WFDis - - some automated discovery which complements the always needed interactive work with the code.

Quote:

a few good starting points to look at that can anchor some of this understanding:

My analysis is helped by the fact the the Apple II is a very simple system (e.g. no interrupts). The non-intuitive parts (especially hires graphics) are well documented. It's thus easy to know which parts of the code are responsible for graphics output. Another easy part of the Apple II are the softswitches for keyboard access and game controller input. I've already added the relevant softswitch addresses as labels. And Robotron doesn't reload anything from disk, so it's benign even from this point of view.

All the automated structure discovery is only necessary but far from sufficient for understanding what and especially why the code does what it does. I really like the 6502, though, so I will eventually manage to peek behind the one or the other curtain

Posted: **Sun Mar 10, 2019 12:12 pm**

I knew before that the people in this forum are really both knowledgeable and helpful, but to experience it first-hand is something different. - -

Many thanks to all of you who have contributed to this topic so far. I feel I have been making soo much more progress than if I had fiddled with the task on my own. Your posts cover a lot of breath and depth, they are really a great resource.

Congrats for having built such a community

Posted: **Sun Mar 10, 2019 5:41 pm**

Very nice to hear those thoughts Frank, thanks for expressing them! You're on a great journey, please keep up with questions and observations and progress reports.

(BTW I think perhaps the 'basic block' might be found in any code, which is to say straight-line sections in between entry points, branches, and returns. Backward branches are always good to see because they are often loops, and studying the setup and loop body can be quite illuminating.)

Posted: **Mon Mar 11, 2019 10:17 am**

Attached an Excel workbook with a disassembly listing ("asm") built from tracing the execution of a couple of seconds of the Robotron intro. The emulator tracks the leaps from and to instructions, then runs an algorithm to structure the code logically. I've implemented a number of suggestions I got from above posts, thanks for the inspiration!

As a complement to "asm" I've added "map" (condensed picture of the memory of the code) and "ann" (annotations to addresses). For reference purposes I have added the complete listing on "raw" (generated by a static disassembler). Not included in the workbook are the Excel macros which drive the workbench, so it's just a snapshot.

Some explanations to make reading the listing on sheet "asm" easier:

The addresses shown are consecutive in memory. This leaves big swaths of .byte data. Some of the data can be identified as ascii. These are further screens of the intro which have not been touched in this particular run. I intend to approach those further parts of the code in a piecemeal fashion, always trying to identify some structures before moving on with discovering of new stretches of code.
Most of the labels are auto-generated (S for JSR entry points, L for branches, J for JMP targets).
There are some manual labels which are supplied to the emulator before it starts, e.g. chromatix01 (multiplication routine as pointed out by chromatix) or copyPagesAtoY (explained in White Flame's video).
Some special labels show access to Apple II softswitches and ROM routines.
The comments contain a lot of '>' and '<' info. That's the addresses this particular instruction leaps to (>), or where other instructions leap from into this instruction (<).
Branch instructions can always branch (+), never (-) or sometimes (0). Branching always can create holes behind the branch instruction if we never get there during execution.
Blank lines are added automatically, at "sensible" points in the listing (as determined by the "tiles" of code, explained below).
"touch" shows the number of times an instruction was executed. This helps me to identify the small-scale structure because it makes loop stand out with higher touch counts. It is also neat to identify initialisation code which is only executed once.
"first" and "last" are cycle counts. The emulator does not store every step of the execution path (which would be excessive) but rather builds a static map from its action.

Apart from the labels and leap targets in the comments, the main work in terms of automatically structuring the code is done with combining vertical parts of the code into bigger logical entities:

Instructions are combined to "tiles". Conceptually, the listing is cut immediately behind a leap (JSR, RTS, JMP, Bxx) and immediately before the address where leaps enter. As a result, all small loops have their body and the Bxx on the same tile. Because tiles break the stream of instructions at points where the code leaps in or out, a tile is a local path of sequential execution. Tiles are indicated by a dotted gray line above their first instruction.
Tiles can be combined into "stretches". For example, a tile with a JSR at the end can be connected to the following tile if the latter has as its only entry point a leap back from the matching RTS. The execution path also traverses over tile boundary if a branch is not taken, but those tiles can be connected, too. There are a handful of rules which guide how to connect tiles to grow stretches.
Some stretches are "compact": They start w/ a tile only leaped to by (or or more) JSR, contain no jumps and only JSR to other stretches which return normally (i.e. normal stack behaviour). Such a compact stretch is marked with a solid red line in the listing, other stretches with a dotted red line.

Compactness is currently defined in a way that it is possible for a stretch to be compact even if it JSRs to a non-compact stretch. Still, compact stretches are very good candidates for regular subroutines. I am well aware that my definition of compactness has a serious flaw: A stretch can be marked as compact even if there are other stretches which leap into other tiles than the head of the stretch, or if grandchildren in the call order are non-compact. I will add this deeper stretch info as soon as I have an idea how to present it on the Excel workbench.

I currently need a lot of time to understand even superficially what a compact stretch does, mechanically. In order to speed up the understanding from a bird's eye perspective, though, I can of course just switch off the execution of a stretch by poking an RTS at the entry point. Repeatedly re-running the software in the emulator then shows which stretch does what. It took just a couple of minutes to identify the routines which print the stats (score and lives) on the right side of the screen, even though I don't know how the 6502 code achieves that.

In this way I am able to make progress and continuously engage with the 6502 machine language by looking at the stretches and the smaller tiles. As long as there is no way to add comments or blank lines manually, the tool is not useful for that kind of reverse engineering anyways. I haven't made up my mind yet how to implement it because I want to focus on the discovery of broader structures anyways, at least for now.

Visually, there is still a bit too much happening in the listing now. I will probably remove the tile separators, because the blank lines already do the trick. A lot of the leap to/from info in the comments will turn out to be not helpful for understanding the code. That's fine, I can always reduce the information when I start to "see" the branch patterns and idioms.

The list of stretches can be shown as a call graph (generated automatically):

Remarks:

The address or label shown is the address of the first instruction in the first tile of the stretch.
Compact stretches are shown as boxes, non-compact ones as ellipses. Some of the stretches are floating disconnected around. That's a nice TODO list for further refactoring, because it shows where the algorithm is not yet able to connect enough tiles to form meaningful stretches.
The arrow is not the call itself but rather pointing from one stretch to another one. Any JSR from inside any of the tiles of a stretch will be shown as just an outbound arrow and any leaps into the body of a stretch would show as an inbound arrow.
Consequently, the call graph is a condensed picture of the JSR action happening between the stretches.
The call graph does not show JMP yet.
I detect RTS jumps which are not balanced by a JSR. The RTS at $51CA is such a "RTS*" because we return to an address after the stash which contains the text to be printed, as explained above for atariPresents and doneAtari. The instruction is marked as RTS* on the "map" sheet, but I don't do anything yet with it.

With the total freedom one has over the stack and execution flow in general, the automated analysis will always fail in certain spots. Currently something like atariPresents is a degenerated stretch: just one tile with one JSR to showText, because the text to show is in the stash below the JSR. Knowing that this is a well-established idiom, the algorithm should be able to build a bridge over atariPresents / stash / doneAtari. This would

reduce the visual cluttering by eliminating the red dotted lines,
make the listing more organised with more solid red lines, and
would also help to cut down the call tree.

Furthermore, sometimes the algorithm tries to be too intelligent. For example, a part of the stretch detection algorithm is to connect tiles ending in an always-branch (+) with the following tile. The assumption behind this is that the branch instruction would fall through eventually. This is usually true, of course, but an SEC/BCS combination (twice in the code) does really never fall through by definition. This shows how progress can be made with the emulator:

identify a weird situation in the listing or call tree
help the algorithm manually to do the right thing
check the newly grouped listing and the call tree
implement the rule as part of the automated rule set

For other occurances the algorithm then already knows how to interprete the situation correctly, thus showing correct stretches and making it easier for me to understand what's happening.

The algorithm will change anyways as soon as I introduce post-execution disassembling, i.e. deviating from the rule "only use executed instructions". This will surface lone RTS like the one in $4EA6 and solve the problem of important parts of loop structures (i.e. the INC in $4C65, as pointed out by chromatix). There are also a number of '-' Bxx which never branch but obviously do at least sometimes. It is not possible to get complete coverage without recursively traversing code. That's a bigger change in the algorithm, though, I will have to plan a bit how to do it before changing the current algorithm.

A huge helper for understanding the 6502 details will be fine-grained LD/ST tracking. The issue is less how to do it in the emulator but rather how to constrain it. One of the reasons for this is that Python is simply too slow for doing this kind of on the fly data collection. But the main reason is that the amount of data generated from recording all LD/ST is massive which would actually hurt the understanding. There are certainly many design choices how to approach this conundrum and find a good balance - - any ideas are welcome.

Posted: **Mon Mar 11, 2019 10:45 am**

If there's an easy way for you to print to PDF and attach that, it would be helpful!

fschuhi wrote:

In order to speed up the understanding from a bird's eye perspective, though, I can of course just switch off the execution of a stretch by poking an RTS at the entry point. Repeatedly re-running the software in the emulator then shows which stretch does what. It took just a couple of minutes to identify the routines which print the stats (score and lives) on the right side of the screen, even though I don't know how the 6502 code achieves that.

That's a nice idea! The opposite of instrumentation, removing code.

Quote:

For example, a part of the stretch detection algorithm is to connect tiles ending in an always-branch (+) with the following tile. The assumption behind this is that the branch instruction would fall through eventually. This is usually true, of course, but an SEC/BCS combination (twice in the code) does really never fall through by definition.

Might be worth pointing out that skilled 6502 programmers will often be able to make an always-branch by using higher-order understanding of the preceding code. For example, knowing that something is never zero, or never negative, or that carry will always (or will never) be set by some preceding operation.

It's probably clear to you by now, but one aspect of the 6502 flags is that they are not updated by all instructions. Therefore, a branch being taken or not taken can be a consequence of something which happened several instructions earlier, possibly prior to a branch or even a call. So, there are four bits of state, in effect, which kind of act like short-lived variables. The Z flag is relatively often affected, the V flag relatively rarely. When the flag is affected prior to an RTS and then affects control flow in the parent routine, it's being used as a return code - but that's just one example. If you ever see PHP and PLP then of course that makes the flag values persist for much longer - perhaps around a complex section of code, or even around some subroutine calls.

Posted: **Mon Mar 11, 2019 11:16 am**

BigEd, I've saved the "asm" sheet as PDF to retain the colors, the other sheets as csv.

Posted: **Mon Mar 11, 2019 11:55 am**

Thanks! I notice very opcode-like byte values in the stretches of as-yet-undisassembled code. It might be worthwhile to apply some statistical test to see whether each block looks like code, or ascii, or other.

Posted: **Mon Mar 11, 2019 12:14 pm**

BigEd wrote:

Thanks! I notice very opcode-like byte values in the stretches of as-yet-undisassembled code. It might be worthwhile to apply some statistical test to see whether each block looks like code, or ascii, or other.

Yes, most of the .byte sections are code, which is also evident from the disassembler listing in the "raw" csv. That output was generated by a static disassembler, starting at $2DFD, the load address of the binary.

For a quick statistical test to check if a block of bytes s code or not I would just point the static disassembler to the first byte of the block, disassemble until the last byte and then check for any unknown ops. Optional checks might be multiple BRK or ORA, or any repetition of bytes >4 times. Are there any standard approaches?

6502.org

reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II