WFDis - 6502/65C02 interactive disassembler

GARTHWILSON · Post by **GARTHWILSON** » Tue Feb 14, 2017 8:08 pm

White Flame wrote:

(Also, is my selection of "65C02" and "65C02 + bit ops" a good way to label it, or would "65C02" (with bit ops) and "65SC02" (without) be better? I don't know whether "65C02" on its own implies having RMB/SMB/BBx in most people's minds.)

G65SC02 was GTE's (later CMD's) designation. Synertek did not have the bit instructions either, but did not use the "S", and called theirs SY65C02. Rockwell added the bit instructions, and were initially the only ones, and called theirs the R65C02. Initially WDC did not have the bit instructions either, and they called theirs the W65C02S. When they added them, they added a "B" at the end of the number, resulting in W65C02SB, showing up in my 1992 data sheet, with the explanation on the last page; but later they said the bit instructions would be included in all of them, so there was no need to differentiate by using the "B", and went back to W65C02S, which is what my 1996 data sheet has. This is all from data books in my collection.

BigEd · Post by **BigEd** » Tue Feb 14, 2017 8:12 pm

(OK, that's a bit more complete and complicated than my model!!)

White Flame · Post by **White Flame** » Tue Feb 14, 2017 8:33 pm

Okay, I'll leave my CPU model descriptions as-is, based on being more ISA-descriptive than relying on chip numbers.

I got the NOPs in, and disassembled that 65816 binary as below. I was a little taken aback that disassembly didn't stop at the STP instruction, but the control flow does branch around it. It doesn't take too long to wharrgarbl across multi-byte NOPs into a BRK, which stops that asm trace.

Dr Jefyll · Post by **Dr Jefyll** » Wed Feb 15, 2017 3:12 am

White Flame wrote:

In the 65SC02, are the addressing mode & timings for $x7 and $xF the same as the bit operations of the full 65C02, or are they 1-byte, 1-cycle NOPs? I suspect the latter

Oops -- I overlooked the newest posts when preparing this. But FWIW here's confirmation re CMD's G65c02-a and for NCR's NCR NCR65c02.

barrym95838 · Post by **barrym95838** » Wed Feb 15, 2017 5:50 am

White Flame wrote:

... It doesn't take too long to wharrgarbl across multi-byte NOPs into a BRK, which stops that asm trace ...

I had to look up wharrgarbl, and got a good chuckle. So, your disassembler doesn't just disassemble, but does some instruction simulation and heuristics on the raw binary, trying to gain helpful insights as to what the code may be trying to do? That's pretty bad-a$$!! What does it "think" about some of the tricky stuff, like in-line literals after a JSR, or branching to the operand of a CMP or BIT instruction?

Mike B.

BigEd · Post by **BigEd** » Wed Feb 15, 2017 9:41 am

How about tables of return addresses which are PHA, PHA, RTS?

White Flame · Post by **White Flame** » Wed Feb 15, 2017 4:50 pm

The JavaScript version is a "tracing" disassembler, meaning it tracks a set of addresses to disassemble, beginning with a given start address. It reads the instruction there, calculates literal jump/branch destinations, and whether the following byte also should be an instruction. These are added to the set of addresses to disassemble from, and it loops until that set is empty. It's sort of like a flood-fill algorithm. Instructions like BRK, RTS, RTI, STP or unknown opcodes don't produce any new traces, but callers like JSR create both a trace into the subroutine and a trace to the next instruction after JSR, which is where the RTS would go anyway in the common case.

So this sort of disassembly handles straightforward code pretty well, but doesn't understand any indirection or related trickery at all. The worst cases are always-taken branches, JSRs that don't return, and code which is always interrupted without returning. In these cases, the tracer believes the next bytes are still instructions to be executed, and wanders off into la-la land through random data. Cases like JMP (abs) are a lot cleaner, because it simply throws up its hands and doesn't trace any new code, conservatively leaving more unanalyzed bytes that can then be manually poked through with new human-provided traces.

The data structures holding the disassembled instruction information do support "embedded" instructions, like BIT $00a9 holding a LDA #$00, but if that's traced it's currently not displayed well. I might leave handling that properly to the server.

The server-based disassembler incorporates a form of "symbolic execution", which is basically analyzing the hypotheticals, unknowns, timings, reachability, and runtime state of what's what as the execution might pass through a block of code. Its memory footprint is ... very large. In practice, my task is to encode what I would do as I look at a block of code, and get the machine to go through the same "thought process". Reverse engineering is what got me into AI in the first place, because it's very laborious for a human to do, and after a certain amount of time doing the same tedious mental tasks over and over, you start to get ideas for how to specify and automate it. After all, isn't offloading work from humans the entire point of computers?

I consider it fundamental requirements for the server version to disassemble straightforward implementations of fixed jump tables, indirection, PHA PHA RTS dispatch, track mode state such as 65816 width bits and banking, and others. By modeling my own understanding of what I think I track mentally to discover these sorts of things, hopefully it will have the tools to grok more complex cases. It's a big unknown as to how much that knowledge application will compound, but it's far more likely to be more useful than trying to discover such tricks via looking at rote code patterns. Plus, it's already a good way there in terms of discovering & tracking much of the data it needs for these issues.

A more internal requirement is that it needs to understand that in the C64 case for example, $D000 doesn't address just a single specific byte, but can address ROM, RAM, I/O, etc, so I don't handle addresses as naked 16-bit numbers internally, even in the client. That's been the biggest challenge in all of this: Creating a substrate which is capable of representing the information I need to extract from it. Most disassembler architectures simply tag addresses with a fixed set of fields, and that simply doesn't cut it, especially in older systems' code where there's banking, loaded overlays, no cleanly separated code/data areas, etc.

Of course, any analysis system is defeatable; I just think when some portion of deeper disassembly is possible, it's stupid to just give up because 100% solutions are impossible, as many have done. It should seek to analyze the tractable cases, and potentially make more complicated cases tractable as it matures.

But I also know it's a "put up or shut up" situation in terms of believability, especially considering the history of others' claims and attempts, and I've not put up yet. That's part of why I want the JavaScript version out there, to be a human-driven version of triggering and representing these features (representation is especially challenging and is very incomplete), and to be practical without the spectre of the larger claims. It has the facility to interact with and render the output of the server, but can be fed by its own simpler disassembler as well.

BigEd · Post by **BigEd** » Wed Feb 15, 2017 5:07 pm

It's a great project!

White Flame · Post by **White Flame** » Sat Jul 08, 2017 6:23 am

Okay, it's been a while since the web version has been updated, and that's because implementing undo/redo properly has been quite difficult. I ended up rewriting the entire job/task/commit architecture at least 3 or 4 times in the interim, honing in on something with the fewest downsides. But I've finally come upon a solution that I think is good to go with.

So you no longer have to worry about side effects from exploring new assembly traces, or emulating suspected relocation code, or toggling data areas to graphics modes to see what they look like, because you can now safely undo whatever you do with a stock ^Z keystroke. And redo it with ^Y if you want.

I'm first working on things that other disassemblers don't do, because that presents a ton of unknowns to the design & architecture of the project. Some of the more mundane features like data table formatting and cross reference lists are being pushed out a bit to get the bigger problems solved first. That does make it tougher to get people to try it out, but still, any use & feedback is highly appreciated as it builds out, and it's a very usable tool already.

I'm going to be presenting this at the CRX commodore expo in Vegas at the end of the month, and I believe they're going to be recording video. A hands-on demonstration for something like this is always more useful than just typing about it, so look forward to that, too.

http://www.white-flame.com/wfdis/

White Flame · Post by **White Flame** » Sun Sep 17, 2017 3:14 am

I have been making steady progress on the project. The main reason I'm pinging this thread is that a stray .htaccess setting partially broke access to the files so it wasn't being served right. That's been fixed now, if you happened to hit that issue.

I'm happy to have "embedded" labels working now, which are rendered as offsets to a base label.

In the example, I create placeholder bytes for a zeropage vector at $80, and after labeling it any reference to $81 turns into "vector + 1" because it's contained in that labeled element. I also assign a label to the high byte of an absolute address parameter, by following the link from the STA and assigning the label there, and it displays a separate relative label.

(600KB gif)

Having added URL loading support, you can play with the same file here: http://www.white-flame.com/wfdis/#!load=demo.prg I beat this thing into submission with all the many edge cases, but feel free to flush out any more!

You should be able to put a label either at the beginning or in the middle of an element, and it'll show "the right thing" both for the label and for any references to the data.

There can now be multiple graphics characters/sprites per line, though it's still a little unwieldy. There's an example with a custom font from one of my minigames at the bottom of this disassembly: http://www.white-flame.com/wfdis/#!load ... .prg.wfdis

Keystrokes have changed around, and are reflected in the Help tab on the upper right. I do need to get away from individual keystrokes into a more nested command structure, as there's going to be way too many commands vying for the same letter.

BigEd · Post by **BigEd** » Sun Sep 17, 2017 6:35 am

looking good!

whartung · Post by **whartung** » Sun Sep 17, 2017 3:27 pm

Yea, looks great!

Windfall · Post by **Windfall** » Sat Sep 23, 2017 12:52 am

Interesting project, WF.

Mostly because I did two similar pieces of code, one rough and ready interactive 6502 disassembler, and a quickly discontinued code analyzer (for multiple CPUs and environments). The interactive disassembler starts off with everything code, and allows you to scroll though it specifying what is data, and what is special construct (like JSR followed by data, e.g. to be printed). The analyzer would have employed emulation techniques (with known starting points, e.g. based on known binary types having these at x,y,z) to separate and structure code and data. Ironically, I used the rather crude interactive disassembler extensively, and never really developed the much fancier analyzer.

I should have a good look at what you've done here, it looks very useful (and adaptable ?).

White Flame · Post by **White Flame** » Sat Sep 23, 2017 5:10 pm

For practicality, a "dumb" manual tool certainly gets the job done, as long as the human is willing. But for my own research desires, I certainly want more comprehensive machine analysis done, same as your analyzer push.

In this project, I'm basically working on what conclusions I want to display, and letting either the human or the AI back-end drive it. So they're all components of the same project. With the client-side "Human mode" having a tracing disassembler & 6502 emulator thrown in for assistance, it's viable for real work instead of just being a UI testbed.

I don't quite consider it "released" yet, and haven't given it any announcements anywhere other than here. I also haven't done comprehensive documentation or demos because things are in major flux. There's still a lot of features like cross-reference lists, demarcating functions, handling data better, etc, that I want to get finished before I "release" it. Basically, bringing it up to a point where it offers a solid alternative to baseline IDA. But I did reverse engineer the brunt of a mystery 4K all-code PET graphics drawing ROM which talked to unknown graphics expansion ports in around two hours using WFDis as-is, starting with no knowledge of the entry points, so it's very useful even just with the interactive labeling and simultaneous multiple-file support.

http://www.white-flame.com/wfdis/#!load=cbm8000.wfdis <- the PET ROM for reference, might take a while to initially load.

Windfall · Post by **Windfall** » Sat Sep 23, 2017 6:38 pm

Resolving the binary into code and data is an interesting process. Except for specific cases, just an algorithm is not enough. And it's the human / algorithm interaction that can make resolution pretty quick and efficient. E.g. one could choose to throw the algorithm at it first, then edit by bits, or demarcate data portions by hand beforehand, to help it not go where there's no real code. The tools can make solving the puzzle far less tedious.

WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler

Re: WFDis - 6502/65C02 interactive disassembler