The JavaScript version is a "tracing" disassembler, meaning it tracks a set of addresses to disassemble, beginning with a given start address. It reads the instruction there, calculates literal jump/branch destinations, and whether the following byte also should be an instruction. These are added to the set of addresses to disassemble from, and it loops until that set is empty. It's sort of like a flood-fill algorithm. Instructions like BRK, RTS, RTI, STP or unknown opcodes don't produce any new traces, but callers like JSR create both a trace into the subroutine and a trace to the next instruction after JSR, which is where the RTS would go anyway in the common case.
So this sort of disassembly handles straightforward code pretty well, but doesn't understand any indirection or related trickery at all. The worst cases are always-taken branches, JSRs that don't return, and code which is always interrupted without returning. In these cases, the tracer believes the next bytes are still instructions to be executed, and wanders off into la-la land through random data. Cases like JMP (abs) are a lot cleaner, because it simply throws up its hands and doesn't trace any new code, conservatively leaving more unanalyzed bytes that can then be manually poked through with new human-provided traces.
The data structures holding the disassembled instruction information do support "embedded" instructions, like BIT $00a9 holding a LDA #$00, but if that's traced it's currently not displayed well. I might leave handling that properly to the server.
The server-based disassembler incorporates a form of "symbolic execution", which is basically analyzing the hypotheticals, unknowns, timings, reachability, and runtime state of what's what as the execution might pass through a block of code. Its memory footprint is ... very large. In practice, my task is to encode what I would do as I look at a block of code, and get the machine to go through the same "thought process". Reverse engineering is what got me into AI in the first place, because it's very laborious for a human to do, and after a certain amount of time doing the same tedious mental tasks over and over, you start to get ideas for how to specify and automate it. After all, isn't offloading work from humans the entire point of computers?
I consider it fundamental requirements for the server version to disassemble
straightforward implementations of fixed jump tables, indirection, PHA PHA RTS dispatch, track mode state such as 65816 width bits and banking, and others. By modeling my own understanding of what I think I track mentally to discover these sorts of things, hopefully it will have the tools to grok more complex cases. It's a big unknown as to how much that knowledge application will compound, but it's far more likely to be more useful than trying to discover such tricks via looking at rote code patterns. Plus, it's already a good way there in terms of discovering & tracking much of the data it needs for these issues.
A more internal requirement is that it needs to understand that in the C64 case for example, $D000 doesn't address just a single specific byte, but can address ROM, RAM, I/O, etc, so I don't handle addresses as naked 16-bit numbers internally, even in the client. That's been the biggest challenge in all of this: Creating a substrate which is capable of representing the information I need to extract from it. Most disassembler architectures simply tag addresses with a fixed set of fields, and that simply doesn't cut it, especially in older systems' code where there's banking, loaded overlays, no cleanly separated code/data areas, etc.
Of course, any analysis system is defeatable; I just think when some
portion of deeper disassembly is possible, it's stupid to just give up because 100% solutions are impossible, as many have done. It should seek to analyze the tractable cases, and potentially make more complicated cases tractable as it matures.
But I also know it's a "put up or shut up" situation in terms of believability, especially considering the history of others' claims and attempts, and I've not put up yet. That's part of why I want the JavaScript version out there, to be a human-driven version of triggering and representing these features (representation is especially challenging and is very incomplete), and to be practical without the spectre of the larger claims. It has the facility to interact with and render the output of the server, but can be fed by its own simpler disassembler as well.