Looking for a disassembler and suitable assembler

Dennis · Post by **Dennis** » Fri Mar 10, 2017 2:10 pm

after trying endless hours different asm/dasmx/cc, I finally give up and ask you for help:
I have a 6502 binary (for testing purposes I'm using the cbm basic v2.bin, since its disassembly is very well documented) and want to run a disassembler
which output is then fed into an assembler and it should produce the exact binary (without manual changing any source lines). I do know the starting address (e.g. $A000 for the cbmbasic) but thats all. Windows binary (command line) preferred.
I totally failed on that task - either the disassmbler spits out some unknown statements for the assembler or I really don't know how to achieve that with the tool (e.g. with the cc65 package).

Could anyone help me please? Thanks, Dennis

Tor · Post by **Tor** » Fri Mar 10, 2017 2:35 pm

It's unlikely that you can disassemble a binary and expect the unmodified output to assemble correctly. The binary will probably include data too, and that won't disassemble into instructions. Unless you tell the disassembler exactly where the data is, it won't produce output you can assemble unmodified. But some disassemblers can take 'hint' files as supporting input, I believe. There was a discussion about disassembling code here in this forum a short time ago - it should cover the issues better than I can describe them.

BigEd · Post by **BigEd** » Fri Mar 10, 2017 2:40 pm

(See this thread for that recent discussion.)

Bear in mind that there are several different syntaxes, almost as many as there are assemblers, so you will need to find a suitable pair of tools.

White Flame's online disassembler is in active development, which is an advantage, if you can feed back any difficulties you have with it.

White Flame · Post by **White Flame** » Fri Mar 10, 2017 3:34 pm

This exact scenario is next up for my disassembler (WFDis). There's a ton of assemblers out there, so the main issue is connecting all the nuances of what can be held in the disassembled state to the input expectations of assemblers. Things like LDA $00FE, which is (for timing or selfmod reasons) encoded as absolute addressing instead of zeropage, need to be indicated to the assembler so it doesn't reencode it to use zeropage. For undefined NMOS opcodes, there are multiple mnemonic sets, or the assembler might not support them. Labels need to be carefully used in such a way that if somebody edits the .asm file to insert or remove instructions, the pointers don't all misalign. Text strings are nasty, especially on retro computers where they had their own character sets, and assemblers have weird conversions to handle text literals from ASCII .asm files. And then there's a bunch of little dumb stuff like "LSR" vs "LSR A" vs "LSR @" which I saw once.

But yeah, the notion of getting a binary to disassemble & reassemble (without touching anything) into the exact same byte-for-byte binary is certainly doable. There's just a ton of doable edge cases to cover, and of course the disassembler has to target a particular assembler very specifically.

Do you have a particular assembler that you prefer?

whartung · Post by **whartung** » Fri Mar 10, 2017 6:55 pm

White Flame wrote:

Things like LDA $00FE, which is (for timing or selfmod reasons) encoded as absolute addressing instead of zeropage, need to be indicated to the assembler so it doesn't reencode it to use zeropage.

Yea, that's a real trick. That will disassemble fine, I mean, why not, it says so in the code. But on assembly, it seems common for the assembler to assume that a zero page address is, in fact, zero page. You might be able to force this by something like:

Code: Select all

LDA ZPADDR
ZPADDR = $00FE

In this case, the assembler may have to assume that ZPADDR is a 2 byte value, since it's undefined at that point. (I honestly don't even know what mine would do with this, probably make it a normal absolute load).

But, that would take a particularly knowledgable disassembler to see that the absolute address might be confused with a zero page address.

It doesn't help that most disassemblers aren't designed to be compiled.

None of the other problems bother me though, really. The disassembler doesn't have to know about data blocks and such, it simply has to know there it's a valid instruction or not. Having instructions interlaced with a bunch of DB statements is, yea, ugly, but it should reassemble back ok. Same with undefined op codes, as long as the disassembler remains ignorant of them.

If the goal is semi-readable assembly that can be reassembled, that's not really a big nut. You can probably pick a favorite assembler that you like and whip up an adequate disassembler in a couple hours, really, with simple labels. It's the more time in to finding blocks and such like that is where the time goes.

Dennis · Post by **Dennis** » Mon Mar 13, 2017 8:01 am

Tor wrote:

The binary will probably include data too, and that won't disassemble into instructions.

I don't care about those data - if the disassembler can't produce code it should just print a "db 0x55" or whatever (that's one problem: some print "db", sometimes with "," sometimes only spaces allowed, some print "hex" and the compiler that worked very well till those data just accept the other way.

Quote:

Do you have a particular assembler that you prefer?

a working one

I really don't understand why the dis->asm->bin way won't be possible.

Just because there is the http://www.pagetable.com/?p=46 "Create your own Version of Microsoft BASIC for 6502" claiming "The source can be assembled into byte-exact versions" with the CC65 package (doesn't work for me though)

BigEd · Post by **BigEd** » Mon Mar 13, 2017 8:09 am

I'm sure it can be done, it's all about the details. Be patient, provide good feedback, and you'll probably get what you want.

White Flame · Post by **White Flame** » Mon Mar 13, 2017 12:47 pm

Unfortunately, there's quite a lot of "It didn't work!" without any further detail, which isn't really conducive to help. For instance, that pagetable link has a .zip with a make.sh that builds everything automatically. Did the built binaries not match expectations? Did the build itself fail? Is your environment set up for it, or did it look like a bug in their posted code?

It sounds like you tried da65/ca65, which is probably a bit complex, but did you try dxa65/xa65? Those both claim to create disassemblies that can be fed straight back into their respective disassembler, with dxa65 specifically saying it should be byte for byte, if you just want something quick and dirty. (I'm not writing a quick-and-dirty tool, so it's going to have to wait a bit longer.

)

Dennis · Post by **Dennis** » Mon Mar 13, 2017 4:17 pm

sorry, I was so frustated because a trivial task wasted that much time.

Just tried dxa/xa on freebsd:

Code: Select all

dxa65 -g a000 basic.901226-01.bin >basic.src

and

Code: Select all

xa65 basic.src

puts the starting address ($a000) at the first 2 bytes, can't find how to omit them

after running a

Code: Select all

dd bs=2 skip=1 if=a.o65 of=basic.bin

and then the cmp

Code: Select all

cmp basic.901226-01.bin basic.bin

says the file is identical.

Thank you! Thanks for the hint with xa65/dxa65 package!

GARTHWILSON · Post by **GARTHWILSON** » Mon Mar 13, 2017 6:56 pm

Dennis wrote:

Tor wrote:

The binary will probably include data too, and that won't disassemble into instructions.

I don't care about those data - if the disassembler can't produce code it should just print a "db 0x55" or whatever (that's one problem: some print "db", sometimes with "," sometimes only spaces allowed, some print "hex" and the compiler that worked very well till those data just accept the other way.

I suspect this is what Tor is getting at. Some of the data may look like valid instructions, fooling the disassembler. When you get to the first real instruction again, the disassembler may think it's an operand for the previous byte which appeared to be an instruction but was really data. Here's an example form:

Code: Select all

        JSR   <output_the_following_data>
        BYTE  <yada yada yada>             ; (any number of bytes, in any range)
        CLI                                ; Next op code could even look like ASCII data,
        LDA   <variable>                   ; meaning the disassembler can't even take a hint.

The subroutine uses the return address on the stack to get the address of the data. The subroutine may already know how long the data field is, or it may get that information from a byte in the data field itself. Regardless, it adjusts the return address past the data field. Since the disassembler won't know this, it is pretty much guaranteed to be lost at that point, with little hope of recovering. I think a successful disassembler would have to be hand-guided, not automated.

Tor · Post by **Tor** » Mon Mar 13, 2017 7:41 pm

Yep, there was a lot of that in the code I used to write back when I used an Apple II at work - e.g. for string output, as in your example. You can always disassemble into .db, but then you could as well "disassemble" it all (every single byte of the program) into one big .db statement, and assemble that. But why disassemble then? The point of disassembling code, the way I see it, would be to get code that makes sense to look at for analysis and modification.

BigEd · Post by **BigEd** » Mon Mar 13, 2017 7:48 pm

But I think it's reasonable for a disassembly to be able to round-trip. If it sometimes gets code or data misidentified, that shouldn't stop the round-trip accuracy. If it only writes out db statements, that's a pretty big fail!
It doesn't even need to worry about branches into the middle of instructions (the BIT trick) so long as it outputs something which is valid and produces the same bytes when assembled.

Tor · Post by **Tor** » Mon Mar 13, 2017 8:02 pm

I don't disagree with that, it's just that I think the reason for disassembling the code mostly goes away unless there's some amount of interesting or useful code to read. If one just wants the same code it's easier to just copy the original binary!

But it's certainly possible to structure the original code in such a way that a disassembler would be able to output all the instructions correctly, with no hints, until finally hitting the data (at the end, for example) and start outputting gibberish and/or db statements. It would still have problems setting a useful 'org' I presume, depending on what it starts with (e.g. binary on disk, or directly from a position in memory).
In practice, most programs tend to have intermixed instructions and data though. (That wasn't much of a problem when I wrote a disassembler for the Norsk Data NORD-500 minicomputer, it used a Harvard architecture where instructions and data were completely separated.)

BigEd · Post by **BigEd** » Mon Mar 13, 2017 8:04 pm

If a disassembler can pass the round trip test, then its output is useful for making changes to the code.

I think what it comes down to is what the disassembler does when it runs out of code, or falls out of code into data. If it makes an effort to keep outputting valid syntax, it can pass the test. If it starts printing ??? then it will fail the test.

whartung · Post by **whartung** » Tue Mar 14, 2017 12:21 am

It all depends on the sophistication of the disassembler. Many a simple disassembler has been successfully used for small patches of code on an ad hoc basis. If you want more of a reverse engineering tool, that lets you identify data segments, tag memory location symbolically, etc, and has some better inbuilt heuristics, that may be a completely different task.

All depends on the design of the tool. "Disassembler" is a pretty broad term.

Looking for a disassembler and suitable assembler

Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler

Re: Looking for a disassembler and suitable assembler