6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 11:59 am

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Mar 10, 2017 2:10 pm 
Offline

Joined: Sun Mar 05, 2017 11:48 am
Posts: 6
after trying endless hours different asm/dasmx/cc, I finally give up and ask you for help:
I have a 6502 binary (for testing purposes I'm using the cbm basic v2.bin, since its disassembly is very well documented) and want to run a disassembler
which output is then fed into an assembler and it should produce the exact binary (without manual changing any source lines). I do know the starting address (e.g. $A000 for the cbmbasic) but thats all. Windows binary (command line) preferred.
I totally failed on that task - either the disassmbler spits out some unknown statements for the assembler or I really don't know how to achieve that with the tool (e.g. with the cc65 package).

Could anyone help me please? Thanks, Dennis


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 10, 2017 2:35 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
It's unlikely that you can disassemble a binary and expect the unmodified output to assemble correctly. The binary will probably include data too, and that won't disassemble into instructions. Unless you tell the disassembler exactly where the data is, it won't produce output you can assemble unmodified. But some disassemblers can take 'hint' files as supporting input, I believe. There was a discussion about disassembling code here in this forum a short time ago - it should cover the issues better than I can describe them.


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 10, 2017 2:40 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
(See this thread for that recent discussion.)

Bear in mind that there are several different syntaxes, almost as many as there are assemblers, so you will need to find a suitable pair of tools.

White Flame's online disassembler is in active development, which is an advantage, if you can feed back any difficulties you have with it.


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 10, 2017 3:34 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
This exact scenario is next up for my disassembler (WFDis). There's a ton of assemblers out there, so the main issue is connecting all the nuances of what can be held in the disassembled state to the input expectations of assemblers. Things like LDA $00FE, which is (for timing or selfmod reasons) encoded as absolute addressing instead of zeropage, need to be indicated to the assembler so it doesn't reencode it to use zeropage. For undefined NMOS opcodes, there are multiple mnemonic sets, or the assembler might not support them. Labels need to be carefully used in such a way that if somebody edits the .asm file to insert or remove instructions, the pointers don't all misalign. Text strings are nasty, especially on retro computers where they had their own character sets, and assemblers have weird conversions to handle text literals from ASCII .asm files. And then there's a bunch of little dumb stuff like "LSR" vs "LSR A" vs "LSR @" which I saw once.

But yeah, the notion of getting a binary to disassemble & reassemble (without touching anything) into the exact same byte-for-byte binary is certainly doable. There's just a ton of doable edge cases to cover, and of course the disassembler has to target a particular assembler very specifically.

Do you have a particular assembler that you prefer?

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 10, 2017 6:55 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
White Flame wrote:
Things like LDA $00FE, which is (for timing or selfmod reasons) encoded as absolute addressing instead of zeropage, need to be indicated to the assembler so it doesn't reencode it to use zeropage.


Yea, that's a real trick. That will disassemble fine, I mean, why not, it says so in the code. But on assembly, it seems common for the assembler to assume that a zero page address is, in fact, zero page. You might be able to force this by something like:
Code:
LDA ZPADDR
ZPADDR = $00FE


In this case, the assembler may have to assume that ZPADDR is a 2 byte value, since it's undefined at that point. (I honestly don't even know what mine would do with this, probably make it a normal absolute load).

But, that would take a particularly knowledgable disassembler to see that the absolute address might be confused with a zero page address.

It doesn't help that most disassemblers aren't designed to be compiled.

None of the other problems bother me though, really. The disassembler doesn't have to know about data blocks and such, it simply has to know there it's a valid instruction or not. Having instructions interlaced with a bunch of DB statements is, yea, ugly, but it should reassemble back ok. Same with undefined op codes, as long as the disassembler remains ignorant of them.

If the goal is semi-readable assembly that can be reassembled, that's not really a big nut. You can probably pick a favorite assembler that you like and whip up an adequate disassembler in a couple hours, really, with simple labels. It's the more time in to finding blocks and such like that is where the time goes.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 8:01 am 
Offline

Joined: Sun Mar 05, 2017 11:48 am
Posts: 6
Tor wrote:
The binary will probably include data too, and that won't disassemble into instructions.

I don't care about those data - if the disassembler can't produce code it should just print a "db 0x55" or whatever (that's one problem: some print "db", sometimes with "," sometimes only spaces allowed, some print "hex" and the compiler that worked very well till those data just accept the other way.


Quote:
Do you have a particular assembler that you prefer?

a working one ;-)

I really don't understand why the dis->asm->bin way won't be possible.

Just because there is the http://www.pagetable.com/?p=46 "Create your own Version of Microsoft BASIC for 6502" claiming "The source can be assembled into byte-exact versions" with the CC65 package (doesn't work for me though)


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 8:09 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I'm sure it can be done, it's all about the details. Be patient, provide good feedback, and you'll probably get what you want.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 12:47 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Unfortunately, there's quite a lot of "It didn't work!" without any further detail, which isn't really conducive to help. For instance, that pagetable link has a .zip with a make.sh that builds everything automatically. Did the built binaries not match expectations? Did the build itself fail? Is your environment set up for it, or did it look like a bug in their posted code?

It sounds like you tried da65/ca65, which is probably a bit complex, but did you try dxa65/xa65? Those both claim to create disassemblies that can be fed straight back into their respective disassembler, with dxa65 specifically saying it should be byte for byte, if you just want something quick and dirty. (I'm not writing a quick-and-dirty tool, so it's going to have to wait a bit longer. ;) )

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 4:17 pm 
Offline

Joined: Sun Mar 05, 2017 11:48 am
Posts: 6
sorry, I was so frustated because a trivial task wasted that much time.

Just tried dxa/xa on freebsd:
Code:
dxa65 -g a000 basic.901226-01.bin >basic.src

and
Code:
xa65 basic.src

puts the starting address ($a000) at the first 2 bytes, can't find how to omit them ;-)

after running a
Code:
dd bs=2 skip=1 if=a.o65 of=basic.bin

and then the cmp
Code:
cmp basic.901226-01.bin basic.bin


says the file is identical.

Thank you! Thanks for the hint with xa65/dxa65 package!


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 6:56 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Dennis wrote:
Tor wrote:
The binary will probably include data too, and that won't disassemble into instructions.

I don't care about those data - if the disassembler can't produce code it should just print a "db 0x55" or whatever (that's one problem: some print "db", sometimes with "," sometimes only spaces allowed, some print "hex" and the compiler that worked very well till those data just accept the other way.

I suspect this is what Tor is getting at. Some of the data may look like valid instructions, fooling the disassembler. When you get to the first real instruction again, the disassembler may think it's an operand for the previous byte which appeared to be an instruction but was really data. Here's an example form:
Code:
        JSR   <output_the_following_data>
        BYTE  <yada yada yada>             ; (any number of bytes, in any range)
        CLI                                ; Next op code could even look like ASCII data,
        LDA   <variable>                   ; meaning the disassembler can't even take a hint.

The subroutine uses the return address on the stack to get the address of the data. The subroutine may already know how long the data field is, or it may get that information from a byte in the data field itself. Regardless, it adjusts the return address past the data field. Since the disassembler won't know this, it is pretty much guaranteed to be lost at that point, with little hope of recovering. I think a successful disassembler would have to be hand-guided, not automated.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 7:41 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
Yep, there was a lot of that in the code I used to write back when I used an Apple II at work - e.g. for string output, as in your example. You can always disassemble into .db, but then you could as well "disassemble" it all (every single byte of the program) into one big .db statement, and assemble that. But why disassemble then? The point of disassembling code, the way I see it, would be to get code that makes sense to look at for analysis and modification.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 7:48 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
But I think it's reasonable for a disassembly to be able to round-trip. If it sometimes gets code or data misidentified, that shouldn't stop the round-trip accuracy. If it only writes out db statements, that's a pretty big fail!
It doesn't even need to worry about branches into the middle of instructions (the BIT trick) so long as it outputs something which is valid and produces the same bytes when assembled.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 8:02 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
I don't disagree with that, it's just that I think the reason for disassembling the code mostly goes away unless there's some amount of interesting or useful code to read. If one just wants the same code it's easier to just copy the original binary! :)
But it's certainly possible to structure the original code in such a way that a disassembler would be able to output all the instructions correctly, with no hints, until finally hitting the data (at the end, for example) and start outputting gibberish and/or db statements. It would still have problems setting a useful 'org' I presume, depending on what it starts with (e.g. binary on disk, or directly from a position in memory).
In practice, most programs tend to have intermixed instructions and data though. (That wasn't much of a problem when I wrote a disassembler for the Norsk Data NORD-500 minicomputer, it used a Harvard architecture where instructions and data were completely separated.)


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 13, 2017 8:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
If a disassembler can pass the round trip test, then its output is useful for making changes to the code.

I think what it comes down to is what the disassembler does when it runs out of code, or falls out of code into data. If it makes an effort to keep outputting valid syntax, it can pass the test. If it starts printing ??? then it will fail the test.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2017 12:21 am 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
It all depends on the sophistication of the disassembler. Many a simple disassembler has been successfully used for small patches of code on an ad hoc basis. If you want more of a reverse engineering tool, that lets you identify data segments, tag memory location symbolically, etc, and has some better inbuilt heuristics, that may be a completely different task.

All depends on the design of the tool. "Disassembler" is a pretty broad term.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: