6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Apr 28, 2024 12:34 pm

All times are UTC




Post new topic Reply to topic  [ 67 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Sun Sep 24, 2017 2:52 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
Yep, that's why over the years (the first incarnation of project started around 1996) I'm always doing various AI implementation research. It's not purely rote "algorithm" that suffices to understand the code, but also exploration, recognition, getting a "sense" for when things aren't working or when more information is needed, etc. Of course, where it's not yet cognizant enough the human steps in to provide the rest, but because both manual and AI can be supported in 1 platform (as opposed to batch/offline static analysis), the boundary between manual and automatic can be moved incrementally.

While I have some of the more rational AI working in my internal versions, I do really want the front end much more complete first, so that I can visualize the conclusions of the AI as they're being constructed, instead of just looking at a pile of numbers that are supposed to imply results. And that front end development has a nice effect of being a complete human-driven toolkit for others in the meantime.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 3:34 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
The better way to do it might be to have the disassembler actually execute the machine code while it's trying to generate assembly-language code from it, starting with the reset routine or the entry point of the application. Then it won't get executed in order, but when it gets to any given part, it can see how much gets treated as data and where the return address gets adjusted to, and things like that. It should still have some human interaction, so you can give meaningful label names to things as you recognize what they're being used for.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 4:33 am 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
GARTHWILSON wrote:
The better way to do it might be to have the disassembler actually execute the machine code while it's trying to generate assembly-language code from it, starting with the reset routine or the entry point of the application.

You would need the entire system in this case. Or at least initial hints in order to ascertain which memory values are just values, vs I/O (thus changing values outside the scope of the executing code).

But once established, you might then just start feeding random values as input and "see what happens". I would imagine the code will sort out rather quickly. But you can also see how difficult it might be if, say, the memory reference is a keyboard, and it's looking for several values, in order (such as four ascii value L I S T) before it actually operates on something.

It would be an interesting exercise to say the least.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 6:49 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
You could have something which symbolically executes some limited number of instructions, perhaps following all branches... that would of course be complex, but we're already talking about AI.

I'm reminded of the situation with static analysers (expensive and large software which checks high level code, typically C code) - they need to support annotations, such that you have a codebase, which changes as work proceeds, and you have a set of annotations, which gets larger as time goes on, and which informs the tool as to how to proceed with various parts of the code. In some cases the annotations can be made in the source as structure comments, but in other cases the annotations are external to the source.

I'm sure there's more than one disassembler which already supports annotations, in the form of a symbol table and/or markers of code vs data. The next step is to provide an IDE where the annotations can be tweaked interactively. Perhaps WFDis already offers this!


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 10:20 am 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
GARTHWILSON wrote:
The better way to do it might be to have the disassembler actually execute the machine code while it's trying to generate assembly-language code from it, starting with the reset routine or the entry point of the application.

Like I said earlier on, that is what my analyzer does/would have done. Although you don't actually execute it, you sort of emulate emulating it :D


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 10:30 am 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
whartung wrote:
GARTHWILSON wrote:
The better way to do it might be to have the disassembler actually execute the machine code while it's trying to generate assembly-language code from it, starting with the reset routine or the entry point of the application.

You would need the entire system in this case.


You don't. You only need some knowledge about what the calls that cross the boundary do. What goes in, what comes out.

whartung wrote:
Or at least initial hints in order to ascertain which memory values are just values, vs I/O (thus changing values outside the scope of the executing code).

But once established, you might then just start feeding random values as input and "see what happens". I would imagine the code will sort out rather quickly. But you can also see how difficult it might be if, say, the memory reference is a keyboard, and it's looking for several values, in order (such as four ascii value L I S T) before it actually operates on something.

It would be an interesting exercise to say the least.

You have to deal explicitly with uncertainty.

I once made an optimizer for an assembler for an SIMD graphics processor, when freelancing for an old employer. This dealt explicitly with uncertainty in moving the many sub-instructions to earlier SIMD instructions. Sometimes, the uncertainty still allows the move. Sometimes not, but what it is uncertain about can at least be reported.

Coming back to the topic, if a call goes across the boundary of the code in hand, you can simply deal with the uncertainty in a similar way : you consider what is known, and iterate over the uncertain value ranges, emulating the path ahead multiple times.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 7:59 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1927
Location: Sacramento, CA, USA
I can imagine that I/O (memory-mapped or otherwise) and interrupts would be the largest sources of uncertainty, but with a bit of carnal knowledge of the system and the expected use-cases it shouldn't always be an insurmountable problem.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 24, 2017 8:21 pm 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
barrym95838 wrote:
I can imagine that I/O (memory-mapped or otherwise) and interrupts would be the largest sources of uncertainty, but with a bit of carnal knowledge of the system and the expected use-cases it shouldn't always be an insurmountable problem.

True. As unpredictable goes, transient events are probably the worst. Wherever the 6502 vectors are deducable, you'd nevertheless have the start address of the handler(s).


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 26, 2017 3:22 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
BigEd wrote:
I'm sure there's more than one disassembler which already supports annotations, in the form of a symbol table and/or markers of code vs data. The next step is to provide an IDE where the annotations can be tweaked interactively. Perhaps WFDis already offers this!
Yeah, that's kind of the definition of an interactive disassembler. ;) Although the "annotations" are behind-the-scenes metadata: The display simply shows the result of bytes being declared as code or data (or graphics, or structs, or whatever) and updates it as you edit that metadata to declare something different. Give it a spin with a little Commodore .prg, or a .bin/.rom image that covers the $fffx vectors. When loading a BASIC .prg (ie, starting at $0801), it loads in the C64 symbol table. Click & cursor around the bytes, and the major keys are:
  • Shift-A to spawn a new assembly trace at the cursor
  • Shift-L to name a label at the cursor
  • Semicolon for comments
  • Ctrl-Z to undo.

Heck, give me a little binary you're familiar with and I'll demonstrate walking through all the steps, if you haven't used a tool like this before. :) There's still a lot to be done even in that list above, like easily defining labels that are being referenced instead of where the cursor literally is, "stop-signs" in the code so asm traces don't wander off into la-la land when they are intended to take an unexpected exit, etc.


As far as I/O and such goes, there's a big nut to crack around dynamic system state, i.e. which code assumes which system state (hardware and/or software) for which "time". Some things are static during a block of code, or even throughout the entire program, then are overwritten by a new overlay or level loading, and now the same locations are used for a different "static" purpose. So it's all about encapsulating the life cycle of the various meanings.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 11:47 am 
Offline

Joined: Wed Oct 18, 2017 10:08 am
Posts: 2
Thank you for this fantastic piece of software.

Here some points I found while using it:

1. I disassembled an old commercial file where I stripped the range $0000-$2000 because it is RAM Mapped and gave WFDis the start address of $2000. There are many points where the programm accesses the range of $0000-$2000 but WFDis does not fill up this range with labels automatically.

3. it would be fantastic if an ascii repesentation was show as column at least after data sections.

4. the IRQ-, Reset- and NMI Vectors at the end could be named and labeled ... only the target addresses are labeled right now.

regards
Carsten


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 11:56 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
(Welcome Carsten! I hope you like the forum, and stick around, and maybe one day Introduce Yourself.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 21, 2017 7:07 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
DarkStar wrote:
Thank you for this fantastic piece of software.

Thank you for using it! :)

Quote:
1. I disassembled an old commercial file where I stripped the range $0000-$2000 because it is RAM Mapped and gave WFDis the start address of $2000. There are many points where the programm accesses the range of $0000-$2000 but WFDis does not fill up this range with labels automatically.

The current workflow is to put the cursor on a reference, and press Shift-C to create a new byte in the address space to hold that referenced value. For instance, if there's a "lda $1003" in the code, putting the cursor on the "$1003" and pressing Shift-C will manifest a byte of value "??" at location $1003. You can then press enter to follow the link and re-label that location.

This is a stopgap, as I'm currently in a major rewrite of how the displayed lines are internally represented, to allow much more flexibility. This is also why updates have been slow. Once I get away from single keystroke commands (because there are simply too many), labeling a referred-to location from an absolute/zeropage reference will be a direct operation. I don't think it's currently the best solution to automatically spam up the disassembly with every byte that's directly referenced, especially since some of those references are intended to cross banking/overlay boundaries, and some are pre-baked offsets against tables.

Quote:
3. it would be fantastic if an ascii repesentation was show as column at least after data sections.

I absolutely agree. However, the main issue is which character set to use. In the C64, which is kind of the initial testbed for WFDis reverse engineering, there's 2 fonts (uppercase/graphics vs lowercase/uppercase), as well as 2 character code mappings (PETSCII vs direct screen codes), so I'm still figuring out where & how to switch character sets to explore all this.

Plus, there are things like having the high bit set for end-of- or beginning-of-word markers, and games/demos that use even different character code mappings. That's probably not applicable to the "unknown byte" parallel display, which would mimic CLI memory monitor output, just showing standard text decodings on the side. The main idea I'm going to try is having a different color for 0-31, so escape codes show up as their respective letters, and render it in reverse if the high bit is set. Then most of the bytes should be displayed using standard 32-127 ascii characters. But even then, I'd rather have options, especially per platform, for varying that display.

Quote:
4. the IRQ-, Reset- and NMI Vectors at the end could be named and labeled ... only the target addresses are labeled right now.

Yes, that's a remaining oversight. The initial destinations of those vectors are labeled as entry points, but that was done before I had an "address" type for data, so I couldn't yet represent the $fffx vectors themselves. I will get that in soon.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 22, 2017 11:58 am 
Offline

Joined: Wed Oct 18, 2017 10:08 am
Posts: 2
Hi BigEd,

BigEd wrote:
(Welcome Carsten! I hope you like the forum, and stick around, and maybe one day Introduce Yourself.)


I like it and found out that real names are not common in this forum :-)
To minimize my effort to introduce myself, here is a link
http://plus.google.com/+CarstenMeyer

regards
Carsten


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 9:29 am 
Offline
User avatar

Joined: Tue Sep 11, 2007 8:48 am
Posts: 27
Location: Pruszków/Poland
I have been using WFDis recently (looking into the TIB PLC DD-01 ROM) and I really like it, helps a lot when analysing a binary.
Background info: TIB PLC DD-01 is a cartridge for C64 with a 37c65 (improved uPD765) FDC controller + 3.5 inch FDD drive.

There is a little thing I miss though - for a raw (*.bin) file we do not get the labels for kernel/Basic/VIC/SID etc, but after few minutes with wfdis.js I had it working (also added tables for CIAs and another table for some of the DD-01 labels as they were listed in the user manual). OK, I needed them for this particular target, I know we may be using WFDis for quite different 6502 systems and in some cases (e.g. when on C64 we access the RAM below ROMs) this may even lead to wrong output.
Could such an extra label-address table (extending the tables built in) possibly be imported from a file ? In some cases, e.g. when we first analyse the hardware, know the I/O addresses - we could prepare a file with labels upfront.

_________________
Practice safe HEX !


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 24, 2018 2:04 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
Yes. I had been putting that off in order to gather a list of file formats suitable for such things, and to see if there could be some smart way of inferring the label format. But given that people are actively using WFDis, it does need some "good enough" features tossed in to support some basic use cases like that.

I've been struggling through the internal rewrite, both in complexity & available time, but I still do occasionally publish bugfixes for the current branch. I'll prioritize the input label files.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 67 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 31 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: