6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 10, 2024 7:41 pm

All times are UTC




Post new topic Reply to topic  [ 98 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Sat Mar 02, 2019 10:20 pm 
Offline

Joined: Tue Nov 10, 2015 5:46 am
Posts: 230
Location: Kent, UK
sark02 wrote:
Some people have a natural talent at reverse-engineering.
If anyone is interested in Williams Robotron and its object management, a chap called Scott Tunstall has done a very detailed disassembly of the code.
Find it via Sean Riddle's (very interesting) website: https://seanriddle.com/robomame.asm


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 1:26 am 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
sark02, fantastic info, thanks for taking the time. I really appreciate your help. This is so motivating! I will certainly come back to your post many times.

I have peeked into the game for a couple of days now. My understanding is, well, limited - - even more so because I've never coded the 6502 so far. I'd still like to give you some feedback on how I work my way towards the "big picture".

Quote:
My approach to the big-picture disassembly was:

My disassembly follows the execution path, i.e. I decide on the basis of actually executed instruction where the opcode, the operands and the data are.

At this point in time my disassembler creates the following file:
Code:
.L0 < $2dfd (jmp)
$4000  a2 3f                 LDX #$3f     
$4002  9a                    TXS         
$4003  a9 2e                 LDA #$2e     
$4005  a2 0c                 LDX #$0c     
$4007  a0 08                 LDY #$08     
$4009  20 9d 44              JSR .closed01
.L4 < $44c1 (rts)

$400c  a9 3a                 LDA #$3a     
$400e  a2 06                 LDX #$06     
$4010  a0 01                 LDY #$01     
$4012  20 9d 44              JSR .closed01
.L5 < $44c1 (rts)

$4015  a2 3f                 LDX #$3f     
$4017  9a                    TXS         
$4018  20 2f 51              JSR .closed13
.L21 < .L20 (rts)

$401b  a9 ea                 LDA #$ea     
$401d  a2 14                 LDX #$14     


The disassembler assigns labels to every point where we leap to (branches, jsr, absolute/indirect jmp, rts). Consequently, I can show the addresses from where I leap to a label w/ "<", together with the type of leap instruction. RTS which do not match a calling JSR (e.g. stack was manipulated) are treated as an own type of leap. The branch instructions are marked depending if the branch was sometimes, always or never executed.

I then take the code and group consecutive instructions. The code (seen as a list of consecutive instructions in memory) is broken into "tiles" at instructions which leap away or instructions which are leaped to from elsewhere. Tiles are then connected automatically using rules, e.g. for sequential execution, normally returning JSR, branch instruction which never branch but which could etc.

Those "stretches" of tiles are a first pretty good approximation of subroutines. I collect the JSR in each stretch and collect to which other stretch/subroutine they leap and return.

So far I have emulated and disassembled only a couple of seconds into the intro of the game. The disassembly therefore has a lot of "holes", because certain parts of the intro have not been executed yet. Still, the game intro runs, so there has to be some structural logic, even if the coverage of code by the execution path is not complete. I'd rather take it slowly in the beginning :)

Quote:
With the known-function-list you can create a call-graph. There will likely be many call graphs - they won't all neatly form one big tree. The call-graph lets to see structure. There will be many shared nodes on the graph. These help identify utility functions.


I know of course what's happening in the game, but I am not yet able to relate it to the disassembly. No surprise there; a steep learning curve is to be expected. The structural information by executed the code is enough, though, to get a first glimpse on the call-graph.

Attachment:
tree0.jpg
tree0.jpg [ 30.58 KiB | Viewed 1892 times ]


The attached graph shows the flow between the stretches of the code, identified by the label of the first instruction of the stretch. The numbers show the count how often the stretch was entered (from the sample execution path). => The helper subroutines are clearly discernible, even with this very crude analysis.

The call-graph as well as the underlying stretches and their tiles are automatically generated from the memory map and disassembly information. The resulting call-graph covers most of the execution paths through the code. There are few locations which the algorithm cannot stitch together correctly, e.g. the cyclical calling in the right sub-call-graph. It offers enough information, though, to guide the next steps.

Quote:
The hardest thing with reverse-engineering is figuring out intent. Not _what_ something does, mechanically, but _why_.

I cannot even imagine how hard this is going to be. It will take ages until I will make good progress into the game. The info from your post and the posts of the others in this thread is a fantastic help, thanks!


Attachments:
into0.asm [21.51 KiB]
Downloaded 60 times
Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 3:13 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Perhaps I can give you a head start by walking you through the subroutine labelled "closed03"; this has the highest call count in the above graph. Please feel free to have this on in the background.
Code:
.closed03 < $5148 (jsr)
$4c4f  85 e0                 STA $e0     
$4c51  86 e1                 STX $e1     
$4c53  a9 00                 LDA #$00     
$4c55  85 e2                 STA $e2     
$4c57  a2 08                 LDX #$08     
$4c59  0a         .L8        ASL         
$4c5a  26 e2                 ROL $e2     
$4c5c  06 e0                 ASL $e0     
$4c5e  90 07                 BCC .L7      ; 0
$4c60  18         .L9        CLC         
$4c61  65 e1                 ADC $e1     
$4c63  90 02                 BCC .L7      ; +
$4c67  ca         .L7        DEX         
$4c68  d0 ef                 BNE .L8      ; 0
$4c6a  aa         .L10       TAX         
$4c6b  a5 e2                 LDA $e2     
$4c6d  60                    RTS         
One of the first things to note here is that the addresses shown are not a complete listing; two bytes at $4c65 are missing. I bet, if you examine the code by hand, they'll turn out to be E6 E2 (INC $e2) - because this is an 8x8 multiplication routine, exactly the sort of thing you'd expect to be frequently executed as a leaf function, given a CPU without a multiply instruction of its own.

I think it should be made clearer in your listings when bytes are skipped, as there may be rarely-executed instructions or useful inline data there. Inline data can be particularly helpful for determining the purpose of a block of code.

The two 8-bit operands are provided in A and X, and the first thing the routine does is stuff them into zero-page at $E0 and $E1 respectively, then clear A and a workspace byte at $E2. The Y register isn't touched at all in the whole routine, so it is preserved for the caller to use as an index register of its own (which it does).

X is then initialised to 8 and used as a loop counter. The loop structure consists of the three instructions LDX #8 ; […] ; DEX ; BNE .L8 (backwards branch) and encompasses the majority of the subroutine body. This is an idiom that you should learn to recognise instantly.

After the loop, note that X is not restored from $E1 (as it would be if the author intended to preserve it), but is instead transferred from A, and then A is loaded from the workspace bytes previously initialised. This indicates that both A and X are used to return data to the caller (in this case, a 16-bit product, with X holding the low-order half). We should try to work out what A and $E2 end up with inside the loop.

Returning to .L8, we immediately see ASL ; ROL $E2. This is a two-byte multiprecision left shift, effectively transferring the top bit of A into the bottom bit of $E2 via the Carry flag, and the former top bit of $E2 ends up in the Carry flag. This is further confirmation that they're being treated as a single wide value. But on the first iteration both are still zero, and remain so for the time being.

The following ASL $E0 is where the real action begins. Recall that $E0 contains the former value of A on entry; we've just popped out the top bit of it into the Carry. The very next instruction branches on the Carry, skipping all the rest of the loop body if it is cleared. Clearly the individual bits of the first operand, most-significant first, are crucial to the algorithm.

What happens when a bit is found to be set? Following on from .L9, we see the idiomatic pair CLC ; ADC $E1, adding the second operand to the accumulator. (You could reasonably contract CLC ; ADC pairs into ADD macros, and SEC ; SBC into SUB - since the 6502 has no add-without-carry nor subtract-without-borrow instructions, those pairings are very common indeed.) So one operand is added to the shifted workspace for each set bit in the other operand; this is a classic multiplication algorithm!

The Carry flag now indicates unsigned overflow from the addition, and there is a branch testing it immediately afterwards - which happens to be always-taken for the moment. But the sensible thing to do if the Carry is set is to propagate it into the high half of the workspace at $E2, which can be done with INC $E2 as noted above.

Hopefully if you can follow the above explanation, much more will make sense from now on.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 8:32 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10976
Location: England
(Good point about the two missing bytes: it demonstrates that a tracing disassembler can only reach code which is executed, so we become interested in code coverage. It's not enough to run the game, it might be necessary to trigger every possible action and interaction, including every possible carry from one byte into two: so actions which are within 256 coordinate points, or 256 energy points, or 256 rotation units, need to be exercised until their range goes beyond one byte. Assuming they are not single-byte values, of course. It might be enough to annotate every branch which is always taken, or never taken, as something to be investigated manually.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 11:48 am 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
Chromatix wrote:

Mess with the best, die like the rest! :)

Quote:
I bet, if you examine the code by hand, they'll turn out to be E6 E2 (INC $e2)

Yes! Nice.

Chromatix, you just jumpstarted my in-depth analysis of the code. Thank you! Your description of the algorithm is very clear. I got a good grasp on what this piece of code is doing.

Chromatix wrote:
I think it should be made clearer in your listings when bytes are skipped,

BigEd wrote:
It's not enough to run the game, it might be necessary to trigger every possible action and interaction

The multiplication algorithm highlights that relying on a disassembler which only covers executed code is a time sink. In order to speed up the analysis I need to fill in the blanks left by the execution thread. This is also necessary for the structural analysis because suddenly formerly disjunct parts of the code might show up as actually hanging together.

I have attached the complete disassembly of the binary data, simply running it linearly through my disassembler. Robotron starts/ w/ a JMP to $4000 which is the part I had included in the earlier post. $2DFD and the first bytes at $4000 are overwritten w/ NOP as part of the initialisation.

Now that I know that .closed03 is 8-bit multiplication I did a bit of research. There are a number of different algorithms, and of course the Robotron one is is among them. No surprise that this forum has it, too: http://forum.6502.org/viewtopic.php?f=2&t=5415&p=65393&hilit=multiplication#p65389 (where you contributed in Jan19) points to "6502 Assembly Language Programming" by Leventhal. Originally, I had wanted to read his "6502 Assembly Language Subroutines" as a 6502 primer, but I'll switch to the predecessor book now.

The multiplication algorithm on https://www.lysator.liu.se/~nisse/misc/6502-mul.html looks faster, but w/o a workbench to actually run and cycle count those algorithms it's difficult to make a judgement call. I haven't started w/ a repository of 6502 idioms and algorithms yet, maybe it's time to contemplate that.


Attachments:
Robotron (raw).asm [373.16 KiB]
Downloaded 59 times
Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 11:53 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10976
Location: England
> I haven't started w/ a repository of 6502 idioms and algorithms yet, maybe it's time to contemplate that.

I can imagine that could be a very useful resource for future adventurers.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 12:27 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Quote:
The multiplication algorithm on https://www.lysator.liu.se/~nisse/misc/6502-mul.html looks faster…

Yes, that's a nicer looking algorithm. It saves a branch and increment by sucking the overflow carry bit back into the accumulator as soon as it's produced, which in turn is made possible by processing everything LSB first instead of MSB first. But it's still fundamentally performing an add of one operand for each set bit in the other operand, so is recognisable as a multiply on the same basis.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 03, 2019 7:12 pm 
Offline

Joined: Tue Nov 10, 2015 5:46 am
Posts: 230
Location: Kent, UK
If anyone is curious about what Robotron 2084 looks like on the Apple II, and how it compares to the arcade, see this "Let's Compare" video:
https://www.youtube.com/watch?v=ejA0w-PmBZY

The video starts with a clip from the Williams arcade version (6809-based, and uses a hardware blitter to draw the sprites; unlike earlier games like Defender, which drew sprites using software). The Apple II version starts at 8:25.

I remembered that a few years ago Atari released, as open source, the code for a number of games for their 6502-based Atari 7800 system. I just found the dump on github: https://github.com/OpenSourcedGames/Atari-7800/tree/master/ROBOTRON. In the Let's Compare video, this version can be seen at 13:49.

I didn't find the multiply subroutine that Chromatix analyzed, so it was likely an independent implementation. Still, it's an excellent, documented, reference for a real 6502 implementation of the game.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 07, 2019 5:20 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Haven't had much time lately, and I'm too tired to do this (I say "16 byte" instead of "16 bit" too often), but I recorded a video anyway, fiddling around with some of the earlier Robotron routines in WFDis. The password is wfdis.

I decided to give vimeo a shot, since they're much less invasive than youtube, but unfortunately it doesn't have 1.5x or 2x playback speed, so you'll have to sit through it all at normal speed, which I can't stand doing anymore. ;)

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 07, 2019 8:19 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Thank you so much for that very informative demonstration, WF! One question regarding your stack picture:

Attachment:
short stack.JPG
short stack.JPG [ 13.32 KiB | Viewed 1778 times ]


It looks like your simulation is inferring that $013d and $013e are being altered by the lda pha lda pha at the end of the copypage subroutine, but as I understand it, the 65xx post-decrements S for pushes and pre-increments S for pulls, causing $013e and $013f to be altered when given an initial S of $3f before the jsr.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 07, 2019 3:38 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Yeah, the emulator is really primitive and probably has some edge case bugs in it. Usually the stack effects while running don't need to be recorded in the output, so it hasn't been visible before. This is a weird case of overlapping features, with both JSR/RTS and STAs overwriting each other. But it still works well enough to figure out what goes on there.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 08, 2019 8:08 am 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
White Flame wrote:
Haven't had much time lately, and I'm too tired to do this (I say "16 byte" instead of "16 bit" too often), but I recorded a video anyway, fiddling around with some of the earlier Robotron routines in WFDis.

Thank you for putting in the time, that was a great intro into how to work with your tool. Very much appreciated.

The adhoc emulation with Shift-R is awesome. I didn't know that WFDis can do that. With regard to the task at hand, it was helpful to see how a subroutine address on the stack can be used to access data immediately after the JSR. That's an important idiom to know.

Prepending labels with L and S is sensible, so I have added this right away to my workbench. I would have expected that jumps were labeled with 'J', but you stick to 'L' here as well. Any particular reason why?

I had originally lowercased mnemonics and addresses (like seen for x86). Most of the snippets on this forum are uppercase, though. In WFDis you decided to lowercase everything. What is your reasoning behind deviating from the common "style guide"? )

The smooth scrolling is a neat idea. I can totally understand why this can help to understand how the different parts of the code hang together when there is much jumping around (as it is with Robotron). I lack the skill of keeping the structure of even smaller stretches of code in mind. For this reason I have put most of the work so far into automated structural analysis of the code. I do this in Python, because the emulator is written in Python, too. BTW Excel doesn't play a big role in the project, for me it's just a convenient notebook with additional intelligence and easy interface to the emulator and tracer.

I think using a manual disassembler is compatible with my reverse engineering approach. I lean on the ideas in Don Lancaster's "Tearing Into Machine-Language Code". He advises against using tools to disentangle the code and rather advocates "Do the dull stuff yourself!". But his method also assumes that those who want to reverse engineer should be versed in 6502. A bit more automated structure discovery is necessary to help my learning.

I'm going to continue working on the analysis over the weekend, let's see if I am able to share interim results. I would certainly consider transferring at least some of the automatically generated info to the listing in WFDis (e.g. caller/callee), so it would be nice if your next version re-enables inline comments. But that's just a nice to have, the tool is certainly powerful as it is. Thanks again for the help!


Attachments:
Tearing Into Machine-Language Code (Lancaster 84).pdf [180.99 KiB]
Downloaded 67 times
Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 08, 2019 9:43 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
fschuhi wrote:
White Flame wrote:
it was helpful to see how a subroutine address on the stack can be used to access data immediately after the JSR. That's an important idiom to know.

That, and lots more, is in the 6502 stacks treatise which I called "6502 STACKS: More than you thought," this part being in section 7, "Inlining subroutine data."

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 08, 2019 10:24 am 
Offline
User avatar

Joined: Sat Feb 23, 2019 9:11 am
Posts: 38
GARTHWILSON wrote:
fschuhi wrote:
White Flame wrote:
it was helpful to see how a subroutine address on the stack can be used to access data immediately after the JSR. That's an important idiom to know.

That, and lots more, is in the 6502 stacks treatise which I called "6502 STACKS: More than you thought," this part being in section 7, "Inlining subroutine data."

I had already found your treatise a couple of weeks ago! - - Very well explained and thus easy to understand, even for newcomers. In section 11 you say: "JSR and RTS are jumps with special effects on the hardware stack." When I read that it clicked.


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 08, 2019 10:44 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10976
Location: England
fschuhi wrote:
I lean on the ideas in Don Lancaster's "Tearing Into Machine-Language Code".


That's an amusing read, thanks!
Quote:
You should have a tractor-feed printer along with some heavy white paper, preferably 20-pound. Naturally, you will also need a plastic 6502 Programming Card and, of course, the 6502 Programming Manual.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 98 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: