reverse engineering Robotron 2084 for the Apple II

BigEd · Post by **BigEd** » Mon Mar 11, 2019 12:37 pm

It's a good point - if you have a disassembler already, then that's a way to check if a block is code or not. (You might need to be a little fuzzy - 90% good disassembly means it's code, for example.) Nice idea on spotting too many BRKs. I'm not aware of any guidance on doing this.

White Flame · Post by **White Flame** » Mon Mar 11, 2019 7:07 pm

fschuhi wrote:

This is usually true, of course, but an SEC/BCS combination (twice in the code) does really never fall through by definition.

As you're talking about a stretch, and not a tile in your specific vocabulary, the above is not technically true. If the BCS is a leap target, then even though a SEC immediately precedes it, it only does so in one code path and thus doesn't always branch.

Quote:

A huge helper for understanding the 6502 details will be fine-grained LD/ST tracking. The issue is less how to do it in the emulator but rather how to constrain it. One of the reasons for this is that Python is simply too slow for doing this kind of on the fly data collection. But the main reason is that the amount of data generated from recording all LD/ST is massive which would actually hurt the understanding. There are certainly many design choices how to approach this conundrum and find a good balance - - any ideas are welcome.

I do this in the server in my tools (btw, the server based one isn't in a running state, the TODO list is large and has many codependent items that need to come together), but LD/ST isn't the only way that things in the CPU state get set. Things like TAX are a write to .X, and INX both reads the old state of .X and writes a new one as well. It's also very useful to track status register flags in the same way. It's handy as a cross-reference list, and isn't that bad when displaying it as extra context off to the side of each instruction.

fschuhi wrote:

For a quick statistical test to check if a block of bytes s code or not I would just point the static disassembler to the first byte of the block, disassemble until the last byte and then check for any unknown ops. Optional checks might be multiple BRK or ORA, or any repetition of bytes >4 times. Are there any standard approaches?

That's probably reasonable. Any BRK or RTS should probably end the attempt, and then see if it's worth keeping. Personally, I think this is a reasonable place for a neural net, as it's literally a classification problem.

fschuhi · Post by **fschuhi** » Mon Mar 11, 2019 8:46 pm

White Flame wrote:

fschuhi wrote:

This is usually true, of course, but an SEC/BCS combination (twice in the code) does really never fall through by definition.

As you're talking about a stretch, and not a tile in your specific vocabulary, the above is not technically true. If the BCS is a leap target, then even though a SEC immediately precedes it, it only does so in one code path and thus doesn't always branch.

Correct, I need to keep that in mind, the SEC needs to be on the same tile. Furthermore, as explained by BigEd: An SEC can be anywhere on the execution path before the BCS, making the BCS branch always. The situation cannot be resolved clearly.

Quote:

but LD/ST isn't the only way that things in the CPU state get set. Things like TAX are a write to .X, and INX both reads the old state of .X and writes a new one as well.

I had only the memory load/store in mind, but of course there is also the whole class of bit operations which operate on the memory. I can hook into the read/write of the simulated memory and collect the info. Even that can quickly lead to just too many references, especially with indexed access, so I need to condense it, maybe grouping by page.

Quote:

Any BRK or RTS should probably end the attempt, and then see if it's worth keeping.

Also RTI, because the Apple II doesn't work with interrupts (only for serial cards or mouse cards, nothing I have to care about here).

Quote:

Personally, I think this is a reasonable place for a neural net, as it's literally a classification problem.

My first reaction was "that would be an overkill"

but depending on the task it might actually be a good idea to train the nn on detecting the edges between code and non-code. Does your server use machine learning on the code?

For the task at hand I think I can get to good results by flood filling .byte sections from known entry points, e.g. branches immediately before (i.e. falling through into) the .byte block, or a branch into the .byte block which never executed. The stop condition would then be an (unmatched, see below) RTS, because even newly disassembled absolute JMPs and JSRs could be treated as hints where to continue to disassemble. Indirect JMP needs to be treated as a stop condition, too, because it's not possible to resolve it statically.

You use flood-filling in your server, right? How do you treat JSR? Most of the JSR will return with a regular RTS, so spidering the subroutine(s, recursively) is sensible. On the other hand, some JSR return to a point after the params which follow immediately behind the JSR, e.g. the into stuff you disassembled in the video. I will probably start w/ JSR as stop condition for the spidering, then relax this with JSRs to stretches known to be compact.

Tor · Post by **Tor** » Tue Mar 12, 2019 9:01 am

For me, with my particular vision issues, $03ea9 is easier to read than $3EA9. Likewise $4abc vs $4ABC
The problem (for me) with the capital variant is that capital letters tend to lean more vertically up on the digits (e.g. $03E, vs $03e), and even letter-to-letter, so it's slightly harder for me to read. In other words, it's faster and more accurate for me to read lower case hex numbers. I don't do well with vertically close lines.

White Flame · Post by **White Flame** » Tue Mar 12, 2019 9:21 am

fschuhi wrote:

My first reaction was "that would be an overkill"

but depending on the task it might actually be a good idea to train the nn on detecting the edges between code and non-code. Does your server use machine learning on the code?

I haven't done anything with NNs looking at code bytes yet. It's obviously very platform-specific, too, and I'd want to be fully cross-platform.

Quote:

You use flood-filling in your server, right? How do you treat JSR?

The target for the server is strongly associated with symbolic execution. Every byte of code needs an impetus for being declared as code, rooted by the file format's entry points and the user's declaration of starting points. Automatically disassembling totally unused code in the file is something I'm not going to tackle soon; finding out how to find all the code paths (indirection, stack games, loaded overlays, interrupts, runtime codegen, self-modification) is what I want it to do instead.

BigEd · Post by **BigEd** » Tue Mar 12, 2019 9:30 am

It might be interesting to note that the usual unsophisticated approach is to run the disassembler over everything, and then to examine the failures, which might be code blocks or misaligned disassembly, and only eventually to distinguish any code which might be unreachable. (I would think unreachable code would be rather unlikely, and data which successfully disassembles also unlikely. But both could happen, especially on a fine-grained view.)

fschuhi · Post by **fschuhi** » Sat Mar 16, 2019 8:02 pm

I've continued working on a way to generate the "big picture" of what's happening. As an example, here is the screen after a couple of seconds of execution:

This is the corresonding auto-generated call-tree.

JSR = green
JMP = red
branchings (if taken) = blue
RTS if not paired with JSR

The paired RTS are not shown, if they did they would return to the sources of the green arrows. The showgext sub displays "Atari presents" at the top of the screen. The string is supplied as param immediately after the JSR to showtext, therefore the RTS does not match that JSR and is shown in yellow.

The nodes are "stretches" of code which the automatic analysis shows to be related. For the nodes with a box shape the analysis determines that it is a regular subroutine. Arrows shown are between stretches, not between the actually leaping instruction (which is somewhere in a stretch) and the target it leaps to (which is in the same or a another stretch). This pruning of the call tree is important, especially for bigger trees, because with otherwise the huge number of arrows and nodes gets confusing and unwieldly. As a side effect, complex stretches with many branches catch the eye (init01, showtext with JMP-loops inside the stretch).

The call tree gives a good impression of the overall structure and gives hints on the execution flow. For small call trees that's fine, but for the bigger ones I am currently trying to understand it is difficult to discern what comes first. I've therefore added the following alternative view:

Here the nodes are ordered vertically by the cycle count of the first execution of that particular stretch. (The nodes are randomly ordered in horizontal direction, to fill the available space as best as possible). You can follow the init01 doing its work in the beginning, then the intro starts with the title scene which contains a part which shows the stats on the right side of the screen, then the "Atari presents" and then the noisily dotted "Robotron" animation.

printChar is used twice: first to show the score, later to show the "Atari presents". A node is located where its stretch executed for the first time, so there is an upward JSR arrow from showtext to printChar. Showing nodes only once is another type of pruning which has turned out really helpful for bigger trees.

The second alternative also shows major loops. If I press space then the intro screen changes to the control selection:

If I press escape then the intro starts over with the "Atari presents" and the "Robotron" visualisation. The corresponding call trees:

The call tree shows the return to the intro sequence, but using the cycle-timed layout the flow of execution becomes more evident, with the yellow RTS jumps showing how the code is stitched together:

Together with the load/save snapshot feature of the workbench this has really given my understanding a good push forward. But it's of course just the superficial picture of the executing code. More insight will have to come from not only seeing the flow but also understanding which stretches read and manipulate memory. I haven't started with this topic yet, though.

White Flame · Post by **White Flame** » Sat Mar 16, 2019 10:50 pm

fschuhi wrote:

Here the nodes are ordered vertically by the cycle count of the first execution of that particular stretch.

That's a great idea. I've personally never gotten much value out of call graphs, but this effectively turns it into a readable flowchart. Nice!

Chromatix · Post by **Chromatix** » Sun Mar 17, 2019 1:23 am

Now might be a good time to mention another idiom, not restricted to the 6502: a subroutine may be entered using JMP instead of using JSR immediately followed by RTS. This is known as a tail-call optimisation. On the 6502 it saves 1 byte and 6 cycles.

GARTHWILSON · Post by **GARTHWILSON** » Sun Mar 17, 2019 2:50 am

Chromatix wrote:

Now might be a good time to mention another idiom, not restricted to the 6502: a subroutine may be entered using JMP instead of using JSR immediately followed by RTS. This is known as a tail-call optimisation. On the 6502 it saves 1 byte and 6 cycles.

actually 9 cycles, since JMP is 3 cycles shorter than JSR (3 versus 6), and the saved RTS is another 6.

fschuhi · Post by **fschuhi** » Sun Mar 17, 2019 10:55 am

GARTHWILSON wrote:

Chromatix wrote:

Now might be a good time to mention another idiom, not restricted to the 6502: a subroutine may be entered using JMP instead of using JSR immediately followed by RTS. This is known as a tail-call optimisation. On the 6502 it saves 1 byte and 6 cycles.

actually 9 cycles, since JMP is 3 cycles shorter than JSR (3 versus 6), and the saved RTS is another 6.

That's a nice one, thanks for pointing that out. So far I've only encountered simple loops and "exceptions" (PLA/PLA/JMP), but the main code is riddled with JMP, there might be the one or the other tail call among them.

BigEd · Post by **BigEd** » Sun Mar 17, 2019 1:16 pm

This reminds me of another thing, or three...

- a routine can fall through into the routine that it would have tail-called, if it is positioned right afterwards. Just the same, but without the JMP.
- a routine which needs to be run twice, such as a nibble-processing routine, can be called immediately before falling through:

Code: Select all

... some code
JSR dothething
dothething:
... the thing to be done twice
RTS

- and even again for something to be done four times (and I do believe I've seen this done, perhaps to deal with 4 bytes of a 32-bit value):

Code: Select all

... some code
JSR dotwothings
dotwothings:
JSR dothething
dothething:
... the thing to be done twice or even four times
RTS

fschuhi · Post by **fschuhi** » Thu Mar 21, 2019 7:46 am

I've managed to understand some important subroutines completely, on my own, for the first time ever

(Don't worry, I won't bother you every time when I understand anything new, it's just a heads-up

)

This is a snippet which is part of the game's intro screen:

Code: Select all

$456c  atariPresents JSR showtext      
                     ; $0d marks new line, followed by x/y coord
                     ; x in chars (0-39), y in hires pixel lines
$456f                .byte 0d 0c 12 "ATARI PRESENTS:" 00
$4582  doneAtari     RTS

showText is pretty straightforward:

Code: Select all

                                                              
$51b8  showtext      PLA               ; save return address
$51b9                STA $22           ;   will be incremented while reading chars
$51bb                PLA               
$51bc                STA $23           
                               
$51be  J51be         JSR readNextChar  ; main loop, char in A
$51c1  R51c1         TAX               
$51c2                BNE L51cb         ; is it #$00?
                               
$51c4  L51c4         LDA $23           ; yes, we reached the end
$51c6                PHA               ; point stack to last byte of buffer (#$00)
$51c7                LDA $22           
$51c9                PHA               
$51ca                RTS               ; RTS to address after the stash ending #$00
                               
$51cb  L51cb         CMP #$0d          ; start new line?
$51cd                BNE L51dc         
                               
$51cf  L51cf         JSR readNextChar  ; yes, load x-position (in chars)
$51d2  R51d2         STA $e0           ; $e0 = x (0-39)
$51d4                JSR readNextChar  ; load y-position (hires lines)
$51d7  R51d7         STA $e1           ; $e1 = y (0-192)
$51d9                JMP J51be         ; start reading the actuals chars
                               
$51dc  L51dc         LDX $e0           ; read char is #$00 or #$0d
$51de                LDY $e1           ;   => display actual char (in A) @ $e0/$e1
$51e0                JSR printChar     
$51e3  R51e3         INC $e0           ; move output pos to the right for next char
$51e5                JMP J51be         
                               
$51e8  readNextChar  INC $22           ; next char in stash (after JSR)
$51ea                BNE L51ee         
$51ec                INC $23           ; stash can be > 255 bytes
                               
$51ee  L51ee         LDY #$00          
$51f0                LDA ($22),Y       ; load char from stash
$51f2                RTS

Understanding the printChar, on the other hand, needed info about how the Apple II deals with hires graphics. The way the addressing of pixels works was optimised for needing as little hardware as possible. Certainly ingenious at the time, but from today's vantage point it resides in the "Nostalgia" board of this forum for a reason, I suppose.

TL;DR: The layout of the pixels on screen differs substantially from the layout of bits in screen memory.

If I remember correctly, this was the point in high school where I stopped dealing with Apple II lowlevel stuff and branched off to Turbo Pascal etc. Today I have access to many excellent primers on the net, including PDFs of Apple II manuals, so "understanding on my own" doesn't really hit the mark, of course.

Looking at printChar, it's not difficult at all to deal with the arcance addressing scheme:

Code: Select all

$5108  printChar     STY $fc           ; Y has hires line to print to (below)
$510a                STY $fe           ;   ($fc) = base for hires area
$510c                LDY #$40          ; will be ROLd => $8000 first offset
$510e                STY $09           
$5110                SEC               
$5111                SBC #$20          ; ascii table begins w/ <space>
$5113                ASL               ; A has char, each has 8 pixel lines
$5114                ASL               ; 3x ASL + ROL = x8 (2 bytes)
$5115                ASL               
$5116                ROL $09           ; ($08) has point to char pixel bytes
$5118                STA $08           
$511a                LDY #$07          ; each char has 8 pixel lines (BPL)
                               
$511c  L511c         LDA ($fc),Y       ; determine offset into hires for line
$511e                STA $5129         
$5121                LDA ($fe),Y       
$5123                STA $512a         
$5126                LDA ($08),Y       ; load char pixel line
$5128                STA $2900,X       
$512b                DEY               
$512c                BPL L511c         BPL goes from #$0-7, i.e. correct first offset
                               
$512e  L512e         RTS

A neat feature is the self-modifying code in L511c. $fd contains $12 and $ff contains $13, so the tables in $1200 and $1300 are the lo and hi bytes of the start addresses of the pixel lines. What remains is adding the horizontal character position.

: hi.jpg (11.48 KiB) Viewed 3531 times

BigEd · Post by **BigEd** » Thu Mar 21, 2019 8:30 am

A milestone - well done!

White Flame · Post by **White Flame** » Thu Mar 21, 2019 7:48 pm

fschuhi wrote:

I've managed to understand some important subroutines completely, on my own, for the first time ever

(Don't worry, I won't bother you every time when I understand anything new, it's just a heads-up

)

Congratulations! I certainly don't mind you spewing your discoveries on this thread; it's basically your thread, and a nice log for posterity as others might be doing the same thing. I didn't spend that much time in the code, so it's good to see that you're not hitting any barriers in tackling it for real. As a port, hopefully it'll not be super tied into any hardware specifics, though it doesn't look like the Apple II has too many hardware specifics to begin with.

reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II

Re: reverse engineering Robotron 2084 for the Apple II