Trying to understand why disassembler output seems incorrect

Building your first 6502-based project? We'll help you get started here.
Post Reply
gwpt
Posts: 2
Joined: 21 Apr 2021

Trying to understand why disassembler output seems incorrect

Post by gwpt »

Hi everyone,
This is my first post, so please forgive if this is a really dumb question! :)

I am trying to reverse engineer the code for a Commodore CBM2022/2023/3022 printer. I have the ROM image and I have used various disassemblers.
In the most part, the output makes sense but I am having trouble understanding some of the branch addresses as they seem to be in the middle of instructions, which from my understanding, is not correct.

Here are some examples:

From http://www.white-flame.com/wfdis/:

Example 1. This appears to not have disassembled the BIT instructions correct to force the branch address to a valid location: (2c and 24 appear to be the starting byte of BIT operations)

Code: Select all

Sf6ba               lda $c5
Sf6bc               cmp #$0d
                    beq Lf6d3
                    cmp #$8d
                    beq Lf6d3
                    cmp #$0a
                    bne Lf6d5
                    lda $dd
                    bne Lf6d0
                    lda #$0d
                    bne Lf6d3
Lf6d0               lda #$00
                    2c 
Lf6d3               clc
                    24 
Lf6d5               sec
                    rts
Example 2: This time the address is marked differently from the usual format (I assume 'Lf9f8 = * + 1' means 1 byte offset into where it should be)

Code: Select all

Sf930               ldx #$09
                    lda #$30
Lf934               sta $b4,x
                    dex
                    bne Lf934
                    stx $bf
                    stx $be
                    beq Lf942
Lf93f               jsr Sfa0d
Lf942               jsr Sf6ba
                    bcs Lf94a
                    jmp Lf9f8
                    
Lf94a               cmp #$21

...
                    adc #$01
Lf9f1               sta $be
                    jsr Sfa0d
Lf9f6               sec
Lf9f8 = * + 1       
                    bit $18
                    rts
I tried a different disassemler, https://www.masswerk.at/6502/disassembler.html
and got a different output:

Code: Select all

			06BA   A5 C5      LDA $C5
			06BC   C9 0D      CMP #$0D		
			06BE   F0 13      BEQ $06D3
			06C0   C9 8D      CMP #$8D		
			06C2   F0 0F      BEQ $06D3
			06C4   C9 0A      CMP #$0A	
			06C6   D0 0D      BNE $06D5
			06C8   A5 DD      LDA $DD
			06CA   D0 04      BNE $06D0
			06CC   A9 0D      LDA #$0D
			06CE   D0 03      BNE $06D3
			
			
			06D0   A9 00      LDA #$00
			06D2   2C 18 24   BIT $2418
			06D5   38         SEC
			06D6   60         RTS
			06D7   2C 00 02   BIT $0200
			06DA   50 14      BVC $06F0
			06DC   A5 C6      LDA $C6
This time is disassembled the code correct, but there are branches to address 06D3 which is in the 'middle' of the BIT operation.

I then tried, BeeDis and got a similar result:

Code: Select all

.LF6BC
        CMP     #$0D
        BEQ     LF6D3

        CMP     #$8D
        BEQ     LF6D3

        CMP     #$0A
        BNE     LF6D5

        LDA     L00DD
        BNE     LF6D0

        LDA     #$0D
        BNE     LF6D3

.LF6D0
        LDA     #$00
.LF6D2
        BIT     L2418
LF6D3 = LF6D2+1
.LF6D5
        SEC
        RTS

So, I'm stumped. I thought that branches should always jump to the start of an instruction, not into the middle of one.
Does anyone know what's going on here?
Thank you :)

The rom is available from here: https://www.commodore.ca/manuals/funet/ ... dex-t.html
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Trying to understand why disassembler output seems incor

Post by BillG »

Jumping into the middle of an instruction is a trick often used to reduce the number of branches in code.

In your first code example, jumping to Lf3d5 sets the carry flag and returns.

Jumping to Lf3d3 clears the carry flag and returns.

Jumping to Lf3d0 clears the accumulator and returns, leaving the carry flag alone.
gwpt
Posts: 2
Joined: 21 Apr 2021

Re: Trying to understand why disassembler output seems incor

Post by gwpt »

Ah ha! Thanks BillG, That makes sense now.

So if I wanted to create source code that I could then compile, from my disassembled code, what would I do in this case? (I assume I would need to expand it out to more than one branch. And that this code is the result of a compiler optimisation trick?)
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Trying to understand why disassembler output seems incor

Post by BigDumbDinosaur »

gwpt wrote:
Ah ha! Thanks BillG, That makes sense now.

Something else that can trip up disassembly is the presence of data tables in the middle of code. Commodore's firmware often will have a section of code, followed by a block of data which is used by the preceding code, followed by another section of code, followed by more data, etc. Disassembly will result in gibberish when a data block is encountered.

Quote:
So if I wanted to create source code that I could then compile, from my disassembled code, what would I do in this case? (I assume I would need to expand it out to more than one branch. And that this code is the result of a compiler optimisation trick?)

To be pedantic, one does not compile assembly language. :D

The use of BIT opcodes as described by Bill is a design feature incorporated into the source code by the programmer. An assembler essentially maintains a one-for-one correspondence between an assembly language instruction in the source code and a machine code instruction in the resulting binary. Unlike a, say, C compiler, an assembler generally does not attempt to optimize anything. If you write source code that is inefficient, the assembler will assemble it as written.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Trying to understand why disassembler output seems incor

Post by BillG »

That is most likely hand-written assembly language code. Compilers tend not to jump into the middle of instructions like that.

I would comment your code something like this:

Code: Select all

.LF6D0
        LDA     #$00
.LF6D2
        BIT     L2418
LF6D3 = LF6D2+1   ; Jumps into middle of above instruction
                  ; Sees clc
                  ;      bit $38
                  ;      rts
.LF6D5
        SEC
        RTS
I previously missed the fact that the first bit instruction is absolute mode; jumping to LF6D0 also sets the carry flag
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Trying to understand why disassembler output seems incor

Post by BillG »

BigDumbDinosaur wrote:
Something else that can trip up disassembly is the presence of data tables in the middle of code. Commodore's firmware often will have a section of code, followed by a block of data which is used by the preceding code, followed by another section of code, followed by more data, etc. Disassembly will result in gibberish when a data block is encountered.
System calls in Apple ProDOS are like that:

Code: Select all

    jsr     $BF00
    db      function code
    db or dw additional data
Another common technique is putting a text string inline:

Code: Select all

    jsr     PrintInline
    db      'Hello world.'
    db      0
Turbo Pascal for CP/M does this extensively.
Post Reply