6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 14, 2024 5:20 pm

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Mon Apr 26, 2021 2:33 am 
Offline

Joined: Wed Apr 21, 2021 7:29 am
Posts: 2
Hi everyone,
This is my first post, so please forgive if this is a really dumb question! :)

I am trying to reverse engineer the code for a Commodore CBM2022/2023/3022 printer. I have the ROM image and I have used various disassemblers.
In the most part, the output makes sense but I am having trouble understanding some of the branch addresses as they seem to be in the middle of instructions, which from my understanding, is not correct.

Here are some examples:

From http://www.white-flame.com/wfdis/:

Example 1. This appears to not have disassembled the BIT instructions correct to force the branch address to a valid location: (2c and 24 appear to be the starting byte of BIT operations)

Code:
Sf6ba               lda $c5
Sf6bc               cmp #$0d
                    beq Lf6d3
                    cmp #$8d
                    beq Lf6d3
                    cmp #$0a
                    bne Lf6d5
                    lda $dd
                    bne Lf6d0
                    lda #$0d
                    bne Lf6d3
Lf6d0               lda #$00
                    2c
Lf6d3               clc
                    24
Lf6d5               sec
                    rts


Example 2: This time the address is marked differently from the usual format (I assume 'Lf9f8 = * + 1' means 1 byte offset into where it should be)

Code:
Sf930               ldx #$09
                    lda #$30
Lf934               sta $b4,x
                    dex
                    bne Lf934
                    stx $bf
                    stx $be
                    beq Lf942
Lf93f               jsr Sfa0d
Lf942               jsr Sf6ba
                    bcs Lf94a
                    jmp Lf9f8
                   
Lf94a               cmp #$21

...
                    adc #$01
Lf9f1               sta $be
                    jsr Sfa0d
Lf9f6               sec
Lf9f8 = * + 1       
                    bit $18
                    rts


I tried a different disassemler, https://www.masswerk.at/6502/disassembler.html
and got a different output:

Code:
         06BA   A5 C5      LDA $C5
         06BC   C9 0D      CMP #$0D      
         06BE   F0 13      BEQ $06D3
         06C0   C9 8D      CMP #$8D      
         06C2   F0 0F      BEQ $06D3
         06C4   C9 0A      CMP #$0A   
         06C6   D0 0D      BNE $06D5
         06C8   A5 DD      LDA $DD
         06CA   D0 04      BNE $06D0
         06CC   A9 0D      LDA #$0D
         06CE   D0 03      BNE $06D3
         
         
         06D0   A9 00      LDA #$00
         06D2   2C 18 24   BIT $2418
         06D5   38         SEC
         06D6   60         RTS
         06D7   2C 00 02   BIT $0200
         06DA   50 14      BVC $06F0
         06DC   A5 C6      LDA $C6


This time is disassembled the code correct, but there are branches to address 06D3 which is in the 'middle' of the BIT operation.

I then tried, BeeDis and got a similar result:

Code:
.LF6BC
        CMP     #$0D
        BEQ     LF6D3

        CMP     #$8D
        BEQ     LF6D3

        CMP     #$0A
        BNE     LF6D5

        LDA     L00DD
        BNE     LF6D0

        LDA     #$0D
        BNE     LF6D3

.LF6D0
        LDA     #$00
.LF6D2
        BIT     L2418
LF6D3 = LF6D2+1
.LF6D5
        SEC
        RTS



So, I'm stumped. I thought that branches should always jump to the start of an instruction, not into the middle of one.
Does anyone know what's going on here?
Thank you :)

The rom is available from here: https://www.commodore.ca/manuals/funet/ ... dex-t.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 26, 2021 2:55 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Jumping into the middle of an instruction is a trick often used to reduce the number of branches in code.

In your first code example, jumping to Lf3d5 sets the carry flag and returns.

Jumping to Lf3d3 clears the carry flag and returns.

Jumping to Lf3d0 clears the accumulator and returns, leaving the carry flag alone.


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 26, 2021 3:21 am 
Offline

Joined: Wed Apr 21, 2021 7:29 am
Posts: 2
Ah ha! Thanks BillG, That makes sense now.

So if I wanted to create source code that I could then compile, from my disassembled code, what would I do in this case? (I assume I would need to expand it out to more than one branch. And that this code is the result of a compiler optimisation trick?)


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 26, 2021 4:10 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8484
Location: Midwestern USA
gwpt wrote:
Ah ha! Thanks BillG, That makes sense now.

Something else that can trip up disassembly is the presence of data tables in the middle of code. Commodore's firmware often will have a section of code, followed by a block of data which is used by the preceding code, followed by another section of code, followed by more data, etc. Disassembly will result in gibberish when a data block is encountered.

Quote:
So if I wanted to create source code that I could then compile, from my disassembled code, what would I do in this case? (I assume I would need to expand it out to more than one branch. And that this code is the result of a compiler optimisation trick?)

To be pedantic, one does not compile assembly language. :D

The use of BIT opcodes as described by Bill is a design feature incorporated into the source code by the programmer. An assembler essentially maintains a one-for-one correspondence between an assembly language instruction in the source code and a machine code instruction in the resulting binary. Unlike a, say, C compiler, an assembler generally does not attempt to optimize anything. If you write source code that is inefficient, the assembler will assemble it as written.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 26, 2021 4:11 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
That is most likely hand-written assembly language code. Compilers tend not to jump into the middle of instructions like that.

I would comment your code something like this:

Code:
.LF6D0
        LDA     #$00
.LF6D2
        BIT     L2418
LF6D3 = LF6D2+1   ; Jumps into middle of above instruction
                  ; Sees clc
                  ;      bit $38
                  ;      rts
.LF6D5
        SEC
        RTS


I previously missed the fact that the first bit instruction is absolute mode; jumping to LF6D0 also sets the carry flag


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 26, 2021 4:23 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
BigDumbDinosaur wrote:
Something else that can trip up disassembly is the presence of data tables in the middle of code. Commodore's firmware often will have a section of code, followed by a block of data which is used by the preceding code, followed by another section of code, followed by more data, etc. Disassembly will result in gibberish when a data block is encountered.

System calls in Apple ProDOS are like that:
Code:
    jsr     $BF00
    db      function code
    db or dw additional data


Another common technique is putting a text string inline:
Code:
    jsr     PrintInline
    db      'Hello world.'
    db      0

Turbo Pascal for CP/M does this extensively.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: