6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Sep 22, 2024 9:24 am

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Tue Nov 17, 2020 6:27 pm 
Offline

Joined: Fri Nov 16, 2018 8:55 pm
Posts: 71
I'm using 64tass as my assembler. I'm running unexpectedly into a situation where my labeled branches are getting treated as TARGET_OFFSET -2. I could swear I have run into some version of this issue before and got things working. I've checked my notes and forum post history. If it's there, I can't seem to find it.

I've been very slowly reverse engineering a cartridge game. I've successfully build cartridge files up to this point using a Makefile. Things "just work." The cartconv tool provided by VICE isn't quite what I need. So, for now I'm just using a macro that generates a bunch of .byte directives. Maybe it's the purist in me but I want to fully separate concerns between the file type header (required by the C64 emulator) and the underlying code.

Well, this issue has come to the forefront a bit more quickly than I expected.

There is a chunk of code in the game I'm working on that I don't fully understand. I decided the best way to understand the code is to copy it out of the main file, get it to assemble in isolation, and try different combinations of inputs until I think I have it. You know, the tried and true "poke at it with a stick until it gives up its secrets" approach.

Since I'm now building a PRG file instead of a cartridge I need a BASIC loader. No problem, if I build one by hand using .byte directives and build the file I get what I expect. Everything works. Then I thought, "hey, let's experiment with building headers outside of the assembler — this will come in handy later."

I thought it would be easy and take all of 5 minutes consisting of the following steps in a Makefile:

  • Have petcat tokenize the BASIC loader: the result is a two byte PRG header followed by "10 sys 2062" correctly tokenized.
  • Have 64tass process the assembly code. The -b option has been passed to ensure no two-byte header is generated.
  • Have cat merge the files with $ cat basic_loader.bin raw_program.bin > program.prg.
  • The resulting file should "just work" identically to a file with the load address and tokenized BASIC spelled out in .byte directives.

No such luck. This does indeed work as expected EXCEPT the branch instructions are now TARGET_OFFSET -2. I can reproduce this reliably even with a simple "HELLO WORLD" program. What am I doing wrong?

EDIT: Updated subject to include 64tass.


Last edited by load81 on Tue Nov 17, 2020 7:01 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 17, 2020 6:39 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
(Might be worth adding [64tass] to your subject line/topic title)

I have no idea! If absolute addresses were off-by-two I might suspect the 2 byte PRG header. But it's hard to see how that would affect branches, because they are relative.


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 17, 2020 8:27 pm 
Offline

Joined: Mon Nov 18, 2019 8:08 pm
Posts: 9
Please post an example source code for such a simple hello world program and its Makefile. If it's short a hex dump of the expected result would be nice as well.

Thanks!


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 3:54 am 
Offline

Joined: Fri Nov 16, 2018 8:55 pm
Posts: 71
Okay, so I did quite a lot of digging after my previous post. I fixed one issue and ran into another one. The "I'm off by 2 bytes" issues was my own fault. I'll spell out my screw-up before launching into the current problem.

For my "known good" code looks like this:

Code:
; vim: cursorline columns=140 tabstop=4 syntax=64tass nu
; HELLO.asm
; PTR
CHROUT=$ffd2

*=    $0801
      .byte $0c, $08, $0a, $00, $9e      ; 00010   SYS
      .byte $32, $30, $36, $31, $00      ;         2061 ($080d)
      .word $0000                        ;         EOF

*=    $080d                              ; Next byte after BASIC EOF
      ldx #$00
loop
      lda message,x
      jsr CHROUT                         ; KERNAL API call to print output.
      inx
      cmp #$00                           ; Is .A == 0?
      bne loop                           ; While .A != 0, loop.
      rts                                ; Else, exit to BASIC.
       
message
      .null "hello world!"


The above works as expected. I had made an error with the -b flag, I think, which caused some confusion early on. I inherited this setting from my cartridge project's makefile due to a copy/paste error. My current flags are: -a --mos6502 and that's it. At this point, I can rule out 64tass as the culprit.

Here is where things get messed up and will not build correctly... I have a petcat formatted BASIC header that looks like this:

Code:
;==0801==
10 sys2061


The ASM file is identical to the one above from position $080d onward, so I won't duplicate it. I have an xxd dump that illustrates the issue. The relevant portion of the known-good program (defined by assembly .byte directives) looks like this:

Code:
07ff: 01  .   ; PRG LOAD ADDR LO
0800: 08  .   ; PRG LOAD ADDR HI
0801: 0c  .   ; BASIC NEXT LINE PTR LO
0802: 08  .   ; BASIC NEXT LINE PTR HI
0803: 0a  .   ; BASIC LINE LN LO
0804: 00  .   ; BASIC LINE LN HI
0805: 9e  .   ; SYS
0806: 32  2   ; 2061 ($080d)
0807: 30  0   ;
0808: 36  6   ;
0809: 31  1   ;
080a: 00  .   ; END OF LINE
080b: 00  .   ; END OF FILE (2 bytes)
080c: 00  .   ;
080d: a2  .   ; ldx, $00
080e: 00  .   ; ...


The broken file differs from the above at a single byte position in an area generated by petcat. It's the only difference in the entire file:

Code:
07ff: 01  .
0800: 08  .
0801: 0b  .   ; <= One bit less than its working counterpart ($0c).
0802: 08  .


The petcat tokenizer is invoked as follows $ petcat -w2 -o header_hello.bin header_hello.bas. While this single byte difference seems to break things when merged with raw assembly, if it's loaded directly into VICE with -autostartprgmode 1 header_[header_binary] it will at least LIST without issue. Files are merged as follows: $ cat header_hello.bin hello.bin > hello.prg.

So, the question becomes is this a VICE petcat bug?

For completeness, here is the xxd -c1 -o 0x7ff dump of the broken file:

Code:
07ff: 01  .
0800: 08  .
0801: 0b  .
0802: 08  .
0803: 0a  .
0804: 00  .
0805: 9e  .
0806: 32  2
0807: 30  0
0808: 36  6
0809: 31  1
080a: 00  .
080b: 00  .
080c: 00  .
080d: a2  .
080e: 00  .
080f: bd  .
0810: 1b  .
0811: 08  .
0812: 20   
0813: d2  .
0814: ff  .
0815: e8  .
0816: c9  .
0817: 00  .
0818: d0  .
0819: f5  .
081a: 60  `
081b: 48  H
081c: 45  E
081d: 4c  L
081e: 4c  L
081f: 4f  O
0820: 20   
0821: 57  W
0822: 4f  O
0823: 52  R
0824: 4c  L
0825: 44  D
0826: 21  !
0827: 00  .


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 5:23 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
Locations $0801 and 2 containing word $080B look correct to me, since it's locations $080B and C, not $080C and D, that contain the $0000 word indicating the end of BASIC program text. In other words, the petcat output looks correct, and your .byte statements doing the same thing look incorrect in that respect.

load81 wrote:
The broken file differs from the above at a single byte position in an area generated by petcat. It's the only difference in the entire file.

If both files are exactly the same except for that one byte, I fail to see how there can be a difference in the offsets of the branch instructions; if they're the same, they're the same. So I'm rather missing something here. Did you expect these not to be the same? (Even if the code were moved, the branch offsets should still be the same so long as the length of the code doesn't change.)

One technique I use in complex, multi-file situations like this is to put together a small Git repo that actually demonstrates the problem. (You can look at my vic20cc65 repo for an example of this; you'll note that there's a top-level Test script that builds the program and runs the emulator on it. Here unfortunately I merely show the expected output for human comparsion rather than make the script do an automated test showing that the problem exists, but the latter is better if you can manage it without too much difficulty.) One advantage of this technique is that doing the work to demonstrate the problem in this way can often lead you to the problem yourself, without even having to post for help.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 8:17 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
(I think the branch offset question is solved now, we're now only looking at the 'next line' pointer in the header. It might even be worth a new thread - but certainly a new title, again!)


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 9:05 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
BigEd wrote:
(I think the branch offset question is solved now, we're now only looking at the 'next line' pointer in the header. It might even be worth a new thread - but certainly a new title, again!)

Ah, upon re-reading I see that removing the -b flag from the 64tass options was believed to have fixed it. But according to [url=http://singularcrew.hu/64tass/]this manual[/u], -b suppresses "the 2 or 3 byte starting address before the resulting binary," which doesn't make sense; if you're concatenating the assembler output to the output of petcat, I'd think you'd want that header stripped because petcat has already supplied it, or you'd end up with a second starting address after the BASIC program but before the machine-language, which would mess up your SYS call. (Unless your SYS argument was already wrong by that amount, and this by chance fixed it.)

Given the number of odd-appearing things going on here, I personally wouldn't try to split this problem into smaller pieces or separate topics becuase I really do not think that the problem has even been clearly described, much less localized. Given that the "broken" file looks correct, this could even be as simple as a case of being confused about which file is working and which file isn't. (I know that I've certainly spent plenty of time debugging the wrong data in situations like this.) A script that produces both files under appropriate names and diffs their hexdumps, along with a manual test to confirm that the names produce their matching results, should fairly quickly confirm that that is or isn't the case.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 9:27 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
To try to confirm the behaviour of the bad code, I re-created the "bad" file from the xxd dump above using the following (the second verifies that a dump of the re-created file with the same parameters produces the same output):

Code:
sed -e 's/^....: //' -e 's/ .*//' bad.xxd | xxd -r -p > bad.prg
xxd -c1 -o 0x7ff bad.prg | sed -e 's/^0000//' | diff -u bad.xxd -

But when I run it in VICE with x64 bad.prg it seems to work fine.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 11:20 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
(You may well be right, Curt, this needs a holistic investigation.)


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 12:30 pm 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 325
If I'm understanding the current state of things correctly, you have a one byte difference in the files produced by two different methods. You describe the "everything in the .asm file" version as "known good", and want to know why the other method produces a different result?

cjs is right: petcat is producing the correct result, and your "known good" .asm file is in fact wrong. That byte should be $0b.

So why does the $0c version appear to be correct? If you run the program, it works fine. If you list the program, you get only "10 SYS2061" and no garbage lines after it. My guess is that running and listing a program don't use the line links. Operations that do use the links might fail in interesting ways (although I couldn't get anything to happen when I tried).

How is the $0b version failing? What happens when you run it? I created a file containing the $0b version (from your xxd -c1 -o 0x7ff dump), loaded it into my C64 simulator, and it was fine.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 1:10 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
I've just checked in VICE x64, and immediately after a NEW and a LOAD "*",8,1 (without doing a LIST or RUN), PEEK(2049) returns 11 ($0B), not 12 ($0C). So it looks as if BASIC be may be "fixing" obviously broken addresses in the file?

There's certainly code in BASIC to deal with rewriting those pointers, since LOAD "...",8,0 ignores the start address in the first two bytes of the file and loads a BASIC program at the standard start address, presumably to allow compatibility with PET BASIC programs. With the first line's next-line link pointing to $80C, that makes the next link $A200, which is outside of BASIC's RAM area, so perhaps it just decides at that point to write a $0000 end-of-text marker after the last good line and stop rewriting there.

I have carefully checked outside the emulator that the file I'm loading has a a $0C at that point, and that the filename in the emulator matches the (new) filename I'm using for this outside the emulator. I'd like to be able to do a BLOAD that would guarantee the BASIC interpreter doesn't poke at the code it's loading in the hope it's a BASIC program, but I don't think that CBM BASIC offers such a thing, does it?

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 18, 2020 10:28 pm 
Offline

Joined: Fri Nov 16, 2018 8:55 pm
Posts: 71
I found the culprit! CJS is indeed correct: petcat has the right linked-list value. My manually created BASIC header was wrong. I think it only worked because there wasn't a second for the bad linked-list pointer to have to deal with.

I manually re-checked as much as I could. I got very suspicious when CJS reconstructed the "bad" file and it worked for him. (That's some excellent use of sed and xxd, BTW.) So, I manually invoked the "bad" file instead of relying on "make run" to do it for me.

Code:
$ x64sc -autostartprgmode 1 bad-hello.prg


This now "just works." It seems I had a bug in my Makefile. I thought the two "good" and "bad" Makefiles were identical but they were not. I must have adjusted something in one and not the other while tired and the two went out of sync. Because they looked identical on casual inspection I got "tunnel vision" on the single byte difference. I'm a bit annoyed with myself.

Now I'm going to go see if I can get the .binary directive to work. Ideally, I'd prefer to import the tokenized BASIC loader instead of relying on cat and output redirection and file concatination.

Thanks for the help!


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 19, 2020 6:50 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
load81 wrote:
I found the culprit! CJS is indeed correct: petcat has the right linked-list value. My manually created BASIC header was wrong. I think it only worked because there wasn't a second for the bad linked-list pointer to have to deal with.

I have further confirmation of what was going on here.

After some examination of the VICE autostart parameters I discovered that there are two modes of interest: -autostartprgmode 1 (inject mode) loads the contents of the .prg directly into RAM, and -autostartprgmode 2 (disk image mode) creates an empty disk image and copies the .prg into it. When not explicitly specified, the previously used mode for that type of file (.prg or .d64 image) will be used. I tested these with -autoload file.prg to suppress the RUN command.

In disk image mode, a LOAD "*",8,1 command will be generated. In this case, loading an actual bad file with $801 = $C still produces PEEK(2049) = 11 (i.e., $0B). However, in inject mode I have seen PEEK(2049) = 12 and additional rubbish after the first line when doing LIST, indicating that it was the program file loader that had fixed that address. (For some reason now when I try to replicate this instead I get odd errors, such as a clear screen and READY prompt from the PEEK and UNCTIO/VER on the same line when I type LIST; this is still clearly indicating that something is corrupt, but I don't know why the behaviour has changed. It may be something to do with another parameter I used when testing; VICE seems to change some of its default parameters based on command-line options you gave in previous runs.)

Quote:
(That's some excellent use of sed and xxd, BTW.)

Thanks! That I still use so much shell scripting even decades after we've had better scripting languages available is actually rather a source of shame for me, but it does on rare occasions produce a clean result. (It's also worth mentioning that if you're distributing things via Git, Bash 3.x or higher is the one scripting language virtually guaranteed to be on everyone's system because it's included in Git for Windows.)

Quote:
It seems I had a bug in my Makefile. I thought the two "good" and "bad" Makefiles were identical but they were not. I must have adjusted something in one and not the other while tired and the two went out of sync. Because they looked identical on casual inspection I got "tunnel vision" on the single byte difference. I'm a bit annoyed with myself.

Well, I can tell you from long experience that being annoyed with yourself by this sort of thing will not be productive; you need to accept that this kind of thing is common human error that cannot be fixed by telling anyone (including yourself), "be more careful." The solution is to set up systems that expect and mitigate this.

My strategies include careful naming of things at all stages (to minimze the chance of confusion), committing unmodified copies of code from elsewhere before committing changes (so I can use git diff to verify what changes I've made, or whether I've even made them!), and automating procedures (e.g., as with my sed/Bash scripting above) both to avoid typing mistakes and to more clearly document and describe exactly what I am doing. (That sed/bash script above went through several versions in my local repo before I figured out how to do it correctly and in the clearest way.)

This requires a fair amount of discipline, but does not require heavyweight process; in fact the more discipline you can bring in (basically, always having an attitude that you develop everything with thought of human error in mind), the more casual your process can be. Though I put absolutely everything in Git and my commit messages always have a clear explanation of exactly why the change does things the way it does, I also almost never use pull requests (just sort it by directly talking to your fellow developers), web-based code review (again, grab another developer and just do it via pairing), ticketing systems (just fix it now), and so on.

Quote:
Now I'm going to go see if I can get the .binary directive to work. Ideally, I'd prefer to import the tokenized BASIC loader instead of relying on cat and output redirection and file concatination.

Importing it is definitely the right idea. I take it that your current thinking is that one of your source files would be the petcat input, and you'd generate the binary from that before assembling the program that includes it? Another option, if you have difficulty there, is to post-process the petcat output with xxd/sed/etc. to produce a file in assembler source format (i.e., with .byte statements) that you could bring in with a regular .include.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 25, 2020 8:21 am 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 395
Location: Minnesota
Late to the party, but IIRC, whenever a C64/C128/Vic20 BASIC program is loaded via the LOAD command, the line links are rebuilt by the interpreter after the load is complete but before control is returned to the user. So that would "fix" any link errors the loaded image might contain.

But I see the assembler you used has a "word" pseudo op (not surprising :-) ). Why not take more advantage of it and reduce the chances of making an error in the first place?

Code:
; vim: cursorline columns=140 tabstop=4 syntax=64tass nu
; HELLO.asm
; PTR
CHROUT=$ffd2

*=    $0801

      .word nextLine                     ;   link to next line
      .word 10                           ;   line#
      .byte $9e, $32, $30, $36, $31      ;   SYS 2061 ($080d)
      .byte $00                          ;   EOL
nextLine
      .word $0000                        ;   EOP

*=    $080d                              ; Next byte after BASIC EOF
      ldx #$00
loop
      lda message,x
      jsr CHROUT                         ; KERNAL API call to print output.
      inx
      cmp #$00                           ; Is .A == 0?
      bne loop                           ; While .A != 0, loop.
      rts                                ; Else, exit to BASIC.
       
message
      .null "hello world!"


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: