Just one note before I start in: "CRT"
very frequently means "cathode ray tube monitor" in the computing community, and I've rarely or never head "CRT" used for cartridge. You might want to change the title of your initial post to make it easier for those browsing the board to know what this thread is about. (From the title, I had assumed that this was about hacking on an old video monitor.)
load81 wrote:
At a certain point, I want to do this to fairly large multi-disk C64 programs. At what point (if any) is it best to partially automate disassembly?
In my experience, right at the very start. Basic disassembly is mechanical and, for large amounts of code, very tedious, which is exactly the kind of job that computers are suited for. A disassembler will also identify locations that are the targets of jumps and create labels for them, which is very handy since you can look through that list of labels for entry points at which to start segments of disassembly. (It will also mis-label addresses that look like targets but are actually the result of incorrectly disassembling data; these you want to clean up fairly early on in the process.)
So what I normally do is start with a disassembler that takes a binary file and generates a text file containing the disassembly. (I don't generally use graphical tools; I've found that the hassle they introduce, especially when it comes to their custom data files and interfacing with a build system and so on isn't worth it.) These disassemblers generally take an annotation file as a second input where you can specify what areas to treat as code vs. data and so on.
I start with annotations for the start address of the disassembly, the CPU's system vectors (reset, IRQ, etc.) if those addresses are present, and any other entry points I know about (e.g., the C64 cartridge entry point, in your case). I then examine the disassembly and work from there, adding annotations for obvious data areas, examining the disassemblies for jump targets as they appear to see if they look sensible or not, and so on. I also start assigning my own label names at this point, though these are usually still tenative, and adding comment annotations to have the disassembler insert blank lines between major blocks. (I may start adding actual comments about the structure of the code, too.)
Once this is done I have in the past moved forward with further extensive annotations, trying to continue using the automated disassembler as long as possible, but I'm finding that the annotation file formats and the control they give me over the appearance of the disassembly output are not all that great. So I think I'm going to switch to starting sooner, rather than later, with changing from the disassembly being a target file (the output of the disassembler) to a source file, and no longer using the disassembler tool from that point on. Before doing this change you want of course to add your build system to build an output binary from the source and confirm that it matches the original input binary.
All of this is done in a Git repo, where I keep the source binary (unless there are copyright concerns), all my scripts and the tools I've written, often submodules for disassembly tools (for which I write appropriate build scripts), and usually the target disassembly during the automated stage to make it easier for someone brewing GitHub or GitLab to just go and read it. At the top level I always include a script called
Test that checks out and builds any necessary tools, runs the disassembly (if still at that stage), builds the source, and compares the assembled output to the original binary. A developer with a fresh clone of the repo should simply be able to type
./Test at the top level of his checkout and end up with either a working system ready for further development or some error messages indicating fairly clearly what's wrong.
You can see some examples of this in the various repos in the
retroabandon group on GitLab. The one that's seen the most work is the
Retrocomputing SE Mystery Board Disassembly, which is 6809 code but otherwise no different from doing a 6502 disassembly. I'd suggest walking through the commits from beginning to end (
git log --reverse --stat --patch): this well documents, in the commit messages and diffs, my journey from a raw ROM dump for a machine I knew almost nothing about to a mostly-complete disassembly, including the tools I built and used. (This will probably take a few hours, but from it you can learn much more quickly a lot of what took me many dozens of hours of work to learn.)
Quote:
Is anybody else doing this sort of thing in the C64 space? It seems more common among the Atari 2600 crowd.
Well, I do it for all sorts of machines, including C64. You can have a look at the start of
a disassembly of the Epyx Fastload cartridge that a friend and I started a while back; it's still in the very early stages (and at least temporarily abandoned, at this point), but you might find the infrastructure and tools there useful. (We use da65 from the cc65 suite to disassemble the code.) Again, it will probably be more comprehensible if you walk forward through the commits, since that's mainly where we documented what the various files are and so on.
Quote:
Is there a good "6502 ASM style guide" of sorts anywhere? I want the end result to be readable so that even a very fresh student of ASM can begin to follow it?
There's no such thing as a one "good style": as with any other kind of writing, good style varies (often drastically) depending on the target audience, and improving code presentation for one audience often makes it worse for others. (I go into this in some detail in
this answer on on the Software Engineering StackExchange.)
Take, for example,
this 6502 routine to convert an ASCII character to a binary number. It's only nine instructions, and not particularly tricky by the standards of an expert 6502 programmer. I annotate these 19 words (labels, ops and operands) of code
with over 400 words of commentary on how they work because this particular routine is one I use as an example for programmers not fairly experienced with 6502 assembly and/or tricky bit manipulation. But all that verbosity makes it harder for even me to read, and if this were code in project I was working on with experienced 6502 coders such as Dr. Jefyl or chromatix, I would most certainly trim all that verbosity down to just some key hints about what's going on, because that would make it easier for them to read (obviously at the expense of the current target audience).
So when you're writing code and considering the style, ask yourself "Who will be reading this and what do they already know?" Try to tell them what they don't know, but also try to avoid telling them obvious things that they do know because that will just make it harder for them to read.
Quote:
Is there an unofficial "standard library" of sorts for 64 projects that will work with 64tass? I know a lot of the A2600-types use a consistent set of headers for dasm.
I don't know of one (because I've never used 64tass), but I do suggest that you use names and symbols from the C64 KERNAL source and
popular disassemblies so that people familiar with those will more quickly understand your code. The standard library that comes with cc65
does this kind of thing.
Quote:
What's with the first instruction being sei? I very dimly recall reading about this as a common thing to do for cartridges, but I don't know why this is done.
Most likely the cart is doing some setup of the hardware, and it's important when touching anything that could generate interrupts that interrupts are
not generated when you are not yet prepared to handle them. In the normal situation the C64 KERNAL ROM has already
disabled interrupts and decimal mode and set up the stack, but the cartridge code doesn't know that this has actually run: perhaps someone took over the machine, set decimal mode, enabled interrupts and put the stack pointer at $00 before doing a
JMP ($8000).
Sure, you could say, "well, we don't support that situation" and that wouldn't be unreasonable. But often (as in this case) it's easier for you as the developer simply to handle pathological situations in a nice way rather than breaking, because in the end you're usually going to be the one debugging any problems that arise. (Even if you say, "we don't support that," confirming that the situation is indeed that one you don't support can be many hours of debugging.)
So most cartridges start with the standard 6502 setup-from-reset code that I mention
in my notes on CBM cartridge startup. Here, for example, is the code from the Epyx Fast Load cartridge:
Code:
008030 1 coldstart:
008030 1 78 sei
008031 1 D8 cld
008032 1 A2 FF ldx #$FF
008034 1 9A txs
008035 1 A9 27 lda #$27
008037 1 85 01 sta PIO
008039 1 A9 2F lda #$2F
00803B 1 85 00 sta $00
...