Here's what I wound up doing.
I defined a macro that inserts a BRK followed by a data byte into the code:
Code:
.macro debug value
brk
.byte value
.endmacro
then I pointed the BRK vector at $FFFE to a handler that calls fprintf to print the values of all the registers to stderr. I save all the cc65 zero page registers, since I use them in my assembly code too.
Code:
.include "zeropage.inc"
.import _fprintf, _stderr
.zeropage
save_pc: .res 2
.bss
save_a: .res 1
save_x: .res 1
save_y: .res 1
save_sp: .res 1
save_cc65_regs: .res zpsavespace
.code
format: .byte "$%02X: A=%02X X=%02X Y=%02X SP=%02X", $0A, $00
; Prints the register values to stderr.
debug_handler:
cld ; Clear decimal flag (just in case)
sta save_a ; Save 6502 registers
stx save_x
sty save_y
tsx ; Get stack pointer into X
stx save_sp ; Save it so we can print it
ldy $102,x ; PC low byte
sty save_pc
ldy $103,x ; PC high byte
dey ; Subtract 256 from PC; we will index with Y = 255 to get PC-1
sty save_pc+1
ldx #0 ; Prepare to save cc65 registers
@save_reg:
lda sp,x ; sp is the first register
sta save_cc65_regs,x
inx
cpx zpsavespace
bne @save_reg
lda _stderr ; fprintf(stderr, ...
ldx _stderr+1
jsr pushax
lda #<format ; format, ...
ldx #>format
jsr pushax
ldy #$FF
lda (save_pc),y ; id, ...
jsr pusha0
lda save_a ; A, ...
jsr pusha0
lda save_x ; X, ...
jsr pusha0
lda save_y ; Y, ...
jsr pusha0
lda save_sp ; SP)
jsr pusha0
ldy #14 ; 14 bytes on the C stack
jsr _fprintf
ldx #0 ; Prepare to restore cc65 registers
@restore_reg:
lda save_cc65_regs,x
sta sp,x
inx
cpx zpsavespace
bne @restore_reg
lda save_a ; Restore 6502 registers
ldx save_x
ldy save_y
rti
Then I just sprinkle the debug macro around problem parts of the code so I can see what's going on. Using BRK is nice because it's only 2 bytes (so less likely that the presence of the debug line will cause some branch to go out of range) and because I can stuff the extra value after the BRK, which I print along with the register values so I can tell which debug line generated the output.