Well, progress is being made! I've got a simple breadboard set up with a 6551 and a 6522 for testing purposes. Since I'm still using the QXB board, I removed the resistor packs across the port 7 LEDs and soldered to CS0, CS3, and CS4. SRAM and flash work just fine, but I'm having some trouble capturing and using the CS0 signal.
From the datasheet, CS0 is linked to the DF00-DF1F range of addresses. So, I used a 3-to-8 decoder connected on A2-A4 and activated with CS0 low, which should give seven CS signals of four addresses each. In the circuit, however, it doesn't seem to be working as it should. As an example, I have my logic analyzer watching A2-A4 and the CS0 lines as I attempt to write to port 4 (should be $df10-$df13). I can see CS0 go low for $df11, $df12, and $df13, but $df10 stubbornly refuses to go low. A0-A2 are acting appropriately.
A further look at the datasheet shows a couple of notes:
Note 2: When on-chip ROM, CS3B and/or CS4B are enabled, then CS5B decode is reduced by the addresses used by same. CS0B and CS1B address space never appears in CS2B, CS4B or CS5B decoded space.
Note 4: CS0B is inactive when 0xDF00-0xDF07 are used for internal I/O register select (BCR0=0) when (BCR0=1) external memory bus is enabled CS0B is active for addresses 0xDF00-0xDF1F.
Since I'm not using CS5 (yet) and it isn't enabled anyway (only CS0, CS3, and CS4 are enabled), the only thing that should matter is BCR0. The monitor ROM sets it to 1 after a reset, and I've verified that it doesn't get changed. Adding BCR3 (emulation mode) doesn't change the lack of CS0 for $df10.
I've sent a follow-up email to David Gray about this, but I thought that if someone here might know, then that puts me a little further ahead.
EDIT: connected to all of the decoder's pins and did a tight loop to write every address from $df1d down to $df00 and then repeat, and I see all sixteen signals on the proper lines. This has to be a code issue.