Hi agsb,
It *is* easy (or it can be!)
Just be sure that ROM_CS\ and DEV_CS\ can't both be low at the same time.
One other thing you might want to think about is that 6522s and 6551s have two chip-select lines each, an active high one and an active low one. Depending on the complexity of your system, you might not need to decode them much at all. You could, e.g., connect A4 and A5 to CS1 and CS2\ of your VIA, and then connect them the other way round to CS0 and CS1\ of your ACIA.
Code:
xxxx xxxx xx01 0000 <- VIA addresses
. . .
xxxx xxxx xx01 1111
xxxx xxxx xx10 0000 <- ACIA addresses
. . .
xxxx xxxx xx10 1111
This works kind of like XOR: your IO devices will activate on A4/A5 = 10 and 01, so you still have A4/A5 = 00 and 11 for RAM and ROM addresses. This kind of decoding will mirror your IO devices throughout your entire address space (all those 'don't care' x's), but depending on what you're building that might be OK! There are some areas of the address space you probably don't want to mirror your IO in, though. How would your address space look if you used A5 and A6 or some other pair of address lines instead of A4 and A5?
Another way is to use your DEV_CS\ signal to drive both the VIA/ACIA active low selects (CS2\ for the VIA and CS1\ for the ACIA), and then use A4 and (NOT A4) to select between them. Depending on how you generate your DEV_CS\ signal, this can give you a more fully decoded address space. (You could use a 74x138 to generate your DEV_CS\ and park your IO at $8000 like you mentioned in your first post.)
Some useful logic chips often used for address decoding are:
74x30 8 input NAND gate, 74x688 8 bit magnitude comparator, 74x139 2 to 4 line decoder / demux, 74x138 3 to 8 line decoder / demux.
The traditional way of fully decoding an address space was to use cascading 74x138s. This will still work fine if you're building a slow system, but propagation delay will add up quickly!
One last thought: to me, setting aside 8k for IO is wasteful. That's room for *hundreds* of VIAs and ACIAs! Depending on your project, there might be tradeoffs that make that waste acceptable. However, if you're building a kind of "standard" system with a 32k SRAM and 32k EEPROM, your address space is already totally full of RAM and ROM. It's not like in the old days where your system might come from the store with 4k or even 2k of RAM. In this scenario decoding IO is a problem of *stealing bytes* from the RAM or (in your case) the ROM area of your address space. If you haven't already, I'd again suggest reading the two articles I linked in my previous post. Garth's page has a good discussion of why you might or might not care about mirroring your IO, and the COMPUTE II article talks about IO decoding as byte stealing.