Last week I was playing with a PDP-8 replica (see
here,
here and
here) and noticed that it has just a 12 bit address space - 4k words - but with a scheme for up to 8 banks of memory (which it calls fields.) It seemed like a very simple scheme which might need minimal hardware, so I started thinking about how to do the same thing for a 6502 or 65C02. [The 65816 already has a solution for extended memory addressing!]
As it happens, I misunderstood what the PDP-8 offers: it has two bank registers, one is for indirect addressing and the other is for everything else. This is a little like the
6509, which has one for opcodes B1 and 91 and another register for everything else. That is, LDA and STA (zp),Y. We discussed that
here. It's also a little like Acorn's in-house extended-address machine, which we discussed
here - in that case, it modifies all 16 opcodes which use (zp) and (zp),Y addressing modes to add a third byte to the indirect address.
(Let me mention in passing
previous ideas of using a TTL register file to act as an MMU. The
tgl6502 does something similar, in physical emulation.
Acorn's Beeb and the
C64 and
Apple II all have some distinguished memory range which can map any one of several devices - much like plugging in a different chip or cartridge which always appears at the same place in memory. The Apple III has a much
more complex scheme, implemented with the help of a custom chip. Jeff's own
KimKlone has a few high-address registers which come into play using custom opcodes or opcode prefixes.)
In discussion with Jeff, he pointed out some useful distinctions. For example, between three kinds of schemes:
- where some opcodes are treated specially
- where different address ranges are treated specially
- where different types of memory access are treated specially
Jeff also pointed out that using a wide pointer into a window of size 8k or 16k - or anything other than 256 bytes or a full 64k - requires some shifting and masking to assemble the right physical address.
Anyway, back to my thinking about a 6502 with minimal mapping hardware. I was thinking of just two memory extension registers, and thinking that one would apply to code and the other to data, and just adding high bits and therefore banking the whole 64k memory at once. By code I mean both the opcode and the operands, but nothing else. I wanted to avoid counting cycles and modelling the different instruction lengths as much as possible.
(I should confess I've done no thinking about interrupts. If the hardware needs to use interrupts, there's a need to preserve and restore the bank(s) before and after, and a need to map in the vectors at the right time, or to replicate them in each bank. Replication of vectors and handlers might be enough... but that does puncture the larger address space, if each bank has some fixed mapping in it, which then makes it harder to point at data structures which span bank boundaries.)
The minimum address extension would be a single 8 bit latch, half of which provides 4 bits of address extension for code accesses and the other half providing the high nibble for data accesses. For a little increase in complexity we could have two latches and offer up to 8 bits of address extension for each case.
But how to distinguish code cycles from other cycles without counting? It turns out it's enough to detect
columns 0 and 8, which is just 3 bits of decode, with one caveat: we need also to detect page one accesses. (For the 65C02 we need to also decode column A, which takes a couple more gates I think.)
So, we need to use SYNC (and RDY if that's in play) to latch at least a few bits of the opcode, and we need to decode the top byte of the address bus to detect page one. (As an enhancement, we could also detect page zero accesses, and treat those specially. And, thinking about the interrupt problem, perhaps page FF accesses too.)
Oh, and we also need always to map the second access, the one after the opcode, because that's always either a dummy or an operand byte. But unlike the Acorn and Apple III schemes, we don't need to count up to six to intercept the final cycles of indexed addressing modes.
So, we've got code accesses satisfied from one bank, and data accesses satisfied by another, or by the same one, depending on how we've set the latches. We've got a single page one for the stack always in a fixed bank, and optionally we've got a single page zero too, always in the same bank.
Question is, is this useful? It's a very simple and very coarse banking.
(As a sketch of how coarse banking might be useful, I think John Kortink's port of BBC Basic to the 816 allows for the Basic program's data to be in bank 1 and for everything else to be in bank 0. So that allows for programs to be up to 40k or so with data up to 64k. The bulk of the OS and all the I/O is in another 64k bank, in fact on another computer. So that's a sketch of three banks in play, all with mostly 16 bit pointers: OS and base machine, user program, user data.)