A few decades ago
Acorn were using "6502 Second Processors" to develop large programs like their operating systems and the early ARM systems - these were essentially 65C02 computers with 64K of RAM and no I/O, connected up to a BBC Micro to provide the missing I/O - keyboard, monitor, disk drives, networking, etc. They had a version of their BASIC that ran on this, and an in-house assembler called MASM. But after a while 64K wasn't enough, so they created a 256K version, and extended BASIC and MASM to support that.
The way the banking worked was interesting - they made all indirect,Y and indirect operations read an additional byte from page 3 at an address corresponding to the zero page address in the instruction, and then took the bottom two bits of data returned from this byte and banked in that RAM for the remainder of the instruction. So rather than indirecting through a two-byte address in zero page, you're kind of indirecting through a three-byte address, with the top byte coming from page 3. In the indirect,Y mode, Y is still added on by the CPU afterwards of course. For a slight performance penalty they got very transparent access to more RAM.
You can read more about it on stardot if you're interested:
https://stardot.org.uk/forums/viewtopic.php?f=3&t=11366including a first-hand account from one of the programmers who used this system when working on the BBC Master operating system code:
https://stardot.org.uk/forums/viewtopic ... 34#p287134Reading about this, I thought it might be possible to fit all the logic into an ATF22V10 PLD, so I gave it a go - below is what I ended up with. I haven't tried it and it might not work, but the simulation works at least in WinCUPL and I think the instruction cycles are correct now. Any comments on that would be most welcome.
There's also a big comment in the PLD code explaining how it works, the instruction timing, and what it needs to do.
Code:
______________
| turbo |
CLK x---|1 24|---x Vcc
SYNC x---|2 23|---x C0
D0 x---|3 22|---x C1
D1 x---|4 21|---x C2
D2 x---|5 20|---x LA17
D3 x---|6 19|---x LA16
D4 x---|7 18|---x PHI0
D5 x---|8 17|---x A17
D6 x---|9 16|---x A16
D7 x---|10 15|---x A9
CA8 x---|11 14|---x A8
GND x---|12 13|---x CA9
|______________|
Code:
Name turbo ;
PartNo 00 ;
Date 28/01/2023 ;
Revision 01 ;
Designer George Foot ;
Company gfoot360 ;
Assembly None ;
Location None ;
Device g22v10 ;
/* Revision 01 - incorporating feedback from cmorley, and inserting the idle
* cycle as it's not already something the CPU does
*/
/* Acorn's "Turbo" modification for the BBC Micro 6502 Second Processor
* extended the indirect,Y and indirect instructions to add an idle cycle
* and make it read an extra byte of data from a page 3 address corresponding
* to the accessed address in page 0, then used the bottom two bits of that
* data value to add two more address bits to the 16-bit address bus, allowing
* access to 256K of RAM rather than just the usual 64k with no performance
* cost beyond the usual cost of an indirect,Y instruction. If the relevant
* byte in page 3 was zero, then the instruction would naturally behave just
* like a normal indirect,Y operation.
*
* This takes advantage of the fact that the idle cycle exists, is a read
* cycle, and the 6502 is still driving the same address onto the address bus
* as it was on the previous cycle.
*
* Example: lda ($23),y
* cyc 0 : read from PC data = opcode
* cyc 1 : read from PC+1 data = $23
* cyc 2 : read from $0023 data = $NN (low byte of final address)
* cyc 3 : read from $0024 data = $MM (high byte of final address)
* cyc 4 : read from $MMNN+y data = value to load into A register
* cyc 5 : possibly read again if adding 'y' caused a carry
*
* With Turbo, we need to insert a cycle with address bus bits A8 and A9 set,
* causing a read from an address in page 3 corresponding to the zero page
* address used in cycle 3. To do this we need to insert it before cycle 3,
* not after it, so that the CPU is putting the right address on most of the
* address bus already.
*
* We can add the extra cycle using clock stretching or maybe RDY but clock
* stretching is simpler.
*
* Turbe example: lda ($23),y
* cyc 0 : read from PC data = opcode
* cyc 1 : read from PC+1 data = $23
* cyc 2 : read from $0023 data = $NN (low byte of final address)
* cyc 3 : read from $0324 data = $LL (very high byte of final addr)
* cyc 4 : read from $0024 data = $MM (high byte of final address)
* cyc 5 : read from $LLMMNN+y data = value to load into A register
* cyc 6 : possibly read again if adding 'y' caused a carry
*
* So we need to:
* * Watch for PHI2 rising with SYNC high and an indirect,Y opcode
* (maybe plain indirect too)
* * Wait until cycle 3
* * During cycle 3, while PHI2 is high force A8 and A9 high
* * During cycle 3, while PHI2 is high capture the data bus state
* * On cycle 3, when the input clock falls, still hold PHI2 high
* * From cycle 5 onwards, present the low bits of the captured data bus
* value as a high address byte (A16+) until the next SYNC
*
* indirect,Y opcodes are all of the form nnn10001. indirect are nnn10010.
* It is possible we can actually ignore the bottom two bits so long as any
* conflicting opcodes are for instructions with less than 6 cycles. But we
* need to spy on the bottom two bits of the data bus anyway so perhaps it's
* easy enough to pick out these opcodes accurately.
*
* Acting on the falling edge of the clock is hard for a simple PLD because it
* only supports one clock signal. We could probably use combinatorial logic
* instead of registered though for this bit.
*/
/* Inputs required:
*
* * CLK clock input - will be inverted
* * SYNC so we can coordinate spotting the opcode
* * D0-D7 so we can spot the opcodes we care about, and capture the
* low bits in the dummy read
* * CA8-CA9 CPU address bits 8 and 9 so we can pass them through most of
* the time
*
* Outputs required:
*
* * A8-A9 usually mirroring CA8-CA9 but sometimes forced to be 3
* * A16-A17 to access higher memory banks
* * PHI0 clock output to CPU to allow cycle stretching
*
* Intermediates:
*
* * C0-C2 cycle count for current instruction
* * LA16-LA17 latch the high address bus bits for the next cycle
*
* Hmm it looks like this might actually fit in a 22V10.
*
* The LA bits could maybe just be one bit, which enables/disables external
* logic like a multiplexer to decide whether to pass them through or not.
*
* D5-D7 are not actually required for the opcode check, so could be left out,
* if there's something else worth passing in as an input - not sure what
* would help with anything though at this stage, as usual it's outputs and
* buried logic that's scarce.
*/
pin 1 = CLK;
/* inputs - pins 2-11, 13, and whichever of 14-23 aren't used elsewhere */
pin 2 = SYNC;
pin [ 3..10 ] = [ D0..D7 ]; /* D5..D7 unused */
pin [ 11, 13 ] = [ CA8..CA9 ];
/* outputs - pins 14-23 */
pin [ 14..15 ] = [ A8..A9 ];
pin [ 16..17 ] = [ A16..A17 ];
pin 18 = PHI0;
/* intermediates */
pin [ 23, 22, 21 ] = [ C0..C2 ];
pin [ 19..20 ] = [ LA16..LA17 ];
Field DATA = [ D7..D0 ];
Field COUNT = [ C2..C0 ];
/* C0..C2 count cycles since the magic opcode was spotted. A SYNC without the
* magic opcode sets them to all bits set. When they reach all bits set, they
* do not wrap around, so this is an idle state. So the logic is, each should
* transition to set iff:
*
* SYNC and not the right opcode
* or !SYNC and all three were already set (stop at 7)
* or !SYNC and this bit was already set and there's no carry
* or !SYNC and this bit was not set but all lower bits were set
*/
/* This precise test could be relaxed if necessary to reduce product term usage */
wrong_op = !D4 # D3 # D2 # !D1 & !D0; /* 4 product terms */
at_limit = C0 & C1 & C2;
C0.d = SYNC & wrong_op # !SYNC & (at_limit # !C0 ); /* 6 PTs */
C1.d = SYNC & wrong_op # !SYNC & (at_limit # C1 & !( C0) # !C1 & ( C0)); /* 7 PTs */
C2.d = SYNC & wrong_op # !SYNC & (at_limit # C2 & !(C1 & C0) # !C2 & (C1 & C0)); /* 8 PTs */
/* For registered logic, the count will be one step behind. Combinatorial
* will see up-to-date count. */
[C0..C2].ar = 'b'0;
[C0..C2].sp = 'b'0;
/* PHI0 is an inverted copy of CLK except during cycle 3 when it stays high throughout */
PHI0 = !CLK # COUNT:'h'3;
/* During cycle 3 we force A8 and A9 high. Note COUNT=2 during cycle 3. */
[A9..A8] = [CA9..CA8] # COUNT:'h'2;
/* At the end of cycle 3, clock the data bits into LA16 and LA17 for later
* reference, clearing it on sync. Note that COUNT=2 prior to the clock edge. */
[LA17..LA16].d = !SYNC & !(COUNT:'h'2) & [LA17..LA16] # COUNT:'h'2 & [D1..D0];
[LA17..LA16].ar = 'b'0;
[LA17..LA16].sp = 'b'0;
/* For cycles 5 or later, output the latched address bits. Note COUNT is one
* lower prior to the clock edge, and we subtract one more because it also
* indicates the previous cycle not the current cycle - so this applies to
* COUNT values of 3 or greater. Also don't do this if SYNC because that's
* the next instruction fetch. */
[A17..A16].d = !SYNC & (COUNT:'h'3 # C2) & [LA17..LA16];
[A17..A16].ar = 'b'0;
[A17..A16].sp = 'b'0;
Code:
Name turbo;
PartNo 00;
Date 28/01/2023;
Revision 01;
Designer George Foot;
Company gfoot360;
Assembly None;
Location None;
Device g22v10;
ORDER: CLK, %1, PHI0, %1, SYNC, %1, DATA, %2, COUNT;
VECTORS:
P 0 0 '00' '0'
$msg "Test PHI0 and COUNT behaviour - non-triggering instructions";
0 H 1 '00' "7"
1 L 1 '00' "7"
0 H 0 '00' "7"
1 L 0 '00' "7"
0 H 0 '12' "7"
1 L 0 '12' "7"
0 H 1 '56' "7"
1 L 1 '56' "7"
0 H 0 '11' "7"
1 L 0 '11' "7"
$msg "Test PHI0 and COUNT behaviour - triggering instruction";
0 H 0 '00' "7"
1 L 0 '00' "7"
0 H 1 '12' "7" /* cyc 0 ph2 */
1 L 1 '12' "0"
0 H 0 '00' "0" /* cyc 1 ph2 */
1 L 0 '00' "1"
0 H 0 '00' "1" /* cyc 2 ph2 */
1 L 0 '00' "2"
0 H 0 '00' "2" /* cyc 3 ph2 */
1 H 0 '00' "3"
0 H 0 '00' "3" /* cyc 4 ph2 */
1 L 0 '00' "4"
0 H 0 '00' "4" /* cyc 5 ph2 */
1 L 0 '00' "5"
0 H 0 '00' "5"
1 L 0 '00' "6"
0 H 0 '00' "6"
1 L 0 '00' "7"
0 H 0 '00' "7"
1 L 0 '00' "7"
$msg "Test PHI0 and COUNT behaviour - odd state recovery";
0 H 1 'f1' "7"
1 L 1 'f1' "0"
0 H 0 '00' "0"
1 L 0 '00' "1"
0 H 0 '00' "1"
1 L 0 '00' "2"
0 H 1 'f1' "2"
1 L 1 'f1' "0"
0 H 0 '00' "0"
1 L 0 '00' "1"
0 H 0 '00' "1"
1 L 0 '00' "2"
0 H 1 '00' "2"
1 L 1 '00' "7"
ORDER: CLK, %1, PHI0, %1, SYNC, %1, DATA, %2, COUNT, %2, CA9, CA8, %1, A9, A8, %2, LA17, LA16, %1, A17, A16;
VECTORS:
P 0 0 '00' '0' 00 00 11 11
$msg "Test A8 and A9 passthrough";
C H 0 '00' "7" 00 LL LL LL
C H 0 '00' "7" 01 LH LL LL
C H 0 '00' "7" 10 HL LL LL
C H 0 '00' "7" 11 HH LL LL
$msg "Test cycle 3 read from page 3, latching LA16..LA17, setting A16..A17";
C H 1 '00' "7" 00 LL LL LL
C H 1 '12' "0" 00 LL LL LL
C H 0 '00' "1" 00 LL LL LL
0 H 0 '00' "1" 00 LL LL LL /* cycle 2 ph2 - read from ZP */
1 L 0 '00' "2" 00 HH LL LL /* cycle 3 ph1 */
0 H 0 '72' "2" 00 HH LL LL /* cycle 3 ph2 - read from page 3+ZP+1 */
1 H 0 '72' "3" 00 LL HL LL /* cycle 4 ph1 - stretched clock */
0 H 0 '10' "3" 00 LL HL LL /* cycle 4 ph2 - read from ZP+1 */
1 L 0 '00' "4" 10 HL HL HL /* cycle 5 ph1 */
0 H 0 '00' "4" 10 HL HL HL /* cycle 5 ph2 */
1 L 0 '00' "5" 11 HH HL HL /* cycle 6 ph1 */
0 H 0 '00' "5" 11 HH HL HL /* cycle 6 ph2 */
1 L 0 '00' "6" 11 HH HL HL /* cycle 7 ph1 - shouldn't ever get this far */
0 H 0 '00' "6" 11 HH HL HL /* cycle 7 ph2 */
C H 0 '00' "7" 00 LL HL HL
C H 0 '00' "7" 00 LL HL HL
$msg "Test that it resets correctly on next SYNC";
C H 1 '00' "7" 00 LL LL LL