6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 11:57 pm

All times are UTC




Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Sat Jan 28, 2023 11:10 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
A few decades ago Acorn were using "6502 Second Processors" to develop large programs like their operating systems and the early ARM systems - these were essentially 65C02 computers with 64K of RAM and no I/O, connected up to a BBC Micro to provide the missing I/O - keyboard, monitor, disk drives, networking, etc. They had a version of their BASIC that ran on this, and an in-house assembler called MASM. But after a while 64K wasn't enough, so they created a 256K version, and extended BASIC and MASM to support that.

The way the banking worked was interesting - they made all indirect,Y and indirect operations read an additional byte from page 3 at an address corresponding to the zero page address in the instruction, and then took the bottom two bits of data returned from this byte and banked in that RAM for the remainder of the instruction. So rather than indirecting through a two-byte address in zero page, you're kind of indirecting through a three-byte address, with the top byte coming from page 3. In the indirect,Y mode, Y is still added on by the CPU afterwards of course. For a slight performance penalty they got very transparent access to more RAM.

You can read more about it on stardot if you're interested: https://stardot.org.uk/forums/viewtopic.php?f=3&t=11366
including a first-hand account from one of the programmers who used this system when working on the BBC Master operating system code: https://stardot.org.uk/forums/viewtopic ... 34#p287134

Reading about this, I thought it might be possible to fit all the logic into an ATF22V10 PLD, so I gave it a go - below is what I ended up with. I haven't tried it and it might not work, but the simulation works at least in WinCUPL and I think the instruction cycles are correct now. Any comments on that would be most welcome.

There's also a big comment in the PLD code explaining how it works, the instruction timing, and what it needs to do.

Code:
                               ______________
                              |    turbo     |
                      CLK x---|1           24|---x Vcc                     
                     SYNC x---|2           23|---x C0                       
                       D0 x---|3           22|---x C1                       
                       D1 x---|4           21|---x C2                       
                       D2 x---|5           20|---x LA17                     
                       D3 x---|6           19|---x LA16                     
                       D4 x---|7           18|---x PHI0                     
                       D5 x---|8           17|---x A17                     
                       D6 x---|9           16|---x A16                     
                       D7 x---|10          15|---x A9                       
                      CA8 x---|11          14|---x A8                       
                      GND x---|12          13|---x CA9                     
                              |______________|


Code:
Name     turbo ;
PartNo   00 ;
Date     28/01/2023 ;
Revision 01 ;
Designer George Foot ;
Company  gfoot360 ;
Assembly None ;
Location None ;
Device   g22v10 ;

/* Revision 01 - incorporating feedback from cmorley, and inserting the idle
 *               cycle as it's not already something the CPU does
 */

/* Acorn's "Turbo" modification for the BBC Micro 6502 Second Processor
 * extended the indirect,Y and indirect instructions to add an idle cycle
 * and make it read an extra byte of data from a page 3 address corresponding
 * to the accessed address in page 0, then used the bottom two bits of that
 * data value to add two more address bits to the 16-bit address bus, allowing
 * access to 256K of RAM rather than just the usual 64k with no performance
 * cost beyond the usual cost of an indirect,Y instruction.  If the relevant
 * byte in page 3 was zero, then the instruction would naturally behave just
 * like a normal indirect,Y operation.
 *
 * This takes advantage of the fact that the idle cycle exists, is a read
 * cycle, and the 6502 is still driving the same address onto the address bus
 * as it was on the previous cycle.
 *
 * Example:  lda ($23),y
 *    cyc 0 :   read from PC       data = opcode
 *    cyc 1 :   read from PC+1     data = $23
 *    cyc 2 :   read from $0023    data = $NN (low byte of final address)
 *    cyc 3 :   read from $0024    data = $MM (high byte of final address)
 *    cyc 4 :   read from $MMNN+y  data = value to load into A register
 *    cyc 5 :   possibly read again if adding 'y' caused a carry
 *
 * With Turbo, we need to insert a cycle with address bus bits A8 and A9 set,
 * causing a read from an address in page 3 corresponding to the zero page
 * address used in cycle 3.  To do this we need to insert it before cycle 3,
 * not after it, so that the CPU is putting the right address on most of the
 * address bus already.
 *
 * We can add the extra cycle using clock stretching or maybe RDY but clock
 * stretching is simpler.
 *
 * Turbe example:  lda ($23),y
 *    cyc 0 :   read from PC         data = opcode
 *    cyc 1 :   read from PC+1       data = $23
 *    cyc 2 :   read from $0023      data = $NN (low byte of final address)
 *    cyc 3 :   read from $0324      data = $LL (very high byte of final addr)
 *    cyc 4 :   read from $0024      data = $MM (high byte of final address)
 *    cyc 5 :   read from $LLMMNN+y  data = value to load into A register
 *    cyc 6 :   possibly read again if adding 'y' caused a carry
 *
 * So we need to:
 *    * Watch for PHI2 rising with SYNC high and an indirect,Y opcode
 *         (maybe plain indirect too)
 *    * Wait until cycle 3
 *    * During cycle 3, while PHI2 is high force A8 and A9 high
 *    * During cycle 3, while PHI2 is high capture the data bus state
 *    * On cycle 3, when the input clock falls, still hold PHI2 high
 *    * From cycle 5 onwards, present the low bits of the captured data bus
 *         value as a high address byte (A16+) until the next SYNC
 *
 * indirect,Y opcodes are all of the form nnn10001.  indirect are nnn10010.
 * It is possible we can actually ignore the bottom two bits so long as any
 * conflicting opcodes are for instructions with less than 6 cycles.  But we
 * need to spy on the bottom two bits of the data bus anyway so perhaps it's
 * easy enough to pick out these opcodes accurately.
 *
 * Acting on the falling edge of the clock is hard for a simple PLD because it
 * only supports one clock signal.  We could probably use combinatorial logic
 * instead of registered though for this bit.
 */

/* Inputs required:
 *
 *    * CLK       clock input - will be inverted
 *    * SYNC      so we can coordinate spotting the opcode
 *    * D0-D7     so we can spot the opcodes we care about, and capture the
 *                    low bits in the dummy read
 *    * CA8-CA9   CPU address bits 8 and 9 so we can pass them through most of
 *                    the time
 *
 * Outputs required:
 *
 *    * A8-A9     usually mirroring CA8-CA9 but sometimes forced to be 3
 *    * A16-A17   to access higher memory banks
 *    * PHI0      clock output to CPU to allow cycle stretching
 *
 * Intermediates:
 *
 *    * C0-C2       cycle count for current instruction
 *    * LA16-LA17   latch the high address bus bits for the next cycle
 *
 * Hmm it looks like this might actually fit in a 22V10.
 *
 * The LA bits could maybe just be one bit, which enables/disables external
 * logic like a multiplexer to decide whether to pass them through or not.
 *
 * D5-D7 are not actually required for the opcode check, so could be left out,
 * if there's something else worth passing in as an input - not sure what
 * would help with anything though at this stage, as usual it's outputs and
 * buried logic that's scarce.
 */

pin 1 = CLK;

/* inputs - pins 2-11, 13, and whichever of 14-23 aren't used elsewhere */
pin 2 = SYNC;
pin [ 3..10 ] = [ D0..D7 ];     /* D5..D7 unused */
pin [ 11, 13 ] = [ CA8..CA9 ];

/* outputs - pins 14-23 */
pin [ 14..15 ] = [ A8..A9 ];
pin [ 16..17 ] = [ A16..A17 ];
pin 18 = PHI0;

/* intermediates */
pin [ 23, 22, 21 ] = [ C0..C2 ];
pin [ 19..20 ] = [ LA16..LA17 ];


Field DATA = [ D7..D0 ];
Field COUNT = [ C2..C0 ];

/* C0..C2 count cycles since the magic opcode was spotted.  A SYNC without the
 * magic opcode sets them to all bits set.  When they reach all bits set, they
 * do not wrap around, so this is an idle state.  So the logic is, each should
 * transition to set iff:
 *
 *    SYNC and not the right opcode
 * or !SYNC and all three were already set (stop at 7)
 * or !SYNC and this bit was already set and there's no carry
 * or !SYNC and this bit was not set but all lower bits were set
 */

/* This precise test could be relaxed if necessary to reduce product term usage */
wrong_op = !D4 # D3 # D2 # !D1 & !D0; /* 4 product terms */
at_limit = C0 & C1 & C2;

C0.d = SYNC & wrong_op # !SYNC & (at_limit #                   !C0            ); /* 6 PTs */
C1.d = SYNC & wrong_op # !SYNC & (at_limit # C1 & !(     C0) # !C1 & (     C0)); /* 7 PTs */
C2.d = SYNC & wrong_op # !SYNC & (at_limit # C2 & !(C1 & C0) # !C2 & (C1 & C0)); /* 8 PTs */

/* For registered logic, the count will be one step behind.  Combinatorial
 * will see up-to-date count. */
[C0..C2].ar = 'b'0;
[C0..C2].sp = 'b'0;

/* PHI0 is an inverted copy of CLK except during cycle 3 when it stays high throughout */
PHI0 = !CLK # COUNT:'h'3;

/* During cycle 3 we force A8 and A9 high.  Note COUNT=2 during cycle 3. */
[A9..A8] = [CA9..CA8] # COUNT:'h'2;

/* At the end of cycle 3, clock the data bits into LA16 and LA17 for later
 * reference, clearing it on sync.  Note that COUNT=2 prior to the clock edge. */
[LA17..LA16].d = !SYNC & !(COUNT:'h'2) & [LA17..LA16] # COUNT:'h'2 & [D1..D0];
[LA17..LA16].ar = 'b'0;
[LA17..LA16].sp = 'b'0;

/* For cycles 5 or later, output the latched address bits.  Note COUNT is one
 * lower prior to the clock edge, and we subtract one more because it also
 * indicates the previous cycle not the current cycle - so this applies to
 * COUNT values of 3 or greater.  Also don't do this if SYNC because that's
 * the next instruction fetch. */
[A17..A16].d = !SYNC & (COUNT:'h'3 # C2) & [LA17..LA16];
[A17..A16].ar = 'b'0;
[A17..A16].sp = 'b'0;


Code:
Name     turbo;
PartNo   00;
Date     28/01/2023;
Revision 01;
Designer George Foot;
Company  gfoot360;
Assembly None;
Location None;
Device   g22v10;


ORDER: CLK, %1, PHI0, %1, SYNC, %1, DATA, %2, COUNT;

VECTORS:
P 0 0 '00' '0'

$msg "Test PHI0 and COUNT behaviour - non-triggering instructions";
0 H 1 '00' "7"
1 L 1 '00' "7"
0 H 0 '00' "7"
1 L 0 '00' "7"
0 H 0 '12' "7"
1 L 0 '12' "7"
0 H 1 '56' "7"
1 L 1 '56' "7"
0 H 0 '11' "7"
1 L 0 '11' "7"


$msg "Test PHI0 and COUNT behaviour - triggering instruction";
0 H 0 '00' "7"
1 L 0 '00' "7"
0 H 1 '12' "7"  /* cyc 0 ph2 */
1 L 1 '12' "0"
0 H 0 '00' "0"  /* cyc 1 ph2 */
1 L 0 '00' "1"
0 H 0 '00' "1"  /* cyc 2 ph2 */
1 L 0 '00' "2"
0 H 0 '00' "2"  /* cyc 3 ph2 */
1 H 0 '00' "3"
0 H 0 '00' "3"  /* cyc 4 ph2 */
1 L 0 '00' "4"
0 H 0 '00' "4"  /* cyc 5 ph2 */
1 L 0 '00' "5"
0 H 0 '00' "5"
1 L 0 '00' "6"
0 H 0 '00' "6"
1 L 0 '00' "7"
0 H 0 '00' "7"
1 L 0 '00' "7"

$msg "Test PHI0 and COUNT behaviour - odd state recovery";
0 H 1 'f1' "7"
1 L 1 'f1' "0"
0 H 0 '00' "0"
1 L 0 '00' "1"
0 H 0 '00' "1"
1 L 0 '00' "2"
0 H 1 'f1' "2"
1 L 1 'f1' "0"
0 H 0 '00' "0"
1 L 0 '00' "1"
0 H 0 '00' "1"
1 L 0 '00' "2"
0 H 1 '00' "2"
1 L 1 '00' "7"


ORDER: CLK, %1, PHI0, %1, SYNC, %1, DATA, %2, COUNT, %2, CA9, CA8, %1, A9, A8, %2, LA17, LA16, %1, A17, A16;

VECTORS:
P 0 0 '00' '0' 00 00 11 11

$msg "Test A8 and A9 passthrough";
C H 0 '00' "7" 00 LL LL LL
C H 0 '00' "7" 01 LH LL LL
C H 0 '00' "7" 10 HL LL LL
C H 0 '00' "7" 11 HH LL LL

$msg "Test cycle 3 read from page 3, latching LA16..LA17, setting A16..A17";
C H 1 '00' "7" 00 LL LL LL
C H 1 '12' "0" 00 LL LL LL
C H 0 '00' "1" 00 LL LL LL
0 H 0 '00' "1" 00 LL LL LL   /* cycle 2 ph2 - read from ZP */

1 L 0 '00' "2" 00 HH LL LL   /* cycle 3 ph1 */
0 H 0 '72' "2" 00 HH LL LL   /* cycle 3 ph2 - read from page 3+ZP+1 */

1 H 0 '72' "3" 00 LL HL LL   /* cycle 4 ph1 - stretched clock */
0 H 0 '10' "3" 00 LL HL LL   /* cycle 4 ph2 - read from ZP+1 */

1 L 0 '00' "4" 10 HL HL HL   /* cycle 5 ph1 */
0 H 0 '00' "4" 10 HL HL HL   /* cycle 5 ph2 */
1 L 0 '00' "5" 11 HH HL HL   /* cycle 6 ph1 */
0 H 0 '00' "5" 11 HH HL HL   /* cycle 6 ph2 */
1 L 0 '00' "6" 11 HH HL HL   /* cycle 7 ph1 - shouldn't ever get this far */
0 H 0 '00' "6" 11 HH HL HL   /* cycle 7 ph2 */
C H 0 '00' "7" 00 LL HL HL
C H 0 '00' "7" 00 LL HL HL

$msg "Test that it resets correctly on next SYNC";
C H 1 '00' "7" 00 LL LL LL


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 29, 2023 12:43 am 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Very interesting technique. I’ll need to study it more to see if it can be used to expand graphic memory to support more color and resolution.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 29, 2023 1:34 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Another thought I had while working on this was that instead of adding the idle cycle and using page 3 you could just reuse the low byte of the usual address as the page number.

So for example lda ($23),y would normally read $NN from $23 and $MM from $24, then fetch from $MMNN+y into A. But we could spy on the $NN value and use that as the page number, so read from $NNMMNN+y. Aside from some extreme cases this is equivalent to reading from $NNMM00+y, which is easier to think about.

So essentially we'd lose the ability to address within a page based on numbers in zero page - we must use Y for that - but gain an extra 8 bits on the address bus in exchange.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 29, 2023 10:32 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
I was mildly confused about the presence or absence of the idle cycle - you are of course right, in that it's absent for a load which doesn't carry, otherwise present. But for the record, here's my visual6502 run, using this code assembled in easy6502:
Code:
LDX #4
STX 2
INX
STX 3
LDA #9
LDY #$01
STA ($02),Y
LDA #7
LDA ($02),Y
LDY #$FF
STA ($02),Y
LDA #7
LDA ($02),Y
NOP

You're right that RDY could be used, even on an NMOS 6502, as the affected cycle is a read. But you find that clock stretching is easier.

Very nice piece of work, anyhow!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: