Usually 65xx Forth implementations map the Interpretive Pointer into a memory location, which unfortunately limits performance. (IP must be accessed and updated during NEXT.) But where else could IP reside? dclxvi proposes that, on an '816, IP could reside in the 16-bit stack pointer, S. In the thread
65816 indirect threaded NEXT Bruce discusses an 11-cycle NEXT that's just 2 instructions
and in the thread
65816 direct threaded NEXT he discusses
a 6-cycle NEXT that's just one instruction Using S is a slick idea for some smokin' fast Forth, but, as Bruces notes, there's a tradeoff, and it's a doozy.
dclxvi wrote:
anything that pushes onto the stack will clobber your program, PHA, JSR, interrupts, etc. For instructions like JSL you can simply temporarily change the stack pointer [...]. However interrupts are far trickier.
This thread deals with the interrupt problem and a hardware solution for it. Ironically, Garth later summarized it better than I did, so I'll quote him here.
Quote:
The circuit below detects RTS opcodes and forces the resulting "stack" memory accesses to remain in the program bank instead of going to Bank 0. Basically, Forth runs in a non-0 bank, and most of bank 0 must be left available for the hardware stack, since it will not be known what the value in the Forth program counter (IP, held in register S) is when an interrupt hits. Stack operations relating to the interrupt will write to and read from bank 0, while RTS serving as Forth's NEXT reads the bank that the program is in, rather than bank 0 where the hardware stack resides.
/edit
Attachment:
Animation10.gif [ 27.72 KiB | Viewed 3754 times ]
The problem lies in
getting SP to do the job of two registers. We want to use SP conventionally as a pointer to short-term read/write storage (for interrupts and for explicit pushes and pulls), and (for the fast NEXT) we also want to use SP as a pointer into a list of pseudo-subroutine addresses -- a list that mustn't be written to. This puzzle has been teasing my imagination, and I wondered what a hardware solution might look like. I ended up with two approaches (both untested). The one shown here is the simplest. If I've overlooked anything I hope someone'll mention.
The '816 is capable of addressing memory as multiple banks, 64KB max per bank. (
dclxvi operates using the simplest model, where all banks map to the same 64K.) We can't have two SP registers, but we
can trick SP into addressing two different banks. All we have to do is tweak the CPU's policy of steering all stack-related operations to Bank 0.
For ease of discussion let's assume we have a machine with 128K of memory -- a single 128KB chip, perhaps. The RAM's A0 - A15 inputs are fed straight from the CPU, and A16 comes from an
Address Latch. This is S.O.P. All '816 systems using multiple banks require an Address Latch to capture the bank-address information that appears on the data bus during the first half of every bus cycle.
Our goal is to use memory "at" SP in Bank 0 for short-term read/write storage, and to use memory "at" SP in Bank 1 for the Forth address lists. So, we reserve a large stack area in Bank 0. We set the '816's Program Bank Register to 1 and we load the address lists into Bank 1 (and the callee machine-code routines, too). We set SP to point to the desired address list.
By using two banks we've provided two separate storage areas, as required. Normally RTS would be unable to access the address list, due to the CPU's policy of steering all stack-related operations to Bank 0. RTS will cause the CPU to do a pull, but it would normally be from Bank 0. To correct this,
the circuit above detects RTS opcodes and forces the resulting "stack" memory accesses to remain in the Program Bank instead of going to Bank 0.
The '574 octal register captures eight signals at the end of every bus cycle. No action results unless the bus cycle is an opcode fetch (SYNC =1). If SYNC =1 the flip-flop may change state in the cycle following the opcode fetch.
If RTS (opcode $60) is detected then the D input will be low and the flip-flop will be cleared.
Clearing the flip-flop has the effect of discontinuing pulses on ALE, the Address Latch Enable signal. The latch ceases to update cycle-by-cycle as it normally would, and the most recent update (indicating the Program Bank) "sticks." All accesses are forced to the Program Bank, and we get the desired effect of RTS pulling an address from the Program Bank rather than Bank 0.
Note-
- The circuit as shown is a little sloppy in that it's triggered by opcodes $70, $E0 and $F0 as well as $60. That turns out not to make any difference, since those other instructions access only the Program Bank anyway.
- The XOR gate protects against an obscure hazard: the false opcode fetch that occurs whenever an interrupt is recognized. There's a risk that an RTS opcode will get fetched but not executed. If this happens the circuit mustn't respond; ie, the flip-flop needs to remain set. To recognize the beginning of the cpu's interrupt sequence we look for the unique circumstance of the address bus failing to increment in the cycle following an opcode fetch. (See Table 5-7 of the '816 Data Sheet) No increment means the least-significant address line, A0, will fail to toggle. The output of the XOR gate will be low and the flip-flop will remain set (allowing a successful interrupt with machine state pushed to stack in Bank 0 where it belongs).
So! The upside is that 6-cycle NEXT is all that we'd hoped: we can aim SP at a list of addresses and use RTS to rapidly "Call" each routine in sequence. JSR opcodes are not required, slashing Call overhead from 12 cycles to 6. Explicit pushes & pulls can be used for short-term storage, and interrupts are fully functional. The downsides are:
- you need hardware mod's or a built-from-scratch machine
- The large (and mostly unused) stack area that must be reserved in Bank 0. That's required because for all addresses pointed to by SP in the Program Bank, those corresponding addresses (and some additional bytes below) in Bank 0 are subject to being written at any time by interrupts. Also SP must avoid addresses corresponding to Direct-page, I/O ports and any other sensitive areas in Bank 0.
- Forth's Return Pointer will have to reside in Y or in memory.
When coding, you need to bear in mind that SP gets altered every time an RTS (ie, NEXT) executes! Nevertheless,
pushes and pulls (PHA & PLA, for example) are permitted provided that you put SP back where you found it before the next NEXT occurs. The Bank 0 allocation for stack will be wastefully large. Luckily, memory is cheap -- and there's nothing to prevent us populating additional banks.
-- Jeff