I've been wondering if it would make sense to add an external hardware stack to my 6502 to help with Forth -- the idea would be to get the Data Stack (DS) out of Zero Page and free up the X Register. So far, I'd have to say "no", but maybe somebody has an idea I missed. I've gotten this far:
Concept The idea would be to have an address you could just write to (STA $FF00 for example) that would push a byte on the "external" stack, while reading it would pop a byte (LDA $FF00). The processor wouldn't have to deal with any of the messy stack management stuff at all, and (in the case of Tali Forth) the X Register wouldn't be fixed as the DS pointer.
Hardware solutions (A CPLD is considered cheating in this project)
It turns out there actually was a LIFO chip out there for a while, the M66250E (
http://www.alldatasheet.com/datasheet-p ... 66250.html) with 5120 8-bit words. Not being produced any more, of course. Lots of FIFO chips out there, but no LIFOs. Sigh.
Since we're looking for a
bidirectional shift register, we could use the 74HCT194 (
http://pdf1.alldatasheet.com/datasheet- ... CT194.html), one for each of the eight "slices" of the byte to be saved; that would give us a stack depth of four entries with eight ICs. If we want to be able to hold four DOUBLE numbers at once (a 16 word stack), we'd be using 32 chips. Ouch. The 74HCT299 (
http://pdf1.alldatasheet.com/datasheet- ... CT299.html) can handle 8 bits, but that's still 16 chips. Both of these have seriell-to-parallel functions, there doesn't seem to be a "pure" bidirectional shift register on the market anywhere.
Third, we could rig up some system where some extra RAM is indexed by a register that has binary count up, count down functionality, something like two 74HCT193s. Far fewer chips used, but timing might be tricky, and much more glue logic.
(I briefly played with the notion of building a bidirectional shift register out of D flipflops -- it's actually not that complicated, it turns out -- but that would use even more chips and space.)
Problems with the 6502For the sake of the discussion, let's just assume we have one of those solutions. Would it really be worth it? Remember, on the 6502 we have two bytes for every 16-bit stack entry, so something simple like DUP would end up as
Code:
LDX $FF00
LDA $FF00
STA $FF00
STX $FF00
STA $FF00
STX $FF00
Which not only looks stupid, but takes 18 bytes and 24 clock ticks. The current code for Tali Forth is:
Code:
DEX
DEX
LDA 3,x
STA 1,x
LDA 4,x
STA 2,x
That's 10 bytes and 20 cycles. So this kind of hardware support seems to slow things down. However --
Use with the 65816?What about the 65816? Since we have freed up the X Register
and it can be 16 bit wide, we can use it as the TOS (won't work with the 6502 because we can't keep one whole stack entry in the register). So we can reduce DUP to STX $FF00 and DROP becomes LDX $FF00 when in 16 bit mode. Now that's more like it.
However, I'm not sure how this stacks up (pun intended, though weak) compared to all the other ways of creating stacks on the 65816. Could this be worth the effort? Or should I wait for my own CPU project to build stacks from scratch
?