Page 1 of 1

Why oh why can't we use the 65816 stack as the Data Stack?

Posted: Wed Feb 15, 2017 10:41 am
by scotws
For weeks, nay, months now I have been trying to figure out a sane way to design a Forth for the 65816 that uses the MPU's stack not as the Return Stack (RS), but as the Data Stack (DS). Combined with using A (or Y) as Top of Stack, this would make a lot of things so much easier: CONSTANT would not need a subroutine jump to DOCONST, but a simple PEA (phe.# for me) instruction, DROP becomes PLA (or PLY), DUP becomes PHA, etc. But I can't figure out a workable way to deal with the RS. This is frustrating.

My least horrible solution so far: You could keep one stack pointer in X and one in A so that (say) TXS would come before a DS access and TCS before a RS access. So each word in a STC design would include

Code: Select all

 TSC  ; save RSP
 TXS  ; get DSP
( ... actual word code ...)
 TSX  ; save DSP
 TCS  ; restore RSP
 RTS
But that's an additional 4 bytes and 8 cycles for each word, and you're using two registers. Another idea was dealing with the RS by hand with a combination of X as the RS pointer and JMP (addr, X) instructions, but workable this is not. You could limit the stacks to 256 bytes each, keep one pointer in A and the other in B, and then attempt some trickery with XBA and TCS for which ever one you need, but the overhead gets bad as well. All of these would make wickedly fast compiled Forth words, because you could strip out the housekeeping stuff rather easily. However, the penalty for anything interactive and for high-level words would be brutal.

I can't be the first one to bang his head against this wall. Is the consensus that it really isn't worth the effort, or am I missing something?

Re: Why oh why can't we use the 65816 stack as the Data Stac

Posted: Wed Feb 15, 2017 4:27 pm
by barrym95838
It's still work-in-progress for me with my 65m32 Forth, and I haven't finished exploring all of the dark corners of the defining words, but it seems that I can use s as DSP and x as RSP without any epic failures ... yet. I'm still JSRing to doCONST and doLIST:

Code: Select all

\======================================================== doCON
\   .dw  DOLIT-6, COMPO+5,'doCON'
DOCON: \ ( -- a )           \ Run time routine for CONSTANT
\                           \   VARIABLE and CREATE
    exa  ,s                 \ swap TOS with jsr DOCON's tucked
    bra  AT                 \   return addr in NOS, then fetch
\
\======================================================= doLIST
\   .dw  EMIT-5, 6,'doLIST'
DOLST: \ ( -- )             \ Process colon list
    sly  ,-x                \ old IP to R:, new IP nipped from
    $NEXT                   \   NOS (jsr DOLST's return addr)
\
Despite that comment copied from Dr. Ting's most recent eForth source), I don't think that CONSTANT VARIABLE and CREATE can all share doCONST ... that will need to be examined further. The "sly ,-x" instruction in doLIST pushes y (IP) on stack X: (RS) and pulls (actually NIPs) a fresh y from implied stack S: (DS). I will to see if I can come up with 65c816 equivalents tonight after work.

[Edit: Yeah, I think that CONSTANT needs the fetch at run-time, but VARIABLE doesn't ... I haven't investigated CREATE yet.
So, I think that doCONST is a SWAP @ and doVARIABLE is just a SWAP ... it works to a coding advantage when the address is tucked into NOS automagically by the JSR, at least for the 65m32 ... ]

[Edit #2: Thinking about it more, if TOS was in RAM at s, then doCONST could be reduced to @ and doVARIABLE could be reduced to a NOP, just like CELLS and ALIGN already are!! doLIST wouldn't change, but a lot of other primitives would, probably for a net loss ... my primitives are so short that even one added machine instruction blows up the percentage of increase ... ]

[Edit #3: I checked a CamelForth source, and rediscovered that doVARIABLE is supposed to end with a fetch ... the reason I was confused is that the JSR provides an address, but it's the address of an address, and the latter needs to be fetched in the same way that a constant would be ... still pondering the doCREATE mechanism ... ]

[Edit #4: Well, my brain is a bit mushy tonight (in case you haven't already noticed), but I think that if the 65c816 has TOS in RAM, it looks like : doCONST 1+ @ ; should work. The 1+ is due to the 65xx return address being one short of the return target. If TOS is in a register, then it seems to me that you would have : doCONST SWAP 1+ @ ; because the JSR return address would appear in NOS. I am most familiar with DTC ... I can't comment authoritatively on whether ITC or STC would be significantly different, so as usual, YMMV ... ]

[Edit #5: I slept on the ideas, and woke up with the undeniable opinion that you would have to be a brutal masochist to try implementing s as data stack pointer in a 65c816 STC Forth ... the threading technique and the data would constantly be locking horns, and I can't imagine any efficient way around it ... ITC and DTC are a different story, however ... ]

Re: Why oh why can't we use the 65816 stack as the Data Stac

Posted: Fri Feb 17, 2017 12:58 am
by Dr Jefyll
barrym95838 wrote:
ITC and DTC are a different story, however ...
That's where I ended up, too, Mike. Scot, I like the idea of using S as Forth's DSP -- you make a good case for it. And your "least horrible" solution isn't all that bad, speed-wise. But, as you say, it ties up another register.

That being so, maybe that register would be better used for something else. It'd be pretty cool to use the "S as DSP" idea with Direct Threaded Code (DTC). X could hold Forth's IP, and NEXT would be based on JMP (0, X).

Re: Why oh why can't we use the 65816 stack as the Data Stac

Posted: Sat Feb 18, 2017 12:13 pm
by scotws
The only thing I have been able to come up with that might make this remotely workable would be extra hardware for the Return Stack - something where you could write to a certain address and magical hardware elves would push the value to an external stack, with the reverse for reading. STA/LDA are each 5 cycles for the 16-bit 65816, which is one cycle more than PHA/PLA. For the moment, though, I'm too busy facing my software demons to play with the hardware elves...

Re: Why oh why can't we use the 65816 stack as the Data Stac

Posted: Sat Feb 18, 2017 7:44 pm
by GARTHWILSON
scotws wrote:
The only thing I have been able to come up with that might make this remotely workable would be extra hardware for the Return Stack - something where you could write to a certain address and magical hardware elves would push the value to an external stack, with the reverse for reading.
That's Jeff's specialty! :D

Quote:
STA/LDA are each 5 cycles for the 16-bit 65816, which is one cycle more than PHA/PLA.

If you have A in 16-bit mode, PLA takes 5 cycles.  PHA takes 4.  (In 8-bit mode they're 4 and 3.)  For those unfamiliar with the '816, you can choose 8- or 16-bit mode for A, and 8- or 16-bit mode for the index registers (X and Y have to match each other in width, whichever width you choose), even though it's in native mode.  Having 8-bit registers and memory accesses does not require being in 6502-emulation mode.  You can put it in native mode in the reset routine and leave it there permanently after that, even though you occasionally change the register width.

Re: Why oh why can't we use the 65816 stack as the Data Stac

Posted: Sun Feb 19, 2017 1:46 pm
by Dr Jefyll
scotws wrote:
magical hardware elves
GARTHWILSON wrote:
That's Jeff's specialty! :D

Hardware elves? Oh, gosh -- not me. I just do hardware smoke & mirrors. TBH my elf chops are pretty well non-existent. :oops:

Re: Why oh why can't we use the 65816 stack as the Data Stac

Posted: Sun Feb 19, 2017 8:08 pm
by BigDumbDinosaur
Dr Jefyll wrote:
scotws wrote:
magical hardware elves
GARTHWILSON wrote:
That's Jeff's specialty! :D

Hardware elves? Oh, gosh -- not me. I just do hardware smoke & mirrors. TBH my elf chops are pretty well non-existent. :oops:
Best to concentrate on the mirrors. That other stuff stinks up the shop. :D