Liara Forth, an "ANSI(ish) initial Forth" for the W65C265SXB

scotws · Post by **scotws** » Sun Mar 27, 2016 3:46 pm

So it's time to stop fooling around with blinking LEDs and write a Forth for the 265SXB.

From what I've seen, other people here such as Andrew (viewtopic.php?f=9&t=3612) are already working on large Forths for the 65816. My aim is to create a "first" or "initial" Forth for the 265SXB: The one you can download as a binary and install immediately after you've bought the board and some Flash memory. There should be tools to access expanded memory, but it will work "out of the box" with the 32 Kb RAM and 24 Kb Flash memory, which is the most simple (useful) configuration. As such, it will access the Mensch Monitor routines (at least at first), which is where the other 8 Kb are.

Obviously I'll be drawing a lot of inspiration from Tali Forth for the 65c02. There are two things that didn't work out so well there I'll be changing:

Tailored specifically for the 265SXB. Tali Forth did have a hardware basis, but it was sorta, kinda supposed to run on more general hardware, which, to put it politely, lead to some lack of structure. This time, the machine is exectly defined. Yes, there will be words to blink the LED and write to Flash.

Dictionary headers and code are kept separate. That allows fallthroughs and all other kinds of neat tricks with the code (see viewtopic.php?p=3331#p3331 for examples) that Tali can't do. (I should probably do a complete rewrite of Tali Forth, but as other people here have pointed out, once you have gone 16 bit with the 65816, it's pretty hard to go back.)

Like Tali, Liara will be based on subroutine threaded code (STC). Again, other people here are working on ITC and DTC versions. And I would still argue that because there are so few registers available, this makes a better fit. Also, like Tali, the ratio of primitives to threaded words will be rather high, for the added speed (fast the 265SXB ain't), the optimizations, and simply because I enjoyed all that coding. STC is good for these things.

(Footnote: I'm probably going to get a RPi 3 soon, which has the ARM-A53 64-bit CPU where they cleaned up the assembler ("AArch64", see https://www.element14.com/community/ser ... Manual.pdf). If I ever write a Forth for that, I promise to take a more serious look at DTC, because then I'll have about 30 64-bit registers to play with.)

The Return Stack will be the system stack, starting at 03FF and growing down. Direct Page (DP) would start at 0200, avoiding the first two pages completely where the Mensch Monitor does its thing. The Data Stack would start after whatever space the DP variables take and use X as the DSP. More on that in a later post, because I'm considering doing something weird with the stacks.

The other stuff is pretty obvious: 16-bit cell size, max code size 24 Kb, terminal access at first via the USB power jack, but with the option of the serial port as an alternative input source. I'm still considering cooperative ("PAUSE") multitasking. There should be enough space for a small editor of some sort, which then should be able to cope with any extra RAM.

I've started a (rather empty) GitHub respository at https://github.com/scotws/LiaraForth . I expect things to be slow going for a while, first because I'll probably be finding bugs galore with my assembler and emulator, second because I'm going to get chased out in the garden a lot in the next few weeks.

scotws · Post by **scotws** » Sun Mar 27, 2016 4:35 pm

For Liara, I'm considering a different configuration of the stacks: Having them grow towards each other. I'm calling this the "Königskinder" (King's Children) design.

We'd put the Direct Page (DP) at 00:0200 to avoid the default Direct Page and Stack use of the Mensch Monitor in the first two pages. The Data Stack (DS) begins in this area after any variables that are used. It then grows "up" (towards 00:FFFF), not down. The Direct Stack Pointer (DSP) is X, and points to the top entry on the Data Stack (TOS). The Return Stack (RS) is the normal system stack. It starts at 00:03FF and grows "down" (towards 00:0000) and points to the next free entry.

Code: Select all

    00:200 -> +-------------------------+ <- Direct Page start
              |                         |
              | Direct Page Variables   |                     
              |                         |
 (unknown) -> +-------------------------+ <- Data Stack Pointer Start (DSP0)
              |                   |     |
              | Data Stack        |     |
              |                   V     | <- DSP (X)
              |                         |

              /~~~~~~~~~~~~~~~~~~~~~~~~~/

              |                         |
              |                   ^     | <- RSP (S)
              | Return Stack      |     |
              |                   |     |
    00:3FF -> +-------------------------+ <- Stack Pointer (RSP0)

This configuration means that both Stacks grow towards each other, eating up a pool of free the space between them. Though the DS is limited to 128 16-bit entries (minus variables), in theory, the Return Stack could keep growing on the 65816 and crash into the DS. This potentially gives the RS more space, and we can test for a collision of the stacks -- an overflow of either the RS or DS -- before it happens by simply testing if DSP == RSP, because one points to its next entry, the other to its current entry.

More to the point, depending on how many DP entries are required for variables, it might be possible to get away with only using one page for both stacks (S would be 00:02FF), making multitasking less of a memory hog. There doesn't seem to be any reason that DP and stack areas can't overlap, and I have the feeling that most stacks are far to large.

(The math: Assume half of the 256 bytes go to the Return Stack, which leaves us 128 bytes of the page. Assume that half of that goes to variables (far too much, but the math is easier). This leaves us with 64 bytes for the Data Stack, or a stack depth of 32 cells at a cell size of 16 bit. That sounds like a very large number for one Forth interpreter thread on a single-user system.)

The name, BTW, comes from the German folk song "Es waren zwei Königskinder" (https://de.wikipedia.org/wiki/Es_waren_ ... nigskinder) about two children of a king who really liked each other, but could never meet because the water was too deep.

Quote:

Es waren zwei Königskinder,
die hatten einander so lieb,
sie konnten beisammen nicht kommen,
das Wasser war viel zu tief

The people you hear in the background clearing their throats are the Germans here who remember how the story ends (both die). That, dear children, is why you always remember to check for overflow.
~

barrym95838 · Post by **barrym95838** » Sun Mar 27, 2016 5:38 pm

Hi Scot,

I believe that the main reason for the data stack growing "down" is because Forth does a lot of indexing to NOS and 3OS in the primitives like OVER, SWAP, ROT, etc. If you try to do this on an "up" growing stack, you end up using a lot of negative indices, which (although not impossible) might require you to use "long" addressing to get the necessary wrap-around on the '816 (or a lot of DEX;DEX/INX;INX combos). Garth uses negative data-stack addressing in a spot or two in his '802 Forth, but he only has a bank zero, so 16-bit negative indices are all that are needed. Everywhere else, he uses fast and compact 8-bit indexing, just like the '02 Forths.

I might be wrong about this "bank-bleeding", but I think that you should consider it before you get too far. Are you going to limit your execution tokens to 16-bit, or use 24-bit, or padded 32-bit?

Mike B.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Mar 27, 2016 6:25 pm

Why not start the hardware stack very high in memory and not worry about a data stack/hardware stack collision? On my POC unit, the hardware stack starts at $00CBFF, which is right below the (fixed) TIA-232 FIFOs. Programs load at $000400, making the likelihood of a collision very remote. With the arrangement you are proposing, you will be consuming clock cycles in checking for free RAM every time an entry is added to the data stack.

scotws · Post by **scotws** » Mon Mar 28, 2016 11:49 am

BigDumbDinosaur wrote:

Why not start the hardware stack very high in memory and not worry about a data stack/hardware stack collision?

... and that is what I should probably do. Mind you, the guys in the PR department will be upset, they really liked the name.

Mike - I'll reverse the stack back to normal as well, though I'm not sure negative indexing would be that much of a problem for the machine (thinking about might hurt my head). I'll be staying on Bank 00, because this is a "initial" Forth, so XTs will be the native cell size, 16 bits. A second, "big" Forth with full memory range would be 32 bits with padding, I think.

As an aside: I briefly considered a "shifted" addressing scheme: Addresses are 24-bits long, but segmented so that the last four bits will always be zero, wasting 8 bytes on average. But they can now be packed in a 16-bit space with one nibble of the Bank Byte.

Code: Select all

Variant A:
  16-bit address stored in code: AAAB
  24-bit upacked real address: 1B:AAA0

Variant B:
  16-bit address stored in code: BAAA
  24-bit upacked real address: 1B:AAA0

(You can use anything else besides the 1 for the "most significant nibble" (MSN) of the Bank Byte). This at least gives you, what, a Mb of addressing space?

Of course, the whole unpacking part breaks your neck on the 65816; all the unmasking and shifting just takes too much time. It might work on an ARM processor because of the "free" shifting, but there you don't need it. Ah well, it was an interesting exercise.

barrym95838 · Post by **barrym95838** » Mon Mar 28, 2016 3:18 pm

I wasn't trying to stunt your creativity, Scot. The best way to try it out is to code a few primitives each way, and see how it pans out. Keeping in mind the '816's "Lovecraftian" bank wrapping and bank crossing personalities for its different execution states.

http://6502.org/tutorials/65c816opcodes.html

Mike B.

scotws · Post by **scotws** » Tue Dec 27, 2016 3:30 am

Ha! Milestone: It worked in the emulator, and now it works on the machine.

Now, all it does is print some strings and then echo what is typed, but since that proves the calls to PUT_CHR and GET_CHR work as intended, that takes care of most interface problems with the Mensch Monitor.

(Of course, I could have had this working four hours earlier if I hadn't stupidly forgotten to re-enable interrupts before calling PUT_CHR. What a difference a byte makes, right?)

scotws · Post by **scotws** » Wed Jan 04, 2017 12:09 pm

Remember when I wrote that PUT_CHR and GET_CHR were working? Well, not so much.

For some reason, I just couldn't them to work in the FIND-NAME loop, and I got fed up after a day of wasted effort and used Andrew's I/O code from the w65c265sxb-hacker (https://github.com/andrew-jacobs/w65c265sxb-hacker) instead. His routines now live in a file named kernel.tasm (to isolate the hardware dependencies and for licensing reasons) and provide a much shorter, faster, and above all working version of put_chr and get_chr. There is still some weird problem with the Backspace character, possibly related to the terminal program, but the basic loop worked immediately. (I'll push the code when I've fixed the BS thing, pun intended.)

Half of the problem is the headache that comes from trying to figure out what the Mensch Monitor routines actually do. I'm perfectly willing to believe that the problem was my fault, but in the end, the hassle is not worth it if there is a simpler alternative. The MM at least needs better documentation, though since the version shipped with the board is from 1995 (if I remember correctly), a major update might be in order?

So, thank you, Andrew.

scotws · Post by **scotws** » Sat Feb 04, 2017 10:28 am

There comes a time when you have to admit that your Very Clever Idea might in fact be Very Clever, but in the end is simply impractical and therefore A Bit Too Clever. In my case, it is keeping the top of the data stack (TOS) in Y.

It's not that it doesn't work. On the contrary, stuff like TYA and INY speed lots of words up very nicely. The problem is that it makes thinking about the code harder and some things a lot more complicated. Put differently, keeping everything "on X" might be slower and use more space, but it is the cleaner design. Put even more differently than that, this makes my brain hurt too much.

As an example of the problems you run into, take Forth words such as DEPTH and .S. Both need to know how many elements are on the stack. This is pretty straightforward with X as the stack pointer. With Y as TOS, you have various cases: Stack is completely empty; there is one element on the stack in Y; and there is more than one element, in Y and on the Direct Page via X. Now, this can be done: If X is equal to the initial value (which I called "dsp0"), we know the stack is empty. If X is dsp0+2, there is one element on the stack in Y. Starting dsp0+4, we have two or more elements on the stack.

Yes, you can build a system this way, and it's faster and smaller than just using X. But this shows that it gets complicated because you keep running into situations where you have to proceed case-by-case. This in turn makes debugging harder - usually my bugs are of the "stupid typo" class, quickly found; with Liara, I'm running into more complicated bugs where Y was passed to A for something and then I got confused where to put it back on the stack. And this is with code where a lot of it is just adapting Tali Forth's 8-bit stuff to 16-bit. Makes me wonder which bugs I haven't found because the logic is more complicated. Be first, but first be right, as journalists say.

So I'll be rewriting Liara with the "common" stack model - slower, longer, but easier to understand - and put a sign with the letters KISS over my desk.

barrym95838 · Post by **barrym95838** » Sat Feb 04, 2017 4:13 pm

scotws wrote:

... As an example of the problems you run into, take Forth words such as DEPTH and .S. Both need to know how many elements are on the stack. This is pretty straightforward with X as the stack pointer. With Y as TOS, you have various cases: Stack is completely empty; there is one element on the stack in Y; and there is more than one element, in Y and on the Direct Page via X. Now, this can be done: If X is equal to the initial value (which I called "dsp0"), we know the stack is empty. If X is dsp0+2, there is one element on the stack in Y. Starting dsp0+4, we have two or more elements on the stack ...

You completely ruined it by overthinking it Scot ... it's actually so much simpler than the way you describe, but not quite simple enough for me to explain without probably making myself late for work ... I'll come back tonight.

Mike B.

White Flame · Post by **White Flame** » Sat Feb 04, 2017 10:53 pm

Besides, how often are DEPTH and .S used?

If it makes the most commonly used words faster, then it's a net win for performance.

If it makes most words easier to implement (ie, TOS-oriented ones), then it's a net win for simplicity.

If it makes most words shorter, but just a few like these longer, then it's a net win in size.

GARTHWILSON · Post by **GARTHWILSON** » Sat Feb 04, 2017 11:20 pm

White Flame wrote:

Besides, how often are DEPTH and .S used?

Not much, except in development. I show an interesting use of DEPTH at viewtopic.php?f=9&t=2940&p=33124#p33124 . The May/Jun 1997 issue of Forth Dimensions had SET_OF to use like CASE_OF in a CASE structure, and that used DEPTH. You can say in essence, "In the case of the number on the top of the stack being equal to any of these, do this..." (where you could dictate any number of possible matches).

barrym95838 · Post by **barrym95838** » Sun Feb 05, 2017 6:30 am

Okay, I'm back. The best way to look at the two methods is to put them side by side (with a bonus TOS-in-a thrown in for good measure). In these methods, I am assuming 65c802/816 in full 16-bit register mode, SP is in x, decrement before push, increment after pull, two address units per cell:

Code: Select all

    TOS in $0,x             TOS in y                TOS in a         |      65m32 TOS in a
    ------------            ------------            ------------     |      --------------
    NOS in $2,x             NOS in $0,x             NOS in $0,x      |      NOS in $0,s
    RSP in s                RSP in s                RSP in s         |      RSP in x (!!!)

This is how we init the stacks :
    ldx  #RP0               ldx  #RP0               ldx  #RP0        |      ldx  #RP0
    txs                     txs                     txs              |      lds  #SP0
    ldx  #SP0               ldx  #SP0               ldx  #SP0

This is how we DUP :
    lda  0,x                dex                     dex              |      pha
    dex                     dex                     dex              |      $NEXT
    dex                     sty  0,x                sta  0,x
    sta  0,x                $NEXT                   $NEXT
    $NEXT

This is how we DROP :
    inx                     ldy  0,x                lda  0,x         |      pla
    inx                     inx                     inx              |      $NEXT
    $NEXT                   inx                     inx
                            $NEXT                   $NEXT

This is how we OVER :
    lda  2,x                dex                     dex              |      pda  ,s
    dex                     dex                     dex              |      $NEXT
    dex                     sty  0,x                sta  0,x
    sta  0,x                ldy  4,x                lda  4,x
    $NEXT                   $NEXT                   $NEXT

This is how we SWAP :
    lda  0,x                lda  0,x                ldy  0,x         |      exa  ,s
    ldy  2,x                sty  0,x                sta  0,x         |      $NEXT
    sta  2,x                tay                     tya
    sty  0,x                $NEXT                   $NEXT
    $NEXT

This is how we NIP :
    lda  0,x                inx                     inx              |      ins
    inx                     inx                     inx              |      $NEXT
    inx                     $NEXT                   $NEXT
    sta  0,x
    $NEXT

This is how we @ :
    lda  (0,x)              lda  00,y               tay              |      lda  ,a
    sta  0,x                tay                     lda  00,y        |      $NEXT
    $NEXT                   $NEXT                   $NEXT

This is how we ! :
    lda  2,x                lda  0,x                tay              |      sla  #,b
    sta  (0,x)              sta  00,y               lda  0,x         |      sla  ,b
    inx                     inx                     sta  00,y        |      $NEXT
    inx                     inx                     inx
    inx                     ldy  0,x                inx
    inx                     inx                     lda  0,x
    $NEXT                   inx                     inx
                            $NEXT                   inx
                                                    $NEXT

This is how we + :
    lda  0,x                tya                     clc              |      add  ,s+
    inx                     clc                     adc  0,x         |      $NEXT
    inx                     adc  0,x                inx
    clc                     tay                     inx
    adc  0,x                inx                     $NEXT
    sta  0,x                inx
    $NEXT                   $NEXT

This is how we - :
    lda  2,x                tya                     eor  #$ffff      |      sub  ,s+
    sec                     eor  #$ffff             sec              |      cdd  #1
    sbc  0,x                sec                     adc  0,x         |      $NEXT
    inx                     adc  0,x                inx
    inx                     tay                     inx
    sta  0,x                inx                     $NEXT
    $NEXT                   inx
                            $NEXT

This is how we >R :
    lda  0,x                phy                     pha              |      sla  ,-x
    inx                     ldy  0,x                lda  0,x         |      $NEXT
    inx                     inx                     inx
    pha                     inx                     inx
    $NEXT                   $NEXT                   $NEXT

This is how we R> :
    pla                     dex                     dex              |      pda  ,x+
    dex                     dex                     dex              |      $NEXT
    dex                     sty  0,x                sta  0,x
    sta  0,x                ply                     pla
    $NEXT                   $NEXT                   $NEXT

This is how we TUCK :
    lda  0,x                lda  0,x                ldy  0,x         |      exa  ,s
    ldy  2,x                sty  0,x                sta  0,x         |      pda  ,s
    sta  2,x                dex                     dex              |      $NEXT
    sty  0,x                dex                     dex     
    dex                     sta  0,x                sty  0,x
    dex                     $NEXT                   $NEXT
    sta  0,x
    $NEXT

This is how we PICK :
    txa                     phx                     phx              |      add  #,s
    asl  0,x                tya                     asl              |      lda  ,a
    adc  0,x                asl                     adc  1,s         |      $NEXT
    tay                     adc  1,s                tax
    lda  02,y               tax                     lda  0,x
    sta  0,x                ldy  0,x                plx
    $NEXT                   plx                     $NEXT
                            $NEXT

This is how we DEPTH :
    txa                     txa                     dex              |      pda  #-1,s
    eor  #$ffff             eor  #$ffff             dex              |      cdd  #SP0
    sec                     sec                     sta  0,x         |      $NEXT
    adc  #SP0               adc  #SP0               txa
    lsr                     lsr                     eor  #$ffff
    dex                     dex                     sec
    dex                     dex                     adc  #SP0-2
    sta  0,x                sty  0,x                lsr
    $NEXT                   tay                     $NEXT
                            $NEXT

The only clear winner for TOS in $0,x is DROP, which is in all fairness a very commonly executed word [Edit: ! (store) is also a winner for TOS in RAM]. The other two methods tie or edge it out in machine code size and execution speed for almost everything else, though.

Regarding DEPTH ... you'll notice that it doesn't matter whether x is pointing to TOS or NOS; if (x == SP0), then the stack is empty, and if (x == SP0-2) then the stack contains one cell. The reason for this is that every word (like DUP ) which grows the stack does DEX DEX somewhere inside it, and every word (like DROP ) which shrinks the stack does INX INX somewhere inside.

The strange #SP0-2 in the TOS-in-a version of DEPTH is there to counteract the necessity of performing the machine language equivalent of DUP before trashing the accumulator for the depth calculation. If the stack was empty and DEPTH was called, the DUP would have pushed nonsense, but that's not a problem, as long as the nonsense doesn't enter into the calculation. In fact, it doesn't matter which of the three methods you use; if you DUP an empty stack you're DUPing nonsense. It's just bad luck that TOS-in-a has to DEX DEX before the TXA to maintain non-empty stack integrity, that's all.

Let's look at a specific example:

SP0 = 9
x = 9
a = 1234

The 1234 is just nonsense, but it doesn't matter:

Code: Select all

    dex         \ x is now 8
    dex         \ x is now 7
    sta  0,x    \ $07 now contains 1234
    txa         \ a is now 7
    eor  #$ffff \ a is now -8
    sec
    adc  #SP0-2 \ a is now -8+1+9-2 = 0
    lsr         \ a is still 0, which is the correct depth

So for the TOS-in-register methods, as long as you always remember that x points to NOS instead of TOS, it doesn't matter if either or both of them contain uninitialized nonsense, as long as you don't try to use the uninitialized nonsense by allowing x to increment above SP0.

Hope this helps,

Mike B.

[Edit: Added in a column for my 65m32, with a hard line to separate fact from fantasy. Maybe I'll break through this year ...]
[Edit #2: Added code samples for >R R> TUCK and PICK ; fixed ! ]

GARTHWILSON · Post by **GARTHWILSON** » Sun Feb 05, 2017 6:56 am

Thanks for doing that, Mike. So without counting stack initialization (since that's only done before we get to work), the overall length of TOS in 0,X and TOS in Y is only different by less than 2% which is insignificant, and TOS in A is a little over 7% shorter than TOS in Y, which is still minor. I haven't counted cycles yet, but it looks like the difference will be rather minor there, too. As they say, sometimes the best experience to have is someone else's, and now I've found out without doing it myself!

scotws · Post by **scotws** » Sun Feb 05, 2017 9:43 am

Thanks to everybody for the input. Mike, now I feel bad, I should have pointed out that I actually did a whole bunch of comparisons like that before I started out with Liara - see https://github.com/scotws/LiaraForth/bl ... ariants.md for a table of the results including words such as R> and their cycle counts (which I am more interested in than the size on the 65816). Some of the code is in https://github.com/scotws/LiaraForth/bl ... narios.txt though I did most of them on paper while on public transport. I didn't do DEPTH at the time, because I considered it too rare; for the record, your version is in fact better than mine.

There is no question that TOS-A and TOS-Y are superior in size and speed to TOS-in-X, which is why I started out coding that way. TYA is something of a "killer instruction" with TOS-Y at one byte and two cycles in this regard, because its savings add up for things like testing if TOS is zero or a negative number.

The problem is what White Flame has touched in his second point: Complexity. SWAP and even DEPTH are trivial to understand in any version. FIND-NAME and PARSE-NAME are not anymore; for example, you have to use Y as an index at some point and have to keep straight when Y is being used for what. With TOS-in-X, if you are not touching X, you are not touching the stack. To my surprise, I have found that overall complexity is higher than I had expected with TOS-Y.

The question for me now is if the speed and size gains are really worth the added complexity. Added complexity means more bugs and more difficult bugs, but you could argue that this is a one-time investment, because once it works, it works forever. It makes it harder for other people to understand the code, but - let's be realistic here - there are at least three Forths for the 65816 now, so I'm mostly writing this for myself. Those arguments fall in the "dude, stop whining and fix your damn code" range.

One big advantage of using TOS-in-X would be that I could use a lot of the same code for Liara (65816) and a complete and very necessary rewrite of Tali (65c02). There is no sense in using Y as half-of-TOS for the 8-bit machine (though to be honest, I have never actually done the code, that might be an interesting exercise). However, the idea of a bare-metal Forth is that it fits the machine as best as it can, and TOS-Y works very well on the 65816. And I did set out to make Liara fast(ish).

The good news is that I'm not on the clock. Thanks to the magic of Git, I've opened a TOS-in-X branch (not pushed to master for now, so not on GitHub) and am going to see how much of a difference that actually makes. So far, as Garth has pointed out, not that much, though I haven't calculated the cycle counts yet (my most important metric). Once i have TOS-in-X versions of PARSE-NAME and FIND-NAME working, I should be able to make a final decision.

Liara Forth, an "ANSI(ish) initial Forth" for the W65C265SXB

Liara Forth, an "ANSI(ish) initial Forth" for the W65C265SXB

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26

Re: Liara Forth, an "ANSI(ish) initial Forth" for the W65C26