M65C02A Forth VM Support

MichaelM · Post by **MichaelM** » Wed Dec 31, 2014 11:42 pm

Jeff:

As usual, you are likely correct on this issue, but I would not have gotten the fig-FORTH kernel running if was not relocatable. Since to be relocatable requires relative offsets, I think that is a benefit to the fig-FORTH model. Most older processors like the 6502/65C02 do not provide good support for relative addressing. What support for relative addressing they do provide is in a form that's not generally available for direct use by the FORTH VM, i.e. native 8-bit relative conditional/unconditional branch instructions.

When I looked at the BRANCH and 0BRANCH words of the fig-FORTH model, there are a lot of FORTH VM operations being performed to synthesize the relative branches used in the fig-FORTH kernel. That process is expensive in terms of native machine instruction cycles due to the repeated use of DOCOL (ENTER) and NEXT to advance the FORTH VM through the code.

Some time ago, I extended the M65C02A instruction set to include the BRA rel16 and the PHR rel16 instructions. Thus, with the incorporation of the FORTH VM registers, IP and W, (per your recommendation, or more accurately, with your prodding

) into the core and the pre-existing support logic for 16-bit relative addressing, all of the logic is in place in the M65C02A core to add support for IP-relative branching. IOW, I believe all that will be required are changes to the microprogram in order to use IP instead of PC for the base and as a pointer to the 16-bit offset.

I have been considering implementing this support by overloading the IND prefix instruction. If IND is applied to the native Bcc rel8 instructions, the microprogram will implement Bcc [IP++] rather than the normal Bcc [PC++].

PS: Thanks for the congratulations.

All:

Have a safe and happy New Year's and we'll communicate again next year.

Dr Jefyll · Post by **Dr Jefyll** » Thu Jan 01, 2015 5:17 pm

MichaelM wrote:

Have a safe and happy New Year's and we'll communicate again next year.

Whoops, is it "next year" already? That was fast!

MichaelM wrote:

Since to be relocatable requires relative offsets, I think that is a benefit to the fig-FORTH model.

barrym95838 wrote:

The link addresses and CFAs are all absolute

To run from a different place in memory, Forth would have to be reassembled to fix all absolute references, as Mike noted. IOW branch destinations are not the only issue. Perhaps we are using the word relocatable in not quite the same way. It seems there's some misunderstanding somewhere.

I admit being perplexed by FIG Forth's use of IP-relative addressing. Regarding relocatability, half a solution is the same as no solution. And, with relocatability ruled out, I see no benefit to justify the slight IP-relative performance hit. New 65xx implementations such as yours may reduce or eliminate the hit, and perhaps there is some value in allowing flawed legacy code to run faster. I just hope you're not devoting a lot of resources to this. It seems to me a fairly trivial rewrite of the FIG code would eliminate IP-relative addressing and its performance hit. If I'm mistaken, or if IP-relative branches offer a benefit not yet mentioned, I hope someone will point it out.

cheers,
Jeff

MichaelM · Post by **MichaelM** » Tue Jul 21, 2015 11:18 pm

It's been a long time in coming, but I've finally begun testing the Forth VM support I've included in the M65C02A core. The following diagram shows the execution of two M65C02A FORTH instructions which can be used to implement a DTC FORTH EXE or EXIT functions. This functionality is implemented using a 16-bit pull instruction, PLI, and a 3-cycle NeXT instruction.

: Forth VM PLI NXT instructions

The test code is a simple DTC FORTH-like Word code sequence where the code field address is populated by the 65C02 BRA rel instruction, so it's only two bytes. The CFA is at address $F22C, and IP is pointing to address $F228.

Code: Select all

F222: F428F2                phw #$F228                  ; test DTC EXE
F225: 6B                    pli
F226: 7B                    nxt
F227: DB                    stp
F228: 2CF2                  dw  *+4
F22A: 0000                  brk #0
F22C: 8000                  bra *+2
F22E: A9FF                  lda #$FF
F230: 8D0002                sta $200
;
F233: EA                    nop

Dr Jefyll · Post by **Dr Jefyll** » Wed Jul 22, 2015 4:02 pm

3-cycle Next, eh?

And pli is handy for popping a value into IP. I see the code snippet tests pil and nxt.

Quote:

the code field address is populated by the 65C02 BRA rel instruction, so it's only two bytes.

I'm not clear why you began the executable code with the BRA. The following lda and sta seem meaningful as a placeholder to represent hypothetical user code. But why the BRA? Does it assist with testing?

cheers,
Jef

barrym95838 · Post by **barrym95838** » Wed Jul 22, 2015 5:11 pm

Yeah, when I switched my 65m32 Forth from ITC to DTC, starting my machine code directly in (at?) the CFA was a nice space saver (and time saver) since many of my primitives are only two or three machine instructions. scotws seems to be doing the BRA thing in his 65c02 STC Forth as well ... maybe he could provide some insights?

Although I didn't directly intend it to be that way, it seems that many 6809 coding strategies can be applied to my 65m32 with similar benefit. I still prefer the 65xx over the 68xx, for other reasons.

Mike B.

[Edit: scotws explains his BRA strategy a bit here.
]

MichaelM · Post by **MichaelM** » Wed Jul 22, 2015 10:25 pm

Dr Jefyll wrote:

I'm not clear why you began the executable code with the BRA. The following lda and sta seem meaningful as a placeholder to represent hypothetical user code. But why the BRA? Does it assist with testing?

As I've said before, I'm a complete noobie when it comes to Forth. In my attempt to incorporate some special support for FORTH, I decided to use the prefix instructions to support both an ITC and a DTC VM. As such, I followed a model where the parameter field of a primitive or secondary requires two bytes. Therefore, in my test case, I am just retaining the code field as a reminder of where the ITC CFA pointer must be inserted.I simply used the bra *+2 as a two byte place holder for the code field; I kind of like how the machine code representation of that instruction, $80 $00, looks in the instruction stream.

When I set up the test for the DTC ENTER instruction, the bra *+2 will be replaced with either the single byte (DTC) ent ($7B) instruction.

Both you and Mike are correct in your assessment that a primitive DTC FORTH word could be implemented with machine code in the code field. The two cycles saved by having no code field in a DTC FORTH word probably should be the preferred implementation.

I hope to complete the testing of the FORTH VM support instructions soon: nxt, pli, ini, phi, ent, and lda (ip,I++). The ITC version of the fundamental operations will require the ind prefix instruction: ENTER => ind ent; NEXT => ind nxt; EXE => pli ind nxt; EXIT => pli ind nxt;

The auxiliary stack usage may add another prefix instruction (osx) to the instruction sequences in either a DTC or an ITC implementation. With the osx prefix, the auxiliary stack, accessed using X, can be used for the FORTH VM's Return Stack (RS), and the system stack can be used as the parameter stack (PS).

Dr Jefyll · Post by **Dr Jefyll** » Wed Jul 22, 2015 11:57 pm

barrym95838 wrote:

scotws explains his BRA strategy a bit here.]

Thanks, Mike. And I hope I didn't seem critical, Michael. The truth is I'm rather a noob myself when it comes to Forth other than ITC. This notion with DTC of starting the code with a jump or bra seems like it might serve a purpose -- a substitute form of indirection, perhaps --but I hesitate to speculate. If you got the idea from something you read, I'd be interested to hear if that source also offers an explanation.

MichaelM · Post by **MichaelM** » Thu Jul 23, 2015 12:18 am

Not at all, Jeff. I simply have on OCD-like streak

that goes for a certain amount of symmetry: a two byte branch and a two byte ENTER sequence just appealed to me. I used the branch instruction to enter/start the simulated primitive because an ENTER into a secondary would require 1 byte if S => RS, or 2 bytes if X => RS. I'd like to imagine that when I got around to optimizing for performance, I would consider eliminating the code field to get back the two cycles of the bra *+2 sequence.

However, being a Forth noob, I might keep the code field because Brad Rodriguez, in his Moving Forth articles, emphasized the importance of having W point to the code field. When he recommended having W pointing to the code field, I don't think he made a distinction between ITC or DTC implementations, or between primitive and secondary words. Since I do not have a full understanding of Forth's fine points, I will try to stay on the path described by Brad and others for the time being.

enso · Post by **enso** » Thu Jul 23, 2015 1:31 am

Bravo, Michael. I can't wait to try it out.

MichaelM · Post by **MichaelM** » Thu Jul 23, 2015 1:07 pm

I have tested a number of other instructions: INI, INW (IND INI), PLW (IND PLI), and ENT (ENTER). In the attached figure, I've annotated the PLW instruction followed by the ENT instruction. External behavior are as I expect, and shows that the DTC ENT instruction requires only 5 cycles.

Only need to verify the ITC mode of the NXT and ENT instructions and the LDA ip,I++ instruction before declaring the ForthVM module tested.

: DTC Forth VM ENTER (DOCOLON)

Edit: removed indirection parentheses around ip,I++ operand. IND, SIZ, ISZ prefixes are supported for this instruction. MAM, 15K15.

MichaelM · Post by **MichaelM** » Fri Jul 24, 2015 1:10 pm

The attached figure shows the LDA ip,I++ Forth VM instruction used to load the accumulator with a 16-bit in-line literal such as used in the LIT and CLIT figFORTH words. The instruction autoincrements IP past the embedded literal/constant so that interpretation can continue seemlessly. Currently, the offset operand is ignored, but can be readily implemented with a simple microcode change rather than a logic change.

Following the LDA ip,I++ instruction is a 16-bit comparison instruction. I've implemented a change to the normal behavior of the CMP/CPX/CPY instructions such that 16-bit comparisons affect the V flag. This change enables the implementation of more powerful conditional branch instructions. PDP11-like signed and unsigned conditional branches will be performed when the normal conditional branch instructions are preceded by the SIZ prefix instruction. I've also set it up such that if the conditional branch instructions are preceded by the ISZ prefix instruction, then the conditional branch is relative to IP. I expect these two simple modifications to significantly improve the efficiency of comparisons for FORTH and standard languages.

: IP-relative LDA with autoincrement

Edit: removed indirection parentheses around ip,I++ operand. IND, SIZ, ISZ prefixes are supported for this instruction. MAM, 15K15.

M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support

Re: M65C02A Forth VM Support