Michael, thanks for the clarification of the register display. That does help. For what it's worth, your "rot" will be known to most Forth programmers as "-rot" (rotate backwards), since the Forth ROT does TOS <= 3OS <= NOS <= TOS (where 3OS is "third on stack"). Yours looks more like the "roll down" on my HP calculator.
I'm a bit handicapped reading your test code, because I don't know what opcodes you have assigned. Here's my paraphrase of what I think your test code contains, with [ xxx ] referring to a 16-bit cell and [ xxx / xxx ] referring to the two bytes of a 16-bit cell:
0x200: [ PLI NXT / 0x00 ]
... opcode 0x3B is PLI NXT, yes?
0x202: [ 0x205 ] [ 0x00 / --- ]
... seems to be a partial thread with only one word address (0x205); I don't know the purpose of the "header #2 (partial)"
0x205: [ ENT / 0x00 ] [ 0x20d ] [ 0x000 ] [ 0x000 ]
... seems to be the beginning of a secondary definition containing only one word address (0x20d)
0x20d: [ bra $+2 ] [ machine code ... ]
Strictly speaking, in DTC the "bra $+2" is not necessary, but I saw your comment in the other topic about using it as a place holder.
To start the simulation, you appear to be putting 0x202 in IP and then doing the action of NXT. If I'm reading your trace correctly, your machine reads the contents of 0x202 into W, making W = 0x205 (the address of a Forth word, in this case a secondary word), then copies W to PC and reads an opcode at 0x205. That opcode 0x7B is ENT, which pushes the IP (currently 0x204), sets the IP = W+2 = 0x207, and then does the action of NXT, which reads the contents of 0x207 into W, making W = 0x20D, then copies W to PC and reads the opcode at 0x20D and branch offset at 0x20E.
All of that looks correct. Arguably the W register is superfluous for DTC, but if you're implementing a dual ITC/DTC instruction set I can see why you'd want to keep W. I presume at the end of your secondary thread there will be an address 0x200, which will cause the primitive at 0x200 to be executed (the Forth word EXIT), which will do the PLI NXT, which will cause interpretation to continue at 0x204.
ITC is essentially the same except that instead of copying W to PC, you fetch from memory @W to the PC. At 0x20d you now have
0x20d: [ 0x20f ] [ machine code ... ]
which is what I would expect, but I don't see how the initial NXT works (when you have IP = 0x202). The IENT trace appears to read the contents of 0x202 into W, making W = 0x205 (ok so far), but then it reads opcodes at 0x205 and 0x206, rather than fetching from memory @W into the PC. For an ITC implementation I would expect the word at 0x205 to be something like
0x205: [ 0x207 ] [ IND / ENT ] [ 0x20d ] [ 0x000 ]
and IND/ENT must be modified to set IP = W+4 rather than W+2. (In ITC, the first cell of a Forth word is the
address of machine code.)
As an aside, if ENT is going to assume that the W register contains the address of the word (as your table of critical Forth operations suggests), you need to handle the Forth word EXECUTE properly. EXECUTE performs a Forth word whose address is given on the parameter stack, not taken from a thread. You will need the ability to copy that address to the W register, and then either jump to that word (DTC) or fetch the PC from that address (ITC).
Code:
ITC DTC
================================================================================
EXECUTE: W <= (PSP++) -- Ld *Code_Fld ; W <= (PSP++) -- Ld *Code_Fld
PC <= (W) -- Jump Dbl Indirect; PC <= W -- Jump Indirect
If W is a "hidden" (internal) register, you'll need an instruction to set it to a given value. (Or just an EXECUTE machine instruction.)