M65C02A Forth VM ENTER Implementation Check Request

MichaelM · Post by **MichaelM** » Wed Feb 20, 2019 1:36 am

I'm wrapping up the unit tests for the implicit addressing mode instructions of the M65C02A. I have 3 remaining instructions to do. I would appreciate it if I could get a sanity check on the implementation of the DTC Forth VM's ENTER opcode.

I've constructed a test scenario that consists of a partial EXIT, i.e. the NXT instruction of the PLI NXT instruction sequence where PLI pulls IP from the return stack. (Note: in my implementation, the return stack is implemented using X, and the parameter stack is implemented using S. It is also possible to swap these two stacks, but that instruction sequence has yet to be tested, although the logic for swapping default and alternate stack pointers has already been tested many times by other unit tests.)

Thus, at memory location 0x200 is the NXT instruction that will read the pointer in location 0x202 which points to a "secondary" WORD starting in location 0x205. Control is transferred to location 0x205, which is the ENT instruction. I am expecting that all "secondary" WORDs will have the ENT instruction as its first instruction. ENT will push the value of the IP register (0x204) onto the RS, and then execute a NXT using the first pointer in the "secondary", which in this particular case points to a "primitive" that begins with a bra $+2. So the "primitive" located at 0x20D begins by branching unconditionally to 0x20F, whereupon the test is complete.

As previously discussed in the M65C02A or a related thread (Programmable Logic), the form of the Forth WORDs has been chosen to follow the structure described by Brad Rodriguez. That form expects the code to start at offset 2 from the "header". A pointer to the "header" is deposited in the W register by the NXT operation, so ENT always "finds" the code field as W + 2.

It's been a while since the implementation of the NXT and ENT were discussed, so I would appreciate it if someone would given me a sanity check on the implementation. When py65 completes the test, the total number of cycles for these two instructions is 10. NXT is 3 cycles, and ENT is 5 cycles. Breaking ENT down further: 1 cycle for opcode fetch, 2 cycles for PHI, and 2 cycles to read address of next WORD. The last 2 cycles are for the BRA $+2 instruction in the
"header".

The following is the Python Unit Test function for this instruction sequence.

Code: Select all

    def test_Forth_VM_enter_using_default_stack(self):
        stdout = StringIO()
        mon = Monitor(stdout = stdout)
        mpu = mon._mpu
        codeFieldPtr = 0x202
        mpu.ip = codeFieldPtr       # Codefield Pointer
        mpu.x = {0 : 0x17F, 1 : 0x4444, 2 : 0x2222}

        mpu.memory[0x200] = 0x3B    # NXT - Primary Forth WORD Exit (pli nxt)
        mpu.memory[0x201] = 0x00
        # Forth WORD - Secondary Code Field
        mpu.memory[0x202] = 0x05    # Pointer to header #1
        mpu.memory[0x203] = 0x02
        mpu.memory[0x204] = 0x00    # Pointer to header #2 (partial)
        # Forth WORD - Secondary
        mpu.memory[0x205] = 0x7B    # header: ENT (Secondary WORD)
        mpu.memory[0x206] = 0x00
        mpu.memory[0x207] = 0x0D    # Pointer to header #1
        mpu.memory[0x208] = 0x02
        mpu.memory[0x209] = 0x00    # Pointer to header #2 (not implemented)
        mpu.memory[0x20A] = 0x00
        mpu.memory[0x20B] = 0x00    # Pointer to header #3 (not implemented)
        mpu.memory[0x20B] = 0x00
        # Forth WORD - Primitive
        mpu.memory[0x20D] = 0x80    # header: bra $+2
        mpu.memory[0x20E] = 0x00
        mpu.memory[0x20F] = 0x00 

        p = copy.copy(mpu.p)
        a = copy.copy(mpu.a)
        x = {0 : 0x17D, 1 : 0x4444, 2 : 0x2222}
        y = copy.copy(mpu.y)
        s = copy.copy(mpu.sp)
        i = 0x209
        w = 0x20D

        mon.do_goto('200')
        print('\n', mpu)
        print(mpu.processorCycles)
        
        self.assertEqual(0x20F, mpu.pc)
        self.assertEqual(p, mpu.p)
        for j in range(3):
            self.assertEqual(a[j], mpu.a[j])
            self.assertEqual(x[j], mpu.x[j])
            self.assertEqual(y[j], mpu.y[j])
        for j in range(2):
            self.assertEqual(s[j], mpu.sp[j])
        self.assertEqual(i, mpu.ip)
        self.assertEqual(w, mpu.wp)

The following is the instruction trace for this test. The processor state before and after the instruction sequence precedes / follows the instruction trace.

Code: Select all

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 020F 0000 017F 0000 01FF 0202 00110000
              0000 4444 0000 01FF 020D DL YXSIZ
              0000 2222 0000           10 00000

.g 200
   IR: 3B <= mem[0200]
 rdDM: 05 <= mem[0202]
 rdDM: 02 <= mem[0203]
   IR: 7B <= mem[0205]
 wrDM: 02 => mem[017F]
 wrDM: 04 => mem[017E]
 rdDM: 0D <= mem[0207]
 rdDM: 02 <= mem[0208]
   IR: 80 <= mem[020D]
 rdPM: 00 <= mem[020E]

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 020F 0000 017D 0000 01FF 0209 00110000
              0000 4444 0000 01FF 020D DL YXSIZ
              0000 2222 0000           10 00000

barrym95838 · Post by **barrym95838** » Wed Feb 20, 2019 4:27 am

Hi Michael

I must admit that I'm in over my head with your explanation, but that's very likely my fault, not yours. I don't follow your "bra $+2" technique and I also don't see how "headers" should have anything to do with your ENT or NXT.

If I may just flop out my interpretation of Dr. Brad's explanation of how DTC works, then maybe we can find some common ground to proceed.

Code: Select all

NEXT is
  JUMP (IP++) ; the address in IP is used to access the cell
              ;   containing the new address for PC (a Code
              ;   Field Address), then IP is incremented
              ;   to point to the next cell.  The CFA must
              ;   point to native machine code in DTC.

ENTER is
  PUSH IP     ; onto Forth's return stack
  IP = PC     ; * see explanation below
  NEXT

EXIT is
  PULL IP     ; from Forth's return stack
  NEXT

* If ENTER is a native machine instruction, then it is located
    at the beginning of the DTC code field, and PC should be
    pointing to the first DTC address immediately following it.

That's if you have a native machine instruction that can "do" ENTER. If you don't then you have to do something like stuff a JSR ENTER at the beginning of your code field, and the ENTER subroutine consumes the JSR's return address to load into IP (after pushing the old IP onto the Forth return stack) before performing a NEXT instead of an RTS.

I want to help, and I hope that I have, but if not, please accept my apology and go on about your business.

MichaelM · Post by **MichaelM** » Wed Feb 20, 2019 10:25 am

Michael:

Thanks for the reply. The pseudo-code for these operations, NXT and ENT, were discussed in way back in Nov 2014 here. (Wow, how time flies. Good thing I'm not working on this project for a paycheck.

) Your comments at that time helped correct a table that I developed to guide my implementation of these two operations based on my interpretations of Brad's descriptions.

It appears from your comments above, which I believe Jeff Laughton also shares, that my definition of the DTC ENTER operation should not be based on using W + 2 as the value loaded into IP after the current IP is pushed onto the return stack (RS). My intention in using that specific definition, IP = W + 2, versus the definition given above, IP = PC, was to enable these two instructions to support both DTC and ITC Forth VMs with only the addition of the IND prefix instruction.

As I see it at the moment, the developer / programmer has the option for primary words of starting the code in the first two bytes, i.e. the "header" I refer to above, or putting in a bra $+2 as I did. For a secondary word, the first two bytes should contain either the ENT 0x00 (DTC) or the IND ENT (ITC) instruction sequences. In either case, the first word pointer is located at offset 2. That definition contrasts with offset 1, i.e. immediately after the ENT instruction, for DTC implementations for the IP = PC definition you gave above.

I'll post some traces of the other three ENTER instruction sequences, ent.s, ient, and ient.s, when I have a some time later today. The execution times of these variations of the ENT instructions are only longer by 1 cycle.

Once again, thanks for your interest and time.

The table previously discussed defining the three critical Forth operations is reproduced below:

Code: Select all

             ITC                                     DTC
================================================================================
NEXT:   W      <= (IP++) -- Ld *Code_Fld     ; W      <= (IP++) -- Ld *Code_Fld
        PC     <= (W)    -- Jump Dbl Indirect; PC     <= W      -- Jump Indirect
================================================================================
ENTER: (RSP--) <= IP     -- Push IP on RS    ;(RSP--) <= IP     -- Push IP on RS
        IP     <= W + 2  -- => Param_Fld     ; IP     <= W + 2  -- => Param_Fld
;NEXT
        W      <= (IP++) -- Ld *Code_Fld     ; W      <= (IP++) -- Ld *Code_Fld
        PC     <= (W)    -- Jump Dbl Indirect; PC     <= W      -- Jump Dbl Ind
================================================================================
EXIT:
        IP     <= (++RSP) -- Pop IP frm RS   ; IP     <= (++RSP)-- Pop IP frm RS
;NEXT
        W      <= (IP++) -- Ld *Code_Fld     ; W      <= (IP++) -- Ld *Code_Fld
        PC     <= (W)    -- Jump Dbl Indirect; PC     <= W      -- Jump Dbl Ind
================================================================================

barrym95838 · Post by **barrym95838** » Wed Feb 20, 2019 4:25 pm

Yeah, I suppose that a large part of my confusion revolves around the use of W, because I haven't run across an urgent need to utilize that register at all in any of my DTC adventures so far, so it feels like a fifth wheel to me. I'll take some alone time going back through your posts from here and from 2014, and ponder for a bit to see if I have anything useful to offer. Best wishes.

P.S. It would be undeniably awesome if Dr. Brad stopped in for a minute to offer his $.02 ... what do you think are the chances of that happening?

MichaelM · Post by **MichaelM** » Wed Feb 20, 2019 10:29 pm

Oh. I don't know, but a PM may be enough to get him to pop in for a look see. I'll send him one, and we'll see if he responds. He's always been gracious enough in the past to respond to my inquiries.

MichaelM · Post by **MichaelM** » Thu Feb 21, 2019 12:03 pm

The following instruction trace is for the ient instruction, which is represented as the ind (0x9B) ent (0x7B) instruction sequence in locations 0x202 and 0x203, respectively. Included in the trace are the initial and final register settings and the statistics for the instruction sequence. The bra $+2 instruction (0x80 0x00) in (0x20D, 0x20E) of the primitive has been replaced by a pointer to 0x20F for the indirection. W is left pointing to the "header", of the pointer to the parameter / code field of the primitive.

Code: Select all

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 0200 0000 017F 0000 01FF 0202 00110000
              0000 4444 0000 01FF 0000 DL YXSIZ
              0000 2222 0000           10 00000

.g 200
   IR: 3B <= mem[0200]
 rdDM: 05 <= mem[0202]
 rdDM: 02 <= mem[0203]
   IR: 9B <= mem[0205]
   IR: 7B <= mem[0206]
 wrDM: 02 => mem[017F]
 wrDM: 04 => mem[017E]
 rdDM: 0D <= mem[0207]
 rdDM: 02 <= mem[0208]
 rdDM: 0F <= mem[020D]
 rdDM: 02 <= mem[020E]

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 020F 0000 017D 0000 01FF 0209 00110000
              0000 4444 0000 01FF 020D DL YXSIZ
              0000 2222 0000           10 00000

.cycles
Total = 11, Num Inst = 2, Pgm Rd = 3, Data Rd = 6, Data Wr = 2, Dummy Cycles = 0
  CPI = 5.50, Avg Inst Len = 1.50

Brad R · Post by **Brad R** » Thu Feb 21, 2019 10:28 pm

barrym95838 wrote:

Yeah, I suppose that a large part of my confusion revolves around the use of W, because I haven't run across an urgent need to utilize that register at all in any of my DTC adventures so far, so it feels like a fifth wheel to me.

I'm still reading this thread -- and I'd better go back and read Michael's 2014 thread -- but I can address this point. Generally speaking, W is necessary on processors that can't do double indirection. A few CPUs, like the PDP-11 and 6809, don't need a W register (especially when doing DTC). Likewise with processors that have dedicated threaded-language instructions, like the Super8.

However, if you're going to use a W register -- even as an "internal" (not programmer-visible) register in your custom CPU implementation -- you can take advantage of the fact that the W register will contain the address (plus or minus a constant) of the word you are executing. This can get you the PFA of the word without having to pop it off the return stack.

More later; I have to finish reading this thread and the earlier thread.

MichaelM · Post by **MichaelM** » Fri Feb 22, 2019 1:26 am

Brad, thank you for joining us. Certainly appreciate you taking that time to come look over this and the previous thread.

For you and others, I think that I need to clarify the register display shown above.

The main registers, A, X, and Y, are implemented as three register stacks. The top line is the top-of-stack (TOS). Loading and storing these registers does not automatically push or pop the register stack. This is done so that programming the processor as a stock 6502/65C02 can be done transparently. Manipulating the stack is done by using three instructions: dup, swp, and rot. To push the stack, dup is used before the 6502/65C02 load instruction. To pop the stack, rot is used after the 6502/65C02 store instruction. The TOS and Next-On-Stack (NOS) registers can be swapped using the swp instruction. I use the swp instruction in much of the generated code by the Mak Pascal instead of using the dup and rot instruction. The rotate instruction performs the following operation: TOS <= NOS <= BOS <= TOS.

The M65C02A supports three hardware stack pointers. The default stack for Forth VM push / pull operations is the auxiliary stack implemented by the TOS register of the X register stack. The OSX prefix instruction changes the default stack used by push / pull / jsr / rts / rti instructions. The default stack address for push / pull operations other than the Forth VM ent, phi/phw, and pli/plw instructions is the system stack pointer. There are two system stack pointers: Sk, and Su. Sk is the kernel mode stack pointer, and Su is the user mode stack pointer. The operating mode defaults to kernel mode and can only be changed by an rti instruction executed in kernel mode; interrupts automatically switch to kernel mode.

In the register display, the top row are the TOS of the three main registers, the middle row are the NOS registers, and the third row are the BOS registers. Under the VM label, the first row is the IP register and the second row is the W register. The PSW is broken down by the bits and the commonly accepted labels for each of the bits. The M bit, bit 5, is the processor mode: 1 - Kernel, 0 - User. Under the SP label is the kernel mode stack pointer on the first row, and the user mode stack pointer on the second row. The Su is accessible from the kernel mode, but Sk is not accessible in the user mode. Below the PSW, on the third row, are the various prefix instruction flags. Two labeled bits, D and L, are not prefix instruction flags. L is an override for the OSX prefix instruction flag for certain instructions and is set by the microcode. D is a py65 monitor flag used to enable the instruction traces shown above.

I hope this little description helps the readers. A more complete description is provided in the documentation for the core. Please note that I've not kept documentation current, and I've made substantial changes to the core since I first wrote that documentation. Finally, I am working with py65 to form a model by which I expect to generate test vectors for the HDL core. At this time, the HDL core is not in a running state.

Brad R · Post by **Brad R** » Fri Feb 22, 2019 9:24 am

Michael, thanks for the clarification of the register display. That does help. For what it's worth, your "rot" will be known to most Forth programmers as "-rot" (rotate backwards), since the Forth ROT does TOS <= 3OS <= NOS <= TOS (where 3OS is "third on stack"). Yours looks more like the "roll down" on my HP calculator.

I'm a bit handicapped reading your test code, because I don't know what opcodes you have assigned. Here's my paraphrase of what I think your test code contains, with [ xxx ] referring to a 16-bit cell and [ xxx / xxx ] referring to the two bytes of a 16-bit cell:

0x200: [ PLI NXT / 0x00 ]
... opcode 0x3B is PLI NXT, yes?

0x202: [ 0x205 ] [ 0x00 / --- ]
... seems to be a partial thread with only one word address (0x205); I don't know the purpose of the "header #2 (partial)"

0x205: [ ENT / 0x00 ] [ 0x20d ] [ 0x000 ] [ 0x000 ]
... seems to be the beginning of a secondary definition containing only one word address (0x20d)

0x20d: [ bra $+2 ] [ machine code ... ]

Strictly speaking, in DTC the "bra $+2" is not necessary, but I saw your comment in the other topic about using it as a place holder.

To start the simulation, you appear to be putting 0x202 in IP and then doing the action of NXT. If I'm reading your trace correctly, your machine reads the contents of 0x202 into W, making W = 0x205 (the address of a Forth word, in this case a secondary word), then copies W to PC and reads an opcode at 0x205. That opcode 0x7B is ENT, which pushes the IP (currently 0x204), sets the IP = W+2 = 0x207, and then does the action of NXT, which reads the contents of 0x207 into W, making W = 0x20D, then copies W to PC and reads the opcode at 0x20D and branch offset at 0x20E.

All of that looks correct. Arguably the W register is superfluous for DTC, but if you're implementing a dual ITC/DTC instruction set I can see why you'd want to keep W. I presume at the end of your secondary thread there will be an address 0x200, which will cause the primitive at 0x200 to be executed (the Forth word EXIT), which will do the PLI NXT, which will cause interpretation to continue at 0x204.

ITC is essentially the same except that instead of copying W to PC, you fetch from memory @W to the PC. At 0x20d you now have

0x20d: [ 0x20f ] [ machine code ... ]

which is what I would expect, but I don't see how the initial NXT works (when you have IP = 0x202). The IENT trace appears to read the contents of 0x202 into W, making W = 0x205 (ok so far), but then it reads opcodes at 0x205 and 0x206, rather than fetching from memory @W into the PC. For an ITC implementation I would expect the word at 0x205 to be something like

0x205: [ 0x207 ] [ IND / ENT ] [ 0x20d ] [ 0x000 ]

and IND/ENT must be modified to set IP = W+4 rather than W+2. (In ITC, the first cell of a Forth word is the address of machine code.)

As an aside, if ENT is going to assume that the W register contains the address of the word (as your table of critical Forth operations suggests), you need to handle the Forth word EXECUTE properly. EXECUTE performs a Forth word whose address is given on the parameter stack, not taken from a thread. You will need the ability to copy that address to the W register, and then either jump to that word (DTC) or fetch the PC from that address (ITC).

Code: Select all

             ITC                                     DTC
================================================================================
EXECUTE: W     <= (PSP++) -- Ld *Code_Fld    ; W      <= (PSP++) -- Ld *Code_Fld
        PC     <= (W)    -- Jump Dbl Indirect; PC     <= W      -- Jump Indirect

If W is a "hidden" (internal) register, you'll need an instruction to set it to a given value. (Or just an EXECUTE machine instruction.)

Brad R · Post by **Brad R** » Fri Feb 22, 2019 9:45 am

Ack. Posting at 4 am, I'm not at my best.

Quote:

For an ITC implementation I would expect the word at 0x205 to be something like

0x205: [ 0x207 ] [ IND / ENT ] [ 0x20d ] [ 0x000 ]

and IND/ENT must be modified to set IP = W+4 rather than W+2. (In ITC, the first cell of a Forth word is the address of machine code.)

A better solution would be for the word at 0x205 to be something like

0x205: [ 0x299 ] [ 0x20d ] [ 0x000 ] [ 0x000 ]
...
0x299: [ IND / ENT ]

and IND/ENT can set IP = W+2. There's nothing magical about the address 0x299; I just wanted to signify an address well removed from, and not connected to, the 0x205 word. All secondary words can share the single appearance of IND/ENT at 0x299.

MichaelM · Post by **MichaelM** » Fri Feb 22, 2019 11:47 pm

Brad:

Thanks for your time.

Understand your comment about the ient test code. Will modify the unit accordingly. Thanks.

On the ent, I started the test with a nxt at 0x200 instead of a pli nxt instruction sequence in 0x200, 0x201. I just set up ip to a value of 0x202, which is in the parameter field of some arbitrary secondary word. I apologize for any confusion my particular set up may have caused.

Quote:

I don't know the purpose of the "header #2 (partial)"

By the notation "header # 2 (partial)", I intended to signify that the list of pointers of the secondary Forth word was truncated for test purposes.

Quote:

I presume at the end of your secondary thread there will be an address 0x200, which will cause the primitive at 0x200 to be executed (the Forth word EXIT)

Well this comment clarifies for me how to terminate secondary Forth words: pointer to EXIT. Exit is the pli nxt code sequence, of which only the nxt opcode is included at address 0x200 for the unit test. I expect all primitives to include the EXIT instruction sequence.

So in summary, (1) DTC / ITC Forth will have a common EXIT sequence for secondaries reached by nxt using the last pointer in the parameter field; (2) DTC primitives will be terminated by the EXIT code sequence; (3) ITC Forth will have a common ENTER for secondaries reached by nxt; the "header" or codefield of DTC Forth is unnecessary if W is not used, or if W and not W + 2 is used for the parameter field address.

The following is a short table of the opcodes representing the register stack instructions, the Forth VM instructions, the prefix instructions, and the ip-relative instructions:

Code: Select all

opcode mnemonic     description
 0x0B    dup        {tos, nos, bos} <= {tos, tos, nos}
 0x1B    swp        {tos, nos, bos} <= {nos, tos, bos}
 0x2B    rot        {tos, nos, bos} <= {nos, bos, tos}
 0x3B    nxt        w <= (ip++); pc <= w
 0x4B    phi        (RSP--) <= ip   -- x = default RSP
 0x5B    ini        ip += 1
 0x6B    pli        ip <= (++RSP)   -- x = default RSP
 0x7B    ent        (RSP--) <= ip; ip <= w + 2; w <= (ip++); pc <= w
 0x8B    osx        change default stack, override x with s
 0x9B    ind        add indirection to address mode, enable alternate function
 0xAB    siz        convert alu operation to 16 bits, enable alternate function
 0xBB    isz        combined ind and siz
 0xCB    osz        combined osx and siz
 0xDB    ois        combined osx, ind, and siz
 0xEB    oax        override / exchange a and x
 0xFB    oay        override / exchange a and y
 
 Alternate functions:
 
 0x9B 0x3B  inxt    w <= (ip++); pc <= (w)
 0x9B 0x4B  phw     (RSP--) <= w
 0x9B 0x5B  inw     w += 1
 0x9B 0x6B  plw     w <= (++RSP)
 0x9B 0x7B  ient    (RSP--) <= ip; ip <= w + 2; w  <= (ip++); pc <= (w)
 
 0x9B 0x0B  tai     ip <= a
 0xAB 0x0B  tia     a <= ip
 0xBB 0x0B  xia     a <> ip     -- exchange a and ip
 
 0x9B 0x1B  swb     a <= {a[7:0], a[15:8]}  -- swap bytes
 0x9B 0x2B  rev     a[15:0] <= a[0..15]     -- reverse bit order
 
 ip-relative instructions
 
 0x03    ora 0,I++  a |= (ip++)
 0x13    asl 0,I++  arithmetic shift left, 0 shifted into lsb
 0x23    and 0,I++  a &= (ip++)
 0x33    rol 0,I++  rotate left through c, c rotated into lsb
 0x43    xor 0,I++  a ^= (ip++)
 0x53    lsr 0,I++  logical shift right into c, 0 shifted into msb
 0x63    adc 0,I++  a <= a + (ip++) + c
 0x73    ror 0,I++  rotate right through c, c rotated into msb
 0x83    sta 0,I++  (ip++) <= a
 0x93    tsb 0,I++  test and set bit indirectly addressed by ip
 0xA3    lda 0,I++  a <= (I++)
 0xB3    trb 0,I++  test and reset bit indirectly addressed by ip
 0xC3    cmp 0,I++  compare a with byte indirectly addressed by ip
 0xD3    dec 0,I++  decrement byte indirectly addressed by ip
 0xE3    sbc 0,I++  a <= a + ~(ip++) + c
 0xF3    inc 0,I++  increment byte indirectly addressed by ip
 
 support ind, siz, isz, oax, and oay prefix instructions
 
 prefix instructions are not interruptable by either NMI or IRQ.

MichaelM · Post by **MichaelM** » Sat Feb 23, 2019 3:45 pm

Instruction trace for ient per Brad's suggestions. I modified the test to use ITC exit, next, and enter as common instruction sequences. The trace provided below places ITC Exit, pli ind nxt, in locations 0x200-0x202, and ITC Enter, ind ent, in locations 0x203-0x204. A return into a secondary ITC Forth word is simulated by putting a pointer to 0x205 on the RS.

The trace shows pli being executed and pulling the ip from the stack, followed by the ITC Next which performs two indirect reads, and the ITC Enter which pushes ip and then does two indirect reads, and transfers control to the code located at 0x20F, a brk instruction for the purpose of the test.

This sequence of three instructions requires 17 cycles: pli - 3, inxt - 6, ient - 8.

Code: Select all

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 0000 0000 017D 0000 01FF 0000 00110010
              0000 0000 0000 01FF 0000 DL YXSIZ
              0000 0000 0000           10 00000

.g 200
   IR: 6B <= mem[0200]
 rdDM: 05 <= mem[017E]
 rdDM: 02 <= mem[017F]
   IR: 9B <= mem[0201]
   IR: 3B <= mem[0202]
 rdDM: 07 <= mem[0205]
 rdDM: 02 <= mem[0206]
 rdDM: 03 <= mem[0207]
 rdDM: 02 <= mem[0208]
   IR: 9B <= mem[0203]
   IR: 7B <= mem[0204]
 wrDM: 02 => mem[017F]
 wrDM: 07 => mem[017E]
 rdDM: 0D <= mem[0209]
 rdDM: 02 <= mem[020A]
 rdDM: 0F <= mem[020D]
 rdDM: 02 <= mem[020E]

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 020F 0000 017D 0000 01FF 020B 00110010
              0000 0000 0000 01FF 020D DL YXSIZ
              0000 0000 0000           10 00000

.cycles
Total = 17, Num Inst = 3, Pgm Rd = 5, Data Rd = 10, Data Wr = 2, Dummy Cycles = 0
  CPI = 5.67, Avg Inst Len = 1.67

The following instruction trace highlights the ient.s instruction using S instead X as the stack onto which ip is pushed. A pli.s instruction can also be used, but in this instruction trace, the default stack pointer, X, was used.

Code: Select all

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 020F 0000 017D 0000 01FF 020B 00110010
              0000 0000 0000 01FF 020D DL YXSIZ
              0000 0000 0000           10 00000

.g 200
   IR: 6B <= mem[0200]
 rdDM: 05 <= mem[017E]
 rdDM: 02 <= mem[017F]
   IR: 9B <= mem[0201]
   IR: 3B <= mem[0202]
 rdDM: 07 <= mem[0205]
 rdDM: 02 <= mem[0206]
 rdDM: 03 <= mem[0207]
 rdDM: 02 <= mem[0208]
   IR: DB <= mem[0203]
   IR: 7B <= mem[0204]
 wrDM: 02 => mem[01FF]
 wrDM: 07 => mem[01FE]
 rdDM: 0D <= mem[0209]
 rdDM: 02 <= mem[020A]
 rdDM: 0F <= mem[020D]
 rdDM: 02 <= mem[020E]

          PC   AC   XR   YR   SP   VM  NVMBDIZC
M65C02A: 020F 0000 017F 0000 01FD 020B 00110010
              0000 0000 0000 01FF 020D DL YXSIZ
              0000 0000 0000           10 00000

.cycles
Total = 17, Num Inst = 3, Pgm Rd = 5, Data Rd = 10, Data Wr = 2, Dummy Cycles = 0
  CPI = 5.67, Avg Inst Len = 1.67

Brad R · Post by **Brad R** » Sun Feb 24, 2019 5:40 am

MichaelM wrote:

On the ent, I started the test with a nxt at 0x200 instead of a pli nxt instruction sequence in 0x200, 0x201. I just set up ip to a value of 0x202, which is in the parameter field of some arbitrary secondary word. I apologize for any confusion my particular set up may have caused.

Oops, yes, I should have figured that out. A PLI would have had no function (actually would have been a problem) there. I was going by your comment that said (pli nxt).

Quote:

So in summary, (1) DTC / ITC Forth will have a common EXIT sequence for secondaries reached by nxt using the last pointer in the parameter field; (2) DTC primitives will be terminated by the EXIT code sequence; (3) ITC Forth will have a common ENTER for secondaries reached by nxt; the "header" or codefield of DTC Forth is unnecessary if W is not used, or if W and not W + 2 is used for the parameter field address.

Well, not exactly.
(1) Yes. Both DTC and ITC Forth have a common EXIT word, which is a primitive that does PLI NXT. Secondaries are terminated with the address of this word.
(2) DTC (and ITC) primitives are terminated by NXT, not EXIT.
(3) Whether or not W is used, a DTC primitive does not require a "header" or codefield. Primitives by definition do not have a parameter field*, and so don't need the value in W. Only secondaries and "defined words" (more on this in a moment) need the parameter field address.

* One could argue that the "parameter field" of a primitive is its machine code. But that doesn't change the fact that a primitive doesn't need to know the address of its own parameter field (its own machine code).

In my previous reply I mentioned that you need to be able to set W as part of EXECUTE. You also need the ability to read W, in order to implement "defined words" like CONSTANT and VARIABLE (and anything made with CREATE..DOES>). E.g., the code field for a constant will point to** machine code that takes the parameter field address (W or W+2, depending on implementation), fetches the contents of that address (the constant's value), and pushes that on the stack. You have this to a limited extent with PHW and PLW, but these will require additional machine instructions to get the desired result, and you might want to add some W instructions "optimized" for the most common defined words.

** For ITC, the code field contains the address of the machine code. For DTC, it commonly contains a subroutine call to the machine code, unless you want to create dedicated machine instructions for the different classes of defined words.

Thanks for the opcode table; that will be helpful.

I'll look at your new traces next.

Brad R · Post by **Brad R** » Sun Feb 24, 2019 7:35 am

Okay, let me see if I understand your first trace.

PLI at 0200. That fetches the value 0205 into IP.
INXT at 0201-0202. That fetches location 0205, value 0207, into W. Then fetches location 0207, value 0203, into PC.
IENT at 0203-0204. That pushes 0207 (?) onto return stack, then I assume sets IP to 0209 (W+2). Then reads location 0209 -- the first address in a secondary definition -- value 020D, into W. Then reads location 020D, value 020F, into PC.
Primitive machine code at 020F, I assume.

(?): I had a bit of confusion because the IENT pushes 0207 onto the return stack, and 0207 is the address of your secondary word. But that's just because you have an incomplete thread at 0205, and IP was left with the value 0207.

So that much looks ok.

MichaelM · Post by **MichaelM** » Sun Feb 24, 2019 12:22 pm

Brad:

Thanks again for your time. Your interpretation of the instruction trace for ient matches what I thought was is required.

Quote:

Well, not exactly.
(1) Yes. Both DTC and ITC Forth have a common EXIT word, which is a primitive that does PLI NXT. Secondaries are terminated with the address of this word.
(2) DTC (and ITC) primitives are terminated by NXT, not EXIT.
(3) Whether or not W is used, a DTC primitive does not require a "header" or codefield. Primitives by definition do not have a parameter field*, and so don't need the value in W. Only secondaries and "defined words" (more on this in a moment) need the parameter field address.)

I don't know how I had gotten the idea that all Forth Words ended in Exit, but your statement clarifies that for me. I was having a bit of confusion looking at the figForth implementation for the 6502 that started me down this path of including support for Forth VMs in the M65C02A core.

Quote:

In my previous reply I mentioned that you need to be able to set W as part of EXECUTE. You also need the ability to read W, in order to implement "defined words" like CONSTANT and VARIABLE (and anything made with CREATE..DOES>). E.g., the code field for a constant will point to** machine code that takes the parameter field address (W or W+2, depending on implementation), fetches the contents of that address (the constant's value), and pushes that on the stack. You have this to a limited extent with PHW and PLW, but these will require additional machine instructions to get the desired result, and you might want to add some W instructions "optimized" for the most common defined words.

I'll have to have a deeper look into CONSTANT and VARIABLE. I did look into EXECUTE, at least in figForth which is the only Forth implementation I currently have that I can inspect. For the figForth Execute I think that there is a way carry out the operation without requiring improved access to the internal W register, but I will need to do further analysis of the other "defined words" and the behaviors of the CREATE..DOES> words.

From figForth, Execute uses the following code, where x is the PSP for the PS in page 0:

Code: Select all

EXEC            .wrd  $+2               
                lda   0,x               
                sta   W                 
                lda   1,x               
                sta   W+1               
                inx                     
                inx                     
                jmp   W-1               ;to JMP (W) in z-page

Since I think that S will be the PSP, the following code should provide the same functionality. Since I can't at the moment see how to support the same behavior for a DTC Forth implementation, your suggestion of additional instructions using W has high merit.

Code: Select all

EXEC            .wrd  $+2               
                pla.w            ; siz pla
                jmp (0,A)        ; oax jmp (abs,X)

I thought that the ip-relative instructions might provide the means by which CONSTANT and VARIABLE might be implemented using pointers embedded in the threads. For example, if a pointer to a CONSTANT follows the pointer (or the actual constant) to CONSTANT, then the following instruction sequence would put the value of the constant onto the parameter stack:

Code: Select all

CONST           .wrd  $+2               
                lda.w  (0,I++)  ; isz lda 0,I++
                pha.w           ; siz pla
                inxt            ; ind nxt

(If the constant is embedded directly in the thread, then simply leave off the ind prefix for the lda 0,I++ base instruction, and if the constant is a byte instead of a word, leave off the siz prefix as well. When a byte is loaded, the upper 8 bits are loaded with 0x00.)

Once again, thanks for your time. It's certainly appreciated.

M65C02A Forth VM ENTER Implementation Check Request

M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request

Re: M65C02A Forth VM ENTER Implementation Check Request