6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 12:01 pm

All times are UTC




Post new topic Reply to topic  [ 41 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: M65C02A Forth VM Support
PostPosted: Tue Aug 26, 2014 5:31 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
This post will describe some planned enhancements to the M65C02A core. These planned enhancements, beyond those already implemented, have the objective to enable the efficient implementation of a Forth VM on the M65C02A core. The overarching goal is to maintain binary compatibility with 6502/65C02 processors such as the W65C02S, unlike the R65C19/29/39 microprocessors. That is, no fundamental changes to the standard instructions will be considered. I think that it is important to be able to execute any existing code that does not take advantage of, or depend in any way on the behavior, of the unimplemented instructions. All M65C02A enhancements will be implemented using the unimplemented opcodes of the W65C02S microprocessor.

First, I have begun adding the address mode modification prefix instruction, IND, to the core. It is expected to allow the programmer to specify indirect addressing for those instruction not already supporting indirect addressing. For example, instructions such as TSB zp, TRB zp, BIT zp, INC zp, etc., if preceded by the IND prefix instruction would be converted to TSB (zp), TRB (zp), BIT (zp), INC (zp), etc. Application of the IND prefix instruction to the absolute address mode of these instructions would generate TSB (abs), TRB (abs), etc. A fairly simple modification to the microprogram controller (MPC) or microprogram sequencer (MPS) allows the address mode modification feature to be easily added to M65C0A instruction set.

Second, I have also have plans to support a SIZ prefix instruction which would convert an operation from 8-bits to 16-bits. Unlike the IND prefix instruction, supporting both 8-bit and 16-bit operations will require some changes to the registers and the addition of some logic to the ALU which can be used to use an 8-bit ALU to perform cascaded operations when performing 16-bit operations. Rather than extending the width of the ALU, since the external bus width is not changing, the plan is to perform two cascaded 8-bit operations when the SIZ prefix precedes an ALU instruction. At this time, SIZ is expected to be allowed for the following operations: ORA/AND/EOR/ADC/STA/LDA/CMP/SBC/INC/DEC/ASL/ROL/LSR/ROR. I am also thinking about extending X and Y to 16 bits. If that happens, then SIZ would be allowed for the X and Y specific instructions as well.

Third, I am also considering how to change the destination register for some instructions: ORA/AND/EOR/ADC/STA/LDA/CMP/SBC/INC/DEC/ASL/ROL/LSR/ROR. Specifically, I am looking into the possibility of allowing these instructions to be applied directly to X and Y. As part of these ruminations, I am looking at how to implement X and Y as stack pointers so that the stack operations can be performed using S, X, and Y.

I think that I've worked out how to implement a single instruction Next, NXT, that will support a direct threaded Forth. Without implementing any new internal registers, I think that I can implement a DTC Next using the currently available registers in the M65C02A ALU and Address Generator. The Interpretive Pointer (IP) would be specified by any pair of page zero locations.

As I've worked it out, NXT zp, would perform the DTC operation JMP (IP++) in approximately 8 clock cycles. I've looked around, and I have not been able to determine if the cycle count for Next is reasonable. Does anyone have a DTC Next implemented for the 65C02 that they'd be willing to share?

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 26, 2014 6:49 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
MichaelM wrote:
... As I've worked it out, NXT zp, would perform the DTC operation JMP (IP++) in approximately 8 clock cycles. I've looked around, and I have not been able to determine if the cycle count for Next is reasonable ...

That seems like an efficient implementation, especially if the zp pointer is double-incremented. If I'm not mistaken, ITC could also use it as the last instruction of its two- or three-instruction NEXT, with a JMP (W++). I don't have them all in front of me, but I think that Jeff's hardware-assisted ITC NEXT on his KimKlone is the only "real" one around with 6502 flavor that could give yours a run for its money (1 byte, 9 cycles), and his is not as flexible.

Mike


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 27, 2014 2:55 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
MichaelM wrote:
Application of the IND prefix instruction to the absolute address mode of [instructions such as TSB zp, TRB zp, BIT zp, INC zp, etc] would generate TSB (abs), TRB (abs), etc.
These are intriguing developments, Michael. If the IND prefix instruction is applied to the absolute address mode of ORA/AND/EOR/ADC/STA/LDA/CMP/SBC, would those likewise generate absolute indirect? The notion of being able to LDA and STA (for example) via a pointer residing anywhere in memory is quite compelling! :shock:

The SIZ prefix sounds good, too, assuming it's applicable to PHA & PLA as well. Hmmm... In the context of an Interrupt Service Routine which doesn't require use of the register high-byte, you could save time by using PHA on entry, then later PLA on exit, without the SIZ prefix.

Quote:
I am looking into the possibility of allowing [ORA/AND/EOR/ADC/STA/LDA/CMP/SBC/INC/DEC/ASL/ROL/LSR/ROR] to be applied directly to X and Y.
As I mentioned in the other thread, I'm in favor of this sort of functionality. For example, ASL applied to X is a great way to transform a byte index so it can index 2-byte objects, 4-byte, 8-byte etc. And X and Y would be handy as alternative accumulators for logical operations, too (AND OR EOR). Of course STA/LDA/CMP/INC/DEC applied to X and Y are superfluous, since we already have STX/LDX/CPX/INX/DEX and STY/LDY/CPY/INY/DEY. But that's neither here nor there -- it's not a disadvantage.

FWIW, I'll mention another approach, and that is to allow register-to-register exchanges, or "swaps." So, just as we have TAX and TAY, we'd also have SAX and SAY. (Do you have enough data paths to do these in a single clock?) Again using the example of transforming a byte index in X into a word index, that could be coded as SAX / ASL A / SAX.

Quote:
I think that I've worked out how to implement a single instruction Next, NXT, that will support a direct threaded Forth. Without implementing any new internal registers, [...]
Is it out of the question to implement IP as an internal register, then? I realize that'd consume 16 more flipflops on chip, and you'd be obliged to work out ways & means of getting data in & out of the new reg. (Reminder: an on-chip reg can also be memory-mapped.) But DTC NEXT could occur in as little as 3 cycles!

  • cycle 1: fetch NEXT opcode
  • cycle 2: fetch byte at IP
  • cycle 3: fetch byte at IP+1. Internally update IP.
  • ( commence execution of target code snippet )

Hmmm, OK, maybe there'd have to be a decode cycle following the opcode fetch. But in any case IP on-chip is still a big win. Without it, NEXT will need 2 extra cycles to read IP and 2 more (or one if you're lucky) to store incremented IP off chip again. Finally, as noted in the other thread, it'll cost you yet another cycle if the NEXT instruction features an operand -- ie; the location of IP is explicit. (But if there's a decode delay that'd hide the operand fetch.)

barrym95838 wrote:
[KimKlone NEXT] is not as flexible.
I agree there's a loss of flexibility due to the location of IP being implicit rather than explicit -- is that what you meant? That decision was forced on me, but I don't regret it. Shaving a cycle off of NEXT is a much bigger advantage than being able to relocate IP, or maintain multiple IPs. (BTW thanks for the link, Mike! :) )

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 27, 2014 2:00 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Jeff:

I've partitioned the microprogram implementation of the M65C02A into two, essentially, independent halves. One half controls the sequence through an instruction cycle, and the other half controls the operation of the ALU and the registers. A side-effect of this organization is that the sequencing half of the microprogram implements the complex addressing modes. There are a number of short micro-sequences within the microprogram that implement the various addressing mode in accordance to the type of operations required: Read-Only, Write-Only, and Read-Modify-Write. Those addressing modes that don't fit these general categories, i.e. push, pop, jsr, jmp, branch, rti, and rts, are implemented in dedicated micro-sequences. A side benefit of this approach has been that if the ALU control (fixed) microprogram is verified independently using a self-checking testbench, then the addressing mode sequences can be verified without regard to the various instructions to which the mode applies.

That being said, applying the IND to the ORA/AND/EOR/ADC/STA/LDA/CMP/SBC abs instructions should be expected to yield the conversion of the abs, abs, X, and abs,Y addressing modes into (abs), (abs,X), and (abs),Y addressing modes. Whether I can accomplish this within the available microprogram space is the issue. One thing is sure, though, if an addressing mode microsequence is modified to support the IND prefix instruction, then all instructions that use that addressing mode will be able to use the addressing mode conversion implied by IND.

An exception to this statement applies to the zero page addressing modes. The multi-way branch used to support IND is generally inserted for zero page addressing modes in the first microword after the opcode fetch. This means that if the multi-way branch is not inserted in that microword, then that opcode/instruction will not support the IND prefix. Since the ORA/AND/EOR/ADC/STA/LDA/CMP/SBC instructions support all of the zero page addressing modes, there is no need to apply IND to these instructions when they are used with zero page addressing modes.

I think that I've resolved how to implement SIZ for a broad class of instructions such as PHA and PLA, which use implied addressing. I've also determined how to combine the IND and SIZ prefix instructions so that the internal flags each sets remains set until a non-prefix instruction is executed. (Note: the prefix instructions are uninterruptable, including while NMI is asserted, so that they can be combined with the action instruction as a single indivisible operation.) However, the method for combining IND and SIZ for a single instruction, one of the primary goals of this effort, has not yet revealed itself to me.

I've given a lot of thought to implementing register swaps. I agree with your suggestion/implication that register swaps may be better than register transfers in some cases. This is particularly true if X or Y was combined with A to form a 16-bit accumulator for extended precision arithmetic. I gave a lot of thought to the current microarchitecture with the idea of implementing register swaps as you suggested. However, as you correctly surmise, the microarchitecture is limited by the number of data paths and controls available. Continuing to work on this problem was delaying the completion of the core to the point where I could solicit comments such as the ones you are providing, I decided to leave that feature to a future date. Both of us are in agreement as to the utility of that feature, and I think that your suggestion for register swaps should be added to the list of features to be included, if possible, to the core.

I think that the method I've been considering for implementing the destination register swap/modification feature will avoid the redundant operation that you point out with respect to the X and Y specific instructions: STX/LDX/CPX/INX/DEX and STY/LDY/CPY/INY/DEY. But as you point out, if my approach ends up creating a redundant version of these instructions, there's really no loss in generality.

I agree that a dedicated 16-bit IP register would allow the reduction of the number of cycles of Next as you suggest. (Your assumption that it would result in a 3 cycle Next within the current microarchitecture is also correct, i.e. no separate dedicate instruction decode cycle is necessary. For the moment, I've worked out how to implement it completely within the resources available in the current core. With the 16-bit registers in the core, Next can be implement in 7 or 8 cycles. It will be 7 cycles if a +2 operation is included in the address generator module, and 8 if only the current +1 increment capability is included.

The primary limiting factor in implementing separate 16-bit registers for any Forth VM registers like IP, W, PSP, and RSR, is that there is no good way to increase the width of the microprogram memory without taking block RAM memory away from the microprocessor. Thus, for the moment, I am trying to fit the VM into the current microarchitecture without reducing the amount of BRAM available to the microprocessor as program/data memory: 28kB for program/data memory, and 4kB for microprogram memory. Performance suffers if I return to using encoded control fields, and at this point, I just don't want to do that. 30 MHz in a Spartan 3A -4 part and 40 MHz in a Spartan 6 -2 part appear to be the practical limits of the core (plus memory and peripherals) in single cycle mode. For me these limits represent a good compromise between performance and features for the core.

Time to go to work. :(

_________________
Michael A.


Last edited by MichaelM on Sat Nov 15, 2014 6:18 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 02, 2014 2:21 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I have released an update of the M65C02A core. I have added four prefix instructions, and fully implemented the IND prefix instruction. It adds/converts direct addressing modes into indirect addressing modes for those instructions which do not already support indirect addressing modes. See the list of instructions affected provided in the Readme file of the core's GitHUB repository.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 03, 2014 7:02 pm 
Offline

Joined: Mon Apr 16, 2012 8:45 pm
Posts: 60
If Forth support is the core here, have you looked at the 65EL02 design by Eloraam?
http://www.eloraam.com/nonwp/redcpu.php
http://www.eloraam.com/blog/2012/04/22/ ... internals/
http://integratedredstone.wikispaces.co ... er+Control
http://integratedredstone.wikispaces.com/65EL02


He design appears to have CPU support for 2 stacks and there is a Forth available for this design.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 03, 2014 11:52 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I found the RP65EL02 a year or more ago. The links you provide are to the same site that I visited. I followed your links again, and I was unable to find any FPGA code. The RP65EL02 appears to be an emulated software-only processor definition that is used to implement games in a simulated environment. Thus, my take is that no actual hardware implementation of the RP65EL02 actually exists; it is simply an emulated processor for use in the Minecraft environment.

My goal is to complete an implementation of the M65C02A, with FORTH VM support, that can be implemented directly in a small, low cost FPGA like the Spartan 3A XC3S200A-4VQG100I FPGA which I am working with at the moment. While extending the instruction set, I want to maintain compatibility with the W65C02S processor to the greatest extent possible. The goal is to make the M65C02A core transparent to common 6502/65C02 tools, and allow its extended capabilities to be accessed using macros or extended tools.

As currently defined, the RP65EL02 provides a significant level of compatibility with the 6502. More importantly, from discussions regarding its errata, which appear to be minor, it is intended to be used as a native mode 65816 emulation instead of a 65C02 emulation. Given its intended purpose as a game engine, that is probably the way to most effectively use it.

Thanks for providing the links.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 4:39 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
The college football season is almost over, so I've had some time recently to get back to my FORTH VM issue. After some noodling on this problem, I have settled on the approach that I'm going to take. I will add an internal module dedicated to providing internal IP and W registers. Not all of the issues have been resolved, but as Jeff suggested, including the IP and W registers internally will provide the best performance. It may lower the overall clock rate marginally because some of the additional multiplexers required are in the critical address path. However, the savings in terms of clock cycles more than offsets the small additional address path delay they introduce.

On the mapping of the FORTH VM registers onto the registers of the M65C02A, I've made the following decisions:

(1) I've decided that IP and W will be implemented internally as additional 16-bit registers.

(2) The Y register will be augmented with a modulo 64 stack pointer capability and used to implement the return stack. To mechanize these feature, a prefix instruction OSY will be used to override the normal stack pointer S with Y. Thus, everywhere OSY precedes a stack instruction, PHA, PLA, etc., the Y register will be used instead of S. The additional clock cycle penalty imposed by the prefix instruction is not particularly onerous given that the parameter/data stack is much more frequently used. The post-indexed stack relative indirect instructions will be modified to use X instead of Y as the index register. Y was chosen as the RSP because it is not as commonly used as the X register. Keeping Y as part of the stack-relative indexed indirect addressing mode would result in a lot of unnecessary saving and restoring of the RSP. This change uses the X register in a previously unsupported manner, but the stack-relative indexed indirect addressing mode is unique to the M65C02A. Therefore, with no existing tools supporting the mode for 65C02-compatible parts there's no reason not to make the change.

(3) The normal stack pointer S will be used as the PSP.

(4) Neither the A or the X register will serve as dedicated FORTH VM registers. This allows X to be used as it has been in 6502/65C02 programming, and for A to be used in a normal manner.

I've almost worked out how to extend the operation of the ALU to support 16-bit functions. As a consequence, I've defined two prefix instructions, SIZ and ISZ, to define an operation as a 16-bit load, store or ALU operation. SIZ will extend the instruction from 8 to 16 bits. ISZ will extend the instruction from 8 to 16 and also convert the addressing mode to an indirect mode. These two new prefix instructions supplement the IND prefix instruction implemented for the last release of the M65C02A core. Two other prefix instructions, OAX and OAY, allow A to be overridden by X or Y. These two prefix instructions allow the full capabilities of the ALU to be applied to the X and Y registers. An override of this type applied to the S was considered, but the lack of opcode space precluded its incorporation into the plan.

I am a bit partial to the stack-oriented ALU of the Inmos Transputers. I have designed several ALUs for use in my work that utilize that stack architecture. I've decided to at least incorporate a triple level stack in place of the M65C02A's A register. Depending on the effectiveness of that evaluation stack, I may add a triple level stack for X and Y as well. I expect the stack to reduce the number of load and store operations that are needed to perform computations. In support of this anticipated evaluation stack, I've reserved three opcodes DUP, SWP, and ROT to manipulate the evaluation stack. Each LDA will push the stack, and each STA will pop the stack. Arithmetic operations will operate as TOS and memory as currently supported by the 6502/65C02 ISA.

The FORTH VM will be supported by a number of single byte instructions: NXT, ENT, PHI, PLI, INI. NXT will perform the inner interpreter function. ENT will push IP onto the RS (PHI) and set IP as W + 2. ENT followed by NXT will complete the entry of the FORTH VM into the primitive. Exit from a primitive will consist of pulling IP from the RS (PLI) followed by NXT.

When used without the IND prefix, NXT will perform as required for DTC FORTH VM. When preceded by IND, NXT will perform as required for an ITC FORTH VM.

The stack-relative addressing modes should make the implementation of many of the fundamental FORTH functions much easier. Thus, not supporting additional FORTH primitives with dedicated instructions is not required given the expanded instruction set of the M65C02A.

I am interested in any feedback. The following two figures (transposed into the more common format) provides the planned expansion of the M65C02A instruction set to support ITC and DTC FORTH VMs:
Attachment:
M65C02A-InstructionSetMatrix(x0-x7)-RevA.JPG
M65C02A-InstructionSetMatrix(x0-x7)-RevA.JPG [ 312.6 KiB | Viewed 3084 times ]

Attachment:
M65C02A-InstructionSetMatrix(x8-xF)-RevA.JPG
M65C02A-InstructionSetMatrix(x8-xF)-RevA.JPG [ 307.59 KiB | Viewed 3084 times ]

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 6:55 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
It sounds very interesting, Michael, but I'm having some difficulty seeing a clear picture of how your modifications can be used in a programming context.

Would it be possible for you to share some sample code snippets, showing how your new instructions and addressing modes can be used to implement a simple benchmark, like Sieve of Eratosthenes, or some Forth core primitives, like (for example) + - AND ENTER EXIT @ ! DUP SWAP ...

Thanks,

Mike


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 9:59 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Very interesting work, Michael. I'd make one comment: for LDA and for arithmetic, converting registers into evaluation stacks is transparent. But a destructive STA seems quite a change. With your proposal, one would have:
STA is destructive
DUP; STA is the conventional store.
I would suggest retaining the conventional store, so you'd instead have
STA; POP is destructive
STA is the conventional store.
That would make it much easier to start programming the machine, from a 6502 background.

Having said which, I see there is no POP and also no room for it... and you'd need PPX and PPY as well.

Hmm. Perhaps you need to rename STA as SAP...

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 2:50 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Ed:

Good point regarding the automatic adjustment of an evaluation stack. Pushing and popping elements from the hidden registers of the evaluation stack should be performed explicitly in order to preserve the expected behavior of the LDA and STA instructions. Therefore, pushing should be performed using the DUP LDA instruction sequence, and popping should be performed using the STA ROT instruction sequence. These two sequences could be wrapped in appropriately named macros to make using the evaluation stack a bit easier. Both DUP and ROT are single byte, single cycle instructions, and when used this way, they will essentially function like some of the prefix instructions.

Mike:

I've read Loeliger's and Brody's books, Brad's tutorials, but that is about as close as I've ever come to FORTH. Perhaps I've missed something obvious, so I'm looking to you, Jeff, and others to clear up any misconceptions that I've had while working to determine the best instructions to design and implement to support a FORTH VM with the M65C02A.

From Brad's website and the Heart of FORTH page, the three core operations of a DTC/ITC FORTH inner interpreter are defined as shown below:
Code:
             ITC                                     DTC
================================================================================
NEXT:   W      <= (IP++) -- Ld *Code_Fld     ; W      <= (IP++) -- Ld *Code_Fld
        PC     <= ((W))  -- Jump Dbl Indirect; PC     <= (W)    -- Jump Indirect
================================================================================       
ENTER: (RSP--) <= IP     -- Push IP on RS    ;(RSP--) <= IP     -- Push IP on RS
        IP     <= W + 2  -- => Param_Fld     ; IP     <= W + 2  -- => Param_Fld
;NEXT
        W      <= (IP++) -- Ld *Code_Fld     ; W      <= (IP++) -- Ld *Code_Fld
        PC     <= ((W))  -- Jump Dbl Indirect; PC     <= (W)    -- Jump Dbl Ind
================================================================================         
EXIT:   
        IP    <= (++RSP) -- Pop IP frm RS    ; IP     <= (++RSP)-- Pop IP frm RS
;NEXT
        W      <= (IP++) -- Ld *Code_Fld     ; W      <= (IP++) -- Ld *Code_Fld
        PC     <= ((W))  -- Jump Dbl Indirect; PC     <= (W)    -- Jump Dbl Ind
================================================================================         

What I've noticed is that NEXT is the instruction sequence through which both ENTER and EXIT complete. In Loeliger's book, he goes to great pain to terminate ENTER/EXIT through NEXT. In the preceding side-by-side pseudo-code description of these operations, it is clear that the only difference between the ITC NEXT and the DTC NEXT is the double indirection on W to reach the machine code that must be executed. I have reasoned that the M65C02A ENT instruction pushes IP onto the RS (return stack) and advances the W to point to the param field. I am assuming the following FORTH dictionary structure for which the Code Field is a two byte field for both ITC and DTC.
Code:
    struct FORTH_word_t {
        uint8_t     Len;
        uint8_t     [Max_Name_Len] Name;
        uint16_t    *Link;
        uint8_t     [2] Code_Fld;
        uint8_t     [Code_Len] Param_Fld;
    }

It is my understanding that the ITC code field points to ENTER if the word is a secondary or it points to NEXT if it is a primitive. In either case the double indirection on the code field locates the correct inner interpreter operation/routine/instruction. In the ITC case, the inner interpreter's NEXT is built from the IND NXT instruction sequence, and NEXT is built from ENT IND NXT.

It is my understanding that the DTC code field contains either CALLs/JMPs to ENTER (secondaries) or into the machine code in the param field (primitives). Thus, the DTC code field is filled with ENT NXT (secondaries) or NXT NOP (primitives).

In either case, EXIT from a primitive can be implemented as OSY PLI NXT or OSY PLI IND NXT (Pull IP from RS followed by NEXT). (The OSY prefix instruction forces the use of Y register instead of S for stack operations. I think that FORTH's EXECUTE needs to be able to pull IP from the parameter/data stack (PS/DS) (where S is the PSP), so I've opted to use the OSY prefix instruction to select the RS instead of the PS/DS.)

I've not implemented the IP/W module, but I expect ENT to be implemented much like a normal 16-bit push instruction with the IP <= W + 2 operation taking place in parallel using a dedicated increment by 2 function within the module. Thus, I expect the cycle count for ENT to come in at 3 cycles. I expect NXT to use 2 cycles for each level indirect operation and to use the same increment by 2 function to adjust IP. Therefore, I expect DTC NEXT (NXT) to come in at 5 cycles and ITC NEXT (IND NXT) to come in at 8 cycles.

The preceding discussion should explain my thinking on the FORTH VM inner interpreter support instructions: NXT, ENT, PLI. Any comments or corrections of misconceptions on my part are welcome.

From my work with Transputers and my stack machines, I think that many of the parameter stack operations can be performed more efficiently using the stack relative addressing modes. I realized that I had declared to new instructions INW dp and DEW dp. With the SIZ prefix implemented by the ALU and microprogram, there is no need for these two instructions. I think that I will re-purpose them as instructions for adjusting S. In other words, INW dp may be better used as ASP #imm (Add #imm to S) and DEW dp may be better used as SSP #imm (Subtract #imm from S).

As I understand FORTH @ it is used to read a memory location whose address is the pointer in the TOS element of the PS/DS. Also the pointer is removed from the PS/DS and the value read is pushed onto the PS/DS. Using stack relative addressing, FORTH @ can be implemented as follows:
Code:
    SIZ LDA (0,S)
    SIZ STA 0,S

As I understand FORTH ! it is used to write a memory location whose address is the pointer in the TOS element of the PS/DS with the data taken from the NOS element. Also both the pointer and data are removed from the PS/DS. Using the SIZ prefix instruction, FORTH ! can be implemented as follows:
Code:
    SIZ PLX         ; Pull pointer from PS/DS
    SIZ STX dp      ; Store into zero page pointer location
    SIZ PLA         ; Pull data from PS/DS
    SIZ STA (dp)    ; Store data using zero page pointer

I will provide some additional code fragments later today, but I've got head off to work now. :) :( take your pick.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 6:33 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Ah, using ROT for POP, that's good. Not sure about LDA needing a DUP - but I know that HP changed their ideas about automatic stack lift, in the early days of their RPN calculators. There's more than one way to do it!


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 6:36 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Lively discussion! :) Right now, Michael, I'll comment on your previous post only.

Kudos to you, and let me say how resoundingly right it feels to see serious priority given to Forth -- a chronically under-publicized language / programming style! Speaking of publicity, I'll grab this chance to trumpet the fact that the Philae lander for the ESA's recent comet mission is powered by a computer that runs Forth natively in hardware! There's a thread following this on the Anycpu.org forum (thanks, Ed).

MichaelM wrote:
(1) I've decided that IP and W will be implemented internally as additional 16-bit registers.
Since I advocated for this, naturally I'm in favor of IP and W being internal rather than living in external RAM. Although the decision costs a few ns every clock cyle, the payback occurs not in ns but in much larger increments -- such as saving 4 entire bus cycles every time IP is updated (ie; no need to read then write this 16-bit value from/to RAM). And updates to IP are extremely frequent -- it's VM's "program counter," after all.

Quote:
(2) The Y register will be augmented with a modulo 64 stack pointer capability and used to implement the return stack. To mechanize these feature, a prefix instruction OSY will be used to override the normal stack pointer S with Y.
I'm not 100% clear on the modulo 64 part, but I certainly endorse having a prefix to allow stack operations via a register other than S. But do circumstances force you to use Y? Would it be possible to create a new register instead? That would allow you to avoid the (to me) unsettling changes in the usage for Y.

Quote:
OAX and OAY, allow A to be overridden by X or Y. These two prefix instructions allow the full capabilities of the ALU to be applied to the X and Y registers. An override of this type applied to the S was considered, but the lack of opcode space precluded its incorporation into the plan.
As with internal IP & W, obviously I am in favor. But, wow, it sure would complete the picture if the capabilities of the ALU could be applied to all three -- to S as well as to X & Y! 8)

It seems to me there's always a way to free up an opcode. The trick is to dream up as many possibilities as you can, then select the option that's best (aka least unpalatable :) ). I haven't had time to give this much thought, but here's the beginning, at least, of a list. To free up opcodes,....

  • - What if SED, CLD and maybe a few others ceased to use the "prime space" opcodes they do, and instead were demoted so as to require a prefix? Or,
  • - are there instructions you added very early in the project whose function is now largely supplanted by the cool new prefixes?
  • - have you wrung all possible functionality from the prefixes?

As a reminder I'll elaborate on that last point. If there are lots of combos (prefix plus target instruction) which are nonsensical or unimplemented, that's a sign of wasted potential. One remedy for that is to allow a prefix to have one meaning in regard to one group of target instructions, but a different meaning when applied to a different group of target instructions. That's the way each of the 8 bits of a legacy opcode work -- each bit is a chameleon, with a meaning that varies dramatically. If you begin with a conception about what every prefix "does" then there's a risk of overlooking useful possibilities, so I'd suggest trying to keep the approach general. The prefix need be no more specific than a 9th bit of opcode.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Fri Nov 21, 2014 6:41 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 21, 2014 6:39 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
(CLV is underused too! It's worth noting that these instructions could perhaps be used as prefixes without losing their usual function. See my other post.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 22, 2014 12:00 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Ed/Jeff:

Thanks for the thought provoking feedback. I want to maintain full compatibility with Klaus Dormann's test programs.

I can't see how redefining the behavior of standard 6502/65C02 instructions will allow that compatibility to be maintained. As has been demonstrated many times by many implementers, Klaus' functional test programs are an invaluable test for both FPGA cores and emulators. His test programs have help me uncover several obscure errors that my functional test program failed to identify. It will be difficult enough to test and validate the additional instructions of the M65C02A that I see regression testing with Klaus' programs as necessary.
Dr Jefyll wrote:
MichaelM wrote:
(2) The Y register will be augmented with a modulo 64 stack pointer capability and used to implement the return stack. To mechanize these feature, a prefix instruction OSY will be used to override the normal stack pointer S with Y.
I'm not 100% clear on the modulo 64 part, but I certainly endorse having a prefix to allow stack operations via a register other than S. But do circumstances force you to use Y? Would it be possible to create a new register instead? That would allow you to avoid the (to me) unsettling changes in the usage for Y.
I recently read somewhere, I'm thinking on the Heart of Forth website, that the FORTH standards recommend/require a minimum of 32 cells (64 bytes) on the return stack and 64 cells (128 bytes) on the parameter stack. Given that the RS would be allocated in zero page, I reasoned that making the stack mechanism perform in a modular manner similar to the system/parameter stack would be beneficial. Such behavior will tend to protect the zero page memory not allocated to the return stack. Thus, I decided that the least significant 6 bits of Y would increment in a modulo 64 fashion. The upper two bits would be available to define which of the 4 64 byte pages in zero page are allocated to the return stack.
Dr Jefyll wrote:
- are there instructions you added very early in the project whose function is now largely supplanted by the cool new prefixes?
With regards to this, I've moved to redefine two of those instructions, and they are to be used for adjusting the system stack pointer (S). There are of course the bit-oriented Rockwell instructions. Redefining them remains an option, but not if the M65C02A is to be instruction set compatible with the existing W65C02S microprocessor. I've included all four of the Rockwell instructions (32 opcodes) as well as WAI and STP. Since I made the Rockwell instructions support the IND prefix, there are actually eight instructions now supporting two addressing modes: zp direct and zero page indirect. I think that the new addressing mode will make these instructions much more useful. Also, keep in mind that I'm trying to keep the M65C02A as standard as possible in order to make use of available tools such as assemblers, compilers, and interpreters. Access to the features enabled by the prefix instructions will likely lag until someone (probably not me) adds the new instructions to a tool.
Dr Jefyll wrote:
- have you wrung all possible functionality from the prefixes?
The quick answer I'm sure is no. I've considered doing what you point out below, but I will defer that until after the new definition is implemented and released.
Dr Jefyll wrote:
One remedy for that is to allow a prefix to have one meaning in regard to one group of target instructions, but a different meaning when applied to a different group of target instructions. That's the way each of the 8 bits of a legacy opcode work -- each bit is a chameleon, with a meaning that varies dramatically.




--Edit: changed "code block into quote block

_________________
Michael A.


Last edited by MichaelM on Sat Nov 22, 2014 1:33 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 41 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: