6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Apr 29, 2024 11:18 am

All times are UTC




Post new topic Reply to topic  [ 137 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 10  Next
Author Message
 Post subject: Re: M65C02A Core
PostPosted: Sat Aug 02, 2014 2:14 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
Quote:
I would be interested in any feedback on my plans for this core.

Hi Michael -- congratulations on your work. I'm envious, as I wish I had time to do a project like this myself. 8) It's certainly a major improvement to implement the bit-manipulation instructions, and that goes for the the new stack address modes and many other features as well.

Offhand my only suggestion is that perhaps you haven't taken full advantage of the Prefix idea. IOW I think you should consider defining several prefixes rather than just one. Here are some possible effects different prefixes could have on the following instruction:

  • Let the role of A be assumed by X instead. This would allow powerful maneuvers for adjusting -- even scaling! -- X. (See below.)
  • Let the role of A be assumed by Y instead. Ditto to above.
  • Let the role of X be assumed by S instead (and let Zero-page be the stack page instead). This is a different way to achieve the new stack addressing modes. It requires no new opcodes; instead you'd just use legacy (Z-pg,X) or Z-pg,X modes but with a prefix.
  • Let the role of X be assumed by Y, and let the role of Y be assumed by X. (One prefix is sufficient for both.) This would give, for example, (ind,Y) and (ind),X modes -- not to mention instructions such as JMP (ind,Y).

I realize that adding a handful of prefixes is wasteful in the sense that you'd open the door for a thousand or more Prefix-Opcode combinations, many of which would remain unimplemented. There's also a one-cycle penalty for fetching the prefix. But my hunch is that this approach might be a lot easier to implement, since there are no new instructions... merely new register assignments. But maybe I'm making it sound too easy -- certainly I'm no HDL expert.

By way of a post-script, maybe there are a few new instructions worth introducing -- for example, add without carry. ADD would be equivalent to the sequence CLC ADC. Of course that would be handy as the first part of a multi-precision addition. What's less obvious is its use with a prefix (as noted above) to adjust X or Y. Example: instead of INX INX INX INX you'd have PFXn ADD #4.

Quote:
To support the MMU and other 16-bit IO page operations, I need two opcodes for moving 16-bit values in an indivisible manner from zero page to the IO page and vice-versa.
I like the idea of 16-bit moves a lot! Is there a way they could be made more general -- I mean so they'd be available for other uses besides the MMU?

One way to achieve 16-bit moves would be with a "double-length accumulator" prefix. Even assuming you limit the prefix to LDA and STA operations, it'd still be very worthwhile. To be clear, PFXn LDA followed by PFXn STA would move 16 bits -- and any of the LDA/STA address modes could be used. So it could handle your MMU issue and much more.

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Sun Aug 03, 2014 5:06 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Jeff:

Thanks for the feedback.

You are probably right to say that I've probably not explored the potential of using prefix instruction as fully as I should. However, the potential explosion in the number of instructions that have to be tested is somewhat scary. Furthermore, since I have changed many of the the microprogram control fields from encoded to one-hot, it's not obvious at the moment how I might implement some of the register aliasing that you suggest. The one-hot control fields have reduced the combinatorial path lengths, and adding logic to induce register aliasing is likely to result in longer combinatorial control paths defeating some of the performance gains I managed to include in the latest release of the core. Thus, after I've completed the basic set of new instructions given above, I will take a longer look at how to implement some of your register aliasing suggestions.

On the other hand, I would like hear from you and others on some suggested instructions that can be used to support a DTC/ITC FORTH VM like those that you implemented for your KIM clone.

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Sun Aug 03, 2014 9:15 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 387
Location: Minnesota
Quote:
RMBx/SMBx zp
BBRx/BBSx zp,rel
PEA abs
PEI zp
PER rel16
REP/SEP #imm
ORA/AND/EOR/ADC/LDA/STA/CMP/CMP sp,S
ORA/AND/EOR/ADC/LDA/STA/CMP/CMP (sp,S),Y.



I'd still call it "BBCx" rather than "BBRx".

PEA and PEI can be one mnemonic with two address modes. As with "BBRx", I imagine the only reason to keep the names is to provide backward compatiblity with existing source.

The main use of REP/SEP has always been to switch register sizes between eight and 16 bits on the '816. I don't recall if you've implemented that on your core. If dual-size registers have not, what use are these instructions? And would any of those uses be so frequent as to be preferable to PHP, adjust (using the following "S" relative instructions, perhaps), and PLP?

Quote:
JMP (zp)
JSR (zp)
JSR (abs,X)



I would limit these to "JSR (abs)" and "JSR (abs,x)". It's true that if "abs" turns out to be a zero page location the assembled result will turn out to be one byte longer and one (?) cycle slower than a "(zp)" form, but that's always been true of "JMP (abs)". I know I've never been bothered by that; if I used a zero page location for "JMP (abs)" it's because it was faster to set up the vector than in a non-zero page location. But I never did it so often I longed for a specific "(zp)" form.

Quote:
MWT zp,(Y)
MWF zp,(Y)


I'm having trouble visualizing exactly what is meant by "(Y)". Even if it does mean something special, aren't the mnemonics themselves enough of a clue? Particularly since, AFAICT, no other instruction would use a "(Y)" mode.

Quote:
JMP rel16
JSR rel16


I love relative addressing modes for these instructions, but there is a problem with using these mnemonics. There seems no way to tell, just by examining the operand, whether it is meant to be absolute or relative or (if you go ahead and implement it) zero page. An assembler could make a "best guess" based perhaps on whether or not the destination address is within a signed 16-bit range, but that might not always match what the programmer had in mind.

Or that might actually be the defined behavior an assembler should by default follow, in which case it would be what the programmer should always expect. If there's a time penalty for using a relative form rather than an absolute form that might not be the best, however.

You might also consider using "BRA" and "BSR" for these instead, matching the "B--" of other relative branch instructions, with the provisio that these are 16-bit ranges, not eight-bit.


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Sun Aug 03, 2014 9:26 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
MichaelM wrote:
On the other hand, I would like hear from you and others on some suggested instructions that can be used to support a DTC/ITC FORTH VM like those that you implemented for your KIM clone.
Hmmm... IMO the best approach is to target operations involving the IP. (For the uninitiated, that's "Interpretive Pointer.") The IP is the high-level (ie; Forth) program counter. The main operation involving IP is NEXT -- which is very frequently executed, and thus an excellent candidate for optimization.

NEXT is basically an indirect jump via IP -- nothing terribly fancy. A Forth program is just a list of pointers, and IP indicates your current position in the list. The indirect jump in NEXT vectors execution to the 65xx code snippet that simulates the desired high-level instruction. IP needs to be incremented after the fetch, just like any program counter, so the complete definition of NEXT for 65xx is JMP (IP++). (What I've described is DTC, or direct-threaded code, used by many modern Forth implementations. Older implementations such as FIG Forth use indirect-threaded code, aka ITC, for which NEXT is defined as JMP ((IP++)).)

Unoptimized 65xx Forth maps IP as a pair of bytes in zero page, and uses legacy 65xx instructions for all accesses to IP. ITC NEXT consumes almost 40 cycles. You could entirely bypass z-pg by adding IP to the M6502A register set -- and then add NEXT to the instruction set and reap a huge speedup. Unfortunately, you'd also need new instructions for a dozen or so incidental operations involving IP, and that gets complicated.

My KimKlone uses an ambidextrous approach! :) IOW, I created an IP register and added JMP ((IP++)) to the 65c02 instruction set -- with an execution time of just 9 cycles! :mrgreen: But I also arranged that IP is accessible as a pair of bytes in zero page, and that allows incidental lower-priority operations to use ordinary 65xx instructions to manipulate IP.

cheers,
Jeff

ps- I agree with teamtempest's observation that new instructions present a challenge in regard to mnemonics. For the KK I threw up my hands and resigned myself to a mish-mash of made-up mnemonics that are ugly & lengthy -- but descriptive.

Edit: subsequent to this post I gave the KK registers new (and hopefully less confusing) names. That's right -- register renaming! -- but not in the usual sense. :) The doc on my web site was suitably updated, and you can find the list of new instructions here.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sun May 24, 2015 8:37 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Mon Aug 04, 2014 4:34 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA
Dr Jefyll wrote:
For the KK I threw up my hands and resigned myself to a mish-mash of made-up mnemonics that are ugly & lengthy -- but descriptive.

LDNCPZPG,X -- Geesh! :lol:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Mon Aug 04, 2014 5:15 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
Yep! I warned you -- ugly & lengthy! But, despite appearances, not cryptic. Every KimKlone programmer in the world (ahem! :oops: ) is sure to know about NCP, the New Code Pointer register. LDNCPZPG,X is just LoaD NCP using ZPG,X address mode.

(It could've been simply LDNCP, but my quick 'n' dirty assembler needs to have the address mode spelled out explicitly. :roll: )

In the case of Michael's project, hopefully there'll be a better assembler that can infer address modes in the usual way. Even so, with all the new instructions, I suspect three-letter mnemonics won't be adequate to describe the action in a way humans will readily identify. I suspect mnemonics of four letters or more will be required.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Mon Aug 04, 2014 5:31 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1927
Location: Sacramento, CA, USA
I agree whole-heartedly with teamtempest's bra/bsr recommendation. Or, would it be brl and bsl?

I agree with Dr. J that a thoughtful treatment of the NEXT mechanism is where the best gains can be made, even if it turns out to be little more than an efficient 16-bit memory increment by one or two. If you have a 16-bit pei (for DOCOLON and >R), then that's even better.

I don't agree with his 4+ char mnemonic idea, though. It just wouldn't feel 65xx enough for me.

Mike

P.S. How hard would auto-increment ( a la 6809 ) be to incorporate?


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Tue Aug 05, 2014 12:57 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Thanks for your responses and suggestions.

I too realized that it would be difficult to distinguish between a JMP/JSR abs and a JMP/JSR rel16. So I wholeheartedly agree that BRA/BSR are much better mnemonics for these two opcodes. I think that it should be easy to resolve whether the branch target is within an 8-bit or a 16-bit range and select the appropriate opcode. Therefore, I don't think it is necessary to use a mnemonic like BRL instead of BRA. (BTW, it may be possible to modify the conditional branch instructions with the prefix instruction to implement a 16-bit relative branch.)

barrym95838 wrote:
P.S. How hard would auto-increment ( a la 6809 ) be to incorporate?

I don't think that it will be too difficult to implement a NEXT a la PDP-11/MC6809 with an auto-increment of the virtual IP.

I was thinking that I would implement NEXT as a single byte instruction. However, as Jeff pointed out above, NEXT is a jump indirect via IP (Intrepretive/Instruction Pointer) with auto-increment. I have been thinking about Jeff's suggestion, and will implement the instruction he suggested with the IP in zero page: JMP (zp++). (Thanks very much Jeff. :) ) That instruction also suggests using zero page for implementing the other FORTH VM registers: W (Working Register), and PSP (Parameter Stack Pointer). The M65C02A page 1 stack can be used for the RS (Return Stack), and the PEI/PEA/PER instructions and stack relative addressing modes can be used for manipulating the return stack.

Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

teamtempest wrote:
I'm having trouble visualizing exactly what is meant by "(Y)". Even if it does mean something special, aren't the mnemonics themselves enough of a clue? Particularly since, AFAICT, no other instruction would use a "(Y)" mode.
These (Y) notation was intended to indicate that these two instructions are two address instructions in contrast to all other instructions. The first address is provided by the zp operand and the second address is the contents of register Y. The contents of register Y will index the IO page, which in the M65C02A is the 256 byte page 0xFF00:FFFF. Perhaps a notation closer to that used for the stack relative instructions might be clearer, but the generally accepted single address/single operand syntax of the 6502 makes it difficult to convey the two address nature of these two instructions.

You, BDD, and others have suggested changing the mnemonics for PEI and PEA on another thread. I don't disagree with the points that you have made. I only want the results. I think it has been suggested that these instructions be defined as:

    PHW #imm16
    PHW dp

Perhaps it would be advantageous to add a third instruction: PHW abs?

I am not sure that PER would serve much purpose if BRA/BSR rel16 were available unless it was also possible to perform these two operations based on the top two locations of the stack. Therefore, what would you say if REP/SEP #imm were not implement as you suggested and instead BRA/BSR (sp,S) were implemented?

I like the idea of implementing JMP/JSR (zp), but I see your point regarding the extra cycle: it's really not that critical in the overall scheme. Thus, I will not implement those two instructions, and reserve the opcodes for other instructions.

I can see implementing some instructions which are the complements of the PHW instructions:

    PLW zp
    PLW abs

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Tue Aug 05, 2014 1:20 am 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
MichaelM wrote:
Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

The 6502's hardware stack works well for the Forth return stack, while ZP works well for the parameter stack.

As I originally learned it nearly 25 years ago, "W" in Forth is the word pointer, and "IP" was the instruction pointer (since compiled code does not get interpreted).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Tue Aug 05, 2014 2:40 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA
MichaelM wrote:
You, BDD, and others have suggested changing the mnemonics for PEI and PEA on another thread. I don't disagree with the points that you have made. I only want the results. I think it has been suggested that these instructions be defined as:

    PHW #imm16
    PHW dp

Perhaps it would be advantageous to add a third instruction: PHW abs?

Too bad the 65C816 doesn't have that instruction. It has to be synthesized with REP #%00100000 -- LDA ABS_ADDR -- PHA.

Quote:
I am not sure that PER would serve much purpose if BRA/BSR rel16 were available unless it was also possible to perform these two operations based on the top two locations of the stack.

PER is very useful, because the value that ends up on the stack is computed at run-time, making fully relocatable code possible. For example, a reference to a data table would become portable if PER were used to push the data table's address, rather than have the address set at assembly time. If all such references are generated by PER, then the program can be loaded anywhere and it will run without alteration.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Tue Aug 05, 2014 3:20 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1927
Location: Sacramento, CA, USA
MichaelM wrote:
... I was thinking that I would implement NEXT as a single byte instruction. However, as Jeff pointed out above, NEXT is a jump indirect via IP (Intrepretive/Instruction Pointer) with auto-increment. I have been thinking about Jeff's suggestion, and will implement the instruction he suggested with the IP in zero page: JMP (zp++). (Thanks very much Jeff. :) ) That instruction also suggests using zero page for implementing the other FORTH VM registers: W (Working Register), and PSP (Parameter Stack Pointer). The M65C02A page 1 stack can be used for the RS (Return Stack), and the PEI/PEA/PER instructions and stack relative addressing modes can be used for manipulating the return stack.

The (zp++) idea sounds cool (the PDP-8 was doing something similar 45 years ago), but it might take some careful thought to implement. Do you want to use an auto-increment address mode, like the 6809, or do you want to make certain zp address ranges increment (or double increment :!: ) automatically when used by the indirect modes? The PDP-8 did the latter, but this method could be a bit confusing for beginners. Also, does the pointer need to be avoided by the next machine instruction, to give it time to finish incrementing without holding up the instruction stream?

Quote:
Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

Garth will be the resident expert on that, and he will certainly tell you to put the parameter stack in page 0 (with X for a pointer), and the return stack in page 1 (with S for a pointer). dclxvi might have something to say about this too, and he has had several intriguing ideas in the past that fiddled with this protocol, but IMO you can't go wrong with the tried-and-true standard.

I'm excited that you have been able to make such significant progress on your project, and will be (a bit enviously) following it with interest.

Mike


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Wed Aug 06, 2014 5:24 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
MichaelM wrote:
I have been thinking about Jeff's suggestion, and will implement the instruction he suggested with the IP in zero page: JMP (zp++).
Glad you like the idea, Michael. Would your JMP (zp++) instruction include an operand following the opcode (to specify which zero-page address to use)? The operand can be omitted (and one cycle saved) if it's implicit that JMP (zp++) will always use a certain pair of z-pg addresses. That's the route I took for my KK NEXT -- it is a one-byte instruction, and it's implicit that $48 and $49 is where IP resides.

Of course it's fair to ask whether it'd be better to have something more general. (Mike asks almost the same question in his last post: is it an address mode, or is it special behavior of a designated area of memory?) Without wishing to sway you one way or the other, here's an observation to consider. If the Interpretive Pointer is physically on-chip, there's no need to consume 2 extra bus cycles fetching those 16 bits of IP -- a obvious point in favor of IP being on-chip. But if the JMP (zp++) instruction uses an operand to specify where IP resides, then saving those 2 extra bus cycles means you must map all of zero-page on-chip. At stake is a delay of (maybe) up to 3 cycles altogether.

MichaelM wrote:
Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?
Certainly the historical 65xx usage is X for the Parameter-stack pointer and S for the Return-stack pointer. It never occurred to me there'd ever be a reason to change that; but, as sometimes happens, a naive comment from a novice triggered some unexpected insight.

X is actually sub-optimal as a P-stack pointer in that the sequence DEX then STA 0,X takes twice as long as PHA, for example. We do a lot of pushing (and pulling) in Forth, so the matter isn't trivial. But X's poor push/pull performance is outweighed by the immense utility of z-pg,X and (z-pg,X) address modes. So, for 6502 and 65c02, the justification for X as a P-stack pointer is clear.

The insight is this. Now that sp,S and (sp,S),Y address modes are available (on the '816 and Michael's M65C02A), X no longer has vastly greater utility than S -- in fact, (sp,S),Y surpasses X in a manner that Forth can use to good advantage. The longstanding 6502 tradeoff (tolerating slow P-stack push/pulls via X) is now clearly open to review. I don't advocate that anyone should rewrite an existing Forth. But IMO any new '816 Forth should break with tradition and use S for the P-stack pointer and X for the R-stack pointer! :idea: :evil: :D

Is there a gotcha I've overlooked? Am I reinventing someone else's idea? (And am I de-railing Michael's thread? :oops: )

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Wed Aug 06, 2014 8:40 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1927
Location: Sacramento, CA, USA
The best way to find out is to throw some code out there and see what happens, doc!

The '816 will have a clear advantage over Michael's 'C02A for a traditional 16-bit cell Forth, because the '816 has 16-bit registers. For example:

Code:
; 65c816 in 16-bit accumulator mode
fetch: ; psp in x
        PRIMITIVE
        lda  (0,x)
        sta  0,x
        NEXT

fetch: ; psp in s
        PRIMITIVE
        ldy  #0
        lda  (1,s),y
        sta  1,s
        NEXT

For a simple but popular word like @ (fetch), the psp in x looks like a good choice for the '816.

How about Michael's core?
Code:
; m65c02a has an 8-bit accumulator
fetch: ; psp in x
        PRIMITIVE
        lda  (0,x)
        pha
        inc  0,x
        bne  *+2
        inc  1,x
        lda  (0,x)
        sta  1,x
        pla
        sta  0,x
        NEXT

fetch: ; psp in s
        PRIMITIVE
        ldy  #0
        lda  (1,s),y
        pha
        iny
        lda  (2,s),y
        sta  3,s
        pla
        sta  1,s
        NEXT

I am not familiar with Michael's core yet, and I may have missed an optimization, but it looks like a slight win for psp in s.

I realize that this is a very isolated example, but I think that it suggests that any difference in efficiency from the choice of psp register is dwarfed by the difference resulting from registers that can hold an entire cell and ones that can't.

Mike


Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Wed Aug 06, 2014 9:35 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
Hmmm.. not a bad start, Mike. It takes effort to choose some examples and work through 'em, so thank you for that. I agree that's how the truth will be revealed.

Quote:
I realize that this is a very isolated example [...]
Yes -- and unfortunately @ neither grows nor shrinks the stack, making it an exception to the biggest advantage I was touting for psp in S. Where I expect to save a few cycles each is with words that do grow/shrink the stack, such as DOCON DOVAR LITERAL DUP OVER + - AND OR R@ 0BRANCH etc. But I was glad to see you used (sp,S),y mode... illustrating how that mode's non-optional post-indirection indexing via Y is sometimes a powerful plus and other times just a nuisance! :roll:

Quote:
any difference in efficiency from the choice of psp register is dwarfed by the difference resulting from registers that can hold an entire cell and ones that can't.
Right, although there's no debate over whether cell-wide registers are better -- they definitely are a major advantage! I don't claim the gains of psp in S would be similarly profound. But they're readily sufficient to overturn the traditional compromise of placing psp in X.

Code:
; 65c816 in 16-bit accumulator mode
plus:  ; psp in X
        clc
        lda  0,x        ;3 instructions to pop
        inx             ;a pair of bytes
        inx             ;from P-stack
        adc  0,x
        sta  0,x

plus:  ; psp in S
        clc
        pla             ;1 instruction to pop a pair of bytes from P-stack
        adc  1,s
        sta  1,s
The example here ( + ) shrinks the stack, and words that grow the stack will show similar improvement. Other words show little gain or loss (like @ as we've seen). AFAIK the only downside to having the P-stack pointer in S is that X's slow push/pop performance ends up hindering the R-stack instead. But the R-stack carries less activity than the P-stack, particularly on inner loops where performance gains are more meaningful. So, IMO psp in S is the best choice for anyone writing a new Forth specific to the '816 (or M65C02A).

cheers,
Jeff

[edit: add code and final paragraph. Then misc edits]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Thu Aug 07, 2014 12:29 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: M65C02A Core
PostPosted: Wed Aug 06, 2014 1:12 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Thanks very much for the continuation of the discussion. It is clear that if the core could be coerced to support 16-bit operations using a dedicated instruction or by using a prefix code, then there would be substantial savings in the number of opcode/operand fetch cycles. At this moment I am not convinced that the microprogram architecture of the M65C02A core is capable of performing the necessary operations without dropping back occasionally to standard 6502/65C02 instructions.

However, I have substantially modified the microprogram architecture of the core such that it can support the programmatic use of the ALU control ROM. What this means is that a microprogram sequence can use the ALU, the programmer visible registers (A, X, Y, S, and P), and several internal temporary registers (OP1, OP2) on a microcycle basis. Although there are a limited number of internal data paths, the address generator, which consists of an independent 16-bit adder (with microprogrammed controlled carry input), the 16-bit PC, the 8-bit stack pointer register S (with its built-in increment/decrement unit), and a 16-bit temporary memory address register (MAR), can sequence through memory without requiring the use of the ALU. This allows the microprogram to use the ALU in an independent manner.

A quick summary from your responses so far is that it is definitely advantageous to implement NEXT as a microprogram sequence. At this moment I prefer to implement IP as an external location in zero page. Jeff's suggestion, from his KimKlone work, that the location of IP be implicit has merit and is something that the microprogram architecture can support. (This capability is in the current implementation and was recently enhanced to support 16-bit relative addressing.) It also appears that there are several other key FORTH words that would be worth implementing as microsequences: ENTER, EXIT, @ (load (TOS)), and ! (store (TOS),NOS). It also appears that it would be nice to support the common operations like addition (+), subtraction (-), AND, ORA, EOR, etc. from the parameter stack for cell-sized (16-bit) operations.

Is this a good summary of your recommendations to date?

I also gather from the discussions so far is that the issue of where to allocate the parameter stack and which processor register to use is not yet resolved. It does appear that it would be advantageous to be able to support stack operations with either X or S. From your comments, and those in Brad Rodriguez's Moving Forth articles, the two stack pointers of the 6809 processor provided that processor a definite advantage as a FORTH engine over competitors such as the 8051, Z80, or the 6502. Although not supported by the M65C02A microarchitecture at the moment, it may be possible to add this feature without impacting performance too much.

I have to run off to work, but I do have a couple of questions regarding the implementations of @ (fetch) provided above by Mike. I will formulate those later questions today after work.

Again thanks to all for continuing the discussions.

_________________
Michael A.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 137 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 10  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: