M65C02A Core

barrym95838 · Post by **barrym95838** » Wed Aug 06, 2014 8:40 am

The best way to find out is to throw some code out there and see what happens, doc!

The '816 will have a clear advantage over Michael's 'C02A for a traditional 16-bit cell Forth, because the '816 has 16-bit registers. For example:

Code: Select all

; 65c816 in 16-bit accumulator mode
fetch: ; psp in x
        PRIMITIVE
        lda  (0,x)
        sta  0,x
        NEXT

fetch: ; psp in s
        PRIMITIVE
        ldy  #0
        lda  (1,s),y
        sta  1,s
        NEXT

For a simple but popular word like @ (fetch), the psp in x looks like a good choice for the '816.

How about Michael's core?

Code: Select all

; m65c02a has an 8-bit accumulator
fetch: ; psp in x
        PRIMITIVE
        lda  (0,x)
        pha
        inc  0,x
        bne  *+2
        inc  1,x
        lda  (0,x)
        sta  1,x
        pla
        sta  0,x
        NEXT

fetch: ; psp in s
        PRIMITIVE
        ldy  #0
        lda  (1,s),y
        pha
        iny
        lda  (2,s),y
        sta  3,s
        pla
        sta  1,s
        NEXT

I am not familiar with Michael's core yet, and I may have missed an optimization, but it looks like a slight win for psp in s.

I realize that this is a very isolated example, but I think that it suggests that any difference in efficiency from the choice of psp register is dwarfed by the difference resulting from registers that can hold an entire cell and ones that can't.

Mike

Dr Jefyll · Post by **Dr Jefyll** » Wed Aug 06, 2014 9:35 am

Hmmm.. not a bad start, Mike. It takes effort to choose some examples and work through 'em, so thank you for that. I agree that's how the truth will be revealed.

Quote:

I realize that this is a very isolated example [...]

Yes -- and unfortunately @ neither grows nor shrinks the stack, making it an exception to the biggest advantage I was touting for psp in S. Where I expect to save a few cycles each is with words that do grow/shrink the stack, such as DOCON DOVAR LITERAL DUP OVER + - AND OR R@ 0BRANCH etc. But I was glad to see you used (sp,S),y mode... illustrating how that mode's non-optional post-indirection indexing via Y is sometimes a powerful plus and other times just a nuisance!

Quote:

any difference in efficiency from the choice of psp register is dwarfed by the difference resulting from registers that can hold an entire cell and ones that can't.

Right, although there's no debate over whether cell-wide registers are better -- they definitely are a major advantage! I don't claim the gains of psp in S would be similarly profound. But they're readily sufficient to overturn the traditional compromise of placing psp in X.

Code: Select all

; 65c816 in 16-bit accumulator mode
plus:  ; psp in X
        clc
        lda  0,x        ;3 instructions to pop
        inx             ;a pair of bytes
        inx             ;from P-stack
        adc  0,x
        sta  0,x

plus:  ; psp in S
        clc
        pla             ;1 instruction to pop a pair of bytes from P-stack
        adc  1,s
        sta  1,s

The example here ( + ) shrinks the stack, and words that grow the stack will show similar improvement. Other words show little gain or loss (like @ as we've seen). AFAIK the only downside to having the P-stack pointer in S is that X's slow push/pop performance ends up hindering the R-stack instead. But the R-stack carries less activity than the P-stack, particularly on inner loops where performance gains are more meaningful. So, IMO psp in S is the best choice for anyone writing a new Forth specific to the '816 (or M65C02A).

cheers,
Jeff

[edit: add code and final paragraph. Then misc edits]

MichaelM · Post by **MichaelM** » Wed Aug 06, 2014 1:12 pm

Thanks very much for the continuation of the discussion. It is clear that if the core could be coerced to support 16-bit operations using a dedicated instruction or by using a prefix code, then there would be substantial savings in the number of opcode/operand fetch cycles. At this moment I am not convinced that the microprogram architecture of the M65C02A core is capable of performing the necessary operations without dropping back occasionally to standard 6502/65C02 instructions.

However, I have substantially modified the microprogram architecture of the core such that it can support the programmatic use of the ALU control ROM. What this means is that a microprogram sequence can use the ALU, the programmer visible registers (A, X, Y, S, and P), and several internal temporary registers (OP1, OP2) on a microcycle basis. Although there are a limited number of internal data paths, the address generator, which consists of an independent 16-bit adder (with microprogrammed controlled carry input), the 16-bit PC, the 8-bit stack pointer register S (with its built-in increment/decrement unit), and a 16-bit temporary memory address register (MAR), can sequence through memory without requiring the use of the ALU. This allows the microprogram to use the ALU in an independent manner.

A quick summary from your responses so far is that it is definitely advantageous to implement NEXT as a microprogram sequence. At this moment I prefer to implement IP as an external location in zero page. Jeff's suggestion, from his KimKlone work, that the location of IP be implicit has merit and is something that the microprogram architecture can support. (This capability is in the current implementation and was recently enhanced to support 16-bit relative addressing.) It also appears that there are several other key FORTH words that would be worth implementing as microsequences: ENTER, EXIT, @ (load (TOS)), and ! (store (TOS),NOS). It also appears that it would be nice to support the common operations like addition (+), subtraction (-), AND, ORA, EOR, etc. from the parameter stack for cell-sized (16-bit) operations.

Is this a good summary of your recommendations to date?

I also gather from the discussions so far is that the issue of where to allocate the parameter stack and which processor register to use is not yet resolved. It does appear that it would be advantageous to be able to support stack operations with either X or S. From your comments, and those in Brad Rodriguez's Moving Forth articles, the two stack pointers of the 6809 processor provided that processor a definite advantage as a FORTH engine over competitors such as the 8051, Z80, or the 6502. Although not supported by the M65C02A microarchitecture at the moment, it may be possible to add this feature without impacting performance too much.

I have to run off to work, but I do have a couple of questions regarding the implementations of @ (fetch) provided above by Mike. I will formulate those later questions today after work.

Again thanks to all for continuing the discussions.

teamtempest · Post by **teamtempest** » Wed Aug 06, 2014 3:05 pm

Quote:

teamtempest wrote:
I'm having trouble visualizing exactly what is meant by "(Y)". Even if it does mean something special, aren't the mnemonics themselves enough of a clue? Particularly since, AFAICT, no other instruction would use a "(Y)" mode.

These (Y) notation was intended to indicate that these two instructions are two address instructions in contrast to all other instructions. The first address is provided by the zp operand and the second address is the contents of register Y. The contents of register Y will index the IO page, which in the M65C02A is the 256 byte page 0xFF00:FFFF. Perhaps a notation closer to that used for the stack relative instructions might be clearer, but the generally accepted single address/single operand syntax of the 6502 makes it difficult to convey the two address nature of these two instructions.

I think I've got what's supposed to happen figured out. It's really more like this:

Code: Select all

  MWT zp,io,y
  MWF zp,io,y

where "io" is just a placeholder for a fixed address.

Two operand instructions are not unknown. It's exactly what all the "BBSx" and "BBRx" instructions are. It's just expected that the programmer will know that the first is a zero page address and the second is relative.

I'm a little curious how you expect to use indexing, though. Presumably in a loop, but how will that work with a fixed zero page address? If anything, in a vague I-haven't-really-thought-about-it way I would have expected the indexing to occur on the zero page address, not the io address.

barrym95838 · Post by **barrym95838** » Wed Aug 06, 2014 10:09 pm

MichaelM wrote:

... I do have a couple of questions regarding the implementations of @ (fetch) provided above by Mike. I will formulate those later questions today after work ...

I hope that I didn't mess up; it was after my bed-time, and I sometimes come up with some real whoppers when I'm tired.

Here's an example that may lead you to think that I'm completely batty ... it involves extending the width of the accumulator with a carry register instead of a carry flag. In the case of my 65m32, it allows 64-bit arithmetic and bit-field manipulation with a minimum of op-code space but excellent machine-code density, and it could theoretically give the m65c02a some neat 16-bit arithmetic capabilities as well.

I have been struggling with the details for some time (months now), but I can't get past the feeling that I could be onto something epic. It's right at the edge of my abilities to define and express, and I don't want to give up on it just to be done ... I want to design something special, and I can't shake the feeling that this detail could be "the icing on the cake". It just needs to be developed by someone a little bit smarter than me ... unless I get a tiny bit lucky, soon

Anybody got any spare epiphanies for sale?

Mike

MichaelM · Post by **MichaelM** » Wed Aug 06, 2014 11:22 pm

teamtempest wrote:

I'm a little curious how you expect to use indexing, though. Presumably in a loop, but how will that work with a fixed zero page address? If anything, in a vague I-haven't-really-thought-about-it way I would have expected the indexing to occur on the zero page address, not the io address.

The IO page address is 0xFF00. Thus the address within the page is simply value of the Y register. Perhaps the addressing modes could be described in a different manner, but the intent is that Y provides the register offset in the IO page.

Mike:

I thought that there may have been an error with the inc 0,x instruction following the pha in your code example. However, after more study, it is apparent that after you save on the stack the low byte of the cell accessed by lda (0,x), it is necessary to increment the 16-bit pointer in order to get the byte of cell using the inc 0,x instruction and to propagate any carries with the inc 1,x instruction.

GARTHWILSON · Post by **GARTHWILSON** » Thu Aug 07, 2014 3:02 am

Dr Jefyll wrote:

I don't claim the gains of psp in S would be similarly profound. But they're readily sufficient to overturn the traditional compromise of placing psp in X.

Code: Select all

; 65c816 in 16-bit accumulator mode
plus:  ; psp in X
        clc
        lda  0,x        ;3 instructions to pop
        inx             ;a pair of bytes
        inx             ;from P-stack
        adc  0,x
        sta  0,x

plus:  ; psp in S
        clc
        pla             ;1 instruction to pop a pair of bytes from P-stack
        adc  1,s
        sta  1,s

That's 19 versus 17 clocks in 16-bit mode, so after you add NEXT, the difference in speed is insignificant. It would be nice to have a single-cycle double INX and double DEX. That would take the first version above down to 16 clocks, faster (but again insignificantly so) than the second version.

barrym95838 · Post by **barrym95838** » Thu Aug 07, 2014 6:47 am

Dr Jefyll wrote:

... AFAIK the only downside to having the P-stack pointer in S is that X's slow push/pop performance ends up hindering the R-stack instead. But the R-stack carries less activity than the P-stack, particularly on inner loops where performance gains are more meaningful. So, IMO psp in S is the best choice for anyone writing a new Forth specific to the '816 (or M65C02A).

Once again, I say let the code do the talking:

Code: Select all

; m65c02a, x=PSP              ; m65c02a, s=PSP              ; 65m32, x=PSP, y=IP, a=TOS

ENTER:                        ENTER:                        ENTER:
        lda  IP+1                     dex                           pdy  #0,u
        pha                           dex                           jmp  NEXT
        lda  IP                       lda  IP+1
        pha                           sta  1,x
        lda  W+1                      lda  IP
        sta  IP+1                     sta  0,x
        lda  W                        lda  W+1
        sta  IP                       sta  IP+1
        jmp  NEXT                     lda  W
                                      sta  IP
                                      jmp  NEXT

EXIT:                         EXIT:                         EXIT:
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        pla                           lda  0,x                      ply
        sta  IP                       sta  IP                       jmp  NEXT
        pla                           lda  1,x
        sta  IP+1                     sta  IP+1
        jmp  NEXT                     inx
                                      inx
                                      jmp  NEXT

drop: ; ( x --  )             drop; ; ( x --  )             drop: ; ( x --  )
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        inx                           pla                           lda  0,x+
        inx                           pla                           jmp  NEXT
        jmp  NEXT                     jmp  NEXT

nip: ; ( x1 x2 -- x2 )        nip: ; ( x1 x2 -- x2 )        nip: ; ( x1 x2 -- x2 )
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        lda  0,x                      lda  1,s                      inx
        sta  2,x                      sta  3,s                      jmp  NEXT
        lda  1,x                      lda  2,s
        sta  3,x                      sta  4,s
        bra  drop+2                   bra  drop+2

dup: ; ( x -- x x )           dup: ; ( x -- x x )           dup: ; ( x -- x x )
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        dex                           lda 2,s                       sta  0,-x
        dex                           pha                           jmp  NEXT
        lda  2,x                      lda 2,s
        sta  0,x                      pha
        lda  3,x                      jmp  NEXT
        sta  1,x
        jmp  NEXT

swap: ; ( x1 x2 -- x2 x1 )    swap: ; ( x1 x2 -- x2 x1 )    swap: ; ( x1 x2 -- x2 x1 )
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        lda  0,x                      pla                           exa  0,x
        pha                           sta  N                        jmp  NEXT
        lda  1,x                      pla
        pha                           tay
        lda  2,x                      lda  2,s
        sta  0,x                      pha
        lda  3,x                      lda  2,s
        sta  1,x                      pha
        pla                           tya
        sta  3,x                      sta  4,s
        pla                           lda  N
        sta  2,x                      sta  3,s
        jmp  NEXT                     jmp  NEXT

fetch: ; ( addr -- x )        fetch: ; ( addr -- x )        fetch: ; ( addr -- x )
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        lda  (0,x)                    ldy  #0                       lda  0,a
        pha                           lda  (1,s),y                  jmp  NEXT
        inc  0,x                      pha
        bne  fetch2                   iny
        inc  1,x                      lda  (2,s),y
fetch2: lda  (0,x)                    sta  3,s
        sta  1,x                      pla
        pla                           sta  1,s
        sta  0,x                      jmp  NEXT
        jmp  NEXT

plus: ; ( n1 n2 -- n3 )       plus: ; ( n1 n2 -- n3 )       plus: ; ( n1 n2 -- n3 )
        PRIMITIVE                     PRIMITIVE                     PRIMITIVE
        lda  0,x                      pla                           add  0,x+
        clc                           clc                           jmp  NEXT
        adc  2,x                      adc  2,s
        sta  2,x                      sta  2,s
        lda  1,x                      pla
        adc  3,x                      adc  2,s
        sta  3,x                      sta  2,s
        bra  drop+2                   jmp  NEXT

store: ; ( x addr --  )       store: ; ( x addr --  )       store: ; ( x addr --  )
        lda  2,x                      PRIMITIVE                     PRIMITIVE
        sta  (0,x)                    ldy  #0                       ldb  0,x+
        inc  0,x                      lda  3,s                      stb  0,a
        bne  store2                   sta  (1,s),y                  bra  drop+1
        inc  1,x                      iny
store2: lda  3,x                      lda  4,s
        sta  (0,x)                    sta  (1,s),y
        inx                           pla
        inx                           pla
        bra  drop+2                   bra  drop+2

Just as Garth asserted that the choice of PSP register seems insignificant for the '816, it appears to have negligible effect for Michael's m65c02a as well, unless I'm missing some opportunities to optimize. BTW, I included the equivalent words from my 65m32 Forth to show how much shorter (in source form, at least) the primitives can be with cell-width registers, TOS and IP in registers, and auto-increment/decrement.

[Edit: fixed code snippet for 65m32's ENTER]
[Edit: fixed code snippets for m65c02a's ENTER]

Dr Jefyll · Post by **Dr Jefyll** » Thu Aug 07, 2014 9:12 am

Golly, Mike -- up past your bedtime again?

I am, so forgive me if I just skim what you posted. I'm glad to see the 65m32 flame is still burning. As for the m65c02a snippets, eyeballing the code shows no surprises but we don't have clock counts available, so I'd prefer to kick the "X vs S" ball around using '816. The focus is on PHA vs DEX DEX STA 0,X and the equivalent Pull operations. Which Forth stack should use which register??

GARTHWILSON wrote:

That's 19 versus 17 clocks in 16-bit mode, so after you add NEXT, the difference in speed is insignificant. It would be nice to have a single-cycle double INX and double DEX. That would take the first version above down to 16 clocks, faster (but again insignificantly so) than the second version.

I don't arrive at the same numbers or the same interpretation.

In fairness I grant that everything hinges on my premise that a fast P-stack is more important than a fast R-stack. My reasoning -- half-baked or maybe 2/3 or 3/4 baked

-- is as follows: Forth code that's written for maximum performance does minimal nesting & un-nesting (ie; R-stack pushing & pulling) in the inner loops that are the hot spots. Of course "performance" might mean simply lowest latency responding to an input. Nevertheless, if it's written for speed then the code that really matters probably doesn't spend much time on the overhead of nesting & un-nesting; instead it'll be busy doing real work.

As for the numbers, I find the "psp in S" approach saves 4 or 5 clocks, not 2 (19 versus 17). Admittedly Table 5-7 of the '816 datasheet is dodgy to read, so an extra set of eyes would be welcome. I put my rundown in the attached text file. BTW the saving is 4 clocks for the exact code example I posted, but certain similar examples save 5 clocks. It depends whether the example is a word that grows the stack (and uses PHA) or a word that shrinks the stack (and uses PLA, a slightly slower instruction). But 4.5 is the average saving.

Quote:

after you add NEXT, the difference in speed is insignificant

Of course the speedup percentage can be diminished by taking a broader view (such as by including NEXT), but does that improve the discussion? For a broader picture I'd rather ask,

how many words are affected, and are they ones which are commonly executed? And,
what's the speedup per word?

The speedup is about 4.5 clocks for each word, based strictly on swapping PHA vs DEX DEX STA 0,X or swapping PLA vs LDA 0,X INX INX. Notable words affected include DOCON DOVAR LITERAL DUP OVER + - AND OR XOR R@ 0BRANCH. The price is a corresponding 4.5 clock slowdown for R-stack push-pull words because now they're forced to use X. These words notably include ENTER and EXIT (nest & un-nest).

So, the question becomes: In the small but critical code sections which are the only spots where performance actually matters, how many ENTERs and EXITs occur, versus the "do actual work" words DOCON DOVAR LITERAL DUP OVER + - AND OR R@ and 0BRANCH ? Other things being equal, and if I were writing a new Forth specific to the '816, I'd opt for psp in S. The historical advantage for psp in X is decimated; even (depending on the code) reversed.

(I also like psp in S because that'd let me pop the high-byte of a long address straight into DBR! But if I start onto that topic I'll definitely be de-railing Michael's thread! )

cheers,
Jeff

teamtempest · Post by **teamtempest** » Thu Aug 07, 2014 2:05 pm

Quote:

teamtempest wrote:
I'm a little curious how you expect to use indexing, though. Presumably in a loop, but how will that work with a fixed zero page address? If anything, in a vague I-haven't-really-thought-about-it way I would have expected the indexing to occur on the zero page address, not the io address.

The IO page address is 0xFF00. Thus the address within the page is simply value of the Y register. Perhaps the addressing modes could be described in a different manner, but the intent is that Y provides the register offset in the IO page.

Sorry I wasn't clear - I do get that. My question is more along the lines of, what do you expect to be able to do with that capability?

Me, I imagine a block of I/O registers and think, "Self, it's difficult to see why I'd want to loop over them to write the same value to every register in that block (except, possibly, for initialization purposes - once). It's also difficult to see why I'd want to read whatever values are in all those registers and put them all in the same zero page location. So...in between all those reads and writes something has to be done about whatever is in that zero page location, no?"

Whereas if the indexing was the other way around, I'd think "Self, that might be a way to copy a block of memory to an output register or read a block of memory from an input register."

barrym95838 · Post by **barrym95838** » Thu Aug 07, 2014 3:08 pm

Dr Jefyll wrote:

... As for the m65c02a snippets, eyeballing the code shows no surprises but we don't have clock counts available, so I'd prefer to kick the "X vs S" ball around using '816. The focus is on PHA vs DEX DEX STA 0,X and the equivalent Pull operations. Which Forth stack should use which register??

It seems that you may have a defensible point with respect to the '816, but I think that Michael wanted to talk about the 'c02a, and you and I are trying to mix in a grapefruit and a cantaloupe with the discussion of his orange! His core has a bunch of single-cycle instructions, which should balance out the cycle count differences that you illustrated for the '816. And my stuff came completely from left field ... I hope that Michael has a good sense of humor!

Mike

MichaelM · Post by **MichaelM** » Thu Aug 07, 2014 4:22 pm

teamtempest wrote:

Sorry I wasn't clear - I do get that. My question is more along the lines of, what do you expect to be able to do with that capability?

I am working toward implementing a Kernel/User mode using bit 5 of the PSW: 1 - Kernel (default), 0 - User. This will require using the PFX instruction to access the user mode stack point, SU, using the TSX/TXS instructions. Further, the MMU will have a total of 32 mapping registers: 16 Kernel, 16 User. Future work would be to extend the MMU to support access controls, and generate an ABORT. Eventually, I would like to support some form of virtual memory simply as an excercise. But I have also thought about using the virtual memory capability to support some sort of cache memory and the SPI interface to access the external virtual memory. (There have been some discussions on the Arduino forum that discuss how a 6502 emulator was constructed using a serial EPROM interface and a SW managed cache for normal program execution. That discussion triggered some of these thoughts.)

As these variations to the basic M65C02A core objectives have gathered some momentum, it is clear that using large section of zp for these MMU registers is not going to be appropriate. Therefore, I concluded that only a couple of locations would likely be used for the soruce and destination of the privileged MWT/MWF instructions. A PFX instruction can be used to convert the zp location into an indirect address to a location located in regular memory.

At the present time, these ideas are on the back burner until I finish testing the new instructions that I've described above.

MichaelM · Post by **MichaelM** » Thu Aug 07, 2014 4:37 pm

barrym95838 wrote:

but I think that Michael wanted to talk about the 'c02a, and you and I are trying to mix in a grapefruit and a cantaloupe with the discussion of his orange!

Go Tigers. On a more serious note, I find these discussions very instructive. Although I am looking for a recommendation such as implement FORTH @, !, etc. as M65C02A native instructions, the current discussions have provided the pseudo code for whatever operations may become the most critical as native instructions. The result is that when the time comes to implement the instructions in microcode, there is an existing outline for the desired behavior.

GARTHWILSON · Post by **GARTHWILSON** » Thu Aug 07, 2014 5:54 pm

Dr Jefyll wrote:

GARTHWILSON wrote:

That's 19 versus 17 clocks in 16-bit mode, so after you add NEXT, the difference in speed is insignificant. It would be nice to have a single-cycle double INX and double DEX. That would take the first version above down to 16 clocks, faster (but again insignificantly so) than the second version.

I don't arrive at the same numbers or the same interpretation.

Code: Select all

[from the attached file] Code example with annotations.
The column with 19a 16a etc indicates entries in Table 5-7 of the '816 data sheet, Sept 13 2010

; 65c816 in 16-bit accumulator mode
plus:  ; psp in X
        clc        19a   2~      
        lda  0,x   16a   5~     ;3 instructions to pop
        inx        19a   2~     ;a pair of bytes
        inx        19a   2~     ;from P-stack
        adc  0,x   16a   5~       
        sta  0,x   16a   5~       
                         21~ total
plus:  ; psp in S
        clc         19a  2~    
        pla         22b  5~     ;1 instruction to pop a pair of bytes from P-stack
        adc  1,s    23   5~       
        sta  1,s    23   5~       
                         17~ total

I wish I could say I was up too late, but I can't use that excuse. Perhaps I forgot the CLC? Anyway, a single-cycle double INX then would bring 21 down to 18, and the difference between 18 and 17 is insignificant.

Dr Jefyll · Post by **Dr Jefyll** » Thu Aug 07, 2014 11:55 pm

barrym95838 wrote:

[MichaelM's M65C02A] core has a bunch of single-cycle instructions, which should balance out the cycle count differences that you illustrated for the '816

Oops --- I wasn't aware of that. If he can INX in one cycle -- or, better yet, "DINX" in one cycle, as Garth suggested -- then it's true: the push/pull speed of X and S become virtually identical. In that case I don't see any reason to deviate from the traditional mapping of psp in X. And it'd be easy to take an existing Forth and tailor it for the M65C02A. But that's the only argument against psp in S -- which is otherwise just as viable as psp in X.

barrym95838 wrote:

IMO you can't go wrong with the tried-and-true standard.

In this case I agree -- you won't go wrong exactly... But when the landscape changes (as with a new processor), a person is at risk for missed opportunities if the mindset is, "We've always done it that way." (Sorry Mike, I twisted your words a little bit in order to make a point.) I'm usually on the lookout for ways to change Forth. Or maybe I'm simply a troublemaker!

But I confess I was eager to give the "psp in S" idea an airing. It's one I've been incubating for a while -- and, for '816, I believe its time has come.

MichaelM wrote:

Although I am looking for a recommendation such as implement FORTH @, !, etc. [...]

We keep colliding with the question of generality, and that's not surprising I guess. I love the idea of frequently-used Forth words as native instructions, but sober reflection tells me to look at how those words are presently coded, and see what they have in common that can be factored into new instructions that're useful even in a non-Forth context. But!...

Michael, I'm interested in the micro-programmability you mentioned. Here's a question that's probably completely ridiculous, but I have to ask: would the core's resources allow end-user microcode to be loaded at runtime?? I realize it's hardly trivial -- a whole new datapath would have to be created, not to the mention documentation you'd need to provide. But the prize would be enormous!! Specifically, it blows away the constraints I just mentioned about generality, and trying to guess what your users need. Loadable microcode would turn your core into a Swiss army knife!!! (And I know what a thrill it is to write microcode.)

<tries to wipe big grin off his face>
<hits Submit>

M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core