M65C02A Core

BigEd · Post by **BigEd** » Tue Jul 01, 2014 9:32 pm

Thanks - I see what you mean now. One of my very few bookmarks is the opcode table at
http://www.llx.com/~nparker/a2/opcodes.html
and I see your listings are in effect going down the columns.

Cheers
Ed

MichaelM · Post by **MichaelM** » Wed Jul 30, 2014 10:18 pm

Just posted an update to GitHUB for this core. I created a complete microcomputer targeting a Xilinx Spartan 3A XC3S200A-4VQG100I FPGA which I have used to build two development boards.

The microcomputer implementation provided utilizes all of the Block RAMs to provide 28kB of internal RAM allocated in three blocks: (1) 16kB RAM from 0x0000-3FFF; (2) 8kB ROM/RAM from 0xD000-0xEFFF; and (3) 3968 Bytes ROM/RAM from 0xF000-FEFF plus 32 Bytes from 0xFFE0-0xFFFF. (The I/O page is taken out of the topmost 128 bytes of a 4kB ROM/RAM, but the top-most 32 bytes represent an expanded interrupt/trap vector table which is mapped back into Block RAM instead of being implemented in LUTs.) The remaining 36864 bits of Block RAM (2 BRAMs) are used for a 512x72 microprogram memory array.

A rudimentary interrupt/trap vector controller is included. The normal vectors for NMI, RST, and IRQ/BRK are supported, but an additional 13 vectors are also supported. There are an additional 8 maskable interrupt vectors to support the internal peripherals. Five other vectors are also supported for additional traps: ABRT, INV, SYS, COP, and BRK. ABRT is intended to support MMU access controls in a future upgrade to the rudimentary MMU included in the released microcomputer implementation. INV is intended to allow the trapping of invalid opcodes. It is not presently connected in the current release. SYS/COP are intended to support specialized instructions in a future release of the M65C02A microprogram. These traps could be used for emulation of other instructions.

The peripherals provided in the implementation are 1 SPI Master (with support for at least two Slave Selects) and 2 UARTs. The peripherals are buffered by 16 deep Tx/Rx FIFOs. The FIFOs are parameterized so it is easy to increase the depth of any of the FIFOs as needed.

In the targeted FPGA, the large number of internal busses reduces the maximum operating speed to 30 MHz. The same M65C02A-based microcomputer targeted to a Spartan-6 XC6SLX9-3FTG256I FPGA will operate in excess of 40 MHz.

I will now focus on getting this microcomputer to run on my Arduino UNO-compatible Chameleon Board. As provided, the application uses only 53% of the logic resources of the XC3S200A-4VQG100I FPGA. This will now allow me to implement another serial port and a slave SPI port in order to make the Chameleon an intelligent slave device to Arduino-based systems.

MichaelM · Post by **MichaelM** » Fri Aug 01, 2014 11:32 pm

Since posting an update to the M65C02A core yesterday, I've been working to add in the following instructions:

RMBx/SMBx zp
BBRx/BBSx zp,rel
PEA abs
PEI zp
PER rel16
REP/SEP #imm
ORA/AND/EOR/ADC/LDA/STA/CMP/CMP sp,S
ORA/AND/EOR/ADC/LDA/STA/CMP/CMP (sp,S),Y.

In addition, I have also decided to add in a co-processor emulation instruction:

COP #imm
COP (zp)

COP will generate a trap like BRK. I've assigned the COP vector at location 0xFFE6:FFE7. I decided that I did not need to preserve the WDM op instruction reserved by WDC. Instead I decided to implement two versions of the COP instruction. The first will load the A or X register with the immediate operand, and the second will load the A or X register with a value from zero page. I initially thought of loading the operand into A. I am now thinking that given the JMP (abs,X) instruction, loading the operand into the X register would set up the service routine to use JMP (abs,X) to go to the desired operation quickly.

Furthermore, I decided that I would round out the JSR and JMP instructions with some of the missing addressing modes:

JMP (zp)
JSR (zp)
JSR (abs,X)

These instructions leave 19 opcodes not assigned. To support the MMU and other 16-bit IO page operations, I need two opcodes for moving 16-bit values in an indivisible manner from zero page to the IO page and vice-versa. I have decided that I will use opcodes 0x44 and 0x54 for this purpose. The architecture of the basic core would require too many changes to support the '816 block move instructions MVN/MVP which use these opcodes. I have designated these instructions as MWT (Move Word To IO Page) and MWF (Move Word From IO Page). The Y register functions as the register index in the IO page and a zp operand provides the source/destination of the 16-bit value being moved.

MWT zp,(Y)
MWF zp,(Y)

I will use opcode 0xFB (the XCE instruction in the '816) for the instruction prefix/escape opcode, PFX, that will allow the M65C02A microprogram to extend/modify the behavior of selected instructions. For example, if the PFX instruction is used before the JSR abs instruction, I expect the instruction to be modified to perform a JSR (abs) instruction. Similarly, when prefixed by PFX, I plan on modifying the RMBx/SMBx zp instructions to perform as RMBx/SMBx (zp). (I expect to apply these types of modifications to instructions where the indirection makes sense. In other situations, I expect to use the PFX to change the index register, or to access the user mode stack pointer from kernel mode, etc.)

As part of these efforts, I realized that it would be easy to support 16-bit relative addressing. So I've made some relatively minor changes to the operand register data paths and to the relative address conditional multiplexer in the address generator. The M65C02A core can now support both 8-bit and 16-bit relative addressing. (I've posted an update to GitHUB that includes this modification.) Thus, I will implement the following two instructions to use the new 16-bit relative addressing mode just added to the core:

JMP rel16
JSR rel16

These two instructions should allow fully relocatable code for the M65C02A core.

With these five additional instructions, there are now only 14 unused opcodes. I expect that this should be sufficient to implement a good set of DTC/ITC primitives to support FORTH or another threaded code compiler.

I would be interested in any feedback on my plans for this core. Any suggestions for the remaining 14 opcodes, as they might apply to a threaded code interpreter, would be welcome.

My first vacation in two years away from home is coming to an end this weekend.

So progress on this project will likely slow down again to just the weekends.

Football season will soon be here, and I've got season tickets once again.

Taking time off from work to go to the games last season did make the pressure at work much more tolerable.

Edit: Added missing words and a blank line after each instruction list.

Dr Jefyll · Post by **Dr Jefyll** » Sat Aug 02, 2014 2:14 am

Quote:

I would be interested in any feedback on my plans for this core.

Hi Michael -- congratulations on your work. I'm envious, as I wish I had time to do a project like this myself.

It's certainly a major improvement to implement the bit-manipulation instructions, and that goes for the the new stack address modes and many other features as well.

Offhand my only suggestion is that perhaps you haven't taken full advantage of the Prefix idea. IOW I think you should consider defining several prefixes rather than just one. Here are some possible effects different prefixes could have on the following instruction:

Let the role of A be assumed by X instead. This would allow powerful maneuvers for adjusting -- even scaling! -- X. (See below.)
Let the role of A be assumed by Y instead. Ditto to above.
Let the role of X be assumed by S instead (and let Zero-page be the stack page instead). This is a different way to achieve the new stack addressing modes. It requires no new opcodes; instead you'd just use legacy (Z-pg,X) or Z-pg,X modes but with a prefix.
Let the role of X be assumed by Y, and let the role of Y be assumed by X. (One prefix is sufficient for both.) This would give, for example, (ind,Y) and (ind),X modes -- not to mention instructions such as JMP (ind,Y).

I realize that adding a handful of prefixes is wasteful in the sense that you'd open the door for a thousand or more Prefix-Opcode combinations, many of which would remain unimplemented. There's also a one-cycle penalty for fetching the prefix. But my hunch is that this approach might be a lot easier to implement, since there are no new instructions... merely new register assignments. But maybe I'm making it sound too easy -- certainly I'm no HDL expert.

By way of a post-script, maybe there are a few new instructions worth introducing -- for example, add without carry. ADD would be equivalent to the sequence CLC ADC. Of course that would be handy as the first part of a multi-precision addition. What's less obvious is its use with a prefix (as noted above) to adjust X or Y. Example: instead of INX INX INX INX you'd have PFXn ADD #4.

Quote:

To support the MMU and other 16-bit IO page operations, I need two opcodes for moving 16-bit values in an indivisible manner from zero page to the IO page and vice-versa.

I like the idea of 16-bit moves a lot! Is there a way they could be made more general -- I mean so they'd be available for other uses besides the MMU?

One way to achieve 16-bit moves would be with a "double-length accumulator" prefix. Even assuming you limit the prefix to LDA and STA operations, it'd still be very worthwhile. To be clear, PFXn LDA followed by PFXn STA would move 16 bits -- and any of the LDA/STA address modes could be used. So it could handle your MMU issue and much more.

cheers,
Jeff

MichaelM · Post by **MichaelM** » Sun Aug 03, 2014 5:06 pm

Jeff:

Thanks for the feedback.

You are probably right to say that I've probably not explored the potential of using prefix instruction as fully as I should. However, the potential explosion in the number of instructions that have to be tested is somewhat scary. Furthermore, since I have changed many of the the microprogram control fields from encoded to one-hot, it's not obvious at the moment how I might implement some of the register aliasing that you suggest. The one-hot control fields have reduced the combinatorial path lengths, and adding logic to induce register aliasing is likely to result in longer combinatorial control paths defeating some of the performance gains I managed to include in the latest release of the core. Thus, after I've completed the basic set of new instructions given above, I will take a longer look at how to implement some of your register aliasing suggestions.

On the other hand, I would like hear from you and others on some suggested instructions that can be used to support a DTC/ITC FORTH VM like those that you implemented for your KIM clone.

teamtempest · Post by **teamtempest** » Sun Aug 03, 2014 9:15 pm

Quote:

RMBx/SMBx zp
BBRx/BBSx zp,rel
PEA abs
PEI zp
PER rel16
REP/SEP #imm
ORA/AND/EOR/ADC/LDA/STA/CMP/CMP sp,S
ORA/AND/EOR/ADC/LDA/STA/CMP/CMP (sp,S),Y.

I'd still call it "BBCx" rather than "BBRx".

PEA and PEI can be one mnemonic with two address modes. As with "BBRx", I imagine the only reason to keep the names is to provide backward compatiblity with existing source.

The main use of REP/SEP has always been to switch register sizes between eight and 16 bits on the '816. I don't recall if you've implemented that on your core. If dual-size registers have not, what use are these instructions? And would any of those uses be so frequent as to be preferable to PHP, adjust (using the following "S" relative instructions, perhaps), and PLP?

Quote:

JMP (zp)
JSR (zp)
JSR (abs,X)

I would limit these to "JSR (abs)" and "JSR (abs,x)". It's true that if "abs" turns out to be a zero page location the assembled result will turn out to be one byte longer and one (?) cycle slower than a "(zp)" form, but that's always been true of "JMP (abs)". I know I've never been bothered by that; if I used a zero page location for "JMP (abs)" it's because it was faster to set up the vector than in a non-zero page location. But I never did it so often I longed for a specific "(zp)" form.

Quote:

MWT zp,(Y)
MWF zp,(Y)

I'm having trouble visualizing exactly what is meant by "(Y)". Even if it does mean something special, aren't the mnemonics themselves enough of a clue? Particularly since, AFAICT, no other instruction would use a "(Y)" mode.

Quote:

JMP rel16
JSR rel16

I love relative addressing modes for these instructions, but there is a problem with using these mnemonics. There seems no way to tell, just by examining the operand, whether it is meant to be absolute or relative or (if you go ahead and implement it) zero page. An assembler could make a "best guess" based perhaps on whether or not the destination address is within a signed 16-bit range, but that might not always match what the programmer had in mind.

Or that might actually be the defined behavior an assembler should by default follow, in which case it would be what the programmer should always expect. If there's a time penalty for using a relative form rather than an absolute form that might not be the best, however.

You might also consider using "BRA" and "BSR" for these instead, matching the "B--" of other relative branch instructions, with the provisio that these are 16-bit ranges, not eight-bit.

Dr Jefyll · Post by **Dr Jefyll** » Sun Aug 03, 2014 9:26 pm

MichaelM wrote:

On the other hand, I would like hear from you and others on some suggested instructions that can be used to support a DTC/ITC FORTH VM like those that you implemented for your KIM clone.

Hmmm... IMO the best approach is to target operations involving the IP. (For the uninitiated, that's "Interpretive Pointer.") The IP is the high-level (ie; Forth) program counter. The main operation involving IP is NEXT -- which is very frequently executed, and thus an excellent candidate for optimization.

NEXT is basically an indirect jump via IP -- nothing terribly fancy. A Forth program is just a list of pointers, and IP indicates your current position in the list. The indirect jump in NEXT vectors execution to the 65xx code snippet that simulates the desired high-level instruction. IP needs to be incremented after the fetch, just like any program counter, so the complete definition of NEXT for 65xx is JMP (IP++). (What I've described is DTC, or direct-threaded code, used by many modern Forth implementations. Older implementations such as FIG Forth use indirect-threaded code, aka ITC, for which NEXT is defined as JMP ((IP++)).)

Unoptimized 65xx Forth maps IP as a pair of bytes in zero page, and uses legacy 65xx instructions for all accesses to IP. ITC NEXT consumes almost 40 cycles. You could entirely bypass z-pg by adding IP to the M6502A register set -- and then add NEXT to the instruction set and reap a huge speedup. Unfortunately, you'd also need new instructions for a dozen or so incidental operations involving IP, and that gets complicated.

My KimKlone uses an ambidextrous approach!

IOW, I created an IP register and added JMP ((IP++)) to the 65c02 instruction set -- with an execution time of just 9 cycles!

But I also arranged that IP is accessible as a pair of bytes in zero page, and that allows incidental lower-priority operations to use ordinary 65xx instructions to manipulate IP.

cheers,
Jeff

ps- I agree with teamtempest's observation that new instructions present a challenge in regard to mnemonics. For the KK I threw up my hands and resigned myself to a mish-mash of made-up mnemonics that are ugly & lengthy -- but descriptive.

Edit: subsequent to this post I gave the KK registers new (and hopefully less confusing) names. That's right -- register renaming! -- but not in the usual sense.

The doc on my web site was suitably updated, and you can find the list of new instructions here.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Aug 04, 2014 4:34 am

Dr Jefyll wrote:

For the KK I threw up my hands and resigned myself to a mish-mash of made-up mnemonics that are ugly & lengthy -- but descriptive.

LDNCPZPG,X -- Geesh!

Dr Jefyll · Post by **Dr Jefyll** » Mon Aug 04, 2014 5:15 am

Yep! I warned you -- ugly & lengthy! But, despite appearances, not cryptic. Every KimKlone programmer in the world (ahem!

) is sure to know about NCP, the New Code Pointer register. LDNCPZPG,X is just LoaD NCP using ZPG,X address mode.

(It could've been simply LDNCP, but my quick 'n' dirty assembler needs to have the address mode spelled out explicitly.

)

In the case of Michael's project, hopefully there'll be a better assembler that can infer address modes in the usual way. Even so, with all the new instructions, I suspect three-letter mnemonics won't be adequate to describe the action in a way humans will readily identify. I suspect mnemonics of four letters or more will be required.

-- Jeff

barrym95838 · Post by **barrym95838** » Mon Aug 04, 2014 5:31 pm

I agree whole-heartedly with teamtempest's bra/bsr recommendation. Or, would it be brl and bsl?

I agree with Dr. J that a thoughtful treatment of the NEXT mechanism is where the best gains can be made, even if it turns out to be little more than an efficient 16-bit memory increment by one or two. If you have a 16-bit pei (for DOCOLON and >R), then that's even better.

I don't agree with his 4+ char mnemonic idea, though. It just wouldn't feel 65xx enough for me.

Mike

P.S. How hard would auto-increment ( a la 6809 ) be to incorporate?

MichaelM · Post by **MichaelM** » Tue Aug 05, 2014 12:57 am

Thanks for your responses and suggestions.

I too realized that it would be difficult to distinguish between a JMP/JSR abs and a JMP/JSR rel16. So I wholeheartedly agree that BRA/BSR are much better mnemonics for these two opcodes. I think that it should be easy to resolve whether the branch target is within an 8-bit or a 16-bit range and select the appropriate opcode. Therefore, I don't think it is necessary to use a mnemonic like BRL instead of BRA. (BTW, it may be possible to modify the conditional branch instructions with the prefix instruction to implement a 16-bit relative branch.)

barrym95838 wrote:

P.S. How hard would auto-increment ( a la 6809 ) be to incorporate?

I don't think that it will be too difficult to implement a NEXT a la PDP-11/MC6809 with an auto-increment of the virtual IP.

I was thinking that I would implement NEXT as a single byte instruction. However, as Jeff pointed out above, NEXT is a jump indirect via IP (Intrepretive/Instruction Pointer) with auto-increment. I have been thinking about Jeff's suggestion, and will implement the instruction he suggested with the IP in zero page: JMP (zp++). (Thanks very much Jeff.

) That instruction also suggests using zero page for implementing the other FORTH VM registers: W (Working Register), and PSP (Parameter Stack Pointer). The M65C02A page 1 stack can be used for the RS (Return Stack), and the PEI/PEA/PER instructions and stack relative addressing modes can be used for manipulating the return stack.

Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

teamtempest wrote:

I'm having trouble visualizing exactly what is meant by "(Y)". Even if it does mean something special, aren't the mnemonics themselves enough of a clue? Particularly since, AFAICT, no other instruction would use a "(Y)" mode.

These (Y) notation was intended to indicate that these two instructions are two address instructions in contrast to all other instructions. The first address is provided by the zp operand and the second address is the contents of register Y. The contents of register Y will index the IO page, which in the M65C02A is the 256 byte page 0xFF00:FFFF. Perhaps a notation closer to that used for the stack relative instructions might be clearer, but the generally accepted single address/single operand syntax of the 6502 makes it difficult to convey the two address nature of these two instructions.

You, BDD, and others have suggested changing the mnemonics for PEI and PEA on another thread. I don't disagree with the points that you have made. I only want the results. I think it has been suggested that these instructions be defined as:

PHW #imm16
PHW dp

Perhaps it would be advantageous to add a third instruction: PHW abs?

I am not sure that PER would serve much purpose if BRA/BSR rel16 were available unless it was also possible to perform these two operations based on the top two locations of the stack. Therefore, what would you say if REP/SEP #imm were not implement as you suggested and instead BRA/BSR (sp,S) were implemented?

I like the idea of implementing JMP/JSR (zp), but I see your point regarding the extra cycle: it's really not that critical in the overall scheme. Thus, I will not implement those two instructions, and reserve the opcodes for other instructions.

I can see implementing some instructions which are the complements of the PHW instructions:

PLW zp
PLW abs

GARTHWILSON · Post by **GARTHWILSON** » Tue Aug 05, 2014 1:20 am

MichaelM wrote:

Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

The 6502's hardware stack works well for the Forth return stack, while ZP works well for the parameter stack.

As I originally learned it nearly 25 years ago, "W" in Forth is the word pointer, and "IP" was the instruction pointer (since compiled code does not get interpreted).

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Aug 05, 2014 2:40 am

MichaelM wrote:

You, BDD, and others have suggested changing the mnemonics for PEI and PEA on another thread. I don't disagree with the points that you have made. I only want the results. I think it has been suggested that these instructions be defined as:

PHW #imm16
PHW dp

Perhaps it would be advantageous to add a third instruction: PHW abs?

Too bad the 65C816 doesn't have that instruction. It has to be synthesized with REP #%00100000 -- LDA ABS_ADDR -- PHA.

Quote:

I am not sure that PER would serve much purpose if BRA/BSR rel16 were available unless it was also possible to perform these two operations based on the top two locations of the stack.

PER is very useful, because the value that ends up on the stack is computed at run-time, making fully relocatable code possible. For example, a reference to a data table would become portable if PER were used to push the data table's address, rather than have the address set at assembly time. If all such references are generated by PER, then the program can be loaded anywhere and it will run without alteration.

barrym95838 · Post by **barrym95838** » Tue Aug 05, 2014 3:20 pm

MichaelM wrote:

... I was thinking that I would implement NEXT as a single byte instruction. However, as Jeff pointed out above, NEXT is a jump indirect via IP (Intrepretive/Instruction Pointer) with auto-increment. I have been thinking about Jeff's suggestion, and will implement the instruction he suggested with the IP in zero page: JMP (zp++). (Thanks very much Jeff.

) That instruction also suggests using zero page for implementing the other FORTH VM registers: W (Working Register), and PSP (Parameter Stack Pointer). The M65C02A page 1 stack can be used for the RS (Return Stack), and the PEI/PEA/PER instructions and stack relative addressing modes can be used for manipulating the return stack.

The (zp++) idea sounds cool (the PDP-8 was doing something similar 45 years ago), but it might take some careful thought to implement. Do you want to use an auto-increment address mode, like the 6809, or do you want to make certain zp address ranges increment (or double increment

) automatically when used by the indirect modes? The PDP-8 did the latter, but this method could be a bit confusing for beginners. Also, does the pointer need to be avoided by the next machine instruction, to give it time to finish incrementing without holding up the instruction stream?

Quote:

Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

Garth will be the resident expert on that, and he will certainly tell you to put the parameter stack in page 0 (with X for a pointer), and the return stack in page 1 (with S for a pointer). dclxvi might have something to say about this too, and he has had several intriguing ideas in the past that fiddled with this protocol, but IMO you can't go wrong with the tried-and-true standard.

I'm excited that you have been able to make such significant progress on your project, and will be (a bit enviously) following it with interest.

Mike

Dr Jefyll · Post by **Dr Jefyll** » Wed Aug 06, 2014 5:24 am

MichaelM wrote:

I have been thinking about Jeff's suggestion, and will implement the instruction he suggested with the IP in zero page: JMP (zp++).

Glad you like the idea, Michael. Would your JMP (zp++) instruction include an operand following the opcode (to specify which zero-page address to use)? The operand can be omitted (and one cycle saved) if it's implicit that JMP (zp++) will always use a certain pair of z-pg addresses. That's the route I took for my KK NEXT -- it is a one-byte instruction, and it's implicit that $48 and $49 is where IP resides.

Of course it's fair to ask whether it'd be better to have something more general. (Mike asks almost the same question in his last post: is it an address mode, or is it special behavior of a designated area of memory?) Without wishing to sway you one way or the other, here's an observation to consider. If the Interpretive Pointer is physically on-chip, there's no need to consume 2 extra bus cycles fetching those 16 bits of IP -- a obvious point in favor of IP being on-chip. But if the JMP (zp++) instruction uses an operand to specify where IP resides, then saving those 2 extra bus cycles means you must map all of zero-page on-chip. At stake is a delay of (maybe) up to 3 cycles altogether.

MichaelM wrote:

Can someone comment on whether the RS and PS (Parameter Stack) is best implemented in the 6502 processor stack or not?

Certainly the historical 65xx usage is X for the Parameter-stack pointer and S for the Return-stack pointer. It never occurred to me there'd ever be a reason to change that; but, as sometimes happens, a naive comment from a novice triggered some unexpected insight.

X is actually sub-optimal as a P-stack pointer in that the sequence DEX then STA 0,X takes twice as long as PHA, for example. We do a lot of pushing (and pulling) in Forth, so the matter isn't trivial. But X's poor push/pull performance is outweighed by the immense utility of z-pg,X and (z-pg,X) address modes. So, for 6502 and 65c02, the justification for X as a P-stack pointer is clear.

The insight is this. Now that sp,S and (sp,S),Y address modes are available (on the '816 and Michael's M65C02A), X no longer has vastly greater utility than S -- in fact, (sp,S),Y surpasses X in a manner that Forth can use to good advantage. The longstanding 6502 tradeoff (tolerating slow P-stack push/pulls via X) is now clearly open to review. I don't advocate that anyone should rewrite an existing Forth. But IMO any new '816 Forth should break with tradition and use S for the P-stack pointer and X for the R-stack pointer!

Is there a gotcha I've overlooked? Am I reinventing someone else's idea? (And am I de-railing Michael's thread?

)

cheers,
Jeff

M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core