6502.org

Posted: **Sat Jun 28, 2014 4:40 pm**

I have released the M65C02A processor core on GitHUB. It provides significant enhancements to the microprogram control structure and the processor logic.

The microprogram structure has been modified to reduce the amount of decode logic required to generate the control signals. As a consequence, the microcode now directly controls most logic structures in the processor. The decoders needed to generate control signals to the logic have been eliminated in most cases, which results in a smaller logic footprint. It also improves the overall speed of the core.

More importantly, the changes to the microprogram structure have allowed the inclusion of additional multiplexers in the data path. The faster control paths gained by moving to unencoded control fields in the microprogram have been traded for additional multiplexers that allow the microprogram direct control of the various address/data path registers. This will allow the microprogram to implement custom functions such as FORTH interpreter functions like NEXT, ENTER, EXIT, etc.

The changes to the logic structures are such that the stack relative addressing mode of the W85C816 can be easily added to the instruction repertoire of the M65C02A core. (These additions can be added without re-synthesizing the core.) The M65C02 core had a dedicated functional unit to support the Rockwell instructions: SMBx/RMBx and BBSx/BBRx. With the new logic structure, these instructions and others, such as the W65C816 PEA/PEI instructions, are directly supported by the new LU (Logic Unit) module. The reduced number of functional units in the M65C0A core's ALU has improved to overall speed. Some of the speed gains have been traded, as indicated above, for additional multiplexers to allow the microprogram more control of the address/data path registers.

I have added two features present on standard processors that I did not previously support. First, I have added support for the SOB (nSO) input pin that standard processors provide to allow external logic to set the V flag on the falling edge of the pin. That function now performs the action defined, but the falling edge detector must be provided by logic external to the core itself. Second, I have included the reset behavior of the 65SC02. To do this, I removed system reset from the PSW and the PC, and initialized the stack pointer to 2 when an external RST is asserted. The reset microprogram will now push the PC, followed by the PSW, to the stack. The three stack pushes place the PC in locations 0x102 (PCH) and 0x101 (PCL), and the PSW in location 0x100. The third push advances the stack pointer to point to location 0x1FF.

Thus, the stack pointer, S, is at the top of the stack page when the processor restarts. This D and I bits are cleared when all four of the traps/interrupts are entered. This modification allows the reset button to be pressed to recover a runaway processor. The monitor/debugger can be used to examine the three lowest locations in the stack page to determine where the processor was when RST
was asserted.

The result is a processor core which is more flexible. I expect to use this core to implement the stack relative and PEA/PEI instructions of the W65C816. Following that, I've been reviewing Brad Rodriquez' Moving FORTH articles and Loeliger's Threaded Interpretive Languages book for ideas that will lead to the implementation of the core of a DTC FORTH interpreter directly in the microcode of the M65C02A core.

I have completed the verification of the core using my test programs. I plan to run Klaus' functional test programs on the core soon as a final check. Klaus' programs helped identify some errors in my stack addressing logic in the M65C02 core. I've incorporated those changes in the current core, and modified my test bench test program to test for those specific conditions. Therefore, I am fairly confident that passing Klaus' functional tests will be pro forma, but I've thought that before as well.

I would appreciate any feedback anyone may have on the M65C02A core.

Posted: **Sat Jun 28, 2014 7:45 pm**

Smaller, faster, and with more features - well done!
Ed

Posted: **Sat Jun 28, 2014 10:31 pm**

Awesome work!

How difficult do you think it would be to make the M65C02A's data bus 16-bits wide? (I've not gotten around to checking your code out yet)

EDIT: Also, what Xilinx? device have you been targeting and what hardware development platform have you been using to develop your core?

Posted: **Sun Jun 29, 2014 11:03 am**

EEyE:

You, BigEd, and others have defined a byte as 16 bits and converted a core (Arlet's) to operate on the bigger word size. That core's instruction sequencing logic is unaffected by the change in the word size. The resulting 65Org16 core was then extended with a large number of registers.

I suppose the same approach could be used with the M65C02/M65C02A cores. I haven't taken any pains to parameterize the basic word size of the core. Therefore, to make the change to a 16 bit word size, a la 65Org16, would simply require changing the width definitions of the relevant registers and wires in the M65C02/M65C02A cores' modules.

However, the M65C02/M65C02A cores only decode 8-bit instructions. Further, these cores decode instructions whose high and low nibbles have been swapped. I chose this approach in order to more fully overlap the execution and instruction fetch phases. It also allowed me to eliminate dead cycles in read-modify-write instructions, reduce all branches to two cycle instructions, and reduce the cycle count for many instructions (~40%) by one.

If I recall correctly, the 65Org16 core also retains its basic instruction definition as 8-bit opcodes. The remainder of the 16-bit instruction is used to address the registers rather than defining additional instructions. If this is really the approach taken by the 65Org16 core, then modifying the M65C02/M65C02A cores to operate on 16-bit data rather than 8-bit data should be workable.

You will likely need to add at least one register, the instruction register (IR), as that register is not required by the M65C02/M65C02A cores for instruction decoding. This register is supported by the control logic, meaning that there exists a write enable control for the register in the microprogram, but since the register is not required, I removed it altogether from the M65C02A core and it gets trimmed from the M65C02 core.

I am currently using two boards which I designed for myself: (1) the M65C02/M16C5x Development Board, and (2) the Chameleon Arduino-compatible FPGA Shield Board. Both can be fitted with either the XC3S50A-xVQ100 or the XC3S200A-xVQ100 FPGAs from Xilinx.

I currently prefer to use the Xilinx ISE 10.1i SP3 toolset. I have synthesized both cores for either board using the ISE 14.4 WEBPack toolset, but I don't generally find that the newer toolset provides any better performance for the Spartan 3A family. I have also synthesized the cores for the Spartan 6 LX9/LX4 parts. I don't currently have a Spartan 6 development board, but will soon have a commercial product using the XC6S25T part. Thus, I may convert one of the prototypes for my use as a development platform.

Posted: **Sun Jun 29, 2014 8:33 pm**

Cool. I was just curious about the bus width extension. You're correct about the 65Org16 details...

I like your approach of modifying your core to support FORTH directives. It's sort of analogous to a (video) hardware accelerator, except your M65C02A retains usage of assemblers made for the 6502, for the most part?

Posted: **Mon Jun 30, 2014 5:23 am**

ElEctric_EyE wrote:

I like your approach of modifying your core to support FORTH directives. It's sort of analogous to a (video) hardware accelerator, except your M65C02A retains usage of assemblers made for the 6502, for the most part?

Right, Sam -- the assembly language would be mostly unchanged. It's usually the case that Forth is implemented as a virtual Forth machine running on a host CPU. Each Forth instruction is simulated by the host -- usually with a performance penalty, because simulating one Forth instruction typically requires multiple host instructions. It's the same as the projects you see here on the forum where people use an Atmel or whatever to simulate a 6502 -- except now the shoe is on the other foot (it's the 6502 doing the simulating).

Michael's approach accelerates things by having a few new 6502 instructions that map more or less directly to some frequently used Forth operations. So it's a hybrid -- mostly 6502, but with some Forth genes in there too.

Needless to say, I heartily approve! Did you plan on mapping Forth's IP register on-chip, Michael? Congratulations on your progress with this and the other features!

MichaelM wrote:

The M65C02 core had a dedicated functional unit to support the Rockwell instructions: SMBx/RMBx and BBSx/BBRx. With the new logic structure, these instructions and others, such as the W65C816 PEA/PEI instructions, are directly supported by the new LU (Logic Unit) module.

Again I approve, since I know from personal experience how powerful SMBx/RMBx and BBSx/BBRx are. I started a topic in February, and humbly draw folks' attention to recently-added material in the lead post, here.

cheers,
Jeff

Posted: **Mon Jun 30, 2014 11:49 am**

Dr Jefyll:

I have considered including one or more of the registers for a FORTH virtual machine within the core. However, at this point, I have decided to place these virtual registers in ZP memory. The additional capabilities I've embedded in the core's logic should ease the implementation of +2 operations on the IP, and ±2 operations on the PSP and RSP. After NEXT, ENTER, and EXIT are implemented using ZP, it may be appropriate to consider how to pull these virtual registers into the core and implement the arithmetic operations required in single cycle structures.

Following your suggestions in the other thread, I have considered how to implement the Rockwell instructions using either ZP or ABS addressing modes. My original thought was to provide microcode for one or the other. However, I've been giving the subject some more thought, and I think that I can see several ways to allow the programmer to determine the addressing mode of these instructions. One way may be to use a emulation bit, a la '816, to determine the length of the address operand. Another way may be to use a pre-byte to override the default addressing mode. A third way may be to implement ZP indirect rather than absolute, and use an emulation bit to determine if ZP direct or ZP indirect should be used.

I have also developed an M65C02 Duo, a dual core implementation. For that implementation, I developed a simple MMU that can be used to decode the address space in 4kB blocks as well as expand the address space to 1MB+. To support that peripheral, I have resorted to using an unused opcode in order to write the mapping data to the MMU in an indivisible manner. (IOW, no SEI/CLI instructions are required to prevent potential critical region conflicts.) I am reworking that peripheral for the M65C02A to use ZP direct or ZP indirect, like I'm considering for the Rockwell instructions. In the current implementation, the 24-bit value required for each entry in the mapping RAM is first loaded into the registers, {A, X, Y}, and the instruction then transfers them in single cycle into the mapping RAM of MMU. I am now contemplating using only the A and a 16-bit location in ZP for the 24-bit value. The A would be set with the MMU register index and the number of wait states required, and the ZP location would contain the extended address and the chip enable map. It is possible to compress the data required to 20 bits if I reduce the number of CEs from 8 to 4, and keep the number of wait states that are programmable to another 4-bit field.

Any thoughts on these issues is appreciated.

Posted: **Mon Jun 30, 2014 2:38 pm**

Lots to cover here, Michael!

I suppose it'd be nice in some circumstances to have SMB/RMB and BBS/BBR available with an additional address mode (a mode other than zero-page). But these instructions are primarily useful for I/O, and my other topic makes the case that often it's best to have I/O in zero-page anyway. The W65C134S features I/O in zero-page, as do several similar commercial products (and every 65xx computer I've ever built). But I grant that an additional mode might be nice in some circumstances.

Quote:

I think that I can see several ways to allow the programmer to determine the addressing mode of these instructions. One way may be to use a emulation bit, a la '816, to determine the length of the address operand. Another way may be to use a pre-byte to override the default addressing mode. A third way may be to implement ZP indirect rather than absolute, and use an emulation bit to determine if ZP direct or ZP indirect should be used.

Mode bits are a double edge sword. On the plus side, they let you specify implicit information that would otherwise have to be explicitly encoded (as with a prefix) for every instance of your alternative-mode instruction. But, unless you set them once and leave 'em alone thereafter, mode bits result in bugs when they're not in the state the programmer thinks they are in. The more mode bits you have, the more potential there is for grief -- even the 65816 suffers from this. In Tracy Kidder's book, The Soul Of A New Machine (which I highly recommend) there's a humorous and disparaging remark about mode bits. Ed deCastro was firmly opposed, noting that, "you always get tied up in your own underwear!" I think you might want to consider the prefix option instead. A lot depends on who you expect to be writing code for the M65C02A, and what sort of code.

Quote:

I have considered including one or more of the registers for a FORTH virtual machine within the core. However, at this point, I have decided to place these virtual registers in ZP memory.

That's reasonable, since ZP is highly accessible, and you'll automatically get to use existing instructions to manipulate the Forth registers. FWIW, the same benefit would apply if you did decide to have, say, IP as part of the core. IOW an on-chip register could be arranged to respond as if it were ZP memory. That might be the best of both worlds.

More to follow... haven't even gotten to the dual-core business! Cheers,
Jeff

ps-

Quote:

Dr Jefyll:

I hope no-one thinks I'm a real doctor!

My handle is a pun on my own name and the name of Robert Louis Stevenson's famous character, Dr Jekyll -- a hacker of sorts who had a madman within!

Posted: **Mon Jun 30, 2014 2:43 pm**

My thoughts on the bit-twiddling ops is that they are strictly optional, but clearly save a few cycles and a few bytes. The systems one might build with or in an FPGA are not likely to be short of a few bytes, and as they run at some tens of MHz it's not even clear that a few cycles matter. But, if a few cycles do matter, it makes sense to support the fastest possible application, not the most general - so for me, zero-page is the right way to go.

I think it's important not to apply sub-MHz value judgements to systems running at tens of MHz.

Cheers
Ed

Posted: **Mon Jun 30, 2014 11:11 pm**

MichaelM wrote:

...I have also developed an M65C02 Duo, a dual core implementation. For that implementation, I developed a simple MMU that can be used to decode the address space in 4kB blocks as well as expand the address space to 1MB+...

Dr Jefyll wrote:

Lots to cover here, Michael!...
More to follow... haven't even gotten to the dual-core business! Cheers,
Jeff...

I'm curious of the dual core as well. I didn't want to derail the thread though. But I'll ask anyway:
How do they interact? What was the goal? How fast were they running? Were they using external memory?

Posted: **Tue Jul 01, 2014 2:38 am**

MichaelM wrote:

I developed a simple MMU that can be used to decode the address space in 4kB blocks as well as expand the address space to 1MB+.

If you haven't already seen it, maybe Hudson Soft's HuC6280 cpu will give you food for thought. It has opcodes that "talk" directly to the MMU. There's ingenuity in their approach, and IMO some good insight there. And of course they've broken the 64K barrier.

OTOH it's a scheme which can't rapidly make use of long (eg: 24 bit) addresses which are computed at run time -- for example when navigating a large data object. Imagine code that...

must randomly index into a 1 MB array according to a continuous series of addresses being calculated. Or,
must follow a linked list when each link contains a full 24-bit address

Tasks like these may only grab 2 or 3 bytes before it's time to look elsewhere in the 24-bit space. That means the MMU becomes a bottleneck unless it can accept an entirely new long address in just a few cycles. So, I think the 65816 and the flawed but fixable 6509 would be good models to contemplate, and the HuC6280 not so much. The former use 64K banks, sidestepping the slow, awkward masking & shifting that arise when juggling non-64K banks. Probably I'm not explaining that very well, but try writing some code to quickly do random accesses (as in the examples) and the challenge will become evident. (Or maybe there's a solution I've overlooked.)

I guess it depends what goals you set for the project. The wait-state and chip-select features you mentioned are certainly valuable, because they help avoid external delays that might otherwise limit clock speed. Can the same features be reworked to fit into an '816- or 6509-like scheme?

cheers,
Jeff

Posted: **Tue Jul 01, 2014 6:35 am**

True MMUs do not only do a bank switching. So if you have a soft-core you could as well add some of those other features that make MMUs so valuable. E.g. page protection and variable offsets into banks. When I think of MMUs for microcontrollers I always think the features of the MMU of a PDP/11 would be nice.

cheers
Peter

Posted: **Tue Jul 01, 2014 12:34 pm**

Jeff:

Good point on the cumbersomeness and management issues with a mode bit. I was leaning toward an escape/prefix approach like the segment override prefixes of the x86 architecture. Your suggestion sort of seals it in favor of a prefix approach. So in planning for this type of modification, I think that the prefix byte would only remain in effect until the following Rockwell instruction completed. The prefix byte would not be interruptable because it would be considered part of the following opcode.

One thought that I had as I was writing this reply was that the prefix approach could be used to select more than just the ZP indirect addressing mode. There are enough unused opcodes that many of the other zp addressing modes could be added to instructions such as the Rockwell instructions, TRB/TSB, BIT, etc. using different prefix bytes to represent the desired addressing mode. However, I am thinking that only a single prefix, IND, to add indirection (if applicable) to the addressing mode of the following instruction would really be necessary. Adding pre-indexed ZP indirect and post-indexed ZP indirect addressing may be nice but not truly "necessary".

I've not completely sorted out how I would implement such a mechanism in the microprogram. I am thinking of adding a hidden bit to the PSW that only gets set when the IND prefix is executed and cleared otherwise. In the microprogram, if the IND bit of the PSW is set, a conditional branch to the corresponding indirect addressing mode sequence is made. The IND bit will clear automatically when the following instruction completes, i.e. when Sync asserts.

If the IND prefix is applied to an instruction that already supports indirect addressing, then it shouldn't have an effect. I see the IND prefix applying to instructions such as TRB dp/TSB dp to convert the instructions to TRB (dp)/TSB (dp). Similarly, TRB abs/TSB abs would be converted to TRB (abs)/TSB (abs). Applying the IND prefix to JSR abs should convert it to JSR (abs). (This may be a good instruction to add to support a FORTH interpreter).

BigEd is probably right with regard to the Rockwell instructions, at least for the SMBx/RMBx instructions. The BBSx/BBRx instructions may be more useful if the IND prefix is applied. With indirect addressing added to instructions such as BIT and TSB/TRB, using the IND prefix, the need for the Rockwell instructions may be eliminated. (If setting/clearing multiple bits in an I/O register using the a mask in the Accumulator is not required, then the Rockwell instructions may be of benefit. Similarly, if setting/clearing a bit is required without otherwise affecting a register or the PSW, then the Rockwell SMBx/RMBx instructions are useful. Otherwise their functionality could be replaced by the BIT and TSB/TRB instructions followed by an optional branch instruction.) I will probably restore the Rockwell instructions simply to provide an instruction set compatible with the WDC 65C02S microprocessor. Their functionality can certainly be emulated by the remainder of the instruction set.

EEyE:

I have implemented the two cores to share a single microprogram memory, and communicate using the last internal Block RAM. I have built a prototype using a Spartan 3A XC3S200A FPGA. I included a simple interrupt handler, MMU, a buffered UART on each core. I am working to share a single SPI Master between them, but at the moment the SPI Master is assigned to only one core. One core is allocated 16 kB of block RAM for its use, and the other is allocated 8 kB RAM for its use. The cores are expected to share a common 2kB boot ROM, and 2kB of DPRAM. The remaining 4kB of block RAM are shared between the cores as dual-ported microprogram ROMs. The DPRAM allows for the transfer of data between the cores in a master-slave configuration.

The cores also share an external memory interface. I have had some issues getting cores to share the external memory interface. I don't plan to release the dual core implementation until I've worked out that interface.

For simplicity, I used my 4 cycle microprogram sequencer. The cores ran at ~60MHz with the M65C02 core as the basic component. I suspect that the M65C02A core will provide single cycle operation at 40 MHz+. With the 4 cycle microprogram sequencer the equivalent execution speed is less than 15 MHz. Still plenty fast enough for many applications, but slower than a single cycle core running at a lower clock rate. The four cycle microprogram sequencer also makes it much easier to implement a 6502-compatible external memory interface.

cbscpe:

Although I have a fond regard for the PDP 11 memory management scheme, the scheme I have been exploring is focused on using the FPGA resources in a way that reduces the decode time. I don't have plans for the M65C02 cores that includes adding mode bits like that found on the PDP 11 for kernel, supervisor, and user modes. As such, any OS for the M65C02/M65C02A cores would not be able to provide the type of memory protection that the PDP 11 MMU offered.

Restricting read/write/execute on a page was not an objective primarily because I do not generally view the 6502/65C02 architecture as being suitable (for my purposes) for a general purpose operating system. Instead, I view the 6502/65C02 as ideal bare metal microprocessors/microcontrollers which in my way of looking at things should not be burdened by an OS. They may have a BIOS/Monitor maybe but not an OS. Furthermore, I am of that school of thought that an OS is to be avoided in real-time applications.

My specific focus has been to find a way to reduce the number of logic levels needed to decode the address bus. As such, the M65C02_MMU is based on a fundamental characteristic of the Spartan 3A Look-Up Tables (LUTs), namely the 4-bit address into the LUT. If this characteristic is utilized efficiently, then passing the core's address output through the MMU will only increase the combinatorial path delay in the address path in a minor way.

I have reasoned that if external devices are attached and the operational objective is to operate at the highest clock rate possible, then a partially decoded chip enable output from the FPGA will reduce the address bit width that any external decoders will be required to process. This will allow the maximum clock rate to be used while allowing one or two levels of external decode logic.

For example, I have blocked the address into 16 4kB pages. Each MMU register currently has 4 bits that define the number of base wait states for that page; 8 bits to define 8 chip enables that may be used to select internal or external devices; and 8 bits to extend the address space from 4 kB to 1M. These address extension bits, when coupled with the chip enables, define the potential address space to be as high as 8 MB. However, I am currently using one of the chip enables to define internal versus external memory in order to reduce the amount of logic needed to multiplex between the external and internal memory or I/O devices. I use four of the chip enables in my M65C02/M16C5x Development Board for external devices, and the remaining 3 chip enables for internal memory and on-chip I/O devices. (The additional decode logic required to avoid loss of address space to I/O devices generally requires the addition of a wait state to compensate for the additional path delays introduced by the I/O device decode logic.)

It is also possible to cascade the MMUs. The second level MMUs can be simpler in that they only generates additional chip enables. The additional combinatorial path delay in the address output path due to cascaded MMUs will reduce the overall operating speed attainable for any specific FPGA family. Unlike the Spartan 3A, the Spartan 6 FPGA family has 6 bit LUTs. This feature of the Spartan 6 LUTs allows the address space of the M65C02/M65C02A cores to be more finely paged in 1kB pages. Further, a two level decode allows the address space to be resolved to 16 locations. In contrast, to resolve the address space to this level in a Spartan 3A requires at least three decode levels.

Dr Jefyll has suggested on another thread that I/O be mapped to zero page. There is a lot of merit to his suggestion. The M65C02/M65C02A cores are capable of being programmed to indicate that particular instructions are being executed, or if zero page or the stack page is being accessed. I currently have only 3 bits defined in the microcode for this purpose. My current definitions focus on defining INV (invalid or undefined) opcodes, VAL (valid) opcodes, COP instruction, XCE instruction, MMU instruction, BRK instruction, STP instruction, and WAI instruction. Instead of these definitions, I can assign one of the instruction mode codes to indicate whether ZP is being accessed. In which case a dedicated zero page I/O decode unit can easily generate, in one level of logic, the segmentation of the data page to 16 bytes.

Posted: **Tue Jul 01, 2014 2:53 pm**

MichaelM wrote:

... these cores decode instructions whose high and low nibbles have been swapped...

Hi Michael
Thanks for your detailed responses! I can't guess what you mean by the comment about nibble swapping, and I confess I haven't had a look at your code. (The only thing I can think of is that you've found it useful to concatenate one or more bits from the top of the byte with one or more from the bottom.) Please enlighten me, if you can!

(For my own part, I'm happy with just 8 bits of opcode. The idea of making a 32-bit machine with single-word fetches comprising an 8-bit opcode with an optional 24-bit operand is presently more appealing to me than the idea of enlarging the register set and using more bits for the opcode, although that's obviously a valid thing to do. When all addresses are a single fetch in size, a few opcodes become freed up anyway because zero page and absolute are the same thing.)

Cheers
Ed

Posted: **Tue Jul 01, 2014 9:25 pm**

Ed:

Below is the first row (LSN == 0x0) of the decode table:

Code: Select all

---------------------------------------------------------------------------------- 
Row 0 : 0x00-0xF0 (All Bcc/JMP/JSR/RTS/RTI implemented as uninterruptable)
--  I   BA, Wt, En, NA, PC, IO, DI, SP, Reg_WE, ISR
--------------------------------------------------------------------------------
_BRK_imm:    BRV2    _Brk,0,1, Stk,, WR, PCH, Psh, WE_P      -- Start Break Handler
_BPL_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_JSR_abs:    BRV0    _JSR,0,1,, Pls, IF, OP1                 -- Read Dst Ptr Lo
_BMI_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_RTI_imp:    BRV0    _RTI,0,1, Stk,, RD, OP1, Pop            -- Read PSW from Stack
_BVC_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_RTS_imp:    BRV0    _RTS,0,1, Stk,, RD, OP1, Pop            -- Read PCL from Stack
_BVS_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_BRA_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_BCC_rel     BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_LDY_imm:    BMW     _Imm,0,1,, Pls, IF, OP1                 -- Read #imm Value
_BCS_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_CPY_imm:    BMW     _Imm,0,1,, Pls, IF, OP1                 -- Read #imm Value
_BNE_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value
_CPX_imm:    BMW     _Imm,0,1,, Pls, IF, OP1                 -- Read #imm Value
_BEQ_rel:    BRV0    _Rel,0,1,, Rel, IF, OP1                 -- Read rel Value

and below is the opcode decode table for the fifth row (LSN = 0x5):

Code: Select all

--------------------------------------------------------------------------------
-- Row 5 : 0x05-0xF5--  I   BA, Wt, En, NA, PC, IO, DI, SP, Reg_WE, ISR
--------------------------------------------------------------------------------
_ORA_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_ORA_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_AND_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_AND_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_EOR_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_EOR_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_ADC_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_ADC_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_STA_dp:     BRV0    _WO_DP,0,1,, Pls, IF, OP1               -- Read DP
_STA_dpX:    BRV0    _WO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_LDA_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_LDA_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_CMP_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_CMP_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP
_SBC_dp:     BRV0    _RO_DP,0,1,, Pls, IF, OP1               -- Read DP
_SBC_dpX:    BRV0    _RO_DPX,0,1,, Pls, IF, OP1              -- Read DP

What you should notice is that when the decode table is organized in this fashion the opcodes are more closely related. This makes it much easier to create and verify the microcode. The first example defines most of the program control opcodes, and the second example defines the opcodes for zero page direct and zero page pre-indexed direct instructions. There are some inconsistencies, but the vast majority of the opcodes are related by type or by addressing mode when the opcodes are indexed by the MSN first and then the LSN. This means that the LSN and MSN are swapped.

Edit: deleted "the" from "defines the most" in the third sentence of last paragraph.

6502.org

M65C02A Core

M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core

Re: M65C02A Core