Jeff:
Good point on the cumbersomeness and management issues with
a mode bit. I was leaning toward an escape/prefix approach like the segment override prefixes
of the x86 architecture. Your suggestion sort
of seals it in favor
of a prefix approach. So in planning for this type
of modification, I think that the prefix byte would only remain in effect until the following Rockwell instruction completed. The prefix byte would not be interruptable because it would be considered part
of the following opcode.
One thought that I had as I was writing this reply was that the prefix approach could be used to select more than just the ZP indirect addressing mode. There are enough unused opcodes that many
of the other zp addressing modes could be added to instructions such as the Rockwell instructions, TRB/TSB, BIT, etc. using different prefix bytes to represent the desired addressing mode. However, I am thinking that only
a single prefix, IND, to add indirection (if applicable) to the addressing mode
of the following instruction would really be necessary. Adding pre-indexed ZP indirect and post-indexed ZP indirect addressing may be nice but not truly "necessary".
I've not completely sorted out how I would implement such
a mechanism in the microprogram. I am thinking
of adding
a hidden bit to the PSW that only gets set when the IND prefix is executed and cleared otherwise. In the microprogram, if the IND bit
of the PSW is set,
a conditional branch to the corresponding indirect addressing mode sequence is made. The IND bit will clear automatically when the following instruction completes, i.e. when Sync asserts.
If the IND prefix is applied to an instruction that already supports indirect addressing, then it shouldn't have an effect. I see the IND prefix applying to instructions such as TRB dp/TSB dp to convert the instructions to TRB (dp)/TSB (dp). Similarly, TRB abs/TSB abs would be converted to TRB (abs)/TSB (abs). Applying the IND prefix to JSR abs should convert it to JSR (abs). (This may be
a good instruction to add to support
a FORTH interpreter).
BigEd is probably right with regard to the Rockwell instructions, at least for the SMBx/RMBx instructions. The BBSx/BBRx instructions may be more useful if the IND prefix is applied. With indirect addressing added to instructions such as BIT and TSB/TRB, using the IND prefix, the need for the Rockwell instructions may be eliminated. (If setting/clearing multiple bits in an I/O register using the
a mask in the Accumulator is not required, then the Rockwell instructions may be
of benefit. Similarly, if setting/clearing
a bit is required without otherwise affecting
a register or the PSW, then the Rockwell SMBx/RMBx instructions are useful. Otherwise their functionality could be replaced by the BIT and TSB/TRB instructions followed by an optional branch instruction.) I will probably restore the Rockwell instructions simply to provide an instruction set compatible with the WDC 65C02S microprocessor. Their functionality can certainly be emulated by the remainder
of the instruction set.
EEyE:
I have implemented the two cores to share
a single microprogram memory, and communicate using the last internal Block RAM. I have built
a prototype using
a Spartan 3A XC3S200A FPGA. I included
a simple interrupt handler, MMU,
a buffered UART on each core. I am working to share
a single SPI Master between them, but at the moment the SPI Master is assigned to only one core. One core is allocated 16 kB
of block RAM for its use, and the other is allocated 8 kB RAM for its use. The cores are expected to share
a common 2kB boot ROM, and 2kB
of DPRAM. The remaining 4kB
of block RAM are shared between the cores as dual-ported microprogram ROMs. The DPRAM allows for the transfer
of data between the cores in
a master-slave configuration.
The cores also share an external memory interface. I have had some issues getting cores to share the external memory interface. I don't plan to release the dual core implementation until I've worked out that interface.
For simplicity, I used my 4 cycle microprogram sequencer. The cores ran at ~60MHz with the M65C02 core as the basic component. I suspect that the M65C02A core will provide single cycle operation at 40 MHz+. With the 4 cycle microprogram sequencer the equivalent execution speed is less than 15 MHz. Still plenty fast enough for many applications, but slower than
a single cycle core running at
a lower clock rate. The four cycle microprogram sequencer also makes it much easier to implement
a 6502-compatible external memory interface.
cbscpe:
Although I have
a fond regard for the PDP 11 memory management scheme, the scheme I have been exploring is focused on using the FPGA resources in
a way that reduces the decode time. I don't have plans for the M65C02 cores that includes adding mode bits like that found on the PDP 11 for kernel, supervisor, and user modes. As such, any OS for the M65C02/M65C02A cores would not be able to provide the type
of memory protection that the PDP 11 MMU offered.
Restricting read/write/execute on
a page was not an objective primarily because I do not generally view the 6502/65C02 architecture as being suitable (for my purposes) for
a general purpose operating system. Instead, I view the 6502/65C02 as ideal bare metal microprocessors/microcontrollers which in my way
of looking at things should not be burdened by an OS. They may have
a BIOS/Monitor maybe but not an OS. Furthermore, I am
of that school
of thought that an OS is to be avoided in real-time applications.
My specific focus has been to find
a way to reduce the number
of logic levels needed to decode the address bus. As such, the M65C02_MMU is based on
a fundamental characteristic
of the Spartan 3A Look-Up Tables (LUTs), namely the 4-bit address into the LUT. If this characteristic is utilized efficiently, then passing the core's address output through the MMU will only increase the combinatorial path delay in the address path in
a minor way.
I have reasoned that if external devices are attached and the operational objective is to operate at the highest clock rate possible, then
a partially decoded chip enable output from the FPGA will reduce the address bit width that any external decoders will be required to process. This will allow the maximum clock rate to be used while allowing one or two levels
of external decode logic.
For example, I have blocked the address into 16 4kB pages. Each MMU register currently has 4 bits that define the number
of base wait states for that page; 8 bits to define 8 chip enables that may be used to select internal or external devices; and 8 bits to extend the address space from 4 kB to 1M. These address extension bits, when coupled with the chip enables, define the potential address space to be as high as 8 MB. However, I am currently using one
of the chip enables to define internal versus external memory in order to reduce the amount
of logic needed to multiplex between the external and internal memory or I/O devices. I use four
of the chip enables in my M65C02/M16C5x Development Board for external devices, and the remaining 3 chip enables for internal memory and on-chip I/O devices. (The additional decode logic required to avoid loss
of address space to I/O devices generally requires the addition
of a wait state to compensate for the additional path delays introduced by the I/O device decode logic.)
It is also possible to cascade the MMUs. The second level MMUs can be simpler in that they only generates additional chip enables. The additional combinatorial path delay in the address output path due to cascaded MMUs will reduce the overall operating speed attainable for any specific FPGA family. Unlike the Spartan 3A, the Spartan 6 FPGA family has 6 bit LUTs. This feature
of the Spartan 6 LUTs allows the address space
of the M65C02/M65C02A cores to be more finely paged in 1kB pages. Further,
a two level decode allows the address space to be resolved to 16 locations. In contrast, to resolve the address space to this level in
a Spartan 3A requires at least three decode levels.
Dr Jefyll has suggested on another
thread that I/O be mapped to zero page. There is
a lot
of merit to his suggestion. The M65C02/M65C02A cores are capable
of being programmed to indicate that particular instructions are being executed, or if zero page or the stack page is being accessed. I currently have only 3 bits defined in the microcode for this purpose. My current definitions focus on defining INV (invalid or undefined) opcodes, VAL (valid) opcodes, COP instruction, XCE instruction, MMU instruction, BRK instruction, STP instruction, and WAI instruction. Instead
of these definitions, I can assign one
of the instruction mode codes to indicate whether ZP is being accessed. In which case
a dedicated zero page I/O decode unit can easily generate, in one level
of logic, the segmentation
of the data page to 16 bytes.