Arlet:
Sorry to have missed your earlier post.
My objective in the core implementation was for the basic synchronous logic to operate at 100 MHz. The only way to achieve this is to implement extra adders and decrementers for the next address, i.e. program counter, and the stack pointer, and to add some pipelining. Thus, the core uses several of these additional functional units to perform several operations in parallel. As a consequence, the LUT utilization is not nearly as resource efficient as the core that you developed.
In the main core file at line 670 and 695, the left and right operands, respectively, for the next address are produced by 16:1 multiplexers. I did design/assign the operand select codes for the next address generator in a fashion that allows some logic reduction. When I first formulated these multiplexers, I created a combinatorial logic loop through the RAM. I broke that loop by requiring the operands for the next address to be sourced from a register: PC, OP2, OP1, or S (StkPtr). In parallel with the NA adder at line 720, the next PC value is computed in the always block at line 740. A multiplexer in that always block performs the operand selection and next PC computation. In contrast to NA, the next PC value may include the current input from the RAM, DI, as an operand. (DI is the memory input bus, and is used to drive the IR, OP1, and OP1 registers, and the address bus of the two microcode block RAMs during the cycle when the instruction opcode is present on the bus.)
Instruction execution is completed in a single cycle (except for BCD mode ADC/SBC instructions) during the instruction fetch cycle of the following instruction. Thus, the condition code status flag which affects the branch instruction is present on the cycle during which the branch address offset is being fetched. At line 750, the PC value is computed using the state of the selected CC flag, and is either the next sequential address (branch not taken) or the current PC + offset + 1. For RTI/RTS instructions, the next instruction's address is computed at line 749.
These additional two adder functions allow the core to trim 1 clock cycle from branches and interrupt/subroutine returns. With respect to stack operations, the stack pointer is implemented with an add/sub at the S register input, and an adder at the S register output. The adder on the output is used during stack pops to eliminate a dead cycle since the 65C02 points to the next free address location. This characteristic of the 65C02 is great for push operations, but causes pops to incur a one clock cycle penalty as the S register is pre-incremented before the stack read can be performed.
The MAR register is used to capture the NA adder output, and used as the basis for the NA during absolute, indexed absolute, and post-indexed indirect addressing. I ran into a problem with interrupt servicing.
In that case, since the core performs the NA and next PC calculations in parallel, the PC value is pointing to the next instruction instead of the last byte of the instruction being interrupted. To correct this issue without changing the PC always block, another register was added to capture the previous value of the PC whenever the PC is modified (line 762), and two in-line multiplexers were added in lines 838 and 839. These multiplexers select the PC's past value instead of the PC's current value whenever an interrupt push operation is performed as a consequence of an instruction interrupt instead of a BRK instruction.
Hope this explanation answers your question.
_________________ Michael A.
Last edited by MichaelM on Sat May 05, 2012 2:38 pm, edited 1 time in total.
|