Dajgoro:
Andre Fachat makes a good suggestion regarding the inverter before the ALU to implement subtraction.
I was in the process of writing a missive on the CPLD architecture, and lost it. I won't go back and reconstruct it, but I do suggest a thorough familiarization with the architecture of a CPLD. It is best thought of as a set of macrocells consisting of an AND array feeding an OR gate which feeds into a configurable FF. The macrocells generally consist of 18 cells. There are several advantages to the CPLD over a RAM-based FPGA, but the generally accepted distinguishing feature is the ability of a CPLD to implement wider logic function in fewer logic levels. Ignoring the fact that most FPGA logic cells have sub-nanosecond performance, fewer logic levels will result in less delay.
The disadvantage that a CPLD has with respect to an FPGA is the lack of FFs, and routing resources. Vendors have worked hard to include routing resources into CPLD architectures, and in most cases there are no constraints in pin placement. However, the basic architecture of a CPLD logic cell is still that of the 22V10 PAL first introduced by Monolithic Memories and AMD in the early 80s. That means that the CPLD places its emphasis on AND-OR array in order to implement wide logic functions in two levels of logic, which a RAM-based FPGA may require as many as 10-20 levels to implement.
That advantage is not as significant as what you'd like particularly when implementing counters, ALUs, and bidirectional busses. Consider the number of terms required by each bit in a simple counter. The first bit is a simple toggle of the output, i.e. an inverter. The second bit is bit more complicated but it is at least two inputs wide. As the counter grows, the number of inputs to each stage grows and is equal to its bit number. Simple functions don't make effective use of the AND-OR array, which can't generally be shared with functions requiring more OR terms. Complex functions such as counters greater than 8 bits in width require more OR terms than are generally available, and so must be expanded in some manner in order that they may be correctly implemented.
A CPLD generally has an OR gate that allows it to accept 8 AND terms. In terms of your minimal CPU, this means that you need to ensure that your logic functions are 8 (or whatever the architecture provides) terms wide or less. If not, the synthesizer will require the use of terms imported from adjacent cells, which will generally limit the logic function that that cell can implement.
I am not a particular fan of either VHDL or Verilog (better) because they tend to hide the architecture from the designer. Generally this allows the designer to focus on the more important architectural elements of the design. However, I've yet to work with a synthesizer that informs the designer of the effects his/her code has with respect to the underlying architecture.
I misunderstood the thrust of your earlier post. The block diagram that you posted earlier clears up your objectives very well. The issue I see for you is that you have an exrernal bidirectional bus that you want to implement. Its been my past experience with the 95144, 95108, and 9536 parts that you have to keep the number of (internal/external) busses down to a minimum. The intra-macrocell routing resources are at a premium in these parts. Within a macrocell, the routing resources is quite extensive, and you can readily implement any functions that stays with the OR widths limits and the cell limits of the macrocell, i.e. 8 and 18, respectively. (In the 9572XL, there are 54 inputs into each macrocell of 18 cells, and 18 outputs from the macrocell into the fast interconnect matrix.)
Thus, from your block diagram, I think that you will need to try and limit the ALU to fit into a single macrocell, or 18 logic cells. Similarly, the register bank should fit into a macrocell. The AR and PC will need to be implemented in the remaining macrocells. The register bank, 4x4 or 16 lgic cells, should be fairly easy to constrain to fit as suggested. It appears that you want the PC and AR to be 8 bit registers. If my count is correct, you expect require 16 registers (logic cells) for the register array, and 16 registers for the PC/AR combination. This leaves approximately 36 cells (two macrocells) for the ALU and the sequencer. I am assuming that the 4 cells in the other macrocells will not be available.
The key to reducing the number of cells required for the ALU is to minimize the functions you require. For example, an ADD instruction is vital, but subtraction is not because it can be accomplished by addition with the complement of one input. In addition, AND and OR functions are vital but NAND and NOT are not. Instead, I would recommend XOR instead.
Thus, I recommend that you reconsider your instruction set with the idea of minimizing the ALU to fit within 8-12 cells: four cells for ADD; four cells for AND, OR, and XOR; and four cells for ASR, ROL. After this is achieved, I would then consider simply adding an increment (INC) instruction for PC management.
Following that effort, I would look to defining the load and store instructions. It is my recommendation that you have two loads: Load from memory (LD), and load constant (LDC). It is my recommendation that you have a store (ST) instruction.
I see where you desire to have a general register based architecture, which is laudable. But for such a minimal CPU, I recommend that you reconsider an accumulator based architecture. The 65C02 architecture is a good example. If there were exchange instructions instead of transfer instructions, then the architecture would be more flexible. The problem with your register based architecture in the small space of the 9572 is the number of logic cells that will be needed to construct the nuimber of multiplexers required for the two operand architecture that your block diagram implies. The implicit address of the accumulator means that only a single multiplexer is generally required on the ALU inputs since the accumulator is always one of the inputs.
Furthermore, I did not see a subroutine call and return mechanism. Without it, I can't think that you will be satisfied with your CPU. (Even the microprogram controller that I used in implementing MAM65C02 core (see github) implements a subroutine/return mechanism, even though the MAM65C02 microcode did not make use of that capability. The point is that subroutines are the key feature of any useful computer/controller.) If you opt for an accumulator based architecture, then I recommend reducing the register bank from 4 registers to 2 registers, and using the 8 cells gained to provide at least a single subroutine level.
With all of that said, the sequencer may be a significant challenge to get to fit into the remaining 24 logic cells, and I've not even accounted for the PSW that will be more than likely necessary to allow you to extend your machine to support multi-word arithmetic operations. (One thought would be to create a BCD mode adder since carry would be matched to the ALU operand size of your machine.)
Andre's suggestion of an implementation in an XC95108 would be worth considering. As I stated above, you need to keep the number of busses to a minimum in order to successfully implement the processor in such a small CPLD. It is a good challenge, and I would recommend its completion because it would certainly hone your HDL skills and give you a high degree of satisfaction upon its completion.
_________________ Michael A.
|