Hi Everyone!
Finally, after 4 Billion Years i have returned with an (atleast somewhat) functional 65CE02 Softcore.
The reason i'm only posting about this now is because i wanted to have the Project at a point where i can throw the CPU on a real FPGA and have it actually run code.
Using my Altera DE2 Dev Board (Cyclone II) i'm able to run this 2700 Logic Element beast of a CPU at 25MHz using a simple Incrementing Program (both ROM and RAM are internal for testing purposes)
from what i know this is basically the second 65CE02 based Softcore ever made, right after the GS4510 used in the Mega65.
I personally don't like the way the GS4510 adds new instructions, IIRC it does it by having sequences of existing instructions activate some extended function, kinda like a cheat code in a game.
in this core I went with the Z80 way of adding Instructions by using a single opcode (AUG/MAP) and using that as a prefix byte to access a seperate opcode table with 256 potential instructions. (only 52 are actually used right now)
this means that as long as a program stays away from the AUG Opcode there is no chance to accidentally execute an extended Instruction.
Anyways, time to talk about the actual CPU itself.
I want to start with Pins.while i was procrastinating... i mean working on this core i've looked a bit into the 65816 as well. and i really liked it's VPA, VDA, and Abort pins.
so i've added them to this core with some additional V?A pins!
Code:
Name Full Name Direction
VPA = Valid Program Address Output
VDA = Valid Data Address Output
VSA = Valid Stack Address Output
VBA = Valid Base Page Address Output
VVA = Valid Vector Address Output
ABT = Abort Instruction Input
Function:
VVA VBA VSA VDA VPA
0 0 0 0 0 Internal Operation
0 0 0 0 1 Opcode Fetch
0 0 0 1 1 Operand Fetch
0 0 0 1 0 Normal Data Read/Write
0 1 0 1 0 Base Page Read/Write
0 0 1 1 0 Stack Read/Write
1 0 1 1 0 RESET/ABT/NMI/IRQ/BRK Stack Write or RTI Stack Read
1 0 0 1 0 Vector Read
I have also added the MLB pin from the 65C02, though VBP was not added as it's basically just an inverted version of VVA in this case.
I'm hoping that this will allow for some rather complex Memory Systems, for example you could give the Stack it's own 64k of RAM and have it completely seperate from regular Memory. not only would that give you more stack space than you would ever need but it also prevents the stack from accidentally overwriting your regular non-Stack Memory.
and of course there is also the ABT Pin, with one major difference to it's 65816 cousin: it doesn't wait for the current instruction to finish, after an Abort is received the CPU will start the Interrupt Sequence after the next falling egde of PHI2, obviously pulling ABT low right before the falling egde of PHI2 will likely cause problems, so i'd recommend pulling it low on the rising egde of PHI2 and then pulling it high again at the next rising edge.
and the Abort vector is also loacted at 0xFFF8 and 0xFFF9 like on the 65816 in Emulation Mode.
Now let's talk about Instructions!all of the 65CE02's Instructions are implemented, and most of the cycle times are the same as the original (since it already removes most dummy cycles) but i was still able to speed up some instructions, the following list shows all instructions i was able to speed up by a single cycle, original cycle count is next to the Opcode in brackets:
PHA (3),
PHX (3),
PHY (3),
PHZ (3),
PHP (3),
PLA (3),
PLX (3),
PLY (3),
PLZ (3),
PLP (3),
CLE (2),
SEE (2),
SEI (2),
ASR A (2),
NEG A (2),
BBR (4),
BBS (4),
BRK (7),
RTS (4),
RTI (5),
LDA (d.SP),Y (6),
STA (d.SP),Y (6)RTN # was sped up from 7 to 4 cycles, which kinda scares me because it makes me think i implemeted the instruction incorrectly if i was able to shave off so many cycles.
but from what i can tell it's literally just an
RTS Instruction with an Immediate value used as an Unsigned offset into the Stack...
Now to the new Instructions!here the whole Opcode table:
Attachment:
soffice.bin_2021-06-13_12-09-59.png [ 714.76 KiB | Viewed 14867 times ]
I tried to place the Instructions in such a way that they match existing instructions. so for example taking
ORA and slapping
AUG infront of it turns it into
MUL (assuming the addressing mode used also exists for the extended Instruction)
and here the full List of New Instructions (note that both Bytes and Cycles have a +1 to them because of the Prefix Opcode being executed before the actual extended Instruction)
Code:
MUL/MLL - "Multiply"/"Multiply Low" Multiplies A by a value from Memory, result (low byte) stored in A. (the Carry being 1 indicates that the High byte (aka MLH result) is not equal to 0)
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 09 Immediate MUL #nn 2+1 2+1
5C 05 Base Page MUL nn 2+1 3+1
5C 15 Base Page X MUL nn,X 2+1 3+1
5C 0D Absolute MUL nnnn 3+1 4+1
5C 1D Absolute X MUL nnnn,X 3+1 4+1
5C 19 Absolute Y MUL nnnn,Y 3+1 4+1
MLH - "Multiply High" Multiplies A by a value from Memory, result (high byte) stored in A (the Carry being 1 indicates that the Low byte (aka MLL result) is not equal to 0)
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 29 Immediate MLH #nn 2+1 2+1
5C 25 Base Page MLH nn 2+1 3+1
5C 35 Base Page X MLH nn,X 2+1 3+1
5C 2D Absolute MLH nnnn 3+1 4+1
5C 3D Absolute X MLH nnnn,X 3+1 4+1
5C 39 Absolute Y MLH nnnn,Y 3+1 4+1
MOD - Divides a value from Memory by A, remainder stored in A. (the Carry being 1 indicates that the High byte (aka DIV result) is not equal to 0)
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 49 Immediate DIV #nn 2+1 2+1+?
5C 45 Base Page DIV nn 2+1 3+1+?
5C 55 Base Page X DIV nn,X 2+1 3+1+?
5C 4D Absolute DIV nnnn 3+1 4+1+?
5C 5D Absolute X DIV nnnn,X 3+1 4+1+?
5C 59 Absolute Y MUL nnnn,Y 3+1 4+1+?
DIV - Divides a value from Memory by A, result stored in A (the Carry being 1 indicates that the Low byte (aka MOD result) is not equal to 0)
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 69 Immediate DIV #nn 2+1 2+1+?
5C 65 Base Page DIV nn 2+1 3+1+?
5C 75 Base Page X DIV nn,X 2+1 3+1+?
5C 6D Absolute DIV nnnn 3+1 4+1+?
5C 7D Absolute X DIV nnnn,X 3+1 4+1+?
5C 79 Absolute Y MUL nnnn,Y 3+1 4+1+?
LWR - "Logic shift Word Right", Shifts all bits 1 to the right, 0 into bit 15, and bit 0 into Carry
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C CA Absolute LRW nnnn 3+1 7+1
AWR - "Arithmetic shift Word Right", Shifts all bits 1 to the right, bit 15 into bit 14, and bit 0 into Carry
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C CB Absolute ARW nnnn 3+1 7+1
RWR - "Rorate Word Right", Shifts all bits 1 to the right, Carry into bit 15, and bit 0 into Carry
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C EB Absolute RRW nnnn 3+1 7+1
ICC - "Increment with Carry", Adds 0 + C to a Value in Memory
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C E6 Base Page ICC nn 2+1 4+1
5C F6 Base Page X ICC nn,X 2+1 4+1
5C EE Absolute ICC nnnn 3+1 5+1
5C FE Absolute X ICC nnnn,X 3+1 5+1
DCC - "Decrement with Carry", Subtracts 1 + C from a Value in Memory
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C C6 Base Page DCC nn 2+1 4+1
5C D6 Base Page X DCC nn,X 2+1 4+1
5C CE Absolute DCC nnnn 3+1 5+1
5C DE Absolute X DCC nnnn,X 3+1 5+1
SWP - "Swap Nibble", Swaps the High and Low Nibble of the A Register.
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 4B Implied SWN 1+1 1+1
SXY - "Swap X and Y", Swaps the contents of the X and Y Register.
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 5B Implied SXY 1+1 2+1
SXZ - "Swap X and Z", Swaps the contents of the X and Z Register.
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 6B Implied SXZ 1+1 2+1
SYZ - "Swap Y and Z", Swaps the contents of the Y and Z Register.
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 7B Implied SYZ 1+1 2+1
LDA - Load Memory into Accumulator
N V S B D I Z C
+ - - - - - + -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C E2 Stack Y LDA SP,Y 1+1 1+2
5C E1 Double Indirect LDA (nn,X),Y 1+2 1+5
5C F1 PC Relative LDA PC,XY 1+1 1+2
STA - Store Accumulator into Memory
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 82 Stack Y STA SP,Y 1+1 1+2
5C 81 Double Indirect STA (nn,X),Y 1+2 1+5
5C 91 PC Relative STA PC,XY 1+1 1+2
CHB - "Convert Hex to BCD", Converts a Binary Value 0x00 - 0x63 to a BCD Value 0x00 - 0x99
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C E8 Implied CHB 1+1 1+1
CBH - "Convert BCD to Hex", Reverse of CBH
N V S B D I Z C
+ - - - - - + +
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C C8 Implied CBH 1+1 1+1
RND - "Random", Loads a Byte from the LFSR into the Accumulator
N V S B D I Z C
+ - - - - - + -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 3B Implied RND 1+1 1+1
PHR - "Push Registers", Pushes the Z, Y, X, B, A Registers onto the Stack in that Order
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 48 Implied PHR 1+1 6+1
PLR - "Pull Registers", Pulls the A, B, X, Y, Z Registers from the Stack in that Order
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C 68 Implied PLR 1+1 6+1
WAI - "Wait", Halts the CPU until an Interrupt (IRQ (if enabled), NMI, ABT, RESET) occurs
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C EF Implied WAI 1+1 ∞
STP - "Stop", Halts the CPU until an Interrupt (ABT, RESET) occurs
N V S B D I Z C
- - - - - - - -
Opcode Addressing Mnemonic Bytes Cyles
-----------------------------------------------------
5C FF Implied STP 1+1 ∞
now a bit more detail about the Instructions.MUL,
MLH,
MOD, and
DIV are pretty straight forward, their placement in the Opcode table matches
ORA,
AND,
XOR, and
ADC in that order.
though the actual Division circuitry is not really ready yet (which is why the Cycles have a +? next to them). i'll likely end up making a seperate state machine that does a simple subtraction loop.
it won't be fast but still faster and more compact than doing it in software.
LWR,
AWR, and
RRW, not much to say either, they are just counterparts to the existing
ASW/AWL, and
ROW/RLW, and are therefore also placed in the same opcodes (exception being
LWR as it has no counterpart)
ICC, and
DCC, Interesting Instructions that allow you to Increment/Decrement multi-byte long values in Memory without using Branches or long
LDA/ADC/STA functions. note that like
ADC/SBC they are effected by the D Flag. their placement in the Opcode table match
INC and
DECSWP,
SXY,
SXZ, and
SYZ, again pretty straight forward. their placement in the Opcode table matches
TAZ,
TAB,
TZA, and
TBA in that order.
also note that none of them update any flags.
LDA/STA SP,Y,
LDA/STA (bp,X),Y, and
LDA/STA PC,XY, some crazy looking Addressing Modes. the
LDA Instructions don't directly line up with existing
LDA Instructions.
SP,Y is the simpliest one, it Accesses Memory at the location of the current Stack Pointer with Y added as an unsigned offset. note that the Stack Pointer is not modifed during this and that this counts as a Stack Read/Write Operation so VSA will be pulled high.
(bp,X),Y or "Double Indirect" is just both of the regular X/Y Indirect Addressing Modes mashed together. i've heard on here this is useful to do stuff like accessing an array of data from a table of pointers, all in a single instruction.
PC,XY the weirdest Mode, it Accesses Memory at the location of the Program Counter (ie opcode of the next instruction) with the 16 bit XY pair being added as an Offset. (X = High Byte, Y = Low Byte)
CHB, and
CBH, allow you to convert back and forth between Binary and BCD, Carry gets set if the input was invalid
RND, not much to explain either, the CPU has a built-in LSFR that is ALWAYS running as long as there is a Clock signal connected to the CPU, it completely ignores anything else going on the CPU like Resets, RDY being pulled low, etc.
it's not really intended to be used as an RNG itself, but it can be useful to give a seed for an actual RNG Function.
PHR, and
PLR, not much to say i just thought these would be useful for Interrupt Routines or debugging something.
WAI, and
STP, both of these function similarly to their 65C02 versions. noteable difference:
WAI doesn't continue execution after itself if it received an IRQ while the I Flag is set, instead it will just ignore it.
also because my Aborts actually cancel the current instruction they can be used to escape both a
WAI, and
STPand finally, the actual Files.EDIT: I uploaded everything Important to a Github Repo, everything can be downloaded from there:
https://github.com/ProxyPlayerHD/65CE02-Softcorekeep in mind that i'm horrible at writing documentation so a lot of the files just kinda exist without explanation.
and also note that there could still be bugs hiding inside the CPU and i will take some time to iron those out (specifically Interrupts and edge cases).
and my work flow still consists of just using a Logic Simulator to generate Verilog code for me instead of writing it myself.
as for the actual useable Verilog Files, they are in the "Digital" folder "CE_M_CPU_TOP.v" is just the CPU with a bidirectional Data Bus. while "CE_M_CPU.v" is the same but with a seperate Data Input and Output (for when the CPU is not the Top Module).
on my Cyclone II the entire CPU takes up around 2700 Logic Elements. which is pretty massive...
I plan on actually building a Computer around this CPU and thought to use an iCE40HX 4K FPGA, with around 3500 Logic Elements it should be enough for the CPU, a VGA Controller, and some extra Logic.
problem is, the Lattice Synthesis Engine doesn't like the CPU and just crashes when trying to synthesize it. it works fine when i use other circuits that i've made with the Logic Simulator. and i can also synthesize each individual part of the CPU (ALU, Control Unit, Registers) fine, but when they are all together it just won't work. it gives me a cryptic error message too
Code:
Done: error code -1073741819
i only found 2 matches on google and neither were really helpful.
I'll probably try to talk to the Dev who made Digital (Logic Simulator) to see if maybe he got an idea what is going on. if not i'd either have to go with another FPGA (Xilinx or Intel) or i have to contact Lattice directly and ask them to fix their outdated software... (they likely won't).
.
anyways next up i'll probably try to either make an Interface with the FPGA's external SRAM or writing a more thorough testing program that goes through all Instructions and also includes Interrupts and such.
but that is for later, right now i just want to finally share the progress i have made so far. and obviously none of this is locked down or anything. if anyone wants to rewrite the whole thing in proper Verilog or just make their own core with a similar idea, then go ahead!
While i doubt anyone is ever gonna use this core in an actual project (besides myself) i atleast hope that it inspired some people to make their own extended cores.
an extended 65816 in the style of what i have done here would be interesting. with things like a Supervisor/User Mode, 24-bit Program Counter Option, moveable stack/direct bank, etc.
then again at that point you might as well just implement a 65k softcore.
and on a little side note while working on this i also came up with some other circuits, one of which is a "Programmable Wait State Generator". I haven't seen anything similar to it on this forum so if anyone is interested i can make a new thread about that. here a summary:
https://pastebin.com/J66WgDnHanother circuit i came up with was a complete Tile or Bitmap based VGA Controller that barely fits into a single ATF1508 CPLD.
welp, that is the entire post. i don't know how to end these things lol. tell me your thoughts, ideas, things i've missed about the CPU, etc.