I've mentioned my 65020 project here a few times in the past. It's finally ready(ish) for release:
https://github.com/John-McKenna/65020The 65020 is an imaginary successor to the 6502, which might have competed with the 68000, but is (mostly) binary compatible with the 6502. It extends the width of each memory location from 8 bits to 16. That means opcodes, which are 8 bits on the 6502, are now 16 bit. Addresses, 16 bits (two locations) on the 6502, are now 32 bit. The three registers A, X, and Y have become 12: A0, A1, A2, A3, X0, X1, X2, X3, Y0, Y1, Y2, Y3. The P, SP, and PC registers are also accessible to many instructions (in particular, PC and SP can be index registers). All registers are 32 bit.
If the top 8 bits of the opcode are all 0, it will behave like the 6502 original: performing 8 bit operations on the low 8 bits of registers, and accessing the low 8 bits of each memory location. The new features are unlocked by setting the top 8 bits (the 'extension') of the opcode to other values.
Taking opcode $6D (ADC abs) as an example: On the 6502, this adds the contents of an 8 bit memory location to the 8 bit A register, along with the C flag. For this instruction the format of the opcode extension is PAAZZZDD. If P is 1, the instruction becomes ADD: add without carry. AA selects which of the four A registers is used: A0, A1, A2, or A3. ZZZ selects one of 8 index registers. Here the default is P (processor status), which always reads as 0 when used as an index. The other options are Z, SP, PC, or one of the four Y registers. Finally, DD selects the data width: 8, 16, or 32 bits. The 8 and 16 bit options read the operand from a single memory location. 32 bit reads from two.
32 bit values are stored in an unusual way. The low 8 bits of the first location form the low 8 bits of the operand. The low 8 bits of the second location are the next 8 bits of the operand. Then comes the high 8 bits of the first location, and finally the high 8 bits of the second. If the two memory locations contain AABB and CCDD, the resulting number is CCAADDBB. This is to keep compatibility with 16 bit addresses used by the 6502: if the top 8 bits of each location are 0, the result will be a 16 bit value.
The 65020 re-interprets some of the 6502's instructions to make them more general. In the above example, a non-indexed addressing mode becomes indexed, with an index of 0. Bits in the opcode extension can be used to select other index registers, giving Y, SP, and PC as possible index registers on previously non-indexed instructions.
A more drastic example of re-interpretation is the CLC, CLD, CLI, and CLV instructions. Each of these clears a different bit of the P register. The opcode extension gives each a choice of 8 bits to clear, in any of the 16 registers. Between them they form the CBT (Clear BiT) instruction, which can clear any bit of any register. CBT is also given an indexed addressing mode (with 32 bit base register) for clearing bits in memory. Finally, if the top bit of the opcode extension is set, CBT becomes TBT, which sets the Z flag according to the state of the bit. Similarly, SEC, SED, SEI, and SEV (which didn't exist on the 6502) combine to make the SBT instruction, which becomes XBT (toggle bit) if its top bit is set.
The TAX, TXA, TAY, ... instructions have gone through a similar extension, together allowing any register to be copied to any other register.
A few other instructions use the top bit of the extension to provide a different operation. LSR becomes ASR (arithmetic shift right), which duplicates the high bit instead of shifting in 0. ASL becomes ESL, which duplicates the low bit. ROR and ROL become RRB and RLB, which shift within the 8, 16, or 32 bit word (the bit shifted out still goes into the C flag, but the bit shifted in comes from the other end of the operand). All of these can shift by any number of bits, and the shift amount can be given in a register.
Branch instructions get a lot of new options. In addition to 8 new conditions (including 'always'), they can select a short (16 bit) or long (32 bit) offset, any register as the base (including P, which is read as 0), a 'link' bit which tells it to push the current PC before branching (giving conditional JSR instructions), and finally an 'indirect' bit.
New versions of the arithmetic and logic instructions have been added which work on the X and Y registers. And there are versions that take another register as the second operand. These are small (1 memory location) and fast (1 cycle).
The implementation is in VHDL, and intended for the Spartan 6 FPGA. It uses a number of Spartan 6 primitives, and would need a bit of work to re-target to another family. No attention has been paid to speed: I wanted something that worked first. I don't even know how fast it will run, as the system it was created for is limited to 5MHz for other reasons. I have a few sketch-plans for a pipelined 65030 that would allow significantly higher clock speeds. But that's in the future.
Included is a MicrocodeBuilder which is used to generate the ROMs for 'microcode' and 'nanocode'. These names come from a very early design. 'Microcode' is more like instruction decoding, providing information about the instruction that applies on all cycles of its operation. 'Nanocode' is what is usually called microcode. It controls what happens on each cycle.
And there is the assembler. It has grown with the CPU, and isn't very pretty. Error handling is particularly weak. But it works. Labels can be anything that isn't recognised as an instruction, and can be followed by an optional :. The syntax of instructions is as given in 65020.pdf.
Expressions can contain the binary operators + (add) - (subtract) * (multiply) / (divide) % (remainder) ^ (bitwise xor) & (bitwise and) | (bitwise or) << (shift left) >> (shift right) < > <= >= == != (comparisions, producing 1 for true and 0 for false) and unary operators < (high byte - bits 8 to 15) > (low byte) - (negate) and ~ (bitwise not). The 'high byte' and 'low byte' operators are for assembling old 6502 code.
Numeric literals can be preceded with % for binary or $ for hex. Any other base can be used by putting a number before the $. For example, 8$135 = $5d = %1011101 = 93.
Directives:
.include "<filename>" include the contents of another file.
.byte val, val, "string", val, ... inserts a sequence of 16 bit values as one memory location each. Strings are inserted one character per location, without a terminating 0.
.word val, val, ... inserts a sequence of 16 bit values as two memory locations each. It's mostly for assembling old 6502 code.
.long val, val, ... inserts a sequence of 32 bit values as two memory locations each.
.space n reserves n memory locations without storing data.
.section name starts or resumes a section of output with the given name. Only sections that contain code or .byte/.word/.long data will be placed in the output binary.
.org val sets the assembly address. It can also be written * = val
.if cond
.else
.endif allow conditional assembly. These can be nested.
Sorry, no macros or local labels yet.
Assembler command line options:
-o <filename> write a binary file
-om <filename> write the output in a format suitable for Xilinx's data2mem utility
-i <path> sets the path for .include files
-l <filename> writes a listing file
-x <filename> writes a cross-reference file
I still don't have a comprehensive test suite for any of this, so there will be many bugs. But it has been used to write and run a decent amount of code. I'm finding it a pleasant CPU to use.
Here's some sample code for a 32 bit unsigned integer division routine, taken from the listing file of my main development program
Code:
1 .section code
2 ; 32 bit unsigned division
3 ; inputs:
4 ; a0: dividend
5 ; a1: divisor
6 ; outputs:
7 ; a0: quotient
8 ; a1: remainder
0000e5f3: 9 divide_32:
0000e5f3: 4248 10 psh.l a2
0000e5f4: 0244 11 psh.l x0
0000e5f5: 485c 12 eor a2, a2
0000e5f6: 00a2 0020 13 ldr x0, #32
0000e5f8: 020a 14 asl.l a0
0000e5f9: 15 divide_32_loop:
0000e5f9: 0a2a 16 rol.l a2 ; 1
0000e5fa: 46dc 17 cmp.l a2, a1 ; 1
0000e5fb: 0090 0001 18 bcc divide_32_less ; 2
0000e5fd: c6fc 19 sub.l a2, a1 ; 1
0000e5fe: 20 divide_32_less:
0000e5fe: 022a 21 rol.l a0 ; 1
0000e5ff: 00ca 22 dex ; 1
0000e600: 00d0 00f7 23 bne divide_32_loop ; 2 8/9 cycles per iteration
0000e602: aaa8 24 mov.l a1, a2
0000e603: 0264 25 pul.l x0
0000e604: 4268 26 pul.l a2
0000e605: 0060 27 rts
PSH.l A2 is the 6502 PHA instruction, with bits in the extension to say "32 bit" (the .l), and "A2 instead of A0". LDR X0, #32 is just a different way of spelling LDX #32. CMP.l A2, A1 uses a new opcode ($dc) for CMP with another register as operand.