Hi all.
I'm new to the forum, but have been lurking for some time, and this is one of my favorite threads, because it is so thought-provoking. I decided to just pull the trigger and post it back
near the top ... I hope that I don't upset anyone by doing so. I know that there are forums out there that frown on this, so I'm taking a chance ...
The idea of designing a more capable work-alike to the 6502 family is very appealing to me, if only from a "mental exercise" angle. My background is not in operating system design or hardware, just simple programs and algorithms as a hobby. The following excerpts are from an incomplete specification document that has gone through several revisions without actually being completed (kind of the story of my life).
My question to the group: Could I be on to something here, or am I barking up the wrong tree?
Code:
Proposed instruction set for the 65m32, by barrym95838.
I started with my understanding of Garth's proposal, and took it as far as I could, short of
writing an actual simulator.
I borrowed ideas from the pdp-11, Nova, 6809, and most of all, the 6502.
Proposed additions, simplifications, and/or criticisms are welcome.
Instruction bit format: oooo ooaa ffff rrri iiii iiii iiii iiii
15 bits specify the operation, and 17 bits provide an 'inherent' constant that can be used to
encode -65536 to 65535 without using a second word.
Addressing modes:
rrr = 0 1 2 3 4 5 6 7
aa =0 #,a #,b #,x #,y #,z #,u #,s #,n
=1 $,a $,b $,x $,y $,z $,u $,s $,n
=2 $,a+ $,b+ $,x+ $,y+ $,z+ $,u+ $,s+ $,n+
=3 $,-a $,-b $,-x $,-y $,-z $,-u $,-s $,-n
There are eight registers, plus p. n is PC. z is permanently hardwired to zero, meaning that
'$,z' '$,z+' and '$,-z' are all equivalent. # and $ come from ...i iiii iiii iiii iiii (the
17-bit twos-complement number is extended to a full 32 bits before the operand calculation).
An example opcode matrix:
Code:
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf
+-----------------------------------------------------------------------------------------------
0x | ora ora ora ora and and and and eor eor eor eor bit bit bit bit
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
1x | adc adc adc adc add add add add sub sub sub sub sbc sbc sbc sbc
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
2x | mul mul mul mul div div div div mod mod mod mod ??? ??? ??? ???
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
3x | ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ???
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
4x | lda lda lda lda ldb ldb ldb ldb ldx ldx ldx ldx ldy ldy ldy ldy
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
5x | ldz ldz ldz ldz ldu ldu ldu ldu lds lds lds lds ldn ldn ldn ldn
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
6x | hla hla hla hla hlb hlb hlb hlb hlx hlx hlx hlx hly hly hly hly
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
7x | hlz hlz hlz hlz hlu hlu hlu hlu brk brk brk brk hln hln hln hln
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
8x | sta sta sta sta stb stb stb stb stx stx stx stx sty sty sty sty
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
9x | stz stz stz stz stu stu stu stu sts sts sts sts stn stn stn stn
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
ax | asl asl asl asl rol rol rol rol ror ror ror ror lsr lsr lsr lsr
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
bx | exa exa exa exa ??? ??? ??? ??? dec dec dec dec inc inc inc inc
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
cx | cmp cmp cmp cmp cpb cpb cpb cpb cpx cpx cpx cpx cpy cpy cpy cpy
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
dx | cpz cpz cpz cpz cpu cpu cpu cpu cps cps cps cps psh psh psh psh
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
ex | ??? ??? ??? ??? adb adb adb adb rep rep rep rep sep sep sep sep
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
|-----------------------------------------------------------------------------------------------
fx | ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ???
| #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r #,r $,r $,r+ $,-r
+-----------------------------------------------------------------------------------------------
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf
Some simple instruction translations:
Code:
------------- Examples that translate with minimal effort -------------------
--------- 65816 ---------------- --------- 65m32 ------------
:a90080 lda #32768 :40088000 lda #32768 (*)
:a200f0 ldx #-4096 :4809f000 ldx #-4096 (*)
:ac9a78 ldy $789a :4d08789a ldy $789a (*)
:e8 inx :48040001 ldx #1,x
:88 dey :4c07ffff ldy #-1,y
:48 pha :830c0000 sta ,-s
:68 pla :420c0000 lda ,s+
:60 rts :5e0c0000 ldn ,s+
:4c5476 jmp $7654 :5c087654 ldn #$7654 (*)
:6c5713 jmp ($1357) :5d081357 ldn $1357 (*)
:8a txa :40040000 lda #,x
:205634 jsr $3456 :7c083456 hln #$3456 (*)
:d00d bne .+15 :5c6e000e ldn [ne],#14,n
:10ce bpl .-48 :5cafffcf ldn [pl],#-49,n
:5c563412 jml $123456 :5e0e0000 ldn ,n+
:00123456 .dw $123456
:22658709 jsl $98765 :7e0e0000 hln ,n+
:00098765 .dw $98765
:fcefcd jsr ($cdef,x) :7d04cdef hln $cdef,x
(*): There is a hidden ",z" within this instruction; this is assumed if
no other index register is specified in the operand field.
Notes: Binary code density obviously favors the 65816, but memory cycle
counts obviously favor the wider-bus 65m32.
These 65m32 assembly examples show the nuts-and-bolts for the sake of
illustration, but an assembler could easily allow macros and/or aliases
with more familiar mnemonics -- 'ldn ,s+' <-> 'rts', 'ldn [ne],# ...'
<-> 'bne ...', etc.
I am in the process of hand-translating FIGFORTH, and it looks like I'm able to do pretty much whatever I need to do with about 1/4 of the instructions required by the NMOS 6502, with the added benefit (?) that everything has been promoted to 32 bits.
What I believe is the true key to the 65m32's efficiency and ease of use is NOT its instruction repertoire, which is rather ordinary with few exceptions, but its flexible operand structure. Once one fully understands how this structure works, programming with it becomes natural and simple (at least for me). The way that it works is as follows:
ANY register except for the processor status register can be used as an index, including the accumulator and the instruction pointer.
There are two families of operand modes, immediate and absolute. The immediate mode is indicated by a leading
# in the operand field, and means that the operand value is to be used at 'face-value'. The immediate value isn't just a static entity, though, because it is (with few exceptions) added to the contents of the specified index register (identified with a leading comma) before use.
#1,x is always equal to the contents of register
x, plus
1. To make the assembly language easier to type, I have specified that either the numeric part
or the register name (but not both) can be omitted. A missing numeric is assumed to be
0, and a missing register name is assumed to be
,z.
There are three absolute modes; they are indicated by the absence of a leading
# in the operand field, and always imply an additional memory access (read, write, or read-modify-write). This is because the operand value (which is calculated in the same manner as it is for immediate mode) is used as a pointer to main memory. Automatic post-increment and pre-decrement options for the indicated index register should be self-explanatory.
The 65m32 is 32-bits all-the-way, and technically EVERY instruction is a single word. Of course, most instructions require operand data to specify an immediate value or an address, and it is impossible to fit a 32-bit operand and an op-code into 32-bits.
One way that the 65m32 gets around the problem is by promoting an embedded 17-bit numeric operand to 32-bits before using it, by duplicating bit 16 in bits 17 to 31 before adding it to the index. But that only works most of the time, depending on what you're doing with the operand. -65536 ... 65535 is a respectable range that can be used for small increments, constants and offsets, but doesn't enable the 65m32's full potential.
The other way that the 65m32 gets around the problem is by treating the instruction pointer as just another index register. This allows in-lining a full 32-bit operand immediately after the instruction, and loading it using the instruction pointer in absolute addressing mode, with auto-increment (so the next op-code after the operand is executed next). The PDP-11 does this, and I think that it's quite elegant. When composing small (<64kW) programs, this technique is typically only needed for large constants, like bit-masks and such, since the inherent 17-bit constant provides plenty of reach for relative branch targets, increments, initializations, and more. While translating FigForth from 6502 to 65m32, I have so far only found two occasions in hundreds of instructions where this 'long-immediate' technique is necessary, and they were only necessary because of the four-char-per-word dictionary name storage convention that I've implemented.
Before I spend too many more hours on this, it would help me to know what the more experienced readers think about the design. Does it still have some of that 6502 flavor, or is it too polluted from its other influences to deserve to be called something like 65m32? I am a fully-grown man, and I can take the negative opinions with the positives, so I would like to ask that you don't pull any punches if you have a reasoned argument as to why it might suck in some way or another. I promise that I won't get butt-hurt and run away.
Sincerely,
Mike