6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Sep 19, 2024 11:23 pm

All times are UTC




Post new topic Reply to topic  [ 186 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12, 13  Next
Author Message
PostPosted: Tue Aug 06, 2013 6:29 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Hi all.

I'm new to the forum, but have been lurking for some time, and this is one of my favorite threads, because it is so thought-provoking. I decided to just pull the trigger and post it back near the top ... I hope that I don't upset anyone by doing so. I know that there are forums out there that frown on this, so I'm taking a chance ... :?

The idea of designing a more capable work-alike to the 6502 family is very appealing to me, if only from a "mental exercise" angle. My background is not in operating system design or hardware, just simple programs and algorithms as a hobby. The following excerpts are from an incomplete specification document that has gone through several revisions without actually being completed (kind of the story of my life).

My question to the group: Could I be on to something here, or am I barking up the wrong tree?

Code:
Proposed instruction set for the 65m32, by barrym95838.

I started with my understanding of Garth's proposal, and took it as far as I could, short of
  writing an actual simulator.
I borrowed ideas from the pdp-11, Nova, 6809, and most of all, the 6502.
Proposed additions, simplifications, and/or criticisms are welcome.

Instruction bit format:  oooo ooaa ffff rrri iiii iiii iiii iiii

15 bits specify the operation, and 17 bits provide an 'inherent' constant that can be used to
  encode -65536 to 65535 without using a second word.

Addressing modes:

rrr  =     0       1       2       3       4       5       6       7
aa   =0   #,a     #,b     #,x     #,y     #,z     #,u     #,s     #,n
     =1   $,a     $,b     $,x     $,y     $,z     $,u     $,s     $,n
     =2   $,a+    $,b+    $,x+    $,y+    $,z+    $,u+    $,s+    $,n+
     =3   $,-a    $,-b    $,-x    $,-y    $,-z    $,-u    $,-s    $,-n

There are eight registers, plus p.  n is PC.  z is permanently hardwired to zero, meaning that
  '$,z' '$,z+' and '$,-z' are all equivalent.  # and $ come from ...i iiii iiii iiii iiii (the
  17-bit twos-complement number is extended to a full 32 bits before the operand calculation).


An example opcode matrix:

Code:
     x0    x1    x2    x3    x4    x5    x6    x7    x8    x9    xa    xb    xc    xd    xe    xf
   +-----------------------------------------------------------------------------------------------
0x | ora   ora   ora   ora   and   and   and   and   eor   eor   eor   eor   bit   bit   bit   bit
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
1x | adc   adc   adc   adc   add   add   add   add   sub   sub   sub   sub   sbc   sbc   sbc   sbc
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
2x | mul   mul   mul   mul   div   div   div   div   mod   mod   mod   mod   ???   ???   ???   ???
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
3x | ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
4x | lda   lda   lda   lda   ldb   ldb   ldb   ldb   ldx   ldx   ldx   ldx   ldy   ldy   ldy   ldy
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
5x | ldz   ldz   ldz   ldz   ldu   ldu   ldu   ldu   lds   lds   lds   lds   ldn   ldn   ldn   ldn
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
6x | hla   hla   hla   hla   hlb   hlb   hlb   hlb   hlx   hlx   hlx   hlx   hly   hly   hly   hly
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
7x | hlz   hlz   hlz   hlz   hlu   hlu   hlu   hlu   brk   brk   brk   brk   hln   hln   hln   hln
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
8x | sta   sta   sta   sta   stb   stb   stb   stb   stx   stx   stx   stx   sty   sty   sty   sty
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
9x | stz   stz   stz   stz   stu   stu   stu   stu   sts   sts   sts   sts   stn   stn   stn   stn
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
ax | asl   asl   asl   asl   rol   rol   rol   rol   ror   ror   ror   ror   lsr   lsr   lsr   lsr
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
bx | exa   exa   exa   exa   ???   ???   ???   ???   dec   dec   dec   dec   inc   inc   inc   inc
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
cx | cmp   cmp   cmp   cmp   cpb   cpb   cpb   cpb   cpx   cpx   cpx   cpx   cpy   cpy   cpy   cpy
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
dx | cpz   cpz   cpz   cpz   cpu   cpu   cpu   cpu   cps   cps   cps   cps   psh   psh   psh   psh
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
ex | ???   ???   ???   ???   adb   adb   adb   adb   rep   rep   rep   rep   sep   sep   sep   sep
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   |-----------------------------------------------------------------------------------------------
fx | ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???   ???
   | #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r  #,r   $,r   $,r+  $,-r
   +-----------------------------------------------------------------------------------------------
     x0    x1    x2    x3    x4    x5    x6    x7    x8    x9    xa    xb    xc    xd    xe    xf


Some simple instruction translations:

Code:
------------- Examples that translate with minimal effort -------------------
--------- 65816 ----------------        --------- 65m32 ------------
:a90080           lda  #32768             :40088000         lda  #32768 (*)

:a200f0         ldx  #-4096             :4809f000         ldx  #-4096 (*)

:ac9a78         ldy  $789a              :4d08789a         ldy  $789a  (*)

:e8             inx                     :48040001         ldx  #1,x

:88             dey                     :4c07ffff         ldy  #-1,y

:48             pha                     :830c0000         sta  ,-s

:68             pla                     :420c0000         lda  ,s+

:60             rts                     :5e0c0000         ldn  ,s+

:4c5476         jmp  $7654              :5c087654         ldn  #$7654 (*)

:6c5713         jmp  ($1357)            :5d081357         ldn  $1357  (*)

:8a             txa                     :40040000         lda  #,x

:205634         jsr  $3456              :7c083456         hln  #$3456 (*)

:d00d           bne  .+15               :5c6e000e         ldn  [ne],#14,n

:10ce           bpl  .-48               :5cafffcf         ldn  [pl],#-49,n

:5c563412       jml  $123456            :5e0e0000         ldn  ,n+
                                        :00123456         .dw  $123456

:22658709       jsl  $98765             :7e0e0000         hln  ,n+
                                        :00098765         .dw  $98765

:fcefcd         jsr  ($cdef,x)          :7d04cdef         hln  $cdef,x

(*):  There is a hidden ",z" within this instruction; this is assumed if
      no other index register is specified in the operand field.

Notes:  Binary code density obviously favors the 65816, but memory cycle
  counts obviously favor the wider-bus 65m32.
  These 65m32 assembly examples show the nuts-and-bolts for the sake of
  illustration, but an assembler could easily allow macros and/or aliases
  with more familiar mnemonics -- 'ldn  ,s+' <-> 'rts', 'ldn  [ne],# ...'
  <-> 'bne ...', etc.


I am in the process of hand-translating FIGFORTH, and it looks like I'm able to do pretty much whatever I need to do with about 1/4 of the instructions required by the NMOS 6502, with the added benefit (?) that everything has been promoted to 32 bits.

What I believe is the true key to the 65m32's efficiency and ease of use is NOT its instruction repertoire, which is rather ordinary with few exceptions, but its flexible operand structure. Once one fully understands how this structure works, programming with it becomes natural and simple (at least for me). The way that it works is as follows:

ANY register except for the processor status register can be used as an index, including the accumulator and the instruction pointer.

There are two families of operand modes, immediate and absolute. The immediate mode is indicated by a leading # in the operand field, and means that the operand value is to be used at 'face-value'. The immediate value isn't just a static entity, though, because it is (with few exceptions) added to the contents of the specified index register (identified with a leading comma) before use. #1,x is always equal to the contents of register x, plus 1. To make the assembly language easier to type, I have specified that either the numeric part or the register name (but not both) can be omitted. A missing numeric is assumed to be 0, and a missing register name is assumed to be ,z.

There are three absolute modes; they are indicated by the absence of a leading # in the operand field, and always imply an additional memory access (read, write, or read-modify-write). This is because the operand value (which is calculated in the same manner as it is for immediate mode) is used as a pointer to main memory. Automatic post-increment and pre-decrement options for the indicated index register should be self-explanatory.

The 65m32 is 32-bits all-the-way, and technically EVERY instruction is a single word. Of course, most instructions require operand data to specify an immediate value or an address, and it is impossible to fit a 32-bit operand and an op-code into 32-bits.

One way that the 65m32 gets around the problem is by promoting an embedded 17-bit numeric operand to 32-bits before using it, by duplicating bit 16 in bits 17 to 31 before adding it to the index. But that only works most of the time, depending on what you're doing with the operand. -65536 ... 65535 is a respectable range that can be used for small increments, constants and offsets, but doesn't enable the 65m32's full potential.

The other way that the 65m32 gets around the problem is by treating the instruction pointer as just another index register. This allows in-lining a full 32-bit operand immediately after the instruction, and loading it using the instruction pointer in absolute addressing mode, with auto-increment (so the next op-code after the operand is executed next). The PDP-11 does this, and I think that it's quite elegant. When composing small (<64kW) programs, this technique is typically only needed for large constants, like bit-masks and such, since the inherent 17-bit constant provides plenty of reach for relative branch targets, increments, initializations, and more. While translating FigForth from 6502 to 65m32, I have so far only found two occasions in hundreds of instructions where this 'long-immediate' technique is necessary, and they were only necessary because of the four-char-per-word dictionary name storage convention that I've implemented.

Before I spend too many more hours on this, it would help me to know what the more experienced readers think about the design. Does it still have some of that 6502 flavor, or is it too polluted from its other influences to deserve to be called something like 65m32? I am a fully-grown man, and I can take the negative opinions with the positives, so I would like to ask that you don't pull any punches if you have a reasoned argument as to why it might suck in some way or another. I promise that I won't get butt-hurt and run away.

Sincerely,

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Sat Aug 10, 2013 5:30 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 06, 2013 7:30 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
I'm glad to see more discussion on it, but I might need some more explanation on the notation. I do understand some of it.

It does indeed lose some of the 6502 flavor, but apparently still deals with the same registers and mostly the same instructions-- just different notation, with some flexibility added.

If I understand the way you're doing it, then for your "LDN #$7654", you will want to be able to add the ",_" where the underscore is filled in with the name of the 32-bit registers that are the equivalent of the 65816's data and program banks, for 32-bit offsets. This will be necessary for address-agnostic code, including for when a program or an array it accesses get moved after already being loaded. There are several reasons for wanting to be able to move it. I started writing up examples and then decided to leave it for later if it comes up.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 2:34 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Thanks for responding, Garth. I was hoping that you, above all others, would ... excellent!

My operand notation is relatively simple, but maybe a little bit eccentric:

# usually indicates an immediate numeric value added to a register (exceptions below). A missing register name is replaced with z, and a missing immediate constant is replaced with 0. If you want to do y = x + 100, you code it as ldy #100,x. If you want to branch 1000 words forward, you code it as ldn #1000,n or its equivalent bra .+1001. The 6502's tsx would be ldx #,s or tsx, depending on how 6502-friendly the assembler is.

No # indicates that the operand will be used as an effective address. A missing register name is replaced with z, and a missing numeric offset is replaced with 0. If you want to load a with the first element of a table in memory pointed to by u, you code it as lda ,u. In my translation of FigForth, NEXT would be ldn ,y+ or its equivalent jmp (,y+).

My mnemonics have only a few irregularities, and plenty of op-code space for important stuff that has certainly slipped my notice ... lots of available bits means lots of flexibility!

The first that may be unfamiliar are hla ... hln. What they do is push the indicated register on the system stack before loading it with a new value, but after any operand side-effects like auto-increment. This allows short or long jsrs (with post-word) to push the correct return address. They are also an economical way to save a register and immediately begin using it, in one instruction. If you want to stack a table pointer address contained in a and load a with the second element from that table, you would code it as hla 1,a. Later, you would restore the pointer with a lda ,s+ or its equivalent pla.

What about the weird stuff like sta #,b or asl #,a? You can't store into or shift an immediate, but you can use sta # to transfer register-to-register without affecting the N or Z flags, which lda # would do (at least for a, b, x, and y). For asl #, the immediate constant would be a shift count: negative values could exclude the carry, and positive values could include it.

Regarding your questions about relocation, please share or link me to any examples that you can create ... I'm chomping at the bit to see if I can adapt my concept to a processor that someone would want to use in a more complex environment. My 6502 experience does not include writing any 65c816 programs, but I can certainly grok the need for additional address translation beyond what the original NMOS could do ... I just need appropriate examples to dissect!

Thanks,

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Wed Aug 07, 2013 4:50 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 4:50 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
It would be nice to have an additional register or two that could be used as stack pointers too (although interrupts and subroutines would still use S). Bruce's idea of just using bank 0 on the '816 for a DTC Forth program and RTS as a 6-clock NEXT is quite efficient but mostly forbids interrupts. His ITC Forth idea of using PLX, JMP(0,X) for a 13-clock NEXT also has that problem. I have had other situations too where a separate hardware stack would make things more efficient. If the pushing and pulling instructions were the same thing only specifying a different register, that would be nice.

The '816 has a program bank register (PBR, also called K) and a data bank register (DBR, also called B) which are only 8 bits but dictate which 64K bank the referenced program addresses and data accesses will take place in, minus things like DP and stack and vectors which are always in bank 0. Any given program can be using the normal 16-bit addresses but the system can assign what bank(s) it will use to make it happily coexist with other programs that get their turn to run in a multitasking environment. Every program can think that it starts at address 0 and owns the hill, but in reality they start at integer multiples of 65,536 apart. I would like these "bank" registers to be 32-bit, like everything else in the processor, and then of course there are no bank boundaries. Your method of specifying the registers does the same thing but adds flexibility. An example I'm thinking of is where you have data mixed in with the program, as well as a data array somewhere else, and data accesses for both could be done with the same instructions but specifying which register to add in, with no need to be changing those registers. In this example, instructions to fetch data mixed in with the program in the program area would specify K to use the PBR equivalent instead of automatically going with B to use the DBR equivalent just 'cause they're accessing data.

The '816 also has a 16-bit direct-page register (also called D) which is an offset in bank 0 for where the direct page starts. Direct page is like zero page except that it can be moved around, so different tasks can have their own ZPs without stepping on each other. In an all-32-bit version, it's kind of like the whole 4gigaword address space is in direct page, but a 32-bit offset register would still be valuable for the same reasons the PBR and DBR are.

I'm perhaps on thin ice talking at the limits of my knowledge, because I don't have any experience myself with using multiple banks on the '816 since I only have the '802 which is like a bank-0-only '816, and that's what I wrote and tested my '816 Forth on. BDD and others might be able to contribute more in this area.

If you haven't already, be sure your ideas accommodate actions like the 816's PEI, PER, and PEA. These can be synthesized in 65c02 also, but it takes a lot of instructions there to do what the '816 does in a single instruction.

What does the the H stand for when it's the first letter of a mnemonic?

Although the above would be useful in multitasking, I have written several times that the kind of work I do with the workbench computer rules out normal multitasking OSs. I still have plenty of use for having multiple programs in memory at once though, and in fact I might like to do something like my HP-71 hand-held computer from the 80's does. It is not really a multitasking system but it does allow you to have hundreds of files, many if not most being program files, in file chains in non-volatile RAM, and they do not have to be loaded into another part of RAM to run, but can be run in place, even calling other programs and subprograms in other files, or even itself, with deep nesting. It supports local environments too, so the various programs that have been started don't step on each other's resources. Recursion is of course possible then too, and a pseudo-multithreading method allowed me to write a text editor for it that allowed me to have dozens of files open at once. If a program resizes a file that comes before it in the file chain, especially enlarging it, the system has to scoot things around to prevent fragmentation, and so all the free memory is together at one end. You can see that without the various offset registers I discuss above, that program would crash the first time there's a jump or return-from-subroutine after everything got moved and things are no longer where it expected them. In this case the system would update the offsets, and a return-from-subroutine for example goes to the right place even though the stack contents were not changed.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 5:47 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Quote:
It would be nice to have an additional register or two that could be used as stack pointers too (although interrupts and subroutines would still use S).


Although my design is still accumulator-centric like the 6502, my addressing modes are brutally orthogonal, meaning that practically any registers besides a, z, and n can be stack pointers without penalty. The main ALU's destination is almost always a, but the operand adder can do all kinds of LEA-style things.

Did you notice my FigForth NEXT from above? It's a single 32-bit instruction: ldn ,y+ AKA jmp (,y+)
EXECUTE is jmp (,x+)
I didn't have to use y as the code pointer, but it looked more 6502-like than jmp (,u+) which reminds me more of the 6809.
It shouldn't be much of a surprise that it greatly resembles rts, which is jmp (,s+)
It also follows neatly that a long jump is jmp (,n+) and a long call is jsr (,n+)
In fact, I have already translated over 100 machine instructions from 6502 FigForth to 65m32 FigForth, and so far all except one are 32-bits long! Here's a brief hand-assembled list file excerpt ... the binary translations are NOT correct, because I remapped the op-codes after doing it, and haven't gotten around to re-assembling it until my op-code map is stable:

Code:
00001021:            43 ;Execute a word by its code field address on the
00001021:            44 ;  dstack
00001021: 00001025   45 EXEC    .dw  .+4
00001022: c5584543   46         .db  "E",'XEC'  ;EXECUTE
00001023: d5544500   47         .db  "U",'TE'
00001024: 00001000   48         .dw  LIT        ;link to LIT
00001025: 29700000   49         jmp  (,x+)
00001026:            50 ;Adjust IP by in-line literal
00001026: 0000102a   51 BRAN    .dw  .+4
00001027: c252414e   52         .db  "B",'RAN'  ;BRANCH
00001028: c3480000   53         .db  "C",'H'
00001029: 00001021   54         .dw  EXEC       ;link to EXECUTE
0000102a: 31000000   55         tya
0000102b: 38500000   56         add  ,y+        ;IP += literal
0000102c: 01300000   57         tay
0000102d: f170ffd8   58         bra  NEXT
0000102e:            59 ;If bottom of dstack is zero then branch
0000102e: 00001032   60 ZBRAN   .dw  .+4
0000102f: b0425241   61         .db  "0",'BRA'  ;0BRANCH
00001030: ce434800   62         .db  "N",'CH'
00001031: 00001026   63         .dw  BRAN       ;link to BRANCH
00001032: 60040000   64         lda  ,x+
00001033: f177fff6   65         beq  BRAN+4
00001034: 31300001   66 BUMP    iny             ;Skip IP over literal
00001035: f170ffd0   67         bra  NEXT
00001036:            68 ;Increment loop index, loop until >= limit
00001036: 0000103a   69 PLOOP   .dw  .+4
00001037: a84c4f4f   70         .db  "(",'LOO'  ;(LOOP)
00001038: d0290000   71         .db  "P",')'
00001039: 0000102e   72         .dw  ZBRAN      ;link to 0BRANCH
0000103a: 66f00000   73         inc  ,s         ;index
0000103b: 500c0001   74 PL1     lda  1,s        ;limit
0000103c: 67000000   75         cmp  ,s         ;index
0000103d: f17bffec   76 PL2     bmi  BRAN+4
0000103e: 61600002   77         lds  #2,s
0000103f: f170fff4   78         bra  BUMP


bra NEXT is replaced with jmp (,y+) after debugging.

Quote:
The '816 has a program bank register (PBR, also called K) and a data bank register (DBR, also called B) which are only 8 bits but dictate which 64K bank the referenced program addresses and data accesses will take place in, minus things like DP and stack and vectors which are always in bank 0. Any given program can be using the normal 16-bit addresses but the system can assign what bank(s) it will use to make it happily coexist with other programs that get their turn to run in a multitasking environment. Every program can think that it starts at address 0 and owns the hill, but in reality they start at integer multiples of 65,536 apart.


This is the kind of stuff that I would like to investigate further. My understanding is that the '816 doesn't have a privileged mode, so a misbehaving 16-bit program could pretty much walk all over the other processes by modifying B and K. So far, I have only investigated and partially implemented the process-view model here, where everything ORGs around zero. It would help greatly if I could figure out an efficient way to add a privileged 'bank' register that always adds to the effective address, and can only be accessed and modified in 'privileged' mode, but at this point, my only concern would be making sure that I leave enough op-code space available for those privileged instructions.

Quote:
If you haven't already, be sure your ideas accommodate actions like the 816's PEI, PER, and PEA. These can be synthesized in 65c02 also, but it takes a lot of instructions there to do what the '816 does in a single instruction.


Done, I think.
pei #10 would be psh #10
pea foo would be psh #foo or psh ,n+ : dw foo if foo is > 16-bits.
per foobar would be ... uhh ... I'd better look that up first, but I think that I can do it hassle-free.

Quote:
What does the the H stan for when it's the first letter of a mnemonic?


Well, it's a little bit embarrassing, but I think that equal-length mnemonics are sexy. They make for aesthetically-pleasing source listings, and all of my designs use them. My 65m32 exclusively uses three-letters from respect for the 6502, my m-824 exclusively uses four-letters, and my 10-bit decimal design uses ... well I'm not sure yet, but I'm sure that it will be equal-length.

hla and its siblings are a disappointing side-effect of this obsession. It means to pusH then Load register A with something. PLA is too ambiguous, and SLA lingered for awhile, but I'm not completely satisfied with any of them. Suggestions are welcome, as long as it's three letters.

More to come,

Mike

[EDIT 2013.10.07]: After some further translation efforts, I have discovered the embarrassing fact that jmp (,y+) is not compatible with a classic ITC Forth implementation, since IP points to the word's CFA, not the word's actual machine code. :oops: Since register u is available for use as W, I have decided that the best 65m32 ITC NEXT would look like this:
Code:
NEXT    ;(2 instructions, 2 machine words, 5 machine cycles)
        ldu  ,y+        ; W = (IP) , IP += 1   
        jmp  (,u+)      ; execute code @ (W) , W += 1

I have also revised my mnemonic table: hla (pusH-Load-A) is now pda (Push-loaD-A) ... I believe that this change is in the spirit of the 6502's pha and php mnemonics.


Last edited by barrym95838 on Tue Oct 08, 2013 5:37 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 7:07 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
I'm not very concerned with privileged mode or memory protection, as this we don't expect this to be used commercially. Instead, probably anyone who uses it will be developing their own software, or at least starting with someone else's source code. Either way, they'll have full control, and can debug it all and not be at the mercy of any software suppliers that are secretive or unresponsive or incompetent to fix their bugs. When I get done developing a piece of software for my workbench computer, it has no bugs at all--or should I say, none ever show up--and there's definitely no crashing. This cannot be said for commercial bloatware; but I don't see this processor being used that way, especially for consumer-type uses. (To go further, I can usually have it recover from a crash in a second or two and get back on the job without re-loading things, since most crashes seem to be a matter of being safely stuck in a loop, not scribbling all over memory. I tell about it in Tip #40.)

Quote:
Did you notice my [...]

Yes, but did not understand it all yet. It may still take awhile.

I do like that the 6502's and 816's mnemonics are all three letters. Many things could be given aliases to get them down to three letters also, like your JMP(,S+) being RTS. I do that with PIC16 programming too, using 6502 mnemonics to invoke a macro to do something the PIC needs multiple instructions for but the 6502 can do in a single instruction.

Quote:
I didn't have to use y as the code pointer, but it looked more 6502-like than jmp (,u+) which reminds me more of the 6809.

On the 6502 & '816 we use X for the data stack pointer in Forth (as you know). Although the '816 does not need Y in Forth in the same way the '02 does, it would be nice to keep X and Y available, not tied up. Even the '816 does sometimes require saving X as the data stack pointer to use X temporarily for something else, and then has to restore X before the end of the primitive.

Quote:
In fact, I have already translated over 100 machine instructions from 6502 FigForth to 65Org32b FigForth, and so far all except one are 32 bits long!

So is that one 64 bits long? (Since the data bus is 32 bits wide, there are no narrower options, just as the 6502 does not have a 4-bit load or store.) What does your @ (fetch) look like? I compare the 6502 @ (10 instructions) with the much, much shorter (2 instructions) '816 @ at viewtopic.php?f=9&t=1505&p=9705#p9705 . Does yours get this down to a single instruction also, in spite of being 32-bit? That would be great for performance! Lemme see... Can it be done with 1 clock for instruction read, one for operand read, one for data read, and one for data write? That would be four clocks to do the 32-bit job whereas the '816 takes 12 clocks to do the 16-bit job. I realize of course that so far it's only talk, with no hardware.

My 32-bit 6502 DO...LOOP and associated words are at http://wilsonminesco.com/Forth/32DOLOOP.FTH and you can see how many, many instructions it takes to do it on the 6502. It will be fun to see that shortened to just a few instructions.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 7:26 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Thanks for your interest Garth! I need to get some sleep now, but I'll find (or quickly write) @ and DO tomorrow after work, and post them here ... unless someone else beats me to it!

Mike

[Edit: I snuck away from work long enough to write @:]

lda ,x ;get address
lda ,a ;fetch value @ address
sta ,x ;store value

One of the possible disadvantages with my design is that (direct,x) and (direct),y have to be synthesized. There is much more flexibility though. With no wait states, it's conceivable that my @ could execute in nine (or possibly even six) cycles.

No time for DO ... back to work!!

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Wed Aug 07, 2013 6:33 pm, edited 4 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 7:36 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
About your listing above-- I know you were copying out of the fig-Forth source, but zowee what a difference a good macro assembler makes! Here are similar pieces from my '816 Forth:
Code:
         HEADER "EXECUTE", NOT_IMMEDIATE   ; ( addr -- )
EXECUTE: PRIMITIVE
         LDA     0,X
 xeq1:   STA     W
         INX_INX
         JMP     W-1
 ;-------------------
         HEADER "PERFORM", NOT_IMMEDIATE   ; ( addr -- )
PERFORM: PRIMITIVE                         ; same as  @ EXECUTE
         LDA     (0,X)
         BRA     xeq1
 ;-------------------
         HEADER "EXIT", NOT_IMMEDIATE      ; ( -- )
EXIT:    DWL     unnest+2                  ; (primitive)
 ;-------------------
         HEADER "branch", NOT_IMMEDIATE    ; ( -- )
branch:  PRIMITIVE                         ; Set the IP to the absolute addr
         LDA     (IP)                      ; pointed to by the cell following the
         STA     IP                        ; execution token of branch. It's faster
         GO_NEXT                           ; this way not making it relative.
 ;-------------------
         HEADER "0branch", NOT_IMMEDIATE   ; ( n -- )
Zbranch: PRIMITIVE
         INX_INX
         LDA     $FFFE,X         ; (FFFE,X works for '802.)  Get the value that was at
         BEQ     branch+2        ; TOS before INX_INX .  Do the branch if TOS was 0.

 bump:   LDA     IP              ; bump (advance) the instruction pointer by two
         INA_INA                 ; LDA, INA, INA, STA  is faster than  INC INC.
         STA     IP
         GO_NEXT
 ;-------------------

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 5:27 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
barrym95838 wrote:
... Does it still have some of that 6502 flavor, or is it too polluted from its other influences to deserve to be called something like 65Org32b?

Hi Mike, I'm happy to see the discussions of your core, but if I can express a preference, I'd prefer you don't call it the 65Org32b: the 65org32 is a particular thing, and if we make any variations like we did with 65Org16, they might be called 65Org32.b - so, if you could pick something like 65X32, for any letter of your choice other than O (or I, or B,...), that might help minimise future confusion.

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 6:20 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
BigEd:

Message received, understood, and respected! I will go back and edit all of my posts ... my core will be 65m32.

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 6:26 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Great - thanks!


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 6:57 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
I tend to agree Ed. I was on the edge of thinking this is a little too different to call it a 65Org32, even with suffixes. He definitely has some good ideas though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 7:58 pm 
Offline
User avatar

Joined: Wed Jul 10, 2013 3:13 pm
Posts: 67
I would probably add more general purpose registers

_________________
JMP $FFD2


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 8:52 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8509
Location: Southern California
James_Parsons wrote:
I would probably add more general purpose registers

This is a popular thing to do in more-complex processors. The 6502 is often seen by others as not having enough registers; yet in a way, all of zero page is registers. But BigEd observed,
Quote:
With 6502, I suspect more than one beginner has wondered why they can't do arithmetic or logic operations on X or Y, or struggled to remember which addressing modes use which of the two. And then the intermediate 6502 programmer will be loading and saving X and Y while the expert always seems to have the right values already in place.

Interestingly, even 30 years ago a Z80 had to have a clock speed of 4MHz to keep up with a 1MHz 6502; and Jack Crenshaw, an embedded-systems engineer who wrote regularly in Embedded Systems Programming magazine said in the 9/98 issue that he still couldn't figure out why, benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly a more powerful instruction set, and ran at higher clock rates.

The 65816 also outperforms the 68000 in the Sieve benchmark, even though the 68000 has eight 32-bit general-purpose data registers (D0-D7), and eight address registers (A0-A7). Also, the 68000's interrupt response speed is terrible compared to the 65 family's. (That's not to say the 68000 isn't a nice processor. It is.)

After having worked with the 65 family for so many years, I have to conclude there's very little I could add to the 65816 outside of widening everything to 32 bits for reasons given early in this topic. I have mentioned another register or two further up that I would like, and another one that comes to mind is that vectors could be loaded from ROM into registers during the reset sequence to further improve interrupt performance and not need the slow ROM at all once you get going. The interrupt vector registers could then be changed by software if necessary.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 3:39 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
James_Parsons wrote:
I would probably add more general purpose registers


Well, more is usually better. But you should consider that microprocessors from the 70s had limited silicon space, and trade-offs were almost always at the fore-front for the design teams. The 6502 team appeared to have a certain philosophy in mind when they made it with a single accumulator and stack pointer, and two index registers. It's addressing modes seemed rather contrived to some of the newcomers of the time, but it quickly became apparent that they were incredibly useful when pushed to their true potential. When I try to think about improving something, I think about how to make it more efficient without changing its basic look and feel too much. How much is too much is a matter of opinion, but I think that replacing the single accumulator with a general-purpose register bank crosses the line, and doesn't just improve the 6502, but makes it something that feels completely different.

Its older step-brother, the 6800, had wider pointer registers and two accumulators, but certain things that happen all the time, like moving an arbitrary range of memory from one spot to another, are inefficient because its indexing address mode is crippled by a hard-coded 8-bit offset and a missing Y register. Most 6800 programmers lived with the inefficiencies, or employed unsafe tricks with the system stack pointer. The X register was 16-bits, which is nice, but CPX only correctly recognized equality, and the code to save and restore X to/from the system stack was clumsy and inefficient. Other qualities that annoy me are the way that it clears carry when you load an accumulator, and the way that it requires you to TST an accumulator that you have just pulled. It does okay on cycle counts, but cannot match the more efficient little-endian and pipe-lined 6502. The 6809 addressed many (but not all) of these issues, and is quite a capable little unit IMO, discounting the 'tacked-on' nature of some of its features. I respect the 6800/6809, but don't consider myself to be a true fan of either.

Its stiffest competition, the Z-80, had more (and wider) registers and a much richer instruction set, but it was hampered by inefficient memory accesses and some legacy issues with the old 8080 instruction set. I cannot speak from first-hand experience, but I have read that some of the neat-looking features are not-so-neat when actually put to work. The use of the index registers bloated and slowed the code, and many of the addressing modes were not available at the right moment. The way that it updates (or doesn't update) the condition flags are different than the 6800, but just as irksome to me. Many of the improvements from the 8080 to the Z-80 had a "tacked-on" feel as well. Although I think that it has a few neat (and unique) features, I do not consider myself to be a fan of the Z-80.

What it boils down to IMO is not how impressive it looks on paper, but how it can be efficiently employed to do something useful. There are a lot of expert 8080 and Z-80 programmers out there that can make their processor dance beautifully, but it takes a lot of experience to do it quickly and efficiently. Something non-trivial, like a BASIC interpreter, can be full of inefficiencies and compromises. The 6502 versions of the same BASIC interpreters came a bit later than those of the 6800 and 8080, but were not translated directly from the 6800 or 8080; they were re-written from scratch to do the same thing, but by someone who clearly knew how to make the 6502 dance. At the hands of a true artist, any of these little machines can be made to do impressive things, but most of us here believe that, at its very best, even with its little quirks and limitations, the 6502 actually WAS the best, all things considered. And many performance bench-marks support this belief.

Okay, back to my original point. My attempt to improve on it starts out as another case of how it looks on paper. With all of the available op-code space, I could have included hundreds of registers, but it wouldn't have had that accumulator-centric look and feel that many of us here have grown to know and love. To help me get away from the static specs and decide how useful it is, I have attempted to rewrite code written for the 6502, to see how natural, easy, and efficient it turns out to be. Whether or not it could be done cheaply and efficiently in actual silicon is another matter, one which I am unfortunately not qualified to address at this time.

Take care,

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 186 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12, 13  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: