6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue May 07, 2024 11:55 am

All times are UTC




Post new topic Reply to topic  [ 186 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13  Next
Author Message
PostPosted: Wed Aug 07, 2013 6:26 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10799
Location: England
Great - thanks!


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 6:57 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
I tend to agree Ed. I was on the edge of thinking this is a little too different to call it a 65Org32, even with suffixes. He definitely has some good ideas though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 7:58 pm 
Offline
User avatar

Joined: Wed Jul 10, 2013 3:13 pm
Posts: 67
I would probably add more general purpose registers

_________________
JMP $FFD2


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 8:52 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
James_Parsons wrote:
I would probably add more general purpose registers

This is a popular thing to do in more-complex processors. The 6502 is often seen by others as not having enough registers; yet in a way, all of zero page is registers. But BigEd observed,
Quote:
With 6502, I suspect more than one beginner has wondered why they can't do arithmetic or logic operations on X or Y, or struggled to remember which addressing modes use which of the two. And then the intermediate 6502 programmer will be loading and saving X and Y while the expert always seems to have the right values already in place.

Interestingly, even 30 years ago a Z80 had to have a clock speed of 4MHz to keep up with a 1MHz 6502; and Jack Crenshaw, an embedded-systems engineer who wrote regularly in Embedded Systems Programming magazine said in the 9/98 issue that he still couldn't figure out why, benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly a more powerful instruction set, and ran at higher clock rates.

The 65816 also outperforms the 68000 in the Sieve benchmark, even though the 68000 has eight 32-bit general-purpose data registers (D0-D7), and eight address registers (A0-A7). Also, the 68000's interrupt response speed is terrible compared to the 65 family's. (That's not to say the 68000 isn't a nice processor. It is.)

After having worked with the 65 family for so many years, I have to conclude there's very little I could add to the 65816 outside of widening everything to 32 bits for reasons given early in this topic. I have mentioned another register or two further up that I would like, and another one that comes to mind is that vectors could be loaded from ROM into registers during the reset sequence to further improve interrupt performance and not need the slow ROM at all once you get going. The interrupt vector registers could then be changed by software if necessary.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 3:39 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
James_Parsons wrote:
I would probably add more general purpose registers


Well, more is usually better. But you should consider that microprocessors from the 70s had limited silicon space, and trade-offs were almost always at the fore-front for the design teams. The 6502 team appeared to have a certain philosophy in mind when they made it with a single accumulator and stack pointer, and two index registers. It's addressing modes seemed rather contrived to some of the newcomers of the time, but it quickly became apparent that they were incredibly useful when pushed to their true potential. When I try to think about improving something, I think about how to make it more efficient without changing its basic look and feel too much. How much is too much is a matter of opinion, but I think that replacing the single accumulator with a general-purpose register bank crosses the line, and doesn't just improve the 6502, but makes it something that feels completely different.

Its older step-brother, the 6800, had wider pointer registers and two accumulators, but certain things that happen all the time, like moving an arbitrary range of memory from one spot to another, are inefficient because its indexing address mode is crippled by a hard-coded 8-bit offset and a missing Y register. Most 6800 programmers lived with the inefficiencies, or employed unsafe tricks with the system stack pointer. The X register was 16-bits, which is nice, but CPX only correctly recognized equality, and the code to save and restore X to/from the system stack was clumsy and inefficient. Other qualities that annoy me are the way that it clears carry when you load an accumulator, and the way that it requires you to TST an accumulator that you have just pulled. It does okay on cycle counts, but cannot match the more efficient little-endian and pipe-lined 6502. The 6809 addressed many (but not all) of these issues, and is quite a capable little unit IMO, discounting the 'tacked-on' nature of some of its features. I respect the 6800/6809, but don't consider myself to be a true fan of either.

Its stiffest competition, the Z-80, had more (and wider) registers and a much richer instruction set, but it was hampered by inefficient memory accesses and some legacy issues with the old 8080 instruction set. I cannot speak from first-hand experience, but I have read that some of the neat-looking features are not-so-neat when actually put to work. The use of the index registers bloated and slowed the code, and many of the addressing modes were not available at the right moment. The way that it updates (or doesn't update) the condition flags are different than the 6800, but just as irksome to me. Many of the improvements from the 8080 to the Z-80 had a "tacked-on" feel as well. Although I think that it has a few neat (and unique) features, I do not consider myself to be a fan of the Z-80.

What it boils down to IMO is not how impressive it looks on paper, but how it can be efficiently employed to do something useful. There are a lot of expert 8080 and Z-80 programmers out there that can make their processor dance beautifully, but it takes a lot of experience to do it quickly and efficiently. Something non-trivial, like a BASIC interpreter, can be full of inefficiencies and compromises. The 6502 versions of the same BASIC interpreters came a bit later than those of the 6800 and 8080, but were not translated directly from the 6800 or 8080; they were re-written from scratch to do the same thing, but by someone who clearly knew how to make the 6502 dance. At the hands of a true artist, any of these little machines can be made to do impressive things, but most of us here believe that, at its very best, even with its little quirks and limitations, the 6502 actually WAS the best, all things considered. And many performance bench-marks support this belief.

Okay, back to my original point. My attempt to improve on it starts out as another case of how it looks on paper. With all of the available op-code space, I could have included hundreds of registers, but it wouldn't have had that accumulator-centric look and feel that many of us here have grown to know and love. To help me get away from the static specs and decide how useful it is, I have attempted to rewrite code written for the 6502, to see how natural, easy, and efficient it turns out to be. Whether or not it could be done cheaply and efficiently in actual silicon is another matter, one which I am unfortunately not qualified to address at this time.

Take care,

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 7:06 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
GARTHWILSON wrote:
My 32-bit 6502 DO...LOOP and associated words are at http://wilsonminesco.com/Forth/32DOLOOP.FTH and you can see how many, many instructions it takes to do it on the 6502. It will be fun to see that shortened to just a few instructions.


Okay Garth, I translated your 2?do and 2do to 65m32. It looks like the 65c02 needs 52 instructions in 88 bytes to do it. The 65m32 needs ... [drum roll] ... 10 instructions in 10 words to do what appears to be the same thing (sorry for the missing binary and comments, it's late and I'm tired):

Code:
        ?do:
:41040000       lda  ,x         
:c1040001       cmp  1,x        ; (dTOS == d2OS)?
:5c6e0002       bne  do         ; no:  execute do
:4d060000       ldy  ,y         ; yes: transfer end-of-loop address to IP,
:5c0e0003       bra  pop2       ;     pop2 from dstack and proceed.
        do:
:de060000       psh  ,y+        ; push end-of-loop address on rstack & advance IP,
:dd040001       psh  1,x        ; push d2OS (limit) on rstack,
:dd040000       psh  ,x         ; push dTOS (index) on rstack,
                                ; and fall through to pop2
        pop2:
:48040002       ldx  #2,x       ; discard top 2 dstack items
:5e060000       jmp  (,y+)      ; proceed (NEXT).


This translation is (obviously) untested, but I think that it's correct. It shouldn't be hard to imagine that 2loop would display similar benefits.

Did I do good, boss??

Mike

[Edit: I think that maybe pop4 should be pop2 instead, so I changed it above.]

[Edit: After replacing pop4 with pop2, it looked like I could use auto-increment to get rid of pop2 altogether, saving a word of code, but decided against it, since it would place the dstack in a temporary unsafe condition (for a few machine instructions), and a FORTH ISR that triggered at the wrong moment might corrupt TOS.]

[Edit: Added some machine hex-code and comments.]

[EDIT 2013.10.07]: jmp (,y+) is not a proper ITC NEXT, since IP points to the CFA, not the actual machine code. :oops: It needs to be replaced with:
Code:
NEXT    (2 instructions, 2 machine words, 5 cycles)
        ldu  ,y+        ; W = (IP) , IP += 1   
        jmp  (,u+)      ; execute code @ (W) , W += 1

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Tue Oct 08, 2013 5:44 am, edited 8 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 7:57 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Very nice! :D That's partly what I'm after! 2do has 1/10th as many instructions. Hopefully it would be fewer clocks per instruction too, making it conceivably maybe 20 times as fast at a given clock speed. If it could run at even 20MHz, that would allow normal memory and off-the-shelf I/O ICs and still be several times as fast as a 100MHz FPGA 65c02 when constantly having to deal with the wider numbers. The names won't need the "2" as in "2do" to get 32 bits with the 65m32 though, as it's naturally 32-bit anyway. 2+loop, the internal compiled by 2+LOOP, has 59 instructions by my count. It would be fun to see what that comes down to.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 4:41 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
GARTHWILSON wrote:
2+loop, the internal compiled by 2+LOOP, has 59 instructions by my count. It would be fun to see what that comes down to.


I'll get on it as soon as possible. Regarding my unsafe dstack comment above, it looks like your '802 Zbranch does the same thing. Correct me if I'm wrong, but if a FORTH ISR sneaks in between the INX_INX and the LDA $fffe,x then a nasty little bug pops up, and it might be very hard to trace down, due to its intermittent nature.

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 6:53 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
barrym95838 wrote:
Regarding my unsafe dstack comment above, it looks like your '802 Zbranch does the same thing. Correct me if I'm wrong, but if a FORTH ISR sneaks in between the INX_INX and the LDA $fffe,x then a nasty little bug pops up, and it might be very hard to trace down, due to its intermittent nature.

The zero-overhead Forth interrupt support is atomic in that although the interrupt is recorded when it hits, the currently executing primitive is allowed to finish before the ISR in invoked, just as a machine-language R-M-W instruction (or any other type of instruction) is allowed to finish so an ISR doesn't foul things up. The interrupt is recorded (without using X) so that next time NEXT runs, it routes to the ISR instead of to the next primitive in the normal program flow. A cool thing is that it actually takes less time to route to the ISR! :D And since the same stacks are used and there's nothing analogous to processor status and registers to save, there's also nothing analogous to the 7-clock interrupt sequence required in machine-language interrupts. It moves right into the ISR without any such sequence, and without any of the register-saving and other setups required by other methods of high-level-language interrupt service. The article is at http://6502.org/tutorials/zero_overhead ... rupts.html.

Edit: The comment above about NEXT routing things differently is how I did it in the 6502 Forth, and what I described in the article. On the '816, it actually makes a different (and shorter) version of NEXT run, because the address of the desired version of NEXT is in a direct-page variable for an indirect jump. The interrupt version of NEXT would only be three instructions long except that it has to restore that pointer to the normal version of NEXT for the next time, adding an extra LDA, STA.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 7:03 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
GARTHWILSON wrote:
The zero-overhead Forth interrupt support is atomic in that although the interrupt is recorded when it hits, the currently executing primitive is allowed to finish before the ISR in invoked, just as a machine-language R-M-W instruction (or any other type of instruction) is allowed to finish so an ISR doesn't foul things up. The interrupt is recorded (without using X) so that next time NEXT runs, it routes to the ISR instead of to the next primitive in the normal program flow.


Okay, if you put it that way, then my auto-increment modification would work:
Code:
2?do:
        lda  ,x+
        cmp  ,x+
        bne  2do
        ldy  ,y
        jmp  (,y+)
2do:
        psh  ,y+
        psh  -1,x
        psh  -2,x
        jmp  (,y+)


It still feels unsafe, somehow, but it's even shorter and quicker, and should work correctly as long as 2do isn't a branch target from somewhere besides 2?do.

Mike

BTW, there's an even shorter version possible, using conditional pushes, but it's not as clear what is going on, and doesn't look as clean as the above. I hesitate to share it.

[Edit: the original (and now commented) version is the only one of the three versions mentioned that preserves the correct functionality of do, so I have abandoned the other two optimizations, since they only concerned themselves with making ?do work, not do or pop2, which most certainly are valuable as their own entities.]

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Fri Aug 09, 2013 5:04 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 7:10 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
I edited and added a little more to my last post above apparently while you were writing.

Until I and any others watching this get accustomed to your instructions and their notation, it would be good if you could comment them a lot to tell what each one does.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 7:22 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Quote:
BTW, there's an even shorter version possible, using conditional pushes, but it's not as clear what is going on, and doesn't look as clean as the above. I hesitate to share it.

This would be for internal stuff that will be out of sight once it's working, but I always explain anything tricky in the comments anyway. I might need my own explanation later when I can't remember what I've done, and there have also been many times that I caught bugs while doing the profuse commenting, bugs that hadn't shown up yet even though the code initially worked. I would say to make it as efficient as possible, and explain it well in the comments. A nice thing about the DOS/ANSI [Edit: that should say IBM437] character set is that it let you put smooth boxes and tables and diagrams in comments in a source-code text file.

It reminds me of the early days of programmable calculators, when memory was never enough. Since they weren't used for any realtime stuff, you could usually add calculation time if it let you save memory. They used to have contests to see who could write the shortest program to do something. Right about when you thought you had the shortest possible entry, in comes someone with a program only half as long, using a strange trick, and by that time you can't even figure out why it works at all!

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 08, 2013 7:23 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
GARTHWILSON wrote:
Until I and any others watching this get accustomed to your instructions and their notation, it would be good if you could comment them a lot to tell what each one does.


Sorry about that, boss. I got too excited, and pulled the trigger a little early on those examples. I'm at work right now, but I'll go back tonight and add some (hopefully useful) comments.

Mike

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2013 2:21 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3353
Location: Ontario, Canada
Hi, Mike. Nice proposal! :D Sorry for not speaking up sooner, but I've been on the bounce lately, and can't devote as much attention as I would like.
Quote:
I'll go back tonight and add some (hopefully useful) comments
IMO, what's as (or more) important than commenting the code is being more complete and orderly regarding the initial spec. For example:
Code:
Instruction bit format:  oooo ooaa ffff rrri iiii iiii iiii iiii

15 bits specify the operation, and 17 bits provide an 'inherent' constant that can be used to
  encode -65536 to 65535 without using a second word.

Addressing modes:

rrr  =     0       1       2       3       4       5       6       7
aa   =0   #,a     #,b     #,x     #,y     #,z     #,u     #,s     #,n
     =1   $,a     $,b     $,x     $,y     $,z     $,u     $,s     $,n
     =2   $,a+    $,b+    $,x+    $,y+    $,z+    $,u+    $,s+    $,n+
     =3   $,-a    $,-b    $,-x    $,-y    $,-z    $,-u    $,-s    $,-n

There are eight registers, plus p.  n is PC.  z is permanently hardwired to zero, meaning that
  '$,z' '$,z+' and '$,-z' are all equivalent.  # and $ come from ...i iiii iiii iiii iiii (the
  17-bit twos-complement number is extended to a full 32 bits before the operand calculation).

  • it would be good to explicitly name the eight registers
  • what is special about p ?
  • it would be good to explicitly state the nature of o, a, f, r, and i in the Instruction bit format (perhaps also listing the possible encodings)
  • what is the difference between # and $ ?

Okay, it's true that r and a are already reasonably clear; it's just that the comments regarding the bit format seem somewhat haphazard. Also, it's possible to infer what the eight registers are; and maybe the reader could answer other questions by poring over opcode matrix. But my suggestion (offered on a constructive basis, of course) is that this doc can be improved, making the project more easy to grasp and motivating folks to join the discussion. (BTW, I'm not suggesting that creating good doc is easy -- it isn't!! :| )

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 09, 2013 2:46 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
Thanks for the friendly encouragement, Jeff!

The complete, original specification document is sadly not ready for public view. I enclosed excerpts from it as a teaser, in the hopes of finding out if there was enough potential interest out there to motivate me into finishing it. You're right though ... the more that I look at my original post, the more that I can understand what you mean about "complete and orderly". I'm already an experienced 65m32 coder, so it was easy for me to go off half-cocked in my excitement. I'll try to rearrange and complete it for upload later (but soon).

Let me answer one of your points right now. The registers are all 32-bits. a, b, x, y, z, u, s, n, and p.
a is the accumulator.
b, x, y, and u are for indexing and temp storage.
z is permanently hard-wired to 0.
s is the system stack pointer, and its usage is implied in any description involving 'push' or 'pull', unless stated otherwise.
n is the system instruction pointer.
p is the processor status register.

I have to get to work, so I'll have to leave it at that for now. More later.

Mike

[Edit: I have more work to do on the original document before I'm comfortable about posting it in its entirety, but I can answer a few more of Jeff's points here while progress ... er ... progresses.]

What I believe is the true key to the 65m32's efficiency and ease of use is NOT its instruction repertoire, which is rather ordinary with few exceptions, but its flexible operand structure. Once one fully understands how this structure works, programming with it becomes natural and simple (at least for me). The way that it works is as follows:

ANY register except for the processor status register can be used as an index, including the accumulator and the instruction pointer.

There are two families of operand modes, immediate and absolute. The immediate mode is indicated by a leading # in the operand field, and means that the operand value is to be used at 'face-value'. The immediate value isn't just a static entity, though, because it is (with few exceptions) added to the contents of the specified index register (identified with a leading comma) before use. #1,x is always equal to the contents of register x, plus 1. To make the assembly language easier to type, I have specified that either the numeric part or the register name (but not both) can be omitted. A missing numeric is assumed to be 0, and a missing register name is assumed to be ,z.

There are three absolute modes; they are indicated by the absence of a leading # in the operand field, and always imply an additional memory access (read, write, or read-modify-write). This is because the operand value (which is calculated in the same manner as it is for immediate mode) is used as a pointer to main memory. Automatic post-increment and pre-decrement options for the indicated index register should be self-explanatory.

The 65m32 is 32-bits all-the-way, and technically EVERY instruction is a single word. Of course, most instructions require operand data to specify an immediate value or an address, and it is impossible to fit a 32-bit operand and an op-code into 32-bits.

One way that the 65m32 gets around the problem is by promoting an embedded 17-bit operand to 32-bits before using it, by duplicating bit 16 in bits 17 to 31 before adding it to the index. But that only works most of the time, depending on what you're doing with the operand. -65536 ... 65535 is a respectable range that can be used for small increments, constants and offsets, but doesn't enable the 65m32's full potential.

The other way that the 65m32 gets around the problem is by treating the instruction pointer as just another index register. This allows in-lining a full 32-bit operand immediately after the instruction, and loading it using the instruction pointer in absolute addressing mode, with auto-increment (so the next op-code after the operand is executed next). The PDP-11 does this, and I think that it's quite elegant. When composing small (<64kW) programs, this technique is typically only needed for large constants, like bit-masks and such, since the inherent 17-bit constant provides plenty of reach for relative branch targets, increments, initializations, and more. While translating FigForth from 6502 to 65m32, I have so far only found two occasions in hundreds of instructions where this 'long-immediate' technique is necessary, and they were only necessary because of the four-char-per-word dictionary name storage convention that I've implemented.

More about my instruction encoding later ...

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Sat Aug 10, 2013 5:12 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 186 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: