6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Apr 30, 2024 12:16 pm

All times are UTC




Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Thu Jun 27, 2013 4:28 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
dclxvi wrote:
GARTHWILSON wrote:
So are you suggesting/suspecting that having another register to do this with might make for enough possible speed increase in the instruction decoding to justify getting rid of (ZP,X)?

It might be nice to have another register available, but it's possible that you don't actually need another register. Y usually winds up being a scratch register in Forth, at least for me, so it's okay if it gets clobbered.

That's true if the processor can handle entire addresses in one gulp like that '816 can, but the '02 needs Y in NEXT for ITC Forth, and many primitives require that Y start as 0, then they increment it to 1 before they're done, getting two bytes of an address as they progress.

Quote:
GARTHWILSON wrote:
SKIP is also in common Forth usage for skipping over leading delimiters (usually spaces) in a string, so it gets heavy use during compilation.

I can't recall having seen PERFORM before, but I tend to use @ EXECUTE in those situations rather than define another word, so maybe I actually have encountered it before and simply forgotten.

In my '816 Forth:
Code:
         HEADER "EXECUTE", NOT_IMMEDIATE  ; ( addr -- )
EXECUTE: PRIMITIVE
         LDA   0,X
 xeq1:   STA   W
         INX_INX
         JMP   W-1
 ;-------------------
         HEADER "PERFORM", NOT_IMMEDIATE  ; ( addr -- )
PERFORM: PRIMITIVE                        ; same as  @ EXECUTE
         LDA   (0,X)
         BRA   xeq1
 ;-------------------


Quote:
I've tended to have PARSE and PARSE-WORD (or WORD) be single long-ish definitions, and not factor it into smaller words, simply becuase I don't wind up needing or using the smaller words. Using smaller words does seem to be a more common approach, but I hadn't remembered that word being called SKIP.

But parsing is another good example where the maximum possible speed isn't absolutely necessary, though obviously you don't want it to be too slow. The whole point of having an inner interpreter is that you compile "once" (well, once when you load the code) and execute many times. It's more important that the "many times" part be fast. Forth gets its performance from the fact the inner interpreter is a lot faster than parsing and dictionary lookup.

True, but the compilation on my '02 ITC Forth is awfully slow, which is why I was asking about dictionary hashing at viewtopic.php?f=9&t=555, and then found that although my search was kind of slow, the real bottleneck is, as I wrote there,
Quote:
FIND is a primitive, but WORD is a pretty long secondary, as is EXPECT , QUERY , and INTERPRET which hands the found word off to COMPILER which is also a secondary. At least WORD uses primitives SKIP and SCAN for the iterative processes, but EXPECT has a loop in high level that runs once per character received.


Quote:
GARTHWILSON wrote:
I did not find anyone else's usage of ANDing or ORing bits of a byte, so I made up my own names. I use these a lot in I/O.

I/O locations tend to be at known (at compile (or load) time) locations and thus potentially subject to the optimizations above. I'm hard pressed to come up with any examples where an I/O address would actually change at run time.

The idea is to be able to use the same word with, in my case, the addresses of 60 different registers in the I/O ICs; so you want to be able to change a bit for example in the data-direction register of port B of VIA #3. It makes sense to take in the register address as one of the input parameters.

Quote:
GARTHWILSON wrote:
it doesn't have to INX INX and later DEX DEX again for each cell

This is one of the reasons why keeping TOS is the accumulator is tempting. It can get in the way with DTC/ITC since many NEXT implentations use the accumulator, but in STC, since next is RTS and nest and unnest are JSR and RTS, the accumulator is (more) available. There are a few simple benefits, e.g. 2* is ASL instead of ASL 0,X (both smaller and faster), and there are variations on the ! and @ optimization theme above, e.g. 1234 XOR can be optimized to a single EOR #1234 instruction. It's easy to get carried away daydreaming about optimizations. :)

That would bring a benefit in some cases, but I wonder if it would be enough to matter. The existing instruction set still doesn't let you do @ for example, LDA(A). In most cases, taking on a new cell still requires moving one to memory, but it will be the one in the accumulator; ie, you have to first move the one in the accumulator to memory before you can put the new TOS in A. When you drop a cell, you then have to go to memory to re-load A with NOS which now becomes TOS. In both cases the stack pointer needs to be changed too.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 30, 2013 8:53 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
GARTHWILSON wrote:
That's true if the processor can handle entire addresses in one gulp like that '816 can,


Unless I've forgotten one, all of the extended designs (implemented or not) either have 16-bit registers (e.g. 65org16) or data width equal to the address width (e.g. 65org32). The 8-bit data designs all are intended to mimic the 6502 to varying degrees, and if they omit anything it's likely to be decimal mode; certainly not (zp,X). So the 6502 concerns about Y aren't really relevant to a (theoretical) processor without (zp,X).

For example, I implemented 65org16 eForth like a 65C816 Forth. "Everything" lives in 64k (this is almost always plenty of space), and I added L! L@ LC! and LC@ to accomodate any need to accessing memory beyond 64k (the eForth kernel itself doesn't use these words).

GARTHWILSON wrote:
It makes sense to take in the register address as one of the input parameters.


It does, but the point is WHAT acts on the input parameters and WHEN. Usually, by the time the compiler encounters the C@ or C! the I/O register address will be known and fixed. Why not let the compiler optimize for this case instead of passing the input parameter along to the body of @ which executes ("many times") at run-time? Bind early! It doesn't matter how many I/O registers there are.

GARTHWILSON wrote:
The existing instruction set still doesn't let you do @ for example, LDA(A).


Not in one instruction, but you can do it in two:

Code:
   TAY
   LDA 0,Y


That's actually a faster (and the same size) FETCH than this:

Code:
   LDA (0,X)
   STA 0,X


GARTHWILSON wrote:
In most cases, taking on a new cell still requires moving one to memory, but it will be the one in the accumulator; ie, you have to first move the one in the accumulator to memory before you can put the new TOS in A. When you drop a cell, you then have to go to memory to re-load A with NOS which now becomes TOS. In both cases the stack pointer needs to be changed too.


Sure, in some instances it's the same instructions, just in different order. For example, the literal 1234:

Code:
; TOS in accumulator
;
   DEX
   DEX
   STA 0,X   ; push TOS
   LDA #1234 ; load literal into TOS


compared to:

Code:
; TOS in 0,X
;
   LDA #1234 ; put literal in accumulator
   DEX
   DEX
   STA 0,X   ; push the literal


so while there's no improvement from using TOS in the accumulator, there is no penalty either. To me, the question is where is it worse?

DROP with TOS in the accumulator has an additional STA 0,X instruction. However, DUP has one less STA 0,X instruction. If depends on how many DUPs vs. DROPs there are; if they are equally common then it all balances out and there's no difference in speed.

With TOS in the accumulator, a pair of INXs is NIP instead of DROP. And having written and used several TOS-in-a-register Forths, I've found that I can arrange to use NIP instead of DROP in many cases. For example, a sequence like IF DROP FALSE EXIT THEN becomes IF FALSE NIP EXIT THEN (and in fact you can optimize FALSE NIP as a single LDA #0 instruction, with no INXs or DEXs -- in fact, since this is just another varation of the @ optimization, you can apply this to any literal or constant, e.g. 1234 NIP becomes LDA #1234). In that case (even without the parenthetical optimization) you get the benefit of a speed increase in DUP without a speed penalty in DROP. It's also possible to arrange to use NIP NIP instead of 2DROP sometimes.

There are a few words that take an extra instruction, e.g. DROP, 2DROP and ZBRANCH (compiled by IF), but there are a lot more words that have fewer instructions and/or are faster, e.g. 1+ 1- 2/ CELL+ (same concept as the 2* ASL). SWAP is:

Code:
   LDY 0,X
   STA 0,X
   TYA


TYA replaces two zp,X instructions. Likewise for ROT. TUCK is similar. Non-optimized XOR is:

Code:
   EOR 0,X
   INX
   INX


which has two fewer zp,X instructions. Likewise for + - AND and OR.

The other thing to consider is which words do you have to add TAY and TYA or STA N and LDA N (or PHA and PLA) because TOS is in the accumulator?

One such word is RP@ (assuming the return stack is the 65xx stack). TSX clobbers X which is no good, and TSC clobbers A. (This is one reason why I've said that TSY and TYS would be helpful for Forth. For words like R@ stack,S addressing can be helpful.) Fortunately, RP@ is typically only needed for words like ABORT QUIT CATCH and THROW and thus its not important that it be absolute maximum speed. So for RP@, TAY and TYA don't hurt much, and I'd rather live with this and get the benefits elsewhere.

However, in the DTC 6-cycle (RTS) NEXT Forth, having TOS in the accumulator caused some slight complications for DOCOLON and EXIT (both have a pair of instructions to preserve the accumulator). And any extra instructions in those two words are a concern (not necessarily a showstopper, but a serious concern).

Of course, it's always wise to evaluate an approach to see if really makes sense rather than just blindly applying it, but I like to use TOS-in-a-register as my starting point.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 02, 2013 9:08 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
You definitely have some good ideas there.

dclxvi wrote:
GARTHWILSON wrote:
It makes sense to take in the register address as one of the input parameters.

It does, but the point is WHAT acts on the input parameters and WHEN. Usually, by the time the compiler encounters the C@ or C! the I/O register address will be known and fixed. Why not let the compiler optimize for this case instead of passing the input parameter along to the body of @ which executes ("many times") at run-time? Bind early! It doesn't matter how many I/O registers there are.

Hmmm... Since you mention it, although I don't currently have any compiler optimization, what I could do is use a headerless primitive, or use the INLINE...END-INLINE we talked about at viewtopic.php?f=9&t=550 and do it in assembly and even use a macro. (For other readers, yes, Forth makes it easy to have full macro capability in a tiny (<4KB) assembler.) Then the address would become a constant and would get handled directly without even going on the stack. That could optionally be the case with the mask as well. I would have to compare the performance and memory to see if there's any benefit at all to using INLINE...END-INLINE for this. (Again for other readers: If a primitive needs its own local variable space, it can use the N area (also in ZP) and avoid the indexing associated with the data stack. What remains in N after a primitive is finished running is absolutely irrelevant; so there are no conflicts.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 04, 2013 3:10 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1927
Location: Sacramento, CA, USA
Awesome thread, guys ... [bump] ...

I have been test-coding various sources to determine whether or not my 65m32 design has enough functionality to be 'fun' to program, like the 65xx is (at least to me). In typical 'old-school' fashion, I decided that an ITC forth would be a good initial practice target, not because I know how to use it, but because I know how to translate it (with one or two embarrassing exceptions). The 65m32 only allows one built-in level of indirection (what a 65xx programmer would simply call 'direct' or 'absolute'), but it also has a couple more registers. I was intrigued by this TOS in accumulator idea, and coded up some tests to see how it might go:
Code:
Keeping dTOS in register a seems like an overall 'win'
 for ITC Forth on the 65m32 ... an analysis of some
 figFORTH primitives follow.  The 65m32 approaches the
 instruction density of the pdp-11 version, and makes
 everything 32-bits.

Register a is an important temporary for 'dTOS in (x)'
Register a is dTOS and (x) is d2OS for 'dTOS in a'
Register b is an important temporary for 'dTOS in a'
Register x is the data SP
Register y is IP
Register u is W
Register s is the return SP

NEXT, BUMP, BRANCH and ;S are identical for either
 version.  The code headers are also identical, and are
 therefore not included in this analysis.  Machine word
 and memory cycle counts are shown in parentheses.

It is assumed that R-M-W instructions and successful
 branches take an extra cycle to complete.

NEXT    (2; 5)
        ldu  ,y+        ; W = (IP) , IP += 1   
        jmp  (,u+)      ; execute code @ (W) , W += 1

BUMP    (2; 8)
        iny             ; IP += 1
        bra  NEXT       ; continue

BRANCH  (2; 9)
        ady  ,y         ; IP += (IP)
        bra  NEXT       ; continue

;S      (2; 9)
        ply             ; pop IP from rstack
        bra  NEXT       ; continue

When the choice presented itself, code density was
 given higher priority than execution speed for both
 versions.

First, the 'ties': ------------------------------------

Before: dTOS in (x)             After:  dTOS in a
-------------------             -----------------

PERFORM (3; 7)                  PERFORM (3; 7)
        lda  ,x+                        ldu  ,a
        ldu  ,a                         lda  ,x+
        jmp  (,u+)                      jmp  (,u+)

>R      (2; 10)                 >R      (2; 10)
        psh  ,x+                        pda  ,x+
        bra  NEXT                       bra  NEXT

R       (3; 11)                 R       (3; 11)
        lda  ,s                         sta  ,-x
        sta  ,-x                        lda  ,s
        bra  NEXT                       bra  NEXT

R>      (3; 11)                 R>      (3; 11)
        pla                             sta  ,-x
        sta  ,-x                        pla 
        bra  NEXT                       bra  NEXT

LIT     (3; 11)                 LIT     (3; 11)
        lda  ,y+                        sta  ,-x
        sta  ,-x                        lda  ,y+
        bra  NEXT                       bra  NEXT

I       (3; 11)                 I       (3; 11)
        lda  ,s                         sta  ,-x
        sta  ,-x                        lda  ,s
        bra  NEXT                       bra  NEXT

LEAVE   (3; 11)                 LEAVE   (3; 11)
        lda  ,s                         ldb  ,s
        sta  1,s                        stb  1,s
        bra  NEXT                       bra  NEXT

OVER    (3; 11)                 OVER    (3; 11)
        lda  1,x                        sta  ,-x
        sta  ,-x                        lda  1,x
        bra  NEXT                       bra  NEXT


(DO)    (3; 13)                 (DO)    (3; 13)
        psh  1,x+                       psh  ,x+
        psh  -1,x+                      pda  ,x+
        bra  NEXT                       bra  NEXT

(LOOP)  (6; 18 or 19)           (LOOP)  (6; 18 or 19)
        inc  ,s                         inc  ,s
        lda  ,s                         ldb  ,s
        cmp  1,s                        cpb  1,s
        bmi  BRANCH                     bmi  BRANCH
        lds  #2,s                       lds  #2,s
        bra  BUMP                       bra  BUMP


The 'winners': ----------------------------------------

1+      (2; 10)                 1+      (2; 8)
        inc  ,x                         lda  #1,a
        bra  NEXT                       bra  NEXT

2+      (2; 15)                 2+      (2; 8)
        inc  ,x                         lda  #2,a
        bra  1+                         bra  NEXT

1-      (2; 10)                 1-      (2; 8)
        dec  ,x                         lda  #-1,a
        bra  NEXT                       bra  NEXT

2-      (2; 15)                 2-      (2; 8)
        dec  ,x                         lda  #-2,a
        bra  1-                         bra  NEXT

PUT     (2; 9)                  PUTB    (2; 8)
        sta  ,x                         tba 
        bra  NEXT                       bra  NEXT

DUP     (2; 13)                 DUP     (2; 9)
        lda  ,x                         sta  ,-x
        bra  PUSH                       bra  NEXT

-DUP    (3; 10 or 13)           -DUP    (3; 9 or 10)
        lda  ,x                         tst  #,a
        bne  PUSH                       sta  [ne],-x
        bra  NEXT                       bra  NEXT

SWAP    (3; 16)                 SWAP    (2; 10)
        lda  ,x                         exa  ,x
        exa  1,x                        bra  NEXT
        bra  PUT

+       (3; 15)                 +       (2; 9)
        lda  1,x                        add  ,x+
        add  ,x+                        bra  NEXT
        bra  PUT

NOT     (3; 14)                 NOT     (2; 8)
        lda  ,x                         cdd  #0
        cdd  #0                         bra  NEXT
        bra  PUT

MINUS   (3; 14)                 MINUS   (2; 8)
        lda  ,x                         cdd  #1
        cdd  #1                         bra  NEXT
        bra  PUT

-       (3; 15)                 -       (3; 10)
        lda  1,x                        sub  #1
        sub  ,x+                        cdd  ,x+
        bra  PUT                        bra  NEXT

U*      (3; 15 + ???)           U*      (2; 9 + ???)
        lda  1,x                        mul  ,x+
        mul  ,x+                        bra  NEXT
        bra  PUT

AND     (3; 15)                 AND     (2; 9)
        lda  ,x+                        and  ,x+
        and  ,x                         bra  NEXT
        bra  PUT

OR      (3; 15)                 OR      (2; 9)
        lda  ,x+                        ora  ,x+
        ora  ,x                         bra  NEXT
        bra  PUT

XOR     (3; 15)                 XOR     (2; 9)
        lda  ,x+                        eor  ,x+
        eor  ,x                         bra  NEXT
        bra  PUT

ABS     (3; 14)                 ABS     (3; 9)
        lda  ,x                         tst  #,a
        cdd  [mi],#1                    cdd  [mi],#1
        bra  PUT                        bra  NEXT

@       (3; 15)                 @       (2; 9)
        lda  ,x                         lda  ,a
        lda  ,a                         bra  NEXT
        bra  PUT

0=      (4; 15)                 0=      (3; 9)
        lda  ,x                         cdd  #1
        cdd  #1                         cdc  #1,a
        cdc  #1,a                       bra  NEXT
        bra  PUT

0<      (4; 15)                 0<      (3; 9)
        lda  ,x                         rol  #1,a
        rol  #1,a                       cdc  #1,a
        cdc  #1,a                       bra  NEXT
        bra  PUT

=       (3; 21)                 =       (2; 13)
        lda  1,x                        sub  ,x+
        sub  ,x+                        bra  0=
        bra  0=




ROT     (6; 21)                 ROT     (5; 15)
        lda  2,x                        tab 
        ldb  1,x                        lda  ,x
        stb  2,x                        exa  1,x
        ldb  ,x                         stb  ,x
        stb  1,x                        bra  NEXT
        bra  PUT


The 'losers': -----------------------------------------

DROP    (2; 8)                  DROP    (2; 9)
        inx                             lda  ,x+
        bra  NEXT                       bra  NEXT

PUSH    (2; 9)                  PUSHB   (2; 12)
        sta  ,-x                        sta  ,-x
        bra  NEXT                       bra  PUTB

SP@     (2; 9)                  SP@     (2; 15)
        stx  ,-x                        txb 
        bra  NEXT                       bra  PUSHB

EXECUTE (2; 5)                  EXECUTE (3; 6)
        ldu  ,x+                        tau 
        jmp  (,u+)                      lda  ,x+
                                        jmp  (,u+)

0BRANCH (3; 13)                 0BRANCH (5; 15)
        tst  ,x+                        tab 
        bne  BRANCH                     lda  ,x+
        bra  BUMP                       tst  #,b
                                        bne  BRANCH
                                        bra  BUMP

(+LOOP) (13; ~24)               (+LOOP) (15; ~27)
        lda  ,s                         tab 
        add  ,x                         add  ,s
        sta  ,s                         sta  ,s
        tst  ,x+                        lda  ,x+
        bpl  2$                         tst  #,b
        lda  1,s                        bpl  2$
        cmp  ,s                         ldb  1,s
        bmi  BRANCH                     cpb  ,s
        bra  3$                         bmi  BRANCH
2$      cmp  1,s                        bra  3$
        bmi  BRANCH             2$      ldb  ,s
3$      lds  #2,s                       cpb  1,s
        bra  BUMP                       bmi  BRANCH
                                3$      lds  #2,s
                                        bra  BUMP


The 'mixed bag': --------------------------------------

+!      (5; 15)                 +!      (4; 17)
        lda  ,x+                        ldb  ,a
        ldb  ,a                         adb  ,x+
        adb  ,x+                        stb  ,a
        stb  ,a                         bra  DROP
        bra  NEXT

!       (4; 13)                 !       (3; 15)
        lda  ,x+                        ldb  ,x+
        ldb  ,x+                        stb  ,a
        stb  ,a                         bra  DROP
        bra  NEXT

I'm sure that there is a coding error or two in there, but it seems like an overall win. I also find myself wondering whether or not ITC is even the best choice for the 65m32. My quest for a completed spec doc continues ... still wandering around a bit, waiting for that burst of inspiration!

Mike


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 04, 2013 5:23 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
Quote:
My quest for a completed spec doc continues ...
Allow me to urge you (again) to persevere with the doc. Right now the readership for code examples (such as those above) is extremely limited, due to the amount of effort required for someone to figure out what your machine is all about. I hope that doesn't sound critical -- I'm actually encouraging you!

I suspect the "TOS in a register" idea is a clear win for any CPU that has a register available to support it -- and that includes 65m32 ! :D As for Direct Threaded Code (DTC) versus Indirect Threaded Code (ITC), I think ITC is more conceptually pure (and it saves memory), but the associated performance hit means that DTC remains attractive. Can your machine be modified to run ITC with speed approaching that of DTC? Probably, but there's a chance the compromises necessary to achieve that may be unappealing. But it's intriguing to contemplate -- an alluring idea, to be sure! Meanwhile, you need to ask whether there's enough on your plate already.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 04, 2013 8:22 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Also not to be forgotten (and it's on my links page) is STC (subroutine-threaded code). Bruce Clark explanains how the faster-running STC Forth avoids the expected memory penalties. He gives 9 reasons, starting in the middle of his long post in the middle of the page. STC of course eliminates the need for NEXT, nest, and unnest, thus improving speed. With a 32-bit address bus and memory prices having come down so much though, saving a little memory for the program is not the big deal it was years ago.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 06, 2013 4:21 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1927
Location: Sacramento, CA, USA
Dr Jefyll wrote:
Allow me to urge you (again) to persevere with the doc. Right now the readership for code examples (such as those above) is extremely limited, due to the amount of effort required for someone to figure out what your machine is all about. I hope that doesn't sound critical -- I'm actually encouraging you!

Thanks doc, for the encouragement and the private help so far (from Dieter and Garth as well).

I have been trying to let the design develop organically, by taking a full plate of 'stuff' and mixing things around on it to see what develops, but it can get really frustrating sometimes, with all of the real-life distractions, the lack of experience, and the writer's block. I suppose that I was hoping that someone 'super-extra-special-sharp' like Bruce would instantly and completely understand what I was trying to accomplish, and bust in with a useful tid-bit that would get me back on track. But that's wishful thinking, and counting on it would probably be a fatal mistake.

Are you a real doctor, or do you just play one on TV? If you are, could you hook me up with a two-month's supply of Ritalin or something? :shock: :wink:

Mike


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 06, 2013 7:09 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8160
Location: Midwestern USA
barrym95838 wrote:
Dr Jefyll wrote:
Allow me to urge you (again) to persevere with the doc.
Are you a real doctor, or do you just play one on TV? If you are, could you hook me up with a two-month's supply of Ritalin or something? :shock: :wink:

Mike

Hey, Jeff, as long as you're writing prescriptions I could use a supply of those little blue pills... :lol:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 06, 2013 9:02 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8160
Location: Midwestern USA
dclxvi wrote:
One thing that's come up once or twice (or more) in the various Programmable Logic threads is the value of having (zp,X) addressing...So, while (zp,X) addressing is certainly nice to have, and its functionality should be possible, I'm not (yet) convinced that it needs to be easy or fast in a 6502-like processor.

I belatedly realized that a reply to this from my perspective might be...er...useful. :?

Way back when the 6502 was new and exciting technology (now, it's old and exciting), (zp,X) was often discussed because of its seemingly limited usefulness. However, Leventhal and others quickly pointed out that (zp,X) would be useful to process I/O devices in a loop. Just such an application comes to mind with my next generation POC design.

POC V2.something's hardware complement will include the NXP 2698B octal UART (octart), which device contains four virtual 2692 DUARTs. All eight of the TIA-232 channels are identical in function and in fact, could be serviced by the same code block. Here's where (zp,X) could be applicable.

First code snippets from the POC V1.1 interrupt service routine where the DUART's channel A receiver and transmitter are serviced:

Code:
         lda isr_92            ;DUART interrupt status
         bne .iirq010          ;DUART interrupting
;
         jmp iirq02            ;DUART not interrupting—skip ahead
;
;—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-
;
;DUART INTERRUPT PROCESSING
;
.iirq010 bit #u92arirq         ;ch A receiver interrupting?
         beq .iirq040          ;no, goto next channel
;
.iirq020 lda rxd_92a           ;read datum &...
         xba                   ;hold it
         ldx e32rixpa          ;buffer "put" index
         txa                   ;determine...
         ina                   ;next position
         and #e32xmask         ;deal with index wrap
         cmp e32rixga          ;check buffer "get" index
         beq .iirq030          ;buffer full, drop datum
;
         xba                   ;swap datum w/index &...
         sta e32bufra,x        ;buffer it
         xba
         sta e32rixpa          ;set new index
;
.iirq030 lda sr_92a            ;channel A status
         bit #u92rxdr          ;more data in FIFO?
         bne .iirq020          ;yes
;
         lda isr_92            ;no, goto next channel
;
.iirq040 ...above duplicated for channel B...
;
.iirq070 bit #u92atirq         ;transmitter interrupting?
         beq .iirq110          ;no, goto next channel
;
         ldx e32tixga          ;buffer "get" index
;
.iirq080 cpx e32tixpa          ;buffer "put" index
         beq .iirq090          ;no data in buffer
;
         lda sr_92a            ;channel A status
         bit #u92txdr          ;FIFO full?
         beq .iirq100          ;yes, defer next datum
;
         lda e32bufta,x        ;get datum from buffer &...
         sta txd_92a           ;write to FIFO
         txa                   ;current buffer position
         ina                   ;new buffer position
         and #e32xmask         ;deal with wrap
         sta e32tixga          ;set next position
         tax
         bra .iirq080          ;send another byte
;
.iirq090 lda #u92crtxd         ;disable xmitter so...
         sta ccr_92a           ;it quits interrupting
         lda #e32txsta         ;tell foreground that...
         tsb e32txst           ;xmitter is disabled
;
.iirq100 lda isr_92            ;reload ISR
;
          ...above duplicated for channel B...
;
iirq02   ...service next device...

Other than device port and buffer references, the above code is duplicated for channel B, which clearly doesn't make the best use of ROM space. However, as I'm not bumping up against the end of the ROM I haven't made any effort to compact this code—it runs plenty fast, so that's not an issue as well.

Servicing the 2698 will be a different matter. With eight channels, it isn't going to be very efficient to have eight virtually identical code blocks lined up like ducks in the ISR and doing what one properly written code block could do eight times. With the 65C02, using (zp,X) addressing to connect to the correct device registers and read/write the corresponding buffers would be most efficient in terms of code space, although at a cost of zero page space. The 65C816 changes things, as now stack pointer relative indirect addressing, along with a lookup table to set the correct addresses, could complement or replace (zp,X) addressing.

Either way, the key to it is that the 2698 registers are arranged in such a way that the device can be "majorly" addressed by "block" number, 0-3, with each block corresponding to a virtual 2692, and "minorly" addressed by channel, with each channel corresponding to A or B (0 or 1) in a real 2692. In each block, each channel can be addressed as above. Here's what I've worked out to date on the 2698's internal architecture:

Code:
;NXP SCC2698B OCTAL ACIA (OCTART)
;
;   ——————————————————————————————————————————————————————————————————————
;   The NXP SCC2698B is functionally equivalent to 4 NXP 2692A DUARTs in a
;   single package.  The DUARTs are arranged in consecutive blocks so that
;   indexed  methods should make it possible for a single code block to be
;   used to service all 8 communications channels.
;   ——————————————————————————————————————————————————————————————————————
;
   .if !.def(irq_2698)
;
irq_2698 =io_irq5              ;default IRQ ID
iob_2698 =io_a                 ;default base I/O address
;
;   ————————————————————————————————————————————————————————
;   Change the above to suit the memory map & IRQ vectoring.
;   ————————————————————————————————————————————————————————
;
ncd_2698 =2                    ;channels per virtual DUART
nvd_2698 =4                    ;virtual DUARTs per device
nrc_2698 =8                    ;registers per channel
nrd_2698 =nrc_2698*ncd_2698    ;registers per virtual DUART
nr_2698  =nrd_2698*nvd_2698    ;registers per device
;
vd1_2698 =iob_2698             ;virtual DUART #1 base address
vd2_2698 =vd1_2698+nrd_2698    ;virtual DUART #2 base address
vd3_2698 =vd2_2698+nrd_2698    ;virtual DUART #3 base address
vd4_2698 =vd3_2698+nrd_2698    ;virtual DUART #4 base address
;
ch1_2698 =vd1_2698             ;channel #1 base address
ch2_2698 =vd1_2698+nrc_2698    ;channel #2 base address
ch3_2698 =vd2_2698             ;channel #3 base address
ch4_2698 =vd2_2698+nrc_2698    ;channel #4 base address
ch5_2698 =vd3_2698             ;channel #5 base address
ch6_2698 =vd3_2698+nrc_2698    ;channel #6 base address
ch7_2698 =vd4_2698             ;channel #7 base address
ch8_2698 =vd4_2698+nrc_2698    ;channel #8 base address


It should be clear from the above that some DP pointers could be used with (dp,X) to select the correct channel. The DP pointers would point to the virtual DUART and that base address would be modified by the channel number (0 or 1) to select the proper channel. Any resemblance to the concept of major/minor UNIX device identification is not purely coincidental. :shock:

It may be that a combination of (dp,X) and (<offset>,S),Y would work best. The only time penalty with stack pointer relative addressing is setting up the stack and then cleaning it up when through. With (dp,X), the addresses in DP would be set up once and then not changed. However, quite a bit of DP space would be consumed. Countering that with the '816 is the fact that DP can be defined anywhere in the first 64KB of RAM, which means code talking with the 2698 could have a private DP store.

When I get to the point where I'm ready to write the 2698 driver I'll take a closer look at which method would be best.

————————————————————————————————————————————————————————
EDIT: I further allude to the use of (<dp>,X) addressing in my running topic on POC V2, which is equipped with a QUART and thus is a prime candidate for more compact device driver code.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: AndrewP and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: