6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 10, 2024 9:33 pm

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Tue Jun 28, 2022 4:25 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Dr Jefyll wrote:
Now I'm wondering if you simply made a typo when you said, "I don't want to give up the direct page indexed Y modes." I think the modes used in the snippet above are called direct page indirect indexed.
By "modes" I meant normal, indirect and indirect long indexed. I listed them all earlier in my post and figured (wrongly) that using "modes" would be clear that I meant them all.

Dr Jefyll wrote:
Also, I don't see why this code wouldn't work if D is being used as the data stack pointer. Wouldn't you want F2 and W1 to be items on the data stack? You'd could do something like this...
Code:
        lda [tos],y              ; compare dictionary entry with word in work buffer
        cmp (nos),y              ; matched so far?
... where tos is a named value meaning 0, and tos likewise is 4. Am I missing something still?
You're right. The code works fine if F2 and W1 are on the stack. What you lose with D being the data stack pointer is that F2 and W1 have to be on the data stack. There is no option for them to be static. No problem, right? Just put everything on the stack.

Unfortunately, this runs counter to a basic design I've used, dating back to my first Forth, which was token threaded. To keep the size small, given my SBC at the time had limited ROM, I factored code as much as possible and to keep things reasonably fast I bypassed error checking and stack effects for these factors. For example, to compile, I have (shown here is my 6502 TTC code):

Code:
; , ( x -- )
xt_comma:
        jsr underflow_1
        lda TOS,x
        ldy TOS+1,x
        jsr comma
        inx                     ; clear x from stack
        inx
        jmp NEXT

; compile A
comma:
        jsr c_comma             ; store lsb TOS item at DSP
        tya
        jsr c_comma             ; store msb TOS+1 item at DSP+1
        rts

The factor comma is called 15 times in my TTC Forth (the 65816 version is called 18 times in my current STC Forth). Primitive words that call this as an intermediate step avoid putting the value in the accumulator on the stack, only to delete it again and avoid error checking, which isn't needed for internal usage. It has a cost though. The word itself has to call the factor, incurring the cost of a subroutine call and return. But speed wasn't a concern in my TTC Forth.

Maybe blindly, I've kept this design philosophy going forward. Generally, if a primitive word uses another non-trivial word as an intermediate, it calls a factor of that word's essential code, bypassing any stack effects and error checking. Parameters are passed via registers when possible, but at times I use statics when I need to pass more data. F2 and W1 are statics used in a factor of FIND. It's called 4 times in my TTC Forth and 5 times in my STC Forth.

I suppose I should reevaluate my design choice, given that I no longer have a memory constraint. However, while using the stack more effectively with the 65816's added address modes is appealing, the added cost of adjusting the data stack pointer, gives me pause. I have roughly about 140 words (probably less) that adjust the stack size. Using D as a data stack pointer would add about 560 cycles overall. I'm guessing that adding intermediate values to the stack as well would increase that many times, resulting in a generally less efficient Forth. I don't incur that cost using X as a data stack pointer, but as you said, doing so I may have an added cost in long access words. Given that I haven't coded my long access words, or even considered much how I'll use them, I'll leave this design choice to another day.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 30, 2022 5:00 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
tmr4 wrote:
You're right. The code works fine if F2 and W1 are on the stack. What you lose with D being the data stack pointer is that F2 and W1 have to be on the data stack. There is no option for them to be static.
Okay, got it. But perhaps the following suggestion will be helpful if you haven't thought of it already.

One of the many things the '816 can do which the 6502 can't is hold a 16-bit value in X or Y... and these are registers which are capable of addressing memory. :idea: Of course, Absolute,X mode obliges you to include a 16-bit offset, and the same is true for Absolute,Y mode. But you have the freedom to make the 16-bit offset equal to zero, in which case X or Y can point straight at the item you want to address. (Of course that's assuming the item is in the bank that's indicated by DBR.)

Admittedly, there's the overhead of loading X or Y beforehand, and indeed you may also need to save/restore the previous value. But at least...
  • it lets you reach any arbitrary address within 64K, and...
  • it doesn't require the usual 2-byte pointer in Direct Page.

Hope this helps...

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 30, 2022 10:08 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Dr Jefyll wrote:
But you have the freedom to make the 16-bit offset equal to zero, in which case X or Y can point straight at the item you want to address. (Of course that's assuming the item is in the bank that's indicated by DBR.)

Admittedly, there's the overhead of loading X or Y beforehand, and indeed you may also need to save/restore the previous value. But at least...
  • it lets you reach any arbitrary address within 64K, and...
  • it doesn't require the usual 2-byte pointer in Direct Page.

Thanks for the idea. I think I've read of that "trick" from an example of the special syntax needed to force the assembler to use absolute indexed addressing in that case (see clarifying edit below). But I don't think I've thought of it as an option to the indirect indexed Y mode. It looks like it's a cycle faster as well. I'll keep it in mind for future use.

I don't think it helps me in most of my uses of the mode though since in a loop you'd need to have the ending address to compare to. I normally have a length, like in a counted string. I'll have to look for times where I calculate the length as an intermediate step. There could be some savings there. (Edit 2: But usually only when a comparison would be made in either case to end the loop. Using length as an index gives more opportunity to end a loop on a flag check, avoiding the comparison, which would probably add at least 5 cycles to each loop.)

Edit for future reference for ca65 assembler users: except for LDX and STX, special syntax is only needed for the absolute,X mode as no other instructions have a dp,Y mode to confuse with absolute,Y. For example, the syntax LDA a:0,X gives absolute,X. LDA a:0,Y works for absolute,Y but LDA 0,Y is sufficient. LDA 0,X or LDA $0000,X will get you the direct page indexed X opcode. The WDC syntax, LDA !0,X doesn't work in ca65, but gets you the direct page indexed X opcode on address 1 since ! is the NOT operator in ca65.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 03, 2022 1:38 am 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
With respect to Dr. Jefyll's remarks:
Dr Jefyll wrote:
... One of the many things the '816 can do which the 6502 can't is hold a 16-bit value in X or Y... and these are registers which are capable of addressing memory. :idea: Of course, Absolute,X mode obliges you to include a 16-bit offset, and the same is true for Absolute,Y mode. But you have the freedom to make the 16-bit offset equal to zero, in which case X or Y can point straight at the item you want to address. (Of course that's assuming the item is in the bank that's indicated by DBR.)

Admittedly, there's the overhead of loading X or Y beforehand, and indeed you may also need to save/restore the previous value. But at least...
  • it lets you reach any arbitrary address within 64K, and...
  • it doesn't require the usual 2-byte pointer in Direct Page.


... and the issue of register allocation, I looked back at the former member BitWise's intriguing "Direct Page Data Stack" concept, earlier in this thread:
BitWise wrote:
I've been working on a 65C816 based direct threaded Forth for a while and was following the 65(C)02 approach of keeping the data stack pointer in X but wasn't happy with the resulting code so at the weekend I decided to try a new approach.

I keep the processor in 16/16 mode with the Forth IP in Y and the data stack pointer in DP. This means stack operations incur a small overhead as DP must be moved to/from C for adjustment but gives you use of the full set of zero page addressing modes when accessing the stack. All my primitives are coded to keep the data and return stacks safe during interrupts. For example:
Code:
PLUS:
  CLC
  LDA <3
  ADC <1
  STA <3
  TDC
  INC A
  INC A
  TCD
  JMP NEXT

FETCH:
  LDX <1
  LDA !0,X
  STA <1
  JMP NEXT

Whilst this limits me to having data and return stacks on bank 0 I don't think that will be a big problem as neither needs to be that large.

...


By contrast, if you use the IP in X, for a JMP (0,X) based NEXT, you can keep Y as a "work" register, use the hardware stack as the data stack, and then using Dr Jefyll's point for @ you might have:

Code:
PLUS:
  CLC
  PLA
  ADC 1,S
  STA 1,S
  INX
  INX
  JMP (0,X)

FETCH:
  PLY
  LDA 0,Y
  PHA
  INX
  INX
  JMP (0,X)


The operations suggested by Brad Rodriguez in Moving Forth, for working through for looking at different register options for a processor, are the ones that Guy Kelly used in 1992 to compare 19 IBM-PC Forths, NEXT, ENTER, EXIT, DOVAR, DOCON, LIT, @, ! and + as well as adding DODOES, SWAP, OVER, ROT, 0= and +!

It will always be a juggling act, where absolutely optimizing for one operation will slow down one of the other ones (for instance, using Y as a work register means retaining the Rack index in a direct page location and loading it into Y for use), so when the comparison is close enough, two different people can look at the very same set of alternative register allocations, and arrive at different conclusions as to which is preferable.


Last edited by BruceRMcF on Sun Jul 03, 2022 1:51 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 03, 2022 10:24 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
(Just to note, in case anyone is unaware, Andrew is unfortunately no longer with us:
Andrew Jacobs (BitWise) has passed away
but it's great that his ideas are still bearing fruit.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 03, 2022 1:53 pm 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
BigEd wrote:
(Just to note, in case anyone is unaware, Andrew is unfortunately no longer with us:
Andrew Jacobs (BitWise) has passed away
but it's great that his ideas are still bearing fruit.)


Quite. I posted in haste, after coming home from the end of a 12 hour overtime Saturday shift, and what sparked my comment was left entirely obscure. I've updated the comment to try to make it clearer that I was using BitWise's intriguing suggestion as a springboard in reacting to a point in Dr Jefyll's post.

It occurs to me just now that one could also flip BitWise's use of the direct page register around and use the direct page as the return stack, but haven't sorted that one out at all.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 03, 2022 6:42 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
BruceRMcF wrote:
By contrast, if you use the IP in X, for a JMP (0,X) based NEXT, you can keep Y as a "work" register, use the hardware stack as the data stack, and then using Dr Jefyll's point for @ you might have:

Code:
PLUS:
  CLC
  PLA
  ADC 1,S
  STA 1,S
  INX
  INX
  JMP (0,X)

FETCH:
  PLY
  LDA 0,Y
  PHA
  INX
  INX
  JMP (0,X)

That's a tidy bit of code. If I look at just that, I'm tempted to jump on a previously discussed strategy to keep TOS in A for an even tidier implementation, but a much larger sample would have to be studied to see if the idea has serious drawbacks:
Code:
PLUS:
  CLC
  ADC 1,S
  PLY        ; NIP
  INX
  INX
  JMP (0,X)

FETCH:
  TAY
  LDA 0,Y
  INX
  INX
  JMP (0,X)

One thing is that return stack operations pay a significant price:
Code:
enter:
   ldy RSP
   dey
   dey
   stx 0,y     ; push old thread addr
   sty RSP
   plx        ; pop new thread addr -1
   inx
   jmp (0,x)

exit:
   ldy RSP
   ldx 0,y
   iny
   iny
   sty RSP
   inx
   inx
   jmp (0,x)

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 04, 2022 2:45 am 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
Precisely ... NEXT is fast, but either ENTER/EXIT pay a penalty, or the rack index is in Y and @ ! etc. pay a penalty.

But with the 65816 having a 16bit LDA (zp) and STA (zp) maybe not so much of a penalty ... maybe:
Code:
FETCH:
  STA W
  LDA (W)
  INX
  INX
  JMP (0,X)

STORE:
  STA W
  PLA
  STA (W)
  PLA
  INX
  INX
  JMP (0,X)


ENTER:
  DEY
  DEY
  STX 0,Y
  PLX
  INX
  JMP (0,X)

EXIT:
  INY
  INY
  LDX 0,Y
  INX
  INX
  JMP (0,X)


One other loss with TOS in A and retaining Y as the rack index is the slower SWAP:
Code:
; A in TOS, Y as a free work register
SWAP:
  PLY
  PHA
  TYA
  INX
  INX
  JMP (0,X)

; A as TOS, Y dedicated to the Rack index
SWAP:
  STA W
  PLA
  PEI W ; note some assemblers use PEI (W)
  INX
  INX
  JMP (0,X)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 04, 2022 6:31 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I like how you riff, Dr. Bruce ... IP in X, RSP in Y, SP in S, maybe TOS in A, and UP in D, right? It all seems to work pretty cleanly in bank 0, but things get a bit messy elsewhere, and that's where my experience falls short, and the '816 starts to feel like a burr under my saddle, because the on-chip registers are simply 33% too narrow to do a proper job.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 04, 2022 12:00 pm 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
barrym95838 wrote:
I like how you riff, Dr. Bruce ... IP in X, RSP in Y, SP in S, maybe TOS in A, and UP in D, right? It all seems to work pretty cleanly in bank 0, but things get a bit messy elsewhere, and that's where my experience falls short, and the '816 starts to feel like a burr under my saddle, because the on-chip registers are simply 33% too narrow to do a proper job.


I think how messy things get elsewhere depends on whether you are going for a large memory model or not. If you are going for a medium memory model, it's pretty clean. Put the Forth dictionary and return stack along with a handful of far memory access buffers in $01000-$01FFFF, the data stack, direct page / User variable, system support code, interrupt processing etc. in $000000-$00FFFF, and treat RAM above as a RAMdisk with dedicated words copying data into the far memory access buffers, and all of the handling of the memory segment registers is wrapped inside the buffer fill words.

If there is going to be support for two to four preemptive multi-tasking threads, I could see Bank 1 as:
  • 256 bytes for the return stack, $01FF00-$01FFFF
  • 16 16byte far RAM buffers, $01FE00-$01FEFF
  • 12 128byte far RAM buffers, $01F800-$01FDFF
  • 6 1K BLOCK Buffers, $01E000-$01F7FF
  • 56K Forth Dictionary, $010000-$01DFFF

I guess this assumes that memory mapped device IO is in Bank0, but if its in Bank 1, just bump Bank 1 above to Bank 2.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: