6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Apr 24, 2024 9:57 am

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Feb 12, 2016 11:56 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I've been working on a 65C816 based direct threaded Forth for a while and was following the 65(C)02 approach of keeping the data stack pointer in X but wasn't happy with the resulting code so at the weekend I decided to try a new approach.

I keep the processor in 16/16 mode with the Forth IP in Y and the data stack pointer in DP. This means stack operations incur a small overhead as DP must be moved to/from C for adjustment but gives you use of the full set of zero page addressing modes when accessing the stack. All my primitives are coded to keep the data and return stacks safe during interrupts. For example:
Code:
PLUS:
  CLC
  LDA <3
  ADC <1
  STA <3
  TDC
  INC A
  INC A
  TCD
  JMP NEXT

FETCH:
  LDX <1
  LDA !0,X
  STA <1
  JMP NEXT

Whilst this limits me to having data and return stacks on bank 0 I don't think that will be a big problem as neither needs to be that large.

A useful side effect has been the freeing up of the X register so my inner interpreter has become:
Code:
NEXT:
  TYX
  INY
  INY
  JMP (0,X)

This is only 3 bytes more than a traditional 'JMP NEXT' at the end of a primitive so I think I will inline it once I get a bit more of the environment working.

I have tried searching around to see how other people implemented Forth on the 65C816 but there are very few public references for this device and some of those discuss techniques (like using SP as the IP) which do not lend themselves to a general purpose implementation.

I'll push a snapshot of my code to GitHub this evening. I'm still working on the outer interpreter so its not fully functioning yet.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 12, 2016 4:04 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
I endorse the use of D as the data-stack pointer, for the reason you mention. It's a tradeoff, of course, but on the back burner I have pieces of an '816 Forth kernel that uses D this way. As you say, it allows use of all the "zero page" addressing modes when accessing the stack.

In particular I find it compelling that 24-bit addresses can be passed on stack and dereferenced in situ, bypassing unappealing alternatives such as gymnastics involving the Data Bank Register (DBR).

BitWise wrote:
A useful side effect has been the freeing up of the X register so my inner interpreter has become:
Code:
NEXT:
  TYX
  INY
  INY
  JMP (0,X)

This is only 3 bytes more than a traditional 'JMP NEXT' at the end of a primitive so I think I will inline it once I get a bit more of the environment working.

Inline NEXT is nice, too. But is there a reason you don't save two cycles and simply let IP reside in X, like this? (Besides the slight speed boost, this also frees up Y.)
Code:
NEXT:
  INX
  INX
  JMP (0,X)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 12, 2016 5:31 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Dr Jefyll wrote:
Inline NEXT is nice, too. But is there a reason you don't save two cycles and simply let IP reside in X, like this? (Besides the slight speed boost, this also frees up Y.)
Code:
NEXT:
  INX
  INX
  JMP (0,X)

-- Jeff

I did consider it. DO_COLON would need adjustment, as would the branch addresses. Pre-incrementing the IP is not as easy to understand and the time saving is only a couple of cycles so I'm not sure its worth it. X has better zero page instruction support than Y so its more useful in primitives.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Last edited by BitWise on Fri Feb 12, 2016 8:18 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 12, 2016 7:55 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
BitWise wrote:
X has better zero page instruction support than Y so its more useful in primitives.
You're referring to the four Z-pg indexed address modes, is that right? It strikes me those modes will be rarely used, given that D already indexes to TOS. That's one of the cool things about using D that way! It does take some getting used to.


Code:
FETCH:
  LDX <1
  LDA !0,X
  STA <1
  JMP NEXT
Can you remind me please what the < and ! do? It seems to me fetch can be coded more briefly as follows.
Code:
FETCH:
  LDA (1)
  STA <1
  JMP NEXT

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 12, 2016 8:15 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
In WDC's assembler a '<' prefix forces zero page addressing and '!' is absolute.

Your right. I thought (zp) was limited to bank zero. I'll change that.

This potentially creates a very easy way to create a multi-threaded environment. Put a copy of the forth runtime in the each 64K banks assigned to a thread (say $C000-FFFF) with the user variables location at a fixed address line (say $0000-$0020) and use the space between them for thread specific code and variables. You just need a fairly simple interrupt handler save and restore each threads context from its return stack.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 15, 2016 1:14 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Just out of curiosity, and because I'm (obviously) thinking about a Forth for the 265SXB, why direct threaded Forth? Given how few registers the CPU has, wouldn't a subroutine threaded version be more appropriate?


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 15, 2016 2:23 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
scotws wrote:
Just out of curiosity, and because I'm (obviously) thinking about a Forth for the 265SXB, why direct threaded Forth? Given how few registers the CPU has, wouldn't a subroutine threaded version be more appropriate?

I wanted to try the DTC approach and it will give better code density on an unexpanded board.

For a lot of words I don't think the speed difference will be that great (especially when the NEXT sequence is inlined). In DTC the TYX/INY/INY/JMP (0,X) sequence to get to the next word is 2+2+2+6 = 12 cycles compared to the subroutine threaded RTS/JSR 7+6 = 13 cycles.

Subroutine threaded is quicker calling between compiled words but some primitives like R>, R> and the runtime part of DO/LOOP/+LOOP require complicated stack trickery unless you have a three stacks.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 15, 2016 4:22 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1925
Location: Sacramento, CA, USA
Based on my limited research and experience, I have chosen DTC as my firm favorite as well, at least for "retro"-style platforms. It provides a nice three-way balance between code density, speed, and complexity.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 15, 2016 6:21 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8427
Location: Southern California
I haven't tried this, but I'm intrigued. Bruce Clark explains how the faster-running STC Forth avoids the expected memory penalties. He gives 9 reasons, starting in the middle of his long post in the middle of the page. STC of course eliminates the need for NEXT, nest, and unnest, thus improving speed.

Also:
Bruce Clark's 2-instruction 65816 NEXT in ITC Forth
Bruce Clark's single-instruction, 6-clock 65816 NEXT in DTC Forth

For readers who may be new to Forth and curious: Here's an explanation of five different Forth threading methods, by Brad Rodriguez. There's a list of more of his related articles at http://www.bradrodriguez.com/papers/.

(These are all from the Forth section of my links page.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 26, 2022 9:14 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Dr Jefyll wrote:
... As you say, it allows use of all the "zero page" addressing modes when accessing the stack.

In particular I find it compelling that 24-bit addresses can be passed on stack and dereferenced in situ, bypassing unappealing alternatives such as gymnastics involving the Data Bank Register (DBR).

I was wondering what the latest thinking is on register usage for 65816 Forth. Being able to use the direct page indirect long address mode on the Forth data stack would be nice.

Tantalized by the above quote I tried to find other direct page address modes that would benefit from using the direct page register, D, as the Forth data stack index. Unfortunately, I couldn't come up with any compared to using the X register as the stack index.

The dp and dp,x address modes are comparable as are the (dp) and (dp,x) modes. The remaining direct page address modes, direct page indexed Y, indirect indexed Y, and long indirect indexed Y are all useful when the X register is used as the stack index (at least I use them in my own code) but don't seem to have much value when using D as an index. To do something similar you'd have to manipulate the Forth stack which I've steered away from when possible in my primitive words. Why churn the stack unnecessarily?

Of course, there are other benefits of using D as the Forth stack index, it frees up the X register and direct page instructions take 1 less cycle than the indexed equivalent. On the flip side though, the cycle savings mostly goes away as the low byte of D usually won't be zero and incrementing the index is more involved than when using the X register as an index. Based on a very simplistic view of all words having equal frequency, I'd call these a wash, at least for my code. I want to think that the benefit swings toward using D as an index when word frequency is considered, but I haven't attempted such an analysis.

So, is it worth using D as the Forth data stack index just to gain the indirect long mode for the Forth stack? And is it worth it to give up the direct page indexed Y modes?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 27, 2022 1:39 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
tmr4 wrote:
So, is it worth using D as the Forth data stack index just to gain the indirect long mode for the Forth stack? And is it worth it to give up the direct page indexed Y modes?
To me, direct page indexed Y mode doesn't seem important at all. But the first sentence in this quotation poses a valid question. And there are tradeoffs involved, so the "best" answer will vary according to prevailing circumstances and priorities.

I myself am drawn to the idea of using D as the Forth data stack pointer because the '816 has a large address space, and I want my code to be able to adroitly accommodate and manipulate large data sets. For example, a program may collect, search or compare multiple arrays, each of which exceeds 64K. (Being able to exceed 64K of code is, for me, not a priority.)

I'd say the main tradeoff is between...
  • what works best for the Forth words doing the Long accesses, and
  • what works best for all the other Forth words.
The latter won't benefit from using D as the Forth data stack pointer; in fact, there are definitely some speed bumps. For example -- as you mentioned -- incrementing the index is more involved.

OTOH, if you don't use D then the ugly, painful code ends up in the Long access words instead. To reach beyond 64K you'll need gymnastics involving the Data Bank Register. (Hm, or Self Modifying Code, perhaps.)

To summarize, do you want a Forth that's fairly fast overall but bogs down when large data sets are involved? If large data sets are not the main priority then a person can, quite reasonably, choose not to use D as the Forth data stack pointer.

I myself lean in the other direction, because I want an environment that encourages ambitious goals. I don't mind if some Forth words run a little slower if the payoff is being able to achieve maximum efficiency (and elegance) when reaching beyond 64K.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 27, 2022 3:43 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Dr Jefyll wrote:
To me, direct page indexed Y mode doesn't seem important at all.
I suppose it depends on ones focus. I use them all when working with strings and tables. Of course, there are always many different ways of doing things, but these modes seem most efficient for such things.

Dr Jefyll wrote:
... OTOH, if you don't use D then the ugly, painful code ends up in the Long access words instead.
Yes. I haven't implemented my long access words yet, but when I've needed the indirect long address mode, I have to copy the address to a scratch area. With the X register as an Forth data stack index, my long access words would have to do the same. Not very elegant!

Dr Jefyll wrote:
To summarize, do you want a Forth that's fairly fast overall but bogs down when large data sets are involved? If large data sets are not the main priority then a person can, quite reasonably, choose not to use D as the Forth data stack pointer.

I myself lean in the other direction, because I want an environment that encourages ambitious goals. I don't mind if some Forth words run a little slower if the payoff is being able to achieve maximum efficiency (and elegance) when reaching beyond 64K.
I figured "it depends" would be the current assessment. For me, while I really like the idea of using indirect long addressing on the Forth data stack, I don't want to give up the direct page indexed Y modes. I'll have to live with less efficient long access words, for now at least.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 27, 2022 7:08 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
tmr4 wrote:
I figured "it depends" would be the current assessment.
Yup. The "correct" answer will be determined by your circumstances and priorities.

tmr4 wrote:
For me, while I really like the idea of using indirect long addressing on the Forth data stack, I don't want to give up the direct page indexed Y modes.
Can you give an example of how you're using the direct page indexed Y modes? Somehow I'm drawing a blank on this.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 27, 2022 8:36 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
Dr Jefyll wrote:
Can you give an example of how you're using the direct page indexed Y modes? Somehow I'm drawing a blank on this.

At the risk of exposing my poor coding skills to the world, here is a snippet from my 65816 STC Forth code that compares a dictionary entry against a word in the input buffer (suggested improvements appreciated):
Code:
        a8
@cloop:
        lda [F2],y              ; compare dictionary entry with word in work buffer
        cmp (W1),y              ; matched so far?
        bne @2                  ; not the same, continue
        dey                     
        bpl @cloop              ; continue
        a16

Y is initialized to the word length-1. F2 and W1 are direct page. F2 points to the word's ascii representation in its dictionary entry. My Forth dictionary is located in Bank 0. W1 points to the current parsed word in an input buffer which is in a bank dedicated to the current Forth instance.

The same thing can be done decrementing F2 and W1 directly, but it's not as efficient as just decrementing Y.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 28, 2022 12:22 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Code:
        lda [F2],y              ; compare dictionary entry with word in work buffer
        cmp (W1),y              ; matched so far?

Now I'm wondering if you simply made a typo when you said, "I don't want to give up the direct page indexed Y modes." I think the modes used in the snippet above are called direct page indirect indexed.

Also, I don't see why this code wouldn't work if D is being used as the data stack pointer. Wouldn't you want F2 and W1 to be items on the data stack? You'd could do something like this...
Code:
        lda [tos],y              ; compare dictionary entry with word in work buffer
        cmp (nos),y              ; matched so far?
... where tos is a named value meaning 0, and tos likewise is 4. Am I missing something still?

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: