6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 16, 2024 3:57 pm

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Dec 12, 2014 3:59 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Making a Forth seems interesting, now that R1 is actually somewhat underway. While I wait for parts, this'll keep me occupied, so... :P

I wanted this to take advantage of the '816's extra addressing modes, especially the movable direct page, so here are some of my thoughts. I'd like some feedback on this. I DO want to be a bit more creative than, say "use X and Y solely for indirect calculations and make them otherwise volatile." Since I don't have many registers to play with, and since the '816 doesn't have a huge penalty for memory access, I might as well use X and Y to keep track of the two stacks in some manner that take advantage of their unique addressing modes (as you can see, I haven't figured out how yet).

All registers are 16-bit, unless explicitly changed or an operation requires a char. All commands, internal or new ones, must start in 16-bit A/16-bit XY and finish in it.

SP: Pointer to the Data Stack. ?kb
DP: Always points to the Page where the SP currently resides. This is so DP modes can be used to quickly modify the stacks.
Y: Complements SP?
X: Pointer to Return Stack?
C: Math operations, of course!
DBR: Bank 1, Points to the dictionary. ?kB
PBR: Bank 0, Points to all code implemented by Forth Words. ?kB
Bank 2: Default bank for loading/storing data from the data stack when @/!
Other banks: Unspecified- can be compiled code, new Forth Words, or Data.

Threading Model:
Initially I was planning on what Wikipedia calls string threading (just use a hash table to quickly compare against the dictionary), but I defer to the experts: What's a good tradeoff between speed and ease to implement. IIRC, JonesForth uses Indirect Threading, because otherwise constructs like "creating new Forth words from combining existing Forth Words" isn't possible with Direct.

Dictionary:
Hash table for words as keys, address in Bank 0 as value.* If I can find a good hash and/or can be bothered to implement one in assembly :P.

*This does not handle "creating new Forth words from combining existing Forth Words". Again, if anyone can give advice, I'm all ears.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 12, 2014 4:46 am 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
cr1901 wrote:
Threading Model:
Initially I was planning on what Wikipedia calls string threading (just use a hash table to quickly compare against the dictionary), but I defer to the experts: What's a good tradeoff between speed and ease to implement. IIRC, JonesForth uses Indirect Threading, because otherwise constructs like "creating new Forth words from combining existing Forth Words" isn't possible with Direct.

Your basic options for threading models are Indirect Threading, Direct Threading, Subroutine Threading, and Token Threading. Treat String Threading with the contempt that it deserves as a Forth implementation technique.

I've used Indirect Threading to good effect, and it is the classic model. I've seen a non-forth stack VM (running on microcoded hardware) use Token Threading to good effect, and I imagine that it would be decent on an '02 or '816 as well. There have been a couple of threads around here on using Subroutine Threading to good effect on an '02 or '816. I don't really have anything to say about Direct Threading, though. For more information (including a discussion of tradeoffs and how to implement some of the tricky bits of the Forth model) have a look at http://www.bradrodriguez.com/papers/moving1.htm and its successor pages (the author is one of the members here, IIRC).

Also, take a lot of what JonesForth does and its commentary with not just a pinch of salt but an entire salt lick. I'm currently using what claims to be jonesforth.f 1.17 and jonesforth.S 1.45 (jonesforth version 45?) as a bootstrap host for my own x86/linux forth implementation (as of two days ago it finally can metacompile itself and produce a bit-identical image to what JonesForth produces, but it was a pain getting to that point). My "jonesforth-fixes" file contains a number of comments "Jonesforth has a non-conforming XXX. Conform it." for various values of XXX, and implements DOES> which the JonesForth source claims is impossible with their model.

Writing your own Forth can be great fun, and a great challenge. I wish you well with it.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 12, 2014 6:51 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8491
Location: Midwestern USA
cr1901 wrote:
SP: Pointer to the Data Stack. ?kb
DP: Always points to the Page where the SP currently resides. This is so DP modes can be used to quickly modify the stacks.
Y: Complements SP?
X: Pointer to Return Stack?
C: Math operations, of course!
DBR: Bank 1, Points to the dictionary. ?kB
PBR: Bank 0, Points to all code implemented by Forth Words. ?kB
Bank 2: Default bank for loading/storing data from the data stack when @/!
Other banks: Unspecified- can be compiled code, new Forth Words, or Data.

Keep in mind that regardless of where you set DP and SP, direct page and stack accesses can only occur in the first 64KB of RAM. Also, interrupt vectors must be in bank $00 and at least some of the interrupt code must be there as well. Executable code can be run anywhere as long as it will fit into a single bank—there is no direct programmatic means by which you can change PB.

DP should always point to the zeroth address in a page to maintain performance, meaning DP should always be $xx00, where xx is any page. If DP is pointed to any location in a page other than the zeroth address, there will be a one clock cycle penalty per direct page access, producing the same level of performance seen with 16 bit absolute accesses.

Your data structures can be anywhere without regard to bank boundaries, assuming you are using [<dp>] and/or [<dp>],Y addressing. You don't have to be concerned with DB when addressing memory in this way. However, keep in mind that when an interrupt hits DB may not being pointing where you want it to be, requiring that you push DB during the ISR front end and load an alternate value into DB. Otherwise, your ISR has to use 24 bit addressing, which uses more clock cycles that 16 bit address.

Don't be too rigid with your memory mapping.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 31, 2014 3:24 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Threading models: The best introduction I know of is http://www.bradrodriguez.com/papers/moving1.htm by Brad Rodriguez (cyberstalking Brad is generally a good way to learn about Forth). The traditional view seems to be that Direct Threading (DTC) produces the best speed, Token Threading (TTC) gives you the smallest size, Subroutine Threading (STC) is the simplest to understand, and Indirect Threading (ITC) is mostly of historical importance.

However, this is all for processors that have enough registers for W and IP and stuff. We're stuck with A, X, and Y, so I would argue that SBC should be given more serious consideration than with, say, a 68000. This does away with various registers at the cost of size, which shouldn't be that big of a problem with a 65816. If this is your first Forth, SBC is by far the simplest to understand, and the system stack almost automatically becomes the Return Stack. This gives you X as the Data Stack Pointer on the Direct Page (Zero Page with the 65c02), and Y to fool around with.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 31, 2014 7:29 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8541
Location: Southern California
Three of Bruce's relevant posts, from the Forth section of my links page:
viewtopic.php?t=584 Bruce Clark's 2-instruction 65816 NEXT in ITC Forth
viewtopic.php?t=586 Bruce Clark's single-instruction, 6-clock 65816 NEXT in DTC Forth
viewtopic.php?p=3335: Bruce Clark explanains how the faster-running STC Forth avoids the expected memory penalties. He gives 9 reasons, starting in the middle of his long post in the middle of the page. STC of course eliminates the need for NEXT, nest, and unnest, thus improving speed.

Quote:
so I would argue that SBC should be given more serious consideration[...] SBC is by far the simplest to understand,

Autocorrect problem there? The only meaning I can think of for "SBC" is "single-board computer."

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 01, 2015 12:52 am 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
GARTHWILSON wrote:
Quote:
so I would argue that SBC should be given more serious consideration[...] SBC is by far the simplest to understand,

Autocorrect problem there? The only meaning I can think of for "SBC" is "single-board computer."

Odd, the first thing that comes to my mind is SuBtract with Carry.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 01, 2015 1:30 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I'm almost positive that Scot meant "STC" ... it makes the most sense in the context of the 6502's paucity of hardware registers. IP is PC, and everything else he said falls into line if you re-read his post with STC in place of SBC.

Mike


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 01, 2015 4:08 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Argh, yes, that would be STC of course. Can't even blame it on autocorrect, I'm on night shift, and that cuts my IQ by half. Thanks for the links, Bruce's discussion of STC (not SBC) was fascinating.

Oh, and happy New Year everybody!


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 3:52 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Here's what I have so far... any critique? I should get this sorted out now before I implement too many commands. It is based upon what I saw in Brad's articles. Y might be usable for something else, as my current NEXT implementation is able to go without an X working register despite being ITC.

Code:
65816 Forth VM Model:
X- Data Stack Pointer
S- Return Stack Pointer
PB- Zero for ROM Code
DB- Zero for Dictionary (for now)
D- Arbitrary in Bank 0, page-aligned

C- "Cache" for W Work Register
Y- X Work Register, used for indirect calculations

IP- Direct Page
W- Direct Page
UP- Direct Page,

where:
W- Work Register (used for indirect)
IP- Forth Instruction Pointer
PSP- Parameter/Data Stack
RSP- Return Stack Pointer
X- Second Work Register (used for second indirect)
UP- User Pointer

The goal is to eventually implement an ANS Forth system with extensions for FAR store, FAR load, FAR execute, and FAR compile (with the restriction that code to be executed must be in the same bank is its dictionary entry), etc. Perhaps FAR compile might even be the default for colon.

Here is my NEXT, DOCOL, and SEMI_S implementation. ForthIP and ForthCodeAddr are in the direct page. I take advantage of direct indirect to get the require second indirection to jump to a new thread. A (currently-unused) User Pointer is also in the direct page.
Code:
;Pseudocode:
;MOV W, (IP)
;IP++
;MOV X, (W)
;JMP (X)
.MACRO NEXT
   lda (ForthIP)
   inc ForthIP
   inc ForthIP ;Add two to IP. A has address of next code address to execute
   sta ForthCodeAddr ;Points the next code fragment
   jmp (ForthCodeAddr) ;Jump to the next code fragment
.ENDM

;Psuedocode
;MOV (RSP), IP
;RSP++
;ADD W, 2
;MOV IP, W
;Next:
;MOV W, (IP)
;IP++
;MOV X, (W)
;JMP (W)
.MACRO DOCOL
   lda ForthIP
   pha
   inc ForthIP
   inc ForthIP
   sta ForthCodeAddr
   NEXT
.ENDM

;Psuedocode
;MOV IP, (RSP)
;RSP--
;Next:
;MOV W, (IP)
;IP++
;MOV X, (W)
;JMP (W)
.MACRO SEMI_S
   pla
   sta (ForthIP)
   NEXT
.ENDM


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 6:03 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I can't back up what I'm about to say with actual evidence or experience, just strong intuition ... You should use Y for IP. Your NEXT, DOCOLON and ;S get heavy use, and might benefit greatly from it (untested ITC snips):
Code:
.MACRO NEXT
   lda 0,y
   sta W
   iny
   iny             ;Add two to IP. W has address of next code address to execute
   jmp (W)
.ENDM

.MACRO DOCOL
   phy
   ldy W
   iny
   iny
   lda 0,y
   sta W
   iny
   iny             ;Add two to IP. W has address of next code address to execute
   jmp (W)
.ENDM

.MACRO SEMI_S
   ply
   lda 0,y
   sta W
   iny
   iny             ;Add two to IP. W has address of next code address to execute
   jmp (W)
.ENDM


Mike B.


Last edited by barrym95838 on Sat Jan 31, 2015 9:35 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 9:25 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10980
Location: England
This might be blindingly obvious cr1901, but please could you clarify what mode you're using for the 816? I'd assume 16/16 but it might be best to be explicit. That is, I think you're in native mode with both mode bits clear:
Code:
CLC
XCE
REP #$30


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 9:50 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
BigEd wrote:
This might be blindingly obvious cr1901, but please could you clarify what mode you're using for the 816? I'd assume 16/16 but it might be best to be explicit. That is, I think you're in native mode with both mode bits clear:
Code:
CLC
XCE
REP #$30
Indeed, your assumption is correct. If I need to do byte access, I'll switch the accumulator to 8-bits briefly. Ideally, however, I should never need to touch the index registers.

I have an '816... might as well use it.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 9:59 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
BigEd wrote:
This might be blindingly obvious cr1901, but please could you clarify what mode you're using for the 816? I'd assume 16/16 but it might be best to be explicit. That is, I think you're in native mode with both mode bits clear:
Code:
CLC
XCE
REP #$30

Those were my assumptions in the code that I just edited in. I can't see any way to put an entire dictionary in a direct page, although I wouldn't be completely surprised if someone like Bruce could imagine a way, with split entries, multiple DPs and token threading or something equally intense ... 8)

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 2:58 pm 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
Something I just realized (Brad or someone else, correct me if I'm wrong): Forth primitives which are coded in ASM can use W as they wish, as NEXT will reload it based on the current Forth IP. The only primitive snippet of code that actually relies on W being a certain value on input (pointing to the address of the Code Field of the currently entered word) is DOCOL. I assume/hope DOCOL is always going to be executed after a NEXT that "refreshes" W, even when entered from the interpreter. Therefore, the A/C accumulator is available for use and should be W

I have not figured out the best way to execute a single word and return to the interpreter loop. This means I may have to set W manually in the interpreter loop if I need to jump straight into DOCOL without executing NEXT.

To that end, I think my final register allocation should be the following:
A- W
Y- IP
X- PSP
S- RSP
Direct Page- User Pointer, Mirror of W that can be used to do the second indirection (psuedo-X).


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 31, 2015 4:26 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Code size shouldn't be a major concern on the '816, but if you decide to save some, you could have the convention that all of the CFAs must hold the address -1 of the machine code to which they point. Then STA W would become PHA and JMP (W) would become RTS in my examples above ... at least I think it could work that way.

Mike B.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: