6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 8:32 am

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Dec 08, 2016 2:34 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
I'm trying to figure out a good way to check for over- and underflow with Liara Forth for the 265SXB (65186). Tali Forth pretty much just ignored the whole problem, but we're trying to up our game here somewhat, you know, like a real Forth or something. This while putting speed before size (within reason).

Part of the problem is that I'm keeping TOS in A, which makes a lot of words a lot faster. NOS is then on a Direct Page with X as the Data Stack Pointer (so lda.dx 00 / LDA $00,X is the second stack entry, lda.dx 02 / LDA $02,X third stack entry, etc).

So far, the only thing I can think of is creating a counter in the Direct Page (Data Stack Counter, DSC) and DEC/INC it parallel with stack operations. The count itself would start with zero, not one, so that $FFFF would signal an empty stack, which is easy to test for with help of the n flag. Overflow is a bit more tricky, but I'm worried less about that, having yet to ever have an actual overflow in real life Forth usage.

This way, DROP becomes
Code:
lda.dx 00 ; (LDA $00,X) move NOS to TOS
inx
inx
dec.d dsc ; (DEC DSC) decrease Data Stack Counter
rts
We check for underflow after we return to the main interpretation loop with something like
Code:
ldy.d dsc ; (LDY DSC) Y is used as scratchpad register
bmi underflow_error
What I don't like is that I'm getting into a bookkeeping nightmare where one wrong dec.d/inc.d would mess stuff up royally. Also, this seems somewhat inelegant, though still better than an outright comparison. This can't be a new problem - anybody have suggestions?


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 08, 2016 5:20 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
I can think of some solutions involving hardware -- and not necessarily a lot of it. You could periodically STX to a port whose upper bits, if altered, trigger an interrupt. (Come to think of it, only one bit need be checked.) The lower bits would define the permissible stack space (a power of two, of course). Or you could have memory decoding that detects access to dedicated "no fly" zones, one above and one below the stack area, likewise triggering an interrupt. But I suspect you want a software-only solution.


I'd move away from the idea of testing the stack pointer within every word that grows or shrinks the stack. An alternative solution is to check less frequently (and less precisely). This is off the top of my head.

At key places the compiler could insert an actual Forth word *specifically* for checking the stack. The word would get inserted once within every DO LOOP and likewise within conditionals that can cause looping, such as BEGIN UNTIL and BEGIN WHILE REPEAT. Or -- better yet -- the branching code compiled by these conditionals could do the stack check. Any backward branch.

The premise is you're unlikely to greatly over- or underflow the stack during the course of in-line execution. You might exceed the stack limits by a few cells, but that's acceptable if the stack "limits" are made a bit smaller than the actual amount of RAM allocated. IOW, there'd be a cushion, and it's alright if enforcement of the limit is somewhat inexact.

Am I missing anything? It seems to me you'd need looping in order to produce an over- or underflow that exceeds the built-in tolerance. And we've got the looping aspect covered.

Edit: Recursion. Hmmm... You could have a stack-check word that you'd manually include in any word that calls itself. Better to have something automatic, though. Edit ps: besides the parameter stack, with recursion it'd be good to check the Return stack, too.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 08, 2016 8:48 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
scotws wrote:
This way, DROP becomes
Code:
lda.dx 00 ; (LDA $00,X) move NOS to TOS
inx
inx
dec.d dsc ; (DEC DSC) decrease Data Stack Counter
rts

If I counted right, including the JSR to it, that's 27 clocks, instead of just the four clocks for INX, INX. Are you sure you want to do that? It's seven times as slow. It seems like it's trying to solve a problem that doesn't exist. Just have QUIT show the stack depth with Ok { 0 } (or whatever the depth is) and have INTERPRETER use ABORT" Empty stack".

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 12:01 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Dr Jefyll wrote:
The premise is you're unlikely to greatly over- or underflow the stack during the course of in-line execution. You might exceed the stack limits by a few cells, but that's acceptable if the stack "limits" are made a bit smaller than the actual amount of RAM allocated.
That's the way Tali Forth does things, with "flood plains" that the system could march all over without much harm being done. I'm hoping to be a bit more precise this time, but if this does turn out to be such a drag on speed (the higher priority, because I certainly have enough memory this time), I might go back to that ...


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 1:06 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
GARTHWILSON wrote:
Are you sure you want to do that? It's seven times as slow.
Well, I sat down and compared some implementation speeds of common words for DP-only Data Stacks (X as DSP), and each of A or Y as TOS - see https://github.com/scotws/LiaraForth/bl ... ariants.md for a summary and https://github.com/scotws/LiaraForth/bl ... narios.txt for some of the implementations (the others I did on paper). DROP is pretty much the only case where DP-only is faster; note the times are without JSR/RTS because these short native coded commands will be inlined anyway. - Actually, I see there is an error in the summary, NIP is INX INX for A and Y as TOS, but then how often do you use NIP anyway?

(It is surprisingly hard to get a list of how frequently Forth words are used. I found https://users.ece.cmu.edu/~koopman/stac ... ec6_3.html which is very biased towards heavy math stuff, with Tower of Hanoi and the N Queens problem. Obviously, there is a lot of CALL and EXIT, which in my STC Forth is taken care of by JSR/RTS. What I have realized since then is that I should analyze the word frequency in my own Crude Emulator, because that reflects my coding preferences - like DEFERs all over the place - which is what realistically speaking will really count.)

So that is where the idea of TOS in either A or Y came from, but I hadn't thought about over- and underflow checking. Any stupid counter would destroy that speed advantage.

I think I'm going to have to go back to the drawing board though and rethink the Data Stack. Currently, X and Y are 16 bit by default, because, ooh, shiny new 65816! But the nice thing about a 8-bit Data Stack Pointer is that you can declare the stack as 128 bytes (as 64 cells is more than enough) and then just check the MSB of X for anything that falls out of that range. That, however, would remove the ability to quickly shuffle around the TOS with TAY/TYA (Y is currently the scratch register), and then the whole "A as TOS" might not be worth it any more.

It certainly adds complexity to the code, that I can already say.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 7:19 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
scotws wrote:
It is surprisingly hard to get a list of how frequently Forth words are used.

You could add a mechanism to keep track of usage frequency in your own projects, and optimize for that.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 7:44 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
If it's not STC, it wouldn't be too hard to put it in NEXT. So where would the count go? Ideally there might be a counter field in every header. If it's an assembly source, you could modify the header-forming macro, and otherwise, the definition of CREATE, so there's not too much work. Obviously that won't work if it's in ROM though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 3:56 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
viewtopic.php?f=9&t=3293&hilit=ertl

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 5:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
> Dynamic trace results for three complex programs
Looks great - thanks for collecting and publishing the data!


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 6:17 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Dr Jefyll wrote:
[...]
scotws wrote:
That's the way Tali Forth does things, with "flood plains" that the system could march all over without much harm being done. I'm hoping to be a bit more precise this time, but if this does turn out to be such a drag on speed (the higher priority, because I certainly have enough memory this time), I might go back to that ...
Just making sure we're on the same page here. "Flood plains"?? Nice imagery! :) But what you describe seems different from what I propose. Yes there'd be spaces added for padding, but they'd be small, and their role is secondary.

The gist of the idea is to avoid the need to be highly precise about checking. If you allow a bit of slop then the bounds check can be performed much less frequently (as outlined in my post). That's the key point -- less frequent checking so there'll be less drag on speed. The pad space would only need to be big enough to hold as many cells as are likely to be pushed or popped between checks. Apologies if I'm overlooking something.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 8:51 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Okay, let me try to be a bit more systematic about the whole thing. The way I see it, the Data Stack can have four states, two of them legal, two of them errors:

    Empty: legal
    One or more entries: legal
    Overflow: error
    Underflow: error

(I think part of my problem so far was to consider "underflow" and "empty" to be a single case.)

A word can rightfully expect to start off with the Data Stack in a legal state, that is, either empty or with one or more entries. In fact, it doesn't need to check for over- or underflow at all, because those are fatal errors and we crash to ABORT anyway. As stated above, we can move the check for these outright error conditions to the interpreting loop after the word is completed, which saves space.

It's the legal (!) empty status that can be the real problem. Some words like 1+ only work if there is a non-empty stack; SWAP needs at least two entries, ROT even three. We can either accept those errors silently (which is fastest) or check in the word itself (which takes longer). After some thought, I think Liara Forth should be sophisticated enough to catch them, so I'm modifying the "speed first" goal by an old journalism rule:

Be first, but first be right!

So how do we do that? Using X as the Data Stack Pointer (DSP) pretty much seems the way to go because of the addressing modes on the Direct Page. With the help of wrapping, we can use the MSB of a byte to signal over- or underflow and have 128 bytes of space, which gives us a stack depth of 64 16-bit cells: Simply check the n flag of X after the word returns. We can even do this with a 16 bit X register with something like:
Code:
txa
and.# 0080 ; (AND #$0080) mask all but over/underflow bit
bne stack_out_of_bounds
(Note that by playing with the operand of the AND instruction, we can even adjust the legal depth of the Data Stack). AND is three cycles, which is an acceptable price for being able to keep X and Y 16 bit wide.

How to we signal an empty stack? The first instinct would be to use $0000 in X. However, we want something that triggers underflow for (say) DROP automatically, and INX INX just gives us $0002. So we should go with $007E as the "empty" value: INX INX produces $0080. For a word that requires one entry on the stack, we now include in its code the check
Code:
cpx.# 007e ; (CPX #$007E)
beq stack_empty
which costs us five cycles if the branch is not taken ("... but first be right").

With $007E in X signaling an empty stack, we shouldn't have to care if TOS is an register such as A or Y or part of the DS (I think). With Y as TOS, for 1+ we get
Code:
cpx.# 007e
beq error_stack_empty
iny
rts
and with TOS on DP we get
Code:
cpx.# 007e
beq error_stack_empty
inc.dx 00
rts
though we really don't want to do that because INC $00,X is brutal eight (!) cycles, while INY is only two. So TOS would probably be Y, not A, because we need A for the ANDing etc.

At least that's the idea at the moment. I'm going to have to do some walkthroughs with pencil and paper first before I trust this, though.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 9:52 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
barrym95838 wrote:
http://forum.6502.org/viewtopic.php?f=9&t=3293&hilit=ertl
Ah, I had missed that. Thank you!


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 9:52 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
There's a joke, something like this:
Two mathematicians are sitting in a cafe watching the house on the other side of the street. They see two people going into the house. Time passes. After a while they notice three people coming out... "Now if one more person goes in, it will be empty."

[If you check after a function, it's not enough to say the stack is healthy at that point. Watch out for underflow.]


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 10:25 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
scotws wrote:
It's the legal (!) empty status that can be the real problem. Some words like 1+ only work if there is a non-empty stack; SWAP needs at least two entries, ROT even three. We can either accept those errors silently (which is fastest) or check in the word itself (which takes longer).

I left 8 bytes behind there so things like that wouldn't do any damage. After all the intended execution of a word being tested, any empty-stack condition is caught by INTERPRET with ABORT" Empty stack". This is only in development, and is never needed after development is completed, and has no effect on execution speed. You could still mess it up with something like 5 ROLL but how often are you going to do that and have inadequate depth? In my 25+ years of playing with this, I've never had the problem.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 09, 2016 11:43 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
GARTHWILSON wrote:
In my 25+ years of playing with this, I've never had the problem.
Good point. I've also redone some of the words with the checks included (because measuring stuff is better than thinking about it, right?), and the number of cycles this adds to the words is sickening. Take @ (FETCH):
Code:
           TOS DP             TOS Y REG

        === @ ( addr -- u ) ================

        cpx.# 007e   3/3
        bne a        2/3      (as TOS DS) 11/6
        lda.# e_code 3/(3)
        jmp error    3/(3)

      a ldy.dx 00    2/5    a lda.y 0000  3/5
        lda.y 0000   3/5      tay         1/2
        sta.dx 00    2/5

                    -----               ------
                    18/21                15/13
Left side is the TOS in Direct Page, right in the Y register, the numbers are bytes/cycles, speed is best case scenario. "First be right" is all well and good, but yikes, that is a pretty major hit. I believe Terry Pratchett had this quote about idealism not having much chance against large numbers ...


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: