6502.org • View topic

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Forth

All times are UTC

Extra stacks

Page 1 of 2

[ 21 posts ]

Go to page 1, 2 Next

Previous topic | Next topic

Author

Message

GARTHWILSON

Post subject: Extra stacks

Posted: Fri Jul 09, 2004 7:06 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

I've already discussed this briefly with Bruce but I'll open it up here to the forum.

BACKGROUND

When Forth does floating-point arithmetic, it often uses the regular data stack for FP operations. A FP representation might still be only two cells (like a double-precision number). In the case of the 6502 where a cell is normally 16 bits (so a double is 32 bits), such a FP representation might use three bytes for mantissa and one for exponent.

There have been times I want more precision than you normally get with the 16-bit cells and the 32-bit intermediate results of UM/MOD and M* etc.; but I don't want the complexity or performance penalty of FP, and I don't want the stack mess that comes with having several very long items on the regular data stack (like 4 cells per item). It makes it a nightmare if you have to reach back in the stack with ROLL , PICK, and so on, and keep things straight. I have triple- and quad-precision integer Forth words from Forth Dimensions magazine, but they use the regular data stack, causing the same stack-management difficulties.

THE IDEA

It's common to have a separate FP stack especially if the FP representations are longer than a regular double; but has anyone ever thought of having a separate stack for double-precision scaled-integer / fixed-point math? A standard cell on this stack could be 4 bytes and a "double" (like the intermediate results of */ ) would be 8, instead of the 2 and 4 bytes of the normal data stack. 64 bits gives almost a ±10E19 range.

Additionally, it could be a complex stack, with the imaginary part simply being ignored in real-only arithmetic. The various operators could even check for 0i in the involved cells and determine automatically whether complex-number arithmetic is necessary. Hmmm... that raises the issue of initialization.

NAMING CONVENTIONS?

The FP stack arithmetic words are always F* F/ F+ etc, and normal stack integer double-precision words are like D+ so maybe the higher-precision-stack words should start with H like H+ H* H*/ HDROP HDUP H2DUP HROT etc. unless you know of something else that is already somewhat standardized. Then of course there would be words to transfer things between stacks, like >H H> etc..

WHERE IN 6502 MEMORY?

My original idea for 6502 implementation would be to have it start at the other end of available ZP (or DP, in the case of the '816) space from the regular data stack, and have them grow toward each other, so that all the free space is all together in the middle and you'll never have the situation where one stack is out of space while the other one has plenty of unused space.

Then Bruce suggested that since the numbers on the high-precision stack won't be addresses needing ZP for the indirect addressing modes, this stack could be kept anywhere in RAM; and that furthermore, the various bytes of a "cell" would not have to be kept together, so indexing can be made easier with for example TOSbyte1,X, TOSbyte2,X, TOSbyte3,X, etc., where the value in X is always 0 for TOS (top of stack), 1 for the next "cell", etc.. That idea gets even better when considering a complex stack where the doubles really eat up memory (16 bytes each!)

If the stack were limited to 8 levels of complex doubles (ie, same as 16 complex singles), it would take half a page of the 6502 memory map-- not bad, as long as it doesn' have to be in ZP. The greater consideration might be for the memory needed by the dozens of extra words (H>D, D>H, H*, H_OVER, HDUP, etc.).

WHY THE FUSS?

It seems like 16 bits (with the occasional 32 bits) ought to be enough for most applications. The higher-precision stack interests me now partly because of my problem with my 16-bit scaled-integer sine and cosine words that are producing unreasonably high distortion products in an FFT routine. With potentially thousands of calculations that go into a particular output cell, the round-off and truncation errors gets compounded. I'm sure there's a way to calculate these more accurately with the resources that are already there, but I'm not the math specialist to figure it out. [Edit: I found the errors were due to a multiplication routine bug that only shows up under certain very limited circumstances, and I fixed it. Still, there remain many applications where the higher precision in needed. With this FFT, I can only do 2048 7-bit samples without overflowing the 16-bit cells.] When I get the large look-up tables implemented [Edit, 6/25/12: posted here], the sine and cosine problem will disappear since the look-ups will be accurate to all 16 bits [Edit, a year later, in 2005: My improved SIN & COS routines are now accurate usually in all 16 bits, and never off by more than one lsb]; but I know there will be something else later. Sometimes the multiple-precision wish is also just to make it easier to keep things within range and not have to worry about losing precision due to near-underflow conditions or getting totally wrong answers due to overflow conditions in intermediate calculations in a long string of them. It seems like this high-precision stack would be more efficient than FP in most respects, but I'm open to ideas.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Last edited by GARTHWILSON on Mon Mar 05, 2007 8:13 am, edited 2 times in total.

Top

kc5tja

Post subject:

Posted: Fri Jul 09, 2004 4:56 pm

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

In my FTS1001 simulations, I'm finding that I haven't even approached fully utilizing the 16 data stack depth. 8 items should be plenty sufficient for the H-stack.

Top

dclxvi

Post subject: Re: Extra stacks

Posted: Mon Jul 12, 2004 9:07 am

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362

GARTHWILSON wrote:

Then Bruce suggested that since the numbers on the high-precision stack won't be addresses needing ZP for the indirect addressing modes, this stack could be kept anywhere in RAM; and that furthermore, the various bytes of a "cell" would not have to be kept together, so indexing can be made easier with for example TOSbyte1,X, TOSbyte2,X, TOSbyte3,X, etc., where the value in X is always 0 for TOS (top of stack), 1 for the next "cell", etc.. That idea gets even better when considering a complex stack where the doubles really eat up memory (16 bytes each!)

I'd like to add:

1. If the data stack pointer is the X register (i.e. the usual 6502 implemenation), it will often be more convenient to use LDY H_STK_PTR and access the "high-precision" stack with H_STK_0,Y and H_STK_1,Y etc. than using LDX and H_STK_0,X etc. since you won't have to save and restore X in the former case. Of course abs,Y addressing isn't available for all instructions (e.g. ASL).

2. The "high-precision" stack pointer can be decremented with a DEC H_STK_PTR or incremented with an INC H_STK_PTR. Decrementing or incrementing the data stack pointer traditionally takes 4 cycles (a pair of DEXs or a pair of INXs), and a INC zp takes only 5 cycles and a INC abs takes only 6 cycles, so the performance hit is small.

3. For many instructions, abs,Y (or abs,X) takes the same number of cycles as zp,X so as long the "high-precision" stack is placed in memory where abs,Y won't cross a page boundary, the performance hit is again small. (STA is one exception. STA zp,X takes 4 cycles, but STA abs,Y takes 5 cycles.)

4. There are LDX abs,Y and LDY abs,X instructions but no corresponding STX abs,Y and STY abs,X instructions, so you'll have to use a TXA STA sequence or a TYA STA sequence instead, which adds 2 cycles.

Top

kc5tja

Post subject: Re: Extra stacks

Posted: Mon Jul 12, 2004 3:32 pm

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

dclxvi wrote:

3. For many instructions, abs,Y (or abs,X) takes the same number of cycles as zp,X so as long the "high-precision" stack is placed in memory where abs,Y won't cross a page boundary, the performance hit is again small. (STA is one exception. STA zp,X takes 4 cycles, but STA abs,Y takes 5 cycles.)

How is this possible? There are three opcode bytes to fetch instead of just two. This suggests that abs,Y ought to be one cycle more than zp,X.

Top

Thowllly

Post subject: Re: Extra stacks

Posted: Tue Jul 13, 2004 12:52 pm

Joined: Wed Oct 22, 2003 4:07 am
Posts: 51
Location: Norway

kc5tja wrote:

dclxvi wrote:

How is this possible? There are three opcode bytes to fetch instead of just two. This suggests that abs,Y ought to be one cycle more than zp,X.

zp,X will load the zp address in one cycle and then add X to it in the next cycle. abs,X will load the low byte of the address in one cycle and in the next cycle it will both add X to the low byte and load the high byte of the address. Only if X+low byte overflows is another cycle needed to increment the high byte.

Top

kc5tja

Post subject: Re: Extra stacks

Posted: Tue Jul 13, 2004 2:40 pm

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

Thowllly wrote:

Ah, I forgot about that. I thought that the CPU would feed the ZP byte directly to the ALU, while feeding X to it as well, thus saving a cycle. But, I guess it doesn't.

Thanks.

Top

JimBoyd

Post subject: Re: Extra stacks

Posted: Sun Oct 04, 2020 8:18 pm

Joined: Fri May 05, 2017 9:27 pm
Posts: 895

Has anyone else implemented an extra stack ( or more) in Forth? How did it affect your programming?
I've implemented an extra stack in Fleet Forth that I mostly use for a spare data stack. I've defined the following parallels for the return stack words:

Code:

>R      >A
R>      A>
R@      A@
DUP>R   DUP>A
2>R     2>A
2R>     2A>

Since this extra stack, which I call the auxiliary stack, is rignt up against the return stack and right up against the area the C64 uses to keep track of which files are open, the words to move data to and from the auxiliary stack ( or aux stack ) test for overflow/underflow.
I usually use the aux stack to hold control flow data so I can do more with CODE words while keeping the source sane and avoiding hand calculated offsets. The aux stack words CS>A and A>CS move the control flow data on the control flow stack ( data stack ) to or from the aux stack. I also use the aux stack to hold temporary addresses to be resolved later, when I want one CODE word to branch or jump into another CODE word at a certain location.
I have also used these words in place of their return stack counterparts when defining a new word ( for my decompiler ) to test it. Once it was working, I changed the aux stack words to the faster return stack words.
It's even been helpful in hand tracing the execution of a system word I was modifying ( to make it easier to support more drive types ).
On another thread, SamCoVT mentioned:

SamCoVT wrote:

I'll also recommend avoiding >r and r> when easy/possible because they make the words harder to test. While they are sometimes the exact right tool for the job, they can only be used in word definitions while compiling.

Aux stack to the rescue!
First, some temporary redefinitions to make things a little safer, just in case:

Code:

: >R >A ; 
REDEFINE: >R
 OK
: 2>R 2>A ; 
REDEFINE: 2>R
 OK
: DUP>R DUP>A ; 
REDEFINE: DUP>R
 OK
: R> A> ; 
REDEFINE: R>
 OK
: 2R> 2A> ; 
REDEFINE: 2R>
 OK
: R@ A@ ; 
REDEFINE: R@
 OK

Here is modified source for one of Fleet Forth's system words:

Code:

// (DR/W)
HEX
NH 2 CONSTANT DSI
: (DR/W)  ( ADR BLK# R/WF CNT -- )
   1- SPLIT 2>R  T&S (IS) DSI
   R> 0
   ?DO
      >R  2OVER 2OVER R@ 100 SR/W
      2>R
      100 UNDER+  DSI + 2 PICK /MOD
      2R>
      ROT UNDER+  R>
   LOOP
   R> 1+ SR/W  DROP ;
' (DR/W) IS DR/W

And here is the log of tracing it by hand:

Code:

HEX  OK
2 DRIVE  OK
PAD 315 1 B/BUF  OK
.S 5934  315    1  400  OK
.AS EMPTY  OK
1- SPLIT  OK
.S 5934  315    1   FF    3  OK
2>A  OK
.S 5934  315    1  OK
T&S81  OK
.S   28 5934   24   50    0    1    1  OK
0 VALUE DSI  OK
TO DSI  OK
.S   28 5934   24   50    0    1  OK
A> 0  OK
.S   28 5934   24   50    0    1    3 
   0  OK
. . 0 3  OK
>A 2OVER 2OVER A@ 100  OK
: .SRW CR . . . . . . ;  OK
.S   28 5934   24   50    0 5934   24 
  50    0    1  100  OK
.A  OK
.S   28 5934   24   50    0 5934   24 
  50    0    1  100    A    0  OK
D. A  OK
.AS   FF    1  OK
.S   28 5934   24   50    0 5934   24 
  50    0    1  100  OK
.SRW 
100 1 0 50 24 5934  OK
2>A 100 UNDER+ DSI + 2 PICK /MOD  OK
.S   28 5A34   25    0  OK
2A> ROT UNDER+ A>  OK
.S   28 5A34   25   50    0    1  OK
>A 2OVER 2OVER A@ 100 .SRW 
100 1 0 50 25 5A34  OK
2>A 100 UNDER+ DSI + 2 PICK /MOD  OK
2A> ROT UNDER+ R>  OK
.S   28 5B34   26   50    0    1  OK
.AS   FF  OK
>A 2OVER 2OVER A@ 100 .SRW 
100 1 0 50 26 5B34  OK
2>A 100 UNDER+ DSI + 2 PICK /MOD  OK
2A> ROT UNDER+ A>  OK
.S   28 5C34   27   50    0    1  OK
R>  OK
.S   28 5C34   27   50    0    1   FF  OK
1+  OK
.SRW DROP 
100 1 0 50 27 5C34  OK
.S EMPTY  OK
CONSOLE 

There is one place in the log where I inadvertently typed .A instead of .AS , placing a double on the data stack rather than displaying the contents of the aux stack. I promptly removed it and continued tracing by hand.
Had I accidentally typed >R rather than >A ( because that is what the source has ) it would have been fine thanks to the temporary redefinitions. Accidentally typing ?DO or LOOP would not have caused a problem other than clearing all stacks when it aborted with the message "FOR COMPILING".

Top

SamCoVT

Post subject: Re: Extra stacks

Posted: Sun Oct 04, 2020 10:17 pm

Joined: Sun May 13, 2018 5:49 pm
Posts: 255

JimBoyd wrote:

On another thread, SamCoVT mentioned:

SamCoVT wrote:

Aux stack to the rescue!

OK - This is pretty slick and something I wish I had in my brain earlier. For doing the kind of debugging work you show, it doesn't even have to be fast - a simple implementation using some space ALLOTted in the dictionary along with an index or pointer would work fine. Thanks for sharing!

Top

IamRob

Post subject: Re: Extra stacks

Posted: Mon Oct 05, 2020 4:03 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

Instead of >r and r>, or even variables, I started using free ZP locations for temporary storage. I call it Z! and Z@, which are defined as,

: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;

The advantage of using memory locations compared to >R is it doesn't have to be DUMP'd at the end, with R> DUMP, if the value is not needed.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.

Top

GARTHWILSON

Post subject: Re: Extra stacks

Posted: Mon Oct 05, 2020 4:07 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

IamRob wrote:

Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.

How 'bout just having LEAVE store the loop index in a variable, all in the one primitive so it's faster. I think I'll do that myself.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

GARTHWILSON

Post subject: Re: Extra stacks

Posted: Mon Oct 05, 2020 6:34 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

Jim, you have quite a few words above that are neither part of any standard I know of, nor defined above. One I'll ask about however is REDEFINE:. It appears to edit the old word to redirect execution to the new one, for secondaries that are already compiled using it, so those secondaries don't need to be recompiled.. Is that what's happening? I've had a way to do that but I like yours more.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

JimBoyd

Post subject: Re: Extra stacks

Posted: Mon Oct 05, 2020 8:14 pm

Joined: Fri May 05, 2017 9:27 pm
Posts: 895

GARTHWILSON wrote:

No. Sorry, I should have been clear about that. This section:

Code:

: >R >A ; 
REDEFINE: >R
 OK
: 2>R 2>A ; 
REDEFINE: 2>R
 OK
: DUP>R DUP>A ; 
REDEFINE: DUP>R
 OK
: R> A> ; 
REDEFINE: R>
 OK
: 2R> 2A> ; 
REDEFINE: 2R>
 OK
: R@ A@ ; 
REDEFINE: R@
 OK

is part of the log of the interactive session where I hand traced the word (DR/W) . I had modified it to make it easier to support the 1581 disk drive as well as the others.
"REDEFINE: >R" is a message from the system letting me know I redefined >R . It's harder to tell that from the print dump than a live session so I may change that message to something like:
"YOU REDEFINED >R"
or even
">R EXISTS"
or maybe
">R REDEFINED"
or even
">R WAS REDEFINED"

Your comment does give me an idea and I'll have to give it some thought.
As for the other words, the source is from the source for my Forth kernel.
NH sets a flag so the metacompiler compiles the next word headerless. For interactive testing in Forth, not metacompiling, I redefine NH as a no-op

Code:

: NH ;

In the log, the phrase "2 DRIVE" ( "10 DRIVE" would also work ) sets the current drive to drive 10 ( drive 8 being selected with "0 DRIVE" or "8 DRIVE" ) . Commodore 64 disk drives start at device 8 and go up from there.
SPLIT splits a cell into its low byte and high byte. It is seven bytes and is a really fast "$100 /MOD" .
Fleet Forth, like Blazin' Forth, uses direct access to drive sectors for block access ( on disks that are only supposed to be for blocks) .
T&S derives the starting track and sector from the block number.
(IS) is the primitive used by IS and TO to write the value on the data stack into the first cell of the parameter field of the following word in the definition and bump IP past said word.
SR/W is the sector read write word.
UNDER+ has the following stack diagram:
( N1 N2 N3 -- N1+N3 N2 )
(DR/W) is the vector for the deferred word DR/W , disk read write.
Either DR/W or RR/W ( ram read write ) is executed by R/W depending on the block number.
I've modified things since my latest upload. a block number of $8000 and up and RR/W is executed. RR/W sees a block number $8000 less than the actual block number.
I hope this clarifies things.

Top

JimBoyd

Post subject: Re: Extra stacks

Posted: Mon Oct 05, 2020 8:34 pm

Joined: Fri May 05, 2017 9:27 pm
Posts: 895

IamRob wrote:

Instead of >r and r>, or even variables, I started using free ZP locations for temporary storage. I call it Z! and Z@, which are defined as,

: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;

One disadvantage is keeping track of which ZP locations you are using if you need more than one. The aux stack is an actual stack and my implementation is over 40 cells deep. ( memory the C64 wasn't using below screen memory ) .

Quote:

The advantage of using memory locations compared to >R is it doesn't have to be DUMP'd at the end, with R> DUMP, if the value is not needed.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.

Wouldn't you still need to initialize the storage with a sentinel value so you know if you left the loop prematurely? Something like 0 Z! or -1 Z! ?
In one of my system words, I leave a loop like this:

Code:

   ?DO
      DUP I >BT @ =
      IF  DROP I UNLOOP
      ELSE CS>A
   LOOP

UNLOOP discards the loop parameters and I branch out of the loop by moving the control flow data from ELSE to the aux stack. Further in the definition I resolve the ELSE by moving the control flow data back to the data stack ( the control flow stack ) like this:

Code:

A>CS THEN

Yes, the definition was a bit long. It could not be factored into non trivial smaller parts that got used more than once though.

Top

GARTHWILSON

Post subject: Re: Extra stacks

Posted: Mon Oct 05, 2020 9:10 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California

JimBoyd wrote:

I hope this clarifies things.

Yes, that clears up a lot.

Quote:

SPLIT splits a cell into its low byte and high byte.

Yes, that one is standard. COMBINE is the complement.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

IamRob

Post subject: Re: Extra stacks

Posted: Tue Oct 06, 2020 3:51 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

JimBoyd wrote:

IamRob wrote:

Instead of >r and r>, or even variables, I started using free ZP locations for temporary storage. I call it Z! and Z@, which are defined as,

: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;

Quote:

Code:

   ?DO
      DUP I >BT @ =
      IF  DROP I UNLOOP
      ELSE CS>A
   LOOP

Code:

A>CS THEN

Yes, the definition was a bit long. It could not be factored into non trivial smaller parts that got used more than once though.

Yes, I have to initialize Z! to zero. I have thought of implementing something like UNLOOP, which is a cleaner exit and probably faster. Is your UNLOOP a primitive or a word?

Top

Page 1 of 2

[ 21 posts ]

Go to page 1, 2 Next

Board index » 6502.org Users Forum » Forth

All times are UTC

Who is online

Users browsing this forum: No registered users and 15 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum