Extra stacks
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Extra stacks
I've already discussed this briefly with Bruce but I'll open it up here to the forum.
BACKGROUND
When Forth does floating-point arithmetic, it often uses the regular data stack for FP operations. A FP representation might still be only two cells (like a double-precision number). In the case of the 6502 where a cell is normally 16 bits (so a double is 32 bits), such a FP representation might use three bytes for mantissa and one for exponent.
There have been times I want more precision than you normally get with the 16-bit cells and the 32-bit intermediate results of UM/MOD and M* etc.; but I don't want the complexity or performance penalty of FP, and I don't want the stack mess that comes with having several very long items on the regular data stack (like 4 cells per item). It makes it a nightmare if you have to reach back in the stack with ROLL , PICK, and so on, and keep things straight. I have triple- and quad-precision integer Forth words from Forth Dimensions magazine, but they use the regular data stack, causing the same stack-management difficulties.
THE IDEA
It's common to have a separate FP stack especially if the FP representations are longer than a regular double; but has anyone ever thought of having a separate stack for double-precision scaled-integer / fixed-point math? A standard cell on this stack could be 4 bytes and a "double" (like the intermediate results of */ ) would be 8, instead of the 2 and 4 bytes of the normal data stack. 64 bits gives almost a ±10E19 range.
Additionally, it could be a complex stack, with the imaginary part simply being ignored in real-only arithmetic. The various operators could even check for 0i in the involved cells and determine automatically whether complex-number arithmetic is necessary. Hmmm... that raises the issue of initialization.
NAMING CONVENTIONS?
The FP stack arithmetic words are always F* F/ F+ etc, and normal stack integer double-precision words are like D+ so maybe the higher-precision-stack words should start with H like H+ H* H*/ HDROP HDUP H2DUP HROT etc. unless you know of something else that is already somewhat standardized. Then of course there would be words to transfer things between stacks, like >H H> etc..
WHERE IN 6502 MEMORY?
My original idea for 6502 implementation would be to have it start at the other end of available ZP (or DP, in the case of the '816) space from the regular data stack, and have them grow toward each other, so that all the free space is all together in the middle and you'll never have the situation where one stack is out of space while the other one has plenty of unused space.
Then Bruce suggested that since the numbers on the high-precision stack won't be addresses needing ZP for the indirect addressing modes, this stack could be kept anywhere in RAM; and that furthermore, the various bytes of a "cell" would not have to be kept together, so indexing can be made easier with for example TOSbyte1,X, TOSbyte2,X, TOSbyte3,X, etc., where the value in X is always 0 for TOS (top of stack), 1 for the next "cell", etc.. That idea gets even better when considering a complex stack where the doubles really eat up memory (16 bytes each!)
If the stack were limited to 8 levels of complex doubles (ie, same as 16 complex singles), it would take half a page of the 6502 memory map-- not bad, as long as it doesn' have to be in ZP. The greater consideration might be for the memory needed by the dozens of extra words (H>D, D>H, H*, H_OVER, HDUP, etc.).
WHY THE FUSS?
It seems like 16 bits (with the occasional 32 bits) ought to be enough for most applications. The higher-precision stack interests me now partly because of my problem with my 16-bit scaled-integer sine and cosine words that are producing unreasonably high distortion products in an FFT routine. With potentially thousands of calculations that go into a particular output cell, the round-off and truncation errors gets compounded. I'm sure there's a way to calculate these more accurately with the resources that are already there, but I'm not the math specialist to figure it out. [Edit: I found the errors were due to a multiplication routine bug that only shows up under certain very limited circumstances, and I fixed it. Still, there remain many applications where the higher precision in needed. With this FFT, I can only do 2048 7-bit samples without overflowing the 16-bit cells.] When I get the large look-up tables implemented [Edit, 6/25/12: posted here], the sine and cosine problem will disappear since the look-ups will be accurate to all 16 bits [Edit, a year later, in 2005: My improved SIN & COS routines are now accurate usually in all 16 bits, and never off by more than one lsb]; but I know there will be something else later. Sometimes the multiple-precision wish is also just to make it easier to keep things within range and not have to worry about losing precision due to near-underflow conditions or getting totally wrong answers due to overflow conditions in intermediate calculations in a long string of them. It seems like this high-precision stack would be more efficient than FP in most respects, but I'm open to ideas.
BACKGROUND
When Forth does floating-point arithmetic, it often uses the regular data stack for FP operations. A FP representation might still be only two cells (like a double-precision number). In the case of the 6502 where a cell is normally 16 bits (so a double is 32 bits), such a FP representation might use three bytes for mantissa and one for exponent.
There have been times I want more precision than you normally get with the 16-bit cells and the 32-bit intermediate results of UM/MOD and M* etc.; but I don't want the complexity or performance penalty of FP, and I don't want the stack mess that comes with having several very long items on the regular data stack (like 4 cells per item). It makes it a nightmare if you have to reach back in the stack with ROLL , PICK, and so on, and keep things straight. I have triple- and quad-precision integer Forth words from Forth Dimensions magazine, but they use the regular data stack, causing the same stack-management difficulties.
THE IDEA
It's common to have a separate FP stack especially if the FP representations are longer than a regular double; but has anyone ever thought of having a separate stack for double-precision scaled-integer / fixed-point math? A standard cell on this stack could be 4 bytes and a "double" (like the intermediate results of */ ) would be 8, instead of the 2 and 4 bytes of the normal data stack. 64 bits gives almost a ±10E19 range.
Additionally, it could be a complex stack, with the imaginary part simply being ignored in real-only arithmetic. The various operators could even check for 0i in the involved cells and determine automatically whether complex-number arithmetic is necessary. Hmmm... that raises the issue of initialization.
NAMING CONVENTIONS?
The FP stack arithmetic words are always F* F/ F+ etc, and normal stack integer double-precision words are like D+ so maybe the higher-precision-stack words should start with H like H+ H* H*/ HDROP HDUP H2DUP HROT etc. unless you know of something else that is already somewhat standardized. Then of course there would be words to transfer things between stacks, like >H H> etc..
WHERE IN 6502 MEMORY?
My original idea for 6502 implementation would be to have it start at the other end of available ZP (or DP, in the case of the '816) space from the regular data stack, and have them grow toward each other, so that all the free space is all together in the middle and you'll never have the situation where one stack is out of space while the other one has plenty of unused space.
Then Bruce suggested that since the numbers on the high-precision stack won't be addresses needing ZP for the indirect addressing modes, this stack could be kept anywhere in RAM; and that furthermore, the various bytes of a "cell" would not have to be kept together, so indexing can be made easier with for example TOSbyte1,X, TOSbyte2,X, TOSbyte3,X, etc., where the value in X is always 0 for TOS (top of stack), 1 for the next "cell", etc.. That idea gets even better when considering a complex stack where the doubles really eat up memory (16 bytes each!)
If the stack were limited to 8 levels of complex doubles (ie, same as 16 complex singles), it would take half a page of the 6502 memory map-- not bad, as long as it doesn' have to be in ZP. The greater consideration might be for the memory needed by the dozens of extra words (H>D, D>H, H*, H_OVER, HDUP, etc.).
WHY THE FUSS?
It seems like 16 bits (with the occasional 32 bits) ought to be enough for most applications. The higher-precision stack interests me now partly because of my problem with my 16-bit scaled-integer sine and cosine words that are producing unreasonably high distortion products in an FFT routine. With potentially thousands of calculations that go into a particular output cell, the round-off and truncation errors gets compounded. I'm sure there's a way to calculate these more accurately with the resources that are already there, but I'm not the math specialist to figure it out. [Edit: I found the errors were due to a multiplication routine bug that only shows up under certain very limited circumstances, and I fixed it. Still, there remain many applications where the higher precision in needed. With this FFT, I can only do 2048 7-bit samples without overflowing the 16-bit cells.] When I get the large look-up tables implemented [Edit, 6/25/12: posted here], the sine and cosine problem will disappear since the look-ups will be accurate to all 16 bits [Edit, a year later, in 2005: My improved SIN & COS routines are now accurate usually in all 16 bits, and never off by more than one lsb]; but I know there will be something else later. Sometimes the multiple-precision wish is also just to make it easier to keep things within range and not have to worry about losing precision due to near-underflow conditions or getting totally wrong answers due to overflow conditions in intermediate calculations in a long string of them. It seems like this high-precision stack would be more efficient than FP in most respects, but I'm open to ideas.
Last edited by GARTHWILSON on Mon Mar 05, 2007 8:13 am, edited 2 times in total.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Extra stacks
GARTHWILSON wrote:
Then Bruce suggested that since the numbers on the high-precision stack won't be addresses needing ZP for the indirect addressing modes, this stack could be kept anywhere in RAM; and that furthermore, the various bytes of a "cell" would not have to be kept together, so indexing can be made easier with for example TOSbyte1,X, TOSbyte2,X, TOSbyte3,X, etc., where the value in X is always 0 for TOS (top of stack), 1 for the next "cell", etc.. That idea gets even better when considering a complex stack where the doubles really eat up memory (16 bytes each!)
1. If the data stack pointer is the X register (i.e. the usual 6502 implemenation), it will often be more convenient to use LDY H_STK_PTR and access the "high-precision" stack with H_STK_0,Y and H_STK_1,Y etc. than using LDX and H_STK_0,X etc. since you won't have to save and restore X in the former case. Of course abs,Y addressing isn't available for all instructions (e.g. ASL).
2. The "high-precision" stack pointer can be decremented with a DEC H_STK_PTR or incremented with an INC H_STK_PTR. Decrementing or incrementing the data stack pointer traditionally takes 4 cycles (a pair of DEXs or a pair of INXs), and a INC zp takes only 5 cycles and a INC abs takes only 6 cycles, so the performance hit is small.
3. For many instructions, abs,Y (or abs,X) takes the same number of cycles as zp,X so as long the "high-precision" stack is placed in memory where abs,Y won't cross a page boundary, the performance hit is again small. (STA is one exception. STA zp,X takes 4 cycles, but STA abs,Y takes 5 cycles.)
4. There are LDX abs,Y and LDY abs,X instructions but no corresponding STX abs,Y and STY abs,X instructions, so you'll have to use a TXA STA sequence or a TYA STA sequence instead, which adds 2 cycles.
Re: Extra stacks
dclxvi wrote:
3. For many instructions, abs,Y (or abs,X) takes the same number of cycles as zp,X so as long the "high-precision" stack is placed in memory where abs,Y won't cross a page boundary, the performance hit is again small. (STA is one exception. STA zp,X takes 4 cycles, but STA abs,Y takes 5 cycles.)
Re: Extra stacks
kc5tja wrote:
dclxvi wrote:
3. For many instructions, abs,Y (or abs,X) takes the same number of cycles as zp,X so as long the "high-precision" stack is placed in memory where abs,Y won't cross a page boundary, the performance hit is again small. (STA is one exception. STA zp,X takes 4 cycles, but STA abs,Y takes 5 cycles.)
Re: Extra stacks
Thowllly wrote:
zp,X will load the zp address in one cycle and then add X to it in the next cycle. abs,X will load the low byte of the address in one cycle and in the next cycle it will both add X to the low byte and load the high byte of the address. Only if X+low byte overflows is another cycle needed to increment the high byte.
Thanks.
Re: Extra stacks
Has anyone else implemented an extra stack ( or more) in Forth? How did it affect your programming?
I've implemented an extra stack in Fleet Forth that I mostly use for a spare data stack. I've defined the following parallels for the return stack words:
Code: Select all
>R >A
R> A>
R@ A@
DUP>R DUP>A
2>R 2>A
2R> 2A>
Since this extra stack, which I call the auxiliary stack, is rignt up against the return stack and right up against the area the C64 uses to keep track of which files are open, the words to move data to and from the auxiliary stack ( or aux stack ) test for overflow/underflow.
I usually use the aux stack to hold control flow data so I can do more with CODE words while keeping the source sane and avoiding hand calculated offsets. The aux stack words CS>A and A>CS move the control flow data on the control flow stack ( data stack ) to or from the aux stack. I also use the aux stack to hold temporary addresses to be resolved later, when I want one CODE word to branch or jump into another CODE word at a certain location.
I have also used these words in place of their return stack counterparts when defining a new word ( for my decompiler ) to test it. Once it was working, I changed the aux stack words to the faster return stack words.
It's even been helpful in hand tracing the execution of a system word I was modifying ( to make it easier to support more drive types ).
On another thread, SamCoVT mentioned:
SamCoVT wrote:
I'll also recommend avoiding >r and r> when easy/possible because they make the words harder to test. While they are sometimes the exact right tool for the job, they can only be used in word definitions while compiling.
Aux stack to the rescue!
First, some temporary redefinitions to make things a little safer, just in case:
Code: Select all
: >R >A ;
REDEFINE: >R
OK
: 2>R 2>A ;
REDEFINE: 2>R
OK
: DUP>R DUP>A ;
REDEFINE: DUP>R
OK
: R> A> ;
REDEFINE: R>
OK
: 2R> 2A> ;
REDEFINE: 2R>
OK
: R@ A@ ;
REDEFINE: R@
OK
Here is modified source for one of Fleet Forth's system words:
Code: Select all
// (DR/W)
HEX
NH 2 CONSTANT DSI
: (DR/W) ( ADR BLK# R/WF CNT -- )
1- SPLIT 2>R T&S (IS) DSI
R> 0
?DO
>R 2OVER 2OVER R@ 100 SR/W
2>R
100 UNDER+ DSI + 2 PICK /MOD
2R>
ROT UNDER+ R>
LOOP
R> 1+ SR/W DROP ;
' (DR/W) IS DR/W
And here is the log of tracing it by hand:
Code: Select all
HEX OK
2 DRIVE OK
PAD 315 1 B/BUF OK
.S 5934 315 1 400 OK
.AS EMPTY OK
1- SPLIT OK
.S 5934 315 1 FF 3 OK
2>A OK
.S 5934 315 1 OK
T&S81 OK
.S 28 5934 24 50 0 1 1 OK
0 VALUE DSI OK
TO DSI OK
.S 28 5934 24 50 0 1 OK
A> 0 OK
.S 28 5934 24 50 0 1 3
0 OK
. . 0 3 OK
>A 2OVER 2OVER A@ 100 OK
: .SRW CR . . . . . . ; OK
.S 28 5934 24 50 0 5934 24
50 0 1 100 OK
.A OK
.S 28 5934 24 50 0 5934 24
50 0 1 100 A 0 OK
D. A OK
.AS FF 1 OK
.S 28 5934 24 50 0 5934 24
50 0 1 100 OK
.SRW
100 1 0 50 24 5934 OK
2>A 100 UNDER+ DSI + 2 PICK /MOD OK
.S 28 5A34 25 0 OK
2A> ROT UNDER+ A> OK
.S 28 5A34 25 50 0 1 OK
>A 2OVER 2OVER A@ 100 .SRW
100 1 0 50 25 5A34 OK
2>A 100 UNDER+ DSI + 2 PICK /MOD OK
2A> ROT UNDER+ R> OK
.S 28 5B34 26 50 0 1 OK
.AS FF OK
>A 2OVER 2OVER A@ 100 .SRW
100 1 0 50 26 5B34 OK
2>A 100 UNDER+ DSI + 2 PICK /MOD OK
2A> ROT UNDER+ A> OK
.S 28 5C34 27 50 0 1 OK
R> OK
.S 28 5C34 27 50 0 1 FF OK
1+ OK
.SRW DROP
100 1 0 50 27 5C34 OK
.S EMPTY OK
CONSOLE
There is one place in the log where I inadvertently typed .A instead of .AS , placing a double on the data stack rather than displaying the contents of the aux stack. I promptly removed it and continued tracing by hand.
Had I accidentally typed >R rather than >A ( because that is what the source has ) it would have been fine thanks to the temporary redefinitions. Accidentally typing ?DO or LOOP would not have caused a problem other than clearing all stacks when it aborted with the message "FOR COMPILING".
Re: Extra stacks
JimBoyd wrote:
On another thread, SamCoVT mentioned:
Aux stack to the rescue!
SamCoVT wrote:
I'll also recommend avoiding >r and r> when easy/possible because they make the words harder to test. While they are sometimes the exact right tool for the job, they can only be used in word definitions while compiling.
Re: Extra stacks
Instead of >r and r>, or even variables, I started using free ZP locations for temporary storage. I call it Z! and Z@, which are defined as,
: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;
The advantage of using memory locations compared to >R is it doesn't have to be DUMP'd at the end, with R> DUMP, if the value is not needed.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;
The advantage of using memory locations compared to >R is it doesn't have to be DUMP'd at the end, with R> DUMP, if the value is not needed.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Extra stacks
IamRob wrote:
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
How 'bout just having LEAVE store the loop index in a variable, all in the one primitive so it's faster. I think I'll do that myself.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Extra stacks
Jim, you have quite a few words above that are neither part of any standard I know of, nor defined above. One I'll ask about however is REDEFINE:. It appears to edit the old word to redirect execution to the new one, for secondaries that are already compiled using it, so those secondaries don't need to be recompiled.. Is that what's happening? I've had a way to do that but I like yours more.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Extra stacks
GARTHWILSON wrote:
Jim, you have quite a few words above that are neither part of any standard I know of, nor defined above. One I'll ask about however is REDEFINE:. It appears to edit the old word to redirect execution to the new one, for secondaries that are already compiled using it, so those secondaries don't need to be recompiled.. Is that what's happening? I've had a way to do that but I like yours more.
No. Sorry, I should have been clear about that. This section:
Code: Select all
: >R >A ;
REDEFINE: >R
OK
: 2>R 2>A ;
REDEFINE: 2>R
OK
: DUP>R DUP>A ;
REDEFINE: DUP>R
OK
: R> A> ;
REDEFINE: R>
OK
: 2R> 2A> ;
REDEFINE: 2R>
OK
: R@ A@ ;
REDEFINE: R@
OK
is part of the log of the interactive session where I hand traced the word (DR/W) . I had modified it to make it easier to support the 1581 disk drive as well as the others.
"REDEFINE: >R" is a message from the system letting me know I redefined >R . It's harder to tell that from the print dump than a live session so I may change that message to something like:
"YOU REDEFINED >R"
or even
">R EXISTS"
or maybe
">R REDEFINED"
or even
">R WAS REDEFINED"
Your comment does give me an idea and I'll have to give it some thought.
As for the other words, the source is from the source for my Forth kernel.
NH sets a flag so the metacompiler compiles the next word headerless. For interactive testing in Forth, not metacompiling, I redefine NH as a no-op
Code: Select all
: NH ;
In the log, the phrase "2 DRIVE" ( "10 DRIVE" would also work ) sets the current drive to drive 10 ( drive 8 being selected with "0 DRIVE" or "8 DRIVE" ) . Commodore 64 disk drives start at device 8 and go up from there.
SPLIT splits a cell into its low byte and high byte. It is seven bytes and is a really fast "$100 /MOD" .
Fleet Forth, like Blazin' Forth, uses direct access to drive sectors for block access ( on disks that are only supposed to be for blocks) .
T&S derives the starting track and sector from the block number.
(IS) is the primitive used by IS and TO to write the value on the data stack into the first cell of the parameter field of the following word in the definition and bump IP past said word.
SR/W is the sector read write word.
UNDER+ has the following stack diagram:
( N1 N2 N3 -- N1+N3 N2 )
(DR/W) is the vector for the deferred word DR/W , disk read write.
Either DR/W or RR/W ( ram read write ) is executed by R/W depending on the block number.
I've modified things since my latest upload. a block number of $8000 and up and RR/W is executed. RR/W sees a block number $8000 less than the actual block number.
I hope this clarifies things.
Re: Extra stacks
IamRob wrote:
Instead of >r and r>, or even variables, I started using free ZP locations for temporary storage. I call it Z! and Z@, which are defined as,
: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;
: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;
One disadvantage is keeping track of which ZP locations you are using if you need more than one. The aux stack is an actual stack and my implementation is over 40 cells deep. ( memory the C64 wasn't using below screen memory ) .
Quote:
The advantage of using memory locations compared to >R is it doesn't have to be DUMP'd at the end, with R> DUMP, if the value is not needed.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
Wouldn't you still need to initialize the storage with a sentinel value so you know if you left the loop prematurely? Something like 0 Z! or -1 Z! ?
In one of my system words, I leave a loop like this:
Code: Select all
?DO
DUP I >BT @ =
IF DROP I UNLOOP
ELSE CS>A
LOOP
UNLOOP discards the loop parameters and I branch out of the loop by moving the control flow data from ELSE to the aux stack. Further in the definition I resolve the ELSE by moving the control flow data back to the data stack ( the control flow stack ) like this:
Code: Select all
A>CS THEN
Yes, the definition was a bit long. It could not be factored into non trivial smaller parts that got used more than once though.
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Extra stacks
JimBoyd wrote:
I hope this clarifies things.
Quote:
SPLIT splits a cell into its low byte and high byte.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Extra stacks
JimBoyd wrote:
IamRob wrote:
Instead of >r and r>, or even variables, I started using free ZP locations for temporary storage. I call it Z! and Z@, which are defined as,
: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;
: Z! 0 ! ; (or any free ZP memory)
: Z@ 0 @ ;
One disadvantage is keeping track of which ZP locations you are using if you need more than one. The aux stack is an actual stack and my implementation is over 40 cells deep. ( memory the C64 wasn't using below screen memory ) .
Quote:
The advantage of using memory locations compared to >R is it doesn't have to be DUMP'd at the end, with R> DUMP, if the value is not needed.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
Another use for the ZP location is the loop variable doesn't get retained when LEAVE is encountered. So I will use: I Z! LEAVE in words that contain a loop that exits prematurely.
Wouldn't you still need to initialize the storage with a sentinel value so you know if you left the loop prematurely? Something like 0 Z! or -1 Z! ?
In one of my system words, I leave a loop like this:
Code: Select all
?DO
DUP I >BT @ =
IF DROP I UNLOOP
ELSE CS>A
LOOP
UNLOOP discards the loop parameters and I branch out of the loop by moving the control flow data from ELSE to the aux stack. Further in the definition I resolve the ELSE by moving the control flow data back to the data stack ( the control flow stack ) like this:
Code: Select all
A>CS THEN
Yes, the definition was a bit long. It could not be factored into non trivial smaller parts that got used more than once though.
Yes, I have to initialize Z! to zero. I have thought of implementing something like UNLOOP, which is a cleaner exit and probably faster. Is your UNLOOP a primitive or a word?