Faster loops

Topics relating to various Forth models on the 6502, 65816, and related microprocessors and microcontrollers.
JimBoyd
Posts: 931
Joined: 05 May 2017

Re: Faster loops

Post by JimBoyd »

Dr Jefyll wrote:
To say it's superior at all times and in all circumstances is rather a sweeping statement. :roll: No way do I endorse talk like that. Perhaps it's a case of the Vocal Minority making the most noise. I hope no-one makes the mistake of supposing all Forth adherents are afflicted with such an unreasoning excess of zeal.

I certainly never made such a statement. I merely took exception to the perceived implication that Forth was not good for anything.
User avatar
Druzyek
Posts: 367
Joined: 12 May 2014
Contact:

Re: Faster loops

Post by Druzyek »

JimBoyd wrote:
Dr Jefyll wrote:
To say it's superior at all times and in all circumstances is rather a sweeping statement. :roll: No way do I endorse talk like that. Perhaps it's a case of the Vocal Minority making the most noise. I hope no-one makes the mistake of supposing all Forth adherents are afflicted with such an unreasoning excess of zeal.

I certainly never made such a statement. I merely took exception to the perceived implication that Forth was not good for anything.
Right, I don't remember you saying anything like that. On the other hand, I don't remember anyone (including myself) ever saying here that Forth is not good for anything. As I pointed out in the thread on it, the article makes clear at the very beginning that I was doing the comparison so I could build a graphing calculator with 100s of kilobytes of RAM. I don't think Forth is a good choice for that project, but that doesn't imply that Forth is not good for anything. The other calculator I'm working on for the 6507 contest does use Forth for the user interface since I'm trying to squeeze the firmware into an 8k ROM and only have 2k of RAM. Obviously in that case I do think Forth is good for something.
JimBoyd
Posts: 931
Joined: 05 May 2017

Re: Faster loops

Post by JimBoyd »

SamCoVT wrote:
Rather than replace Tali2's do-loop words, these words are meant to compliment them. They use the Y register similar to how one might in assembly to count down to zero, so only the upper limit is given, and it's limited to 8-bits (but you can get 256 loops by specifying 0 as the starting value). Because these words use the Y register, I just made word names with a "Y" prefix. After writing this, I realized that it doesn't really matter which register is used, and A could have been used just as well (and sometimes is used because Tali2 has a PUSH-A macro to get A on the top of the Forth data stack.
Quote:
' testingyyij cycle_test CYCLES: 5420625 ok
' testingdodoij cycle_test CYCLES: 11053940 ok

I implemented YLOOPS in Fleet Forth, an ITC Forth. Since it is for a Commodore 64, it uses an NMOS version of the 6502, the 6510. There are no PLY and PHY instructions.
In my first attempt to code (YLOOP) , I had to clear the Y index register to zero because the words BRANCH and ?BRANCH need it cleared. ( NEXT leaves zero in the Y index register).

Code: Select all

CODE (YLOOP)
   PLA  TAY  DEY
   0= NOT IF
      TYA  PHA  0 # LDY
      ' BRANCH @ JMP
   THEN
   ' ?BRANCH @ 8 + JMP  END-CODE

I think this version of (YLOOP) for the 6510 is better.

Code: Select all

CODE (YLOOP)
   PLA  SEC  1 # SBC
   0= NOT IF
      PHA
      ' BRANCH @ JMP
   THEN
   ' ?BRANCH @ 8 + JMP  END-CODE

Here is all of it.

Code: Select all

CODE (YDO)  ( B -- )
   0 ,X LDA  PHA  POP JMP  END-CODE
: YDO
   COMPILE (YDO)
   <MARK NEGATE ; IMMEDIATE
CODE (YLOOP)
   PLA  SEC  1 # SBC
   0= NOT IF
      PHA
      ' BRANCH @ JMP
   THEN
   ' ?BRANCH @ 8 + JMP  END-CODE
: YLOOP
   COMPILE (YLOOP)
   NEGATE <RESOLVE ; IMMEDIATE
CODE YI  ( -- B )
   PLA  PHA
   APUSH JMP  END-CODE
CODE YJ  ( -- B )
   XSAVE STX  TSX
   $102 ,X LDA
   XSAVE LDX
   APUSH JMP  END-CODE
CODE YK  ( -- B )
   XSAVE STX  TSX
   $103 ,X LDA
   XSAVE LDX
   APUSH JMP  END-CODE

: TESTING
   5 YDO
      3 YDO
         CR ." YI=" YI . ." YJ=" YJ .
      YLOOP
   YLOOP ;
: TESTING2
   2 YDO
      5 YDO
         3 YDO
            CR ." YI=" YI . ." YJ=" YJ . ." YK=" YK .
         YLOOP
      YLOOP
   YLOOP ;

: TESTINGYYIJ
   255 YDO 255 YDO  YI YJ 2DROP  YLOOP YLOOP ;
: TESTINGDODOIJ
   255 0 DO  255 0 DO  I J 2DROP  LOOP LOOP ;

I included YK because Fleet Forth has K in the system loader. I don't think triply nested DO LOOPs are a good idea, but may be helpful during the prototyping phase to flesh out an idea.

I noticed that in your results testingyyij took slightly less than half the time of testingdodoij .
Since Fleet Forth is an ITC Forth, my results were that TESTINGYYIJ took about 83-84 percent the time of TESTINGDODOIJ .
JimBoyd
Posts: 931
Joined: 05 May 2017

Re: Faster loops

Post by JimBoyd »

barrym95838 wrote:
Quote:
didn't someone find out TaliForth 2 was pushing a 1 on the stack then falling through to +LOOP? It might be worth looking at 8 and 16-bit versions that don't do this if you're looking at new DO...LOOP versions.
If that's true, it might be partially my fault, since ISTR mentioning an idea like that to Scot a few years back, without considering the potential performance hit.

[edit: yep, guilty!]

I presented an idea for DO LOOPS in an STC Forth where LOOP does NOT fall through to +LOOP .
leepivonka shows how to enhance the performance a couple of posts down.
Now that I think about it, leepivonka has posted about improving TaliForth's DO LOOP's elsewhere on this forum.
Post Reply