Help on fig-FORTH 1.0 Bring Up

MichaelM · Post by **MichaelM** » Wed Dec 17, 2014 2:38 am

I finally punted on the fig-FORTH from the archives, fig-FORTH 1.1. I am too much of a newbie to FORTH to understand what's happening. I significantly increased the amount of information that the trace subroutine output to the console, but that did not appear to help me find the problem.

I finally compared it to the file that enso had posted, which is for fig-FORTH 1.0k. I removed all changes from my file that I could detect using WinMerge, but still no luck. I opted to use enso's source file, and just provided the three monitor routines required for character I/O.

It came right up without any issues. Still can't see what the differences are between the two source files that causes the fig-FORTH 1.1 source to hang in the INTERPRET routine.

I've attached a capture of the terminal output from a working session:

Code: Select all

65C02 Monitor Lite v5.1.5 (7-Dec-14) Ready

>0400G

 PC=3A1D  A=2A  X=B4  Y=FF  S=FF  P=73 (NVRBDIZC)=01110011

>D000G

fig-FORTH 1.0a
 OK
vlist
MON   VLIST   TRIAD   INDEX   LIST   ?   .   .R   D.   D.R   #S   #   SIGN   #>   <#   SPACES   WHILE   ELSE   IF   REPEAT   AGAIN   END   UNTIL   +LOOP   LOOP   DO   THEN   ENDIF   BEGIN   BACK   FORGET   '   RSW   -BCD   -->   LOAD   MESSAGE   .LINE   (LINE)   BLOCK   BUFFER   DR1   DR0   EMPTY-BUFFERS   FLUSH   UPDATE   +BUF   PREV   USE   M/MOD   */   */MOD   MOD   /   /MOD   *   M/   M*   MAX   MIN   DABS   ABS   D+-   +-   S->D   COLD   ABORT   QUIT   (   DEFINITIONS   FORTH   VOCABULARY   IMMEDIATE   INTERPRET   ?STACK   DLITERAL   LITERAL   [COMPILE]   CREATE   ID.   ERROR   (ABORT)   -FIND   NUMBER   (NUMBER)   UPPER   WORD   PAD   HOLD   BLANKS   ERASE   FILL      QUERY   EXPECT   ."   (.")   -TRAILING   TYPE   COUNT   DOES>   <BUILDS           (         DECIMAL   HEX   SMUDGE   ]   [   COMPILE   ?LOADING   ?CSP   ?PAIRS   ?EXEC   ?COMP   ?ERROR   !CSP   PFA   NFA   CFA   LFA   LATEST   TRAVERSE   -DUP   SPACE   ROT   >   <   U<   =   -   C,   ,   ALLOT   HERE   2+   1+   HLD   R#   CSP   FLD   DPL   BASE   STATE   CURRENT   CONTEXT   OFFSET   SCR   OUT   IN   BLK   VOC-LINK   DP   FENCE   WARNING   WIDTH   TIB   +ORIGIN   B/SCR   B/BUF   LIMIT   FIRST   C/L   BL   3   2   1   0   USER   VARIABLE   CONSTANT   ;   :   C!   !   C@   @   T   +!   DUP   SWAP   DROP   OVER   DMINUS   MINUS   D+   +   0<   0=   R   R>   >R   LEAVE        RP!   SP!   SP@   XOR   OR   AND   U/   U*   CMOVE   CR   ?TERMINAL   KEY   EMIT   ENCLOSE   (FIND)   DIGIT   I   (DO)   (+LOOP)   (LOOP)   0BRANCH   BRANCH   EXECUTE   CLIT   LIT   OK
: star 42 emit ; OK
: stars 0 do star loop ; OK
: margin cr 30 spaces ; OK
: blip margin star ; OK
: bar margin 5 stars ; OK
: f bar blip bar blip blip cr ; OK
f
                              *****
                              *
                              *****
                              *
                              *
OK
vlist
F   BAR   BLIP   MARGIN   STARS   STAR   MON   VLIST   TRIAD   INDEX   LIST   ?   .   .R   D.   D.R   #S   #   SIGN   #>   <#   SPACES   WHILE   ELSE   IF   REPEAT   AGAIN   END   UNTIL   +LOOP   LOOP   DO   THEN   ENDIF   BEGIN   BACK   FORGET   '   RSW   -BCD   -->   LOAD   MESSAGE   .LINE   (LINE)   BLOCK   BUFFER   DR1   DR0   EMPTY-BUFFERS   FLUSH   UPDATE   +BUF   PREV   USE   M/MOD   */   */MOD   MOD   /   /MOD   *   M/   M*   MAX   MIN   DABS   ABS   D+-   +-   S->D   COLD   ABORT   QUIT   (   DEFINITIONS   FORTH   VOCABULARY   IMMEDIATE   INTERPRET   ?STACK   DLITERAL   LITERAL   [COMPILE]   CREATE   ID.   ERROR   (ABORT)   -FIND   NUMBER   (NUMBER)   UPPER   WORD   PAD   HOLD   BLANKS   ERASE   FILL      QUERY   EXPECT   ."   (.")   -TRAILING   TYPE   COUNT   DOES>   <BUILDS           (         DECIMAL   HEX   SMUDGE   ]   [   COMPILE   ?LOADING   ?CSP   ?PAIRS   ?EXEC   ?COMP   ?ERROR   !CSP   PFA   NFA   CFA   LFA   LATEST   TRAVERSE   -DUP   SPACE   ROT   >   <   U<   =   -   C,   ,   ALLOT   HERE   2+   1+   HLD   R#   CSP   FLD   DPL   BASE   STATE   CURRENT   CONTEXT   OFFSET   SCR   OUT   IN   BLK   VOC-LINK   DP   FENCE   WARNING   WIDTH   TIB   +ORIGIN   B/SCR   B/BUF   LIMIT   FIRST   C/L   BL   3   2   1   0   USER   VARIABLE   CONSTANT   ;   :   C!   !   C@   @   T   +!   DUP   SWAP   DROP   OVER   DMINUS   MINUS   D+   +   0<   0=   R   R>   >R   LEAVE        RP!   SP!   SP@   XOR   OR   AND   U/   U*   CMOVE   CR   ?TERMINAL   KEY   EMIT   ENCLOSE   (FIND)   DIGIT   I   (DO)   (+LOOP)   (LOOP)   0BRANCH   BRANCH   EXECUTE   CLIT   LIT   OK
 OK

After downloading the FPGA, I ran Klaus' functional test program, and it passed as expected. I then booted fig-FORTH 1.0a, ran vlist to get a listing of the dictionary, and then typed in and ran a small program (from Leo Brodie's Starting Forth). The program ran fine and a second vlist shows the words of the program added to the dictionary.

Any suggestions for additional tests are appreciated. I did try to FORGET some of the words in my test program, but that didn't seem to work as I expected.

barrym95838 · Post by **barrym95838** » Wed Dec 17, 2014 4:20 pm

MichaelM wrote:

... I did try to FORGET some of the words in my test program, but that didn't seem to work as I expected.

I think that FORGET truncates the dictionary at the word you specify, effectively forgetting that word and every word defined later than it. Is that the behavior you noted?

Mike

P.S. I had a feeling that enso's binary would figure into the mix somehow.

MichaelM · Post by **MichaelM** » Wed Dec 17, 2014 7:49 pm

No. I may have misused it do to lack of experience, but it did not forget anything. I read that it will ignore any request to forget words below the "fence", but I attempted to remove words that I had recently added.

I did not use the binary because of the differences in the memory map between the ChoChi board and my M65C02/M16C5x and Chameleon Development Boards. I only had to modify the memory defines, provide the addresses of the monitor routines for character I/O and carriage return output, and re-assemble. I did modify the character output call in the fig-FORTH kernel to mask off the msb of the character in order to provide only 7-bit ASCII to the console. This is required in order to use the console output routine with VLIST because fig-FORTH sets the msb of the last character in the name.

MichaelM · Post by **MichaelM** » Thu Dec 18, 2014 4:38 am

I have been experimenting a bit with the fig-FORTH implementation running on the 14.7456 MHz M65C02/M16C5x Development Board. I located a number of benchmarks. One in particular was a Sieve of Eratosthenes in FORTH.

I placed a loop around the sieve and ran it 10000 times. The sieve program/word was finding the primes between 2 and 100. With a stop watch I measured a time of 135 seconds. Since 135 s is the time for 10000 iterations of the inner prime factor routine, that time translates into 13.5 ms per iteration.

Does anyone know where to locate the timings for a similar benchmark run on a stock 6502/65C02?

BigEd · Post by **BigEd** » Thu Dec 18, 2014 8:42 am

There's an example program with timings around p21 of this PDF:
http://www.forth.org/TM-10656.pdf
(But it's seeking a different range of primes so the times won't be comparable to yours.)

Their code runs on an Apple II in 190 seconds, on a 5MHz PC in 70s and a 12MHz Z8 in 102s. I think the commentary is saying that the FORTH code in question isn't especially good, and it's then sped up by 3x.

MichaelM · Post by **MichaelM** » Thu Dec 18, 2014 2:38 pm

Thanks for the link.

Somewhere else I found a Forth version of the Sieve of Eratosthenes that used a search range of 100. I downloaded one program, but it didn't work at all. The second program that I downloaded yielded the same results as those given for a Fortran 90 version of the sieve after I modified the search range.

I had downloaded and read that paper earlier last evening and decided that the search range was probably bigger than my little board could support with internal memory.

I determined that for a $D000 fig-FORTH location, running the sieve with that large a search range eats the configuration. However, I found that for a $0400 fig-FORTH location, there is sufficient memory to allocate the flag array.

With the search range set to 16384 and 10 iterations, the 14.7456 MHz M65C02A executes the sieve benchmark (using the first working program) in a stop watch time of approximately 17.5 seconds. (Note: the program linked to by BigEd uses a value of 8190, but it takes one optimization and that is that 2 is a prime and all its multiples may be ignored. Thus, the search space is reduced by half, i.e. the maximum number of cells required is 8190.)

The time measured is 10.85x faster than the Apple II time given in the reference. Examining the differences between the sieve program that I've been using and the one given in the reference, I found that the program I used included an unnecessary multiplication. The program provided in the reference used the FORTH idiom "DUP +" to perform a multiply by 2 rather than using "*". With this change, the stop watch time for 10 iterations is approximately 10.5 seconds rather than 17.5 seconds. (A substantial improvement.)

The second measured time is 18.1x faster than the Apple II time given in the reference. The 65CE02 datasheet indicates that it is roughly 25% faster than a standard 65C02 because it eliminates the dead instruction/memory cycles of the 65C02. The M65C02A also eliminates all dead instruction/memory cycles like the 65CE02. At a base frequency of 14.7456 MHz, a 25% improvement in execution time due to dead cycle elimination would yield an effective clock rate of 18.432 MHz. This result is very close to the calculated speed difference of 18.1x between the 1 MHz Apple II 6502 Sieve benchmark and the 14.7456 MHz M65C02A Sieve benchmark.

I had contended that previously contended that the M65C02/M65C02A cores would show an improvement over a standard, cycle accurate core implementation, but had been unable to find a way to measure that difference with a commonly available program. This little exercise has allowed that feature of the M65C02/M65C02A cores to be demonstrated. All in all a good exercise.

Thanks BigEd.

Now on to adding the planned FORTH VM support instructions to the M65C02A. Before closing this post, I've included the source for the two FORTH Sieve of Eratosthenes below:

Code: Select all

: 2DROP drop drop ;
16384 2 / CONSTANT maxp 
: SIEVE                             ( -- n )
  HERE maxp 1 FILL
  1                                 ( count, including 2 )
  maxp 0 DO
    I HERE + C@ IF
      I 2 * 3 + ( dup . ) DUP  I +  ( prime current )
      BEGIN  DUP maxp U<
      WHILE  0 OVER HERE + C!
             OVER +
      REPEAT
      2DROP 1+
    THEN
  LOOP ;
: PRIMES ." S" 10 1 do sieve sp! loop sp! ." E" ;

Code: Select all

8192 constant size
0 variable flags size 1 - allot
: sieve
    flags size 1+ 1 fill
    0 size 0
    do flags i + c@
        if i dup + 3 + dup i +
            begin dup size <
            while 0 over flags + c! over + repeat
            drop drop 1+
        then
    loop ;
: primes ." S" 10 1 do sieve sp! loop sp! ." E" ;

GARTHWILSON · Post by **GARTHWILSON** » Thu Dec 18, 2014 6:59 pm

MichaelM wrote:

I found that the program I used included an unnecessary multiplication. The program provided in the reference used the FORTH idiom "DUP +" to perform a multiply by 2 rather than using "*".

2* ("two-star") is a standard Forth word which multiplies by two by shifting rather than by multiplying.

Refer also to our topic on multiplying which I started with my bug fix for UM* and then Bruce offered the first performance improvement and then there were other improvements, plus comparison of different methods' performance, at viewtopic.php?f=9&t=689 . And if that's not enough, I have the large tables for super fast, accurate 16-bit math at http://wilsonminesco.com/16bitMathTables/index.html .

MichaelM · Post by **MichaelM** » Thu Dec 18, 2014 10:04 pm

GARTHWILSON wrote:

2* ("two-star") is a standard Forth word which multiplies by two by shifting rather than by multiplying.

That may be for a standard FORTH dialect, but I don't find it listed in the off-the-shelf fig-FORTH dictionary of the implementation with which I am currently working. Without a primitive for 2* using an arithmetic left shift operator, I can image DUP + always being faster than 2 * even using a fast multiply algorithm or a table-driven multiplication algorithm, or am I missing something?

GARTHWILSON · Post by **GARTHWILSON** » Thu Dec 18, 2014 10:48 pm

I'm surprised to find (I just verified) that 2* is not in fig-Forth. I have never been aware of any Forth that didn't have it. You could add it:

Code: Select all

        HEADER "2*", NOT_IMMEDIATE
_2STAR: ASL  0,X
        ROL  1,X
        JMP  NEXT

which will take just a few cycles plus NEXT, rather than nest, NEXT, DUP, NEXT, ADD, NEXT, unnest, NEXT, so it runs much faster than doing it as a colon definition.

As for the look-up tables, a full look-up table for multiplying two inputs of 16 bits each would be prohibitively huge at 16 gigabytes (4G answers with 4 bytes each); so the multiplication table is 256x256 or 64K 16-bit cells, meaning 128KB for the one table, so yes, you would have to do one to four look-ups and piece it together. A 6502's smaller address space would need a window into a larger address space for the big tables (my whole set of tables combined takes 2MB, and I can provide them in a pair of 1MB EPROMs, free except for shipping), but the 65816 can address the tables directly, without such a window. The tables become more valuable for functions that are more complex, so a log or trig function that would require several divisions and multiplications to actually calculate is reduced to a single look-up, with no interpolation, because every single answer is there, pre-calculated, accurate to all 16 bits. They were calculated with a machine that uses 12 decimal digits in floating-point, plus 3 guard digits, then rounded for the 16-bit output and converted to hex. For division, you can use the inversion table which has 32-bit outputs, take the bytes of interest, and then multiply the numerator by the inverse of the denominator.

MichaelM · Post by **MichaelM** » Thu Dec 18, 2014 11:20 pm

There were a number of other words that the fig-FORTH implementation I am using did not support but appeared to be expected as standard: 2DUP, 2DROP. It is notable that these words were easy to add, although if defined as primitives would have been faster.

Although I am quite a newbie to FORTH, I was pleasantly surprised at the ease with which some things are easy to incorporate/test. For example, I was able to configure the second serial port all in immediate FORTH mode, and to write some simple data to it. I can see why you use it extensively for developing your testers.

I still have quite a ways to go before being comfortable with much of the fig-FORTH kernel, but I can see that it might be a useful skill to have since it is relatively painless to include a working ITC FORTH kernel in a small amount of ROM. Such a built-in feature can be used to provide capabilities beyond the functions provided by the monitor in a very resource friendly manner.

I would be interested in reading about supporting interrupts, even as colon definitions, in FORTH. Do you have any recommended readings on this subject?

GARTHWILSON · Post by **GARTHWILSON** » Thu Dec 18, 2014 11:23 pm

MichaelM wrote:

I would be interested in reading about supporting interrupts, even as colon definitions, in FORTH. Do you have any recommended readings on this subject?

http://6502.org/tutorials/zero_overhead ... rupts.html

MichaelM · Post by **MichaelM** » Fri Dec 19, 2014 12:21 am

Very cool. Thanks. Pretty effective way to incorporate a very necessary element into a FORTH kernel. Have bookmarked and may consider expanding the fig-FORTH kernel to incorporate the concepts/words you described. Since the M65C02A interrupt controller provides 16 vectors, will have to expand your concepts to support the external IRQ and the 8 internal IRQ[7:0] maskable interrupt requests. (The other 7 requests support non-maskable interrupts and traps: NMI, COP, BRK, INValid instruction, ABORT, etc.)

barrym95838 · Post by **barrym95838** » Fri Dec 19, 2014 2:04 am

GARTHWILSON wrote:

MichaelM wrote:

I would be interested in reading about supporting interrupts, even as colon definitions, in FORTH. Do you have any recommended readings on this subject?

http://6502.org/tutorials/zero_overhead ... rupts.html

Garth, I hope that I'm not being hopelessly ignorant with these questions, but could you explain for me why you appear to be wasting three cycles with the JMP W-1 instructions in your versions of NEXT? Why not just JMP (W) straight from within NEXT, or, better yet, put all of NEXT in zero-page and self-modify an indirect JMP instruction that contains W inside its operand field? Am I somehow losing a level of indirection by thinking this way, and changing ITC to DTC?

Mike

[Edit: Thinking about it further, I have come to the realization that my first suggestion would be DTC, and only the zero-page NEXT method would be ITC. I think ... ]

GARTHWILSON · Post by **GARTHWILSON** » Fri Dec 19, 2014 3:33 am

barrym95838 wrote:

could you explain for me why you appear to be wasting three cycles with the JMP W-1 instructions in your versions of NEXT? Why not just JMP (W) straight from within NEXT, or, better yet, put all of NEXT in zero-page and self-modify an indirect JMP instruction that contains W inside its operand field? Am I somehow losing a level of indirection by thinking this way, and changing ITC to DTC?

Mike

[Edit: Thinking about it further, I have come to the realization that my first suggestion would be DTC, and only the zero-page NEXT method would be ITC. I think ... ]

I believe you're looking at a ROM version, with NEXT in ROM. And yes, it's ITC. I made my '816 Forth for ROM also; but at startup, it copies the NEXT image from ROM to ZP RAM so you can run it there more efficiently, using self-modifying code; ie, the W variable is the operand of the JMP() instruction. IP is also done as part of self-modifying code, being the operand of an LDA abs instruction.

barrym95838 · Post by **barrym95838** » Sat Dec 20, 2014 6:04 am

Got it. Okay, question #2:

Code: Select all

setirq:              ; Use to record IRQ for NEXT.  Put this address in MIRQVEC.
     STZ  irqnot     ; Record that interrupt was req'ed by storing 0 in irqnot.
     STA  tempA      ; Temporarily save accumulator in tempA to restore below.
        PLA          ; Pull saved processor status byte off the μP stack,
        ORA  #4      ; set the bit corresponding to the interrupt disable,
        PHA          ; and push the revised status byte back on the stack.
     LDA  tempA      ; Restore the accumulator content.
     RTI             ; Return from interrupt.  μP status gets restored modified.

Could this be safely shortened to

Code: Select all

setirq:              ; Use to record IRQ for NEXT.  Put this address in MIRQVEC.
     STZ  irqnot     ; Record that interrupt was req'ed by storing 0 in irqnot.
     PLP
     SEI
     PHP
     RTI             ; Return from interrupt.  μP status gets restored modified.

... without issues? Or is there a risk of another IRQ hitting between the PLP and SEI, mucking up the works?

Mike

Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up

Re: Help on fig-FORTH 1.0 Bring Up