6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 12:39 pm

All times are UTC




Post new topic Reply to topic  [ 73 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Thu Feb 06, 2020 4:37 pm 
Offline

Joined: Fri Apr 15, 2016 1:03 am
Posts: 140
Here are some replacement DO group words for Tali that generate smaller inline code. They have been lightly tested.
The same code fragment as above is then compiled using the replacement words & disassembled using current standard Tali.

Code:
Tali Forth 2 kernel for 65816s (5 Jan 2020)


Tali Forth 2 for the 65c02
Version 1.0 24. Jan 2020
Copyright 2014-2020 Scot W. Stevenson
Tali Forth 2 comes with absolutely NO WARRANTY
Type 'bye' to exit

A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00e014 lsr a
.@f_zz2.txt
.g
\ Replacement DO words for Tali that generate smaller inline code  ok
\ 65c02 code done without Tali's assembler, so this will run on small Tali.  ok
  ok
decimal  ok
5 nc-limit !  \ inline code up to 5 bytes  ok
true strip-underflow !  \ remove underflow checking  ok
  ok
\ Labels for unlabelled Tali internal variables  ok
0 constant User0  ok
User0 36 + constant tmp1  \ temporary storage  ok
User0 38 + constant tmp2  \ temporary storage  ok
User0 40 + constant tmp3  \ temporary storage (especially for print)  ok
  ok
hex  ok
  ok
: Branch, ( addr opcode -- )  \ compile a short or long 6502 branch  compiled
  over 1- 1- here - dup ff7f u> if  compiled
    swap c, c, drop  \ opcode & displacement  compiled
   else  compiled
    drop 20 xor c, 3 c,  \ opcode with reversed sense to branch around jmp  compiled
    4c c, ,  \ jmp  compiled
   then  compiled
  ;  ok
  ok
: Do_Run [ ( limit start -- ) ( R: -- limit start )  \ DO runtime  ok
  7a68 , \ pla  ply   \ pop rts addr  ok
    1a c,  01d0 , c8 c, \ inc.a 1 bne iny  ok
    85 c, tmp2 c, 84 c, tmp2 1+ c, \ tmp2 sta.z tmp2 1+ sty.z  ok
  38 c, 00a9 , 02f5 , a8 c, \ sec  0 lda.#  2 sbc.zx  tay \ push limit  ok
    80a9 , 03f5 ,  85 c, tmp3 c, 5a48 , \ 80 lda.#  3 sbc.zx  tmp3 sta.z  pha  phy  ok
  1898 , 0075 , a8 c, \ tya  clc  0 adc.zx  tay  \ push index  ok
    01b5 , 65 c, tmp3 c, 5a48 , \ 1 lda.zx  tmp3 adc.z  pha  phy   ok
  e8e8 , e8e8 , \ inx  inx  inx  inx  \ discard params  ok
  6c c, tmp2 , \ tmp2 jmp.i  ok
  ] ;  ok
  ok
variable Do_Leave  \ current LEAVE chain anchor  ok
  ok
: Do   compiled
  \ https://forth-standard.org/standard/core/DO  compiled
  Do_Leave @  \ save old leave ptr  compiled
  0 Do_Leave !  \ init  compiled
  ['] Do_Run compile,  compiled
  here  \ save do_addr to patch LOOP later  compiled
  ; immediate compile-only redefined do  ok
  ok
: ?Do  compiled
  abort \ incomplete  compiled
  ; immediate compile-only redefined ?do  ok
  ok
: Leave  compiled
  4c c, here Do_Leave @ , Do_Leave ! ; immediate compile-only redefined leave  ok
  ok
: +Loop_Run [ ( n -- )  \ +LOOP runtime  ok
  bada , a88a , fa c, \  phx  tsx  txa  tay  plx  \ Y=S-1  ok
  b918 , 104 , 0075 , 99 c, 104 , \ clc  104 lda.y  0 adc.zx  104 sta.y  \ add to index  ok
    b9 c, 105 , 0175 , 99 c, 105 , \  105 lda.y  1 adc.zx  105 sta.y  ok
  e8e8 , \ inx  inx  \ drop n  ok
  ] ; never-native  ok
  ok
: Loop_Run [  \ LOOP runtime  ok
  bada , \ phx  tsx  ok
  bd18 , 104 , 0169 , 9d c, 104 , \ clc  104 lda.x  1 adc.#  104 sta.x  \ add to index  ok
    bd c, 105 , 0069 , 9d c, 105 , \  105 lda.x  0 adc.#  105 sta.x  ok
  fa c, \ plx  ok
  ] ; never-native  ok
  ok
: LoopEnd ( OldLeave do_addr -- )  compiled
  50 Branch,  \ bvc do_addr  compiled
  Do_Leave @ begin dup while   \ patch leave chain  compiled
    dup @ swap here swap ! repeat drop  compiled
  Do_Leave !  \ restore OldLeave  compiled
  6868 , 6868 ,  \ pla pla pla pla  discard loop variables  compiled
  ;  ok
:  Loop  compiled
  [']  Loop_Run compile,  LoopEnd ; IMMEDIATE COMPILE-ONLY redefined loop  ok
: +Loop  compiled
  ['] +Loop_Run compile,  LoopEnd ; IMMEDIATE COMPILE-ONLY redefined +loop  ok
  ok
: I [ ( -- n )  ok
  bada , \ phx tsx  ok
  bd38 , 104 , fd c, 106 , a8 c, \ sec 104 lda.x 106 sbc.x tay  ok
    bd c, 105 , fd c, 107 , \  105 lda.x 107 sbc.x  ok
  fa c, \ plx  ok
  caca , 0195 ,  0094 , \ dex dex 1 sta.zx 0 sty.zx  ok
  ] ; never-native redefined i  ok
  ok
: J [ ( -- n )  ok
  bada , \ phx tsx  ok
  bd38 , 108 , fd c, 10a , a8 c, \ sec 108 lda.x 10a sbc.x tay  ok
    bd c, 109 , fd c, 10b , \  109 lda.x 10b sbc.x  ok
  fa c, \ plx  ok
  caca , 0195 ,  0094 , \ dex dex 1 sta.zx 0 sty.zx  ok
  ] ; never-native redefined j  ok
  ok
decimal  ok
here ' User0 - .  \ display size of this part 598  ok
 eof
  ok

A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00e014 lsr a
.@f_drozyak1.txt
.g
\ http://forum.6502.org/viewtopic.php?f=9&t=5911&start=30  ok
 5 nc-limit !  \ limit inline code to <=5 bytes per word  ok
 true strip-underflow !  \ omit underflow checking code  ok
  ok
: 3drop drop 2drop ;  ok
: r>> r> r> ;  ok
create tiles 20 allot  ok
create tile_colors 20 allot  ok
  ok
  ok
: TileID> ( ID -- addr)                \ look up tile address from ID. tiles is list of pointers to tile data  compiled
   cells tiles + @ ;  ok
see tileID>
nt: AD6  xt: AE5
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 12

0AE5  20 CC A3 20 95 0A 20 65  99 20 70 8F   .. .. e . p.

AE5   A3CC jsr      cells
AE8    A95 jsr      tiles
AEB   9965 jsr      +
AEE   8F70 jsr      @
 ok
        ok
: TileHW ( addr -- addr+2 w h)         \ generate pointer to pixel data. fetch width and height  compiled
   dup c@ >r 1+                        \ get width  compiled
   dup c@ >r 1+                        \ get height  compiled
   r>> swap ;  ok
see TileHW
nt: AF2  xt: B00
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 30

0B00  20 9F 8D 20 FB 85 20 EC  A2 20 9F 97 20 9F 8D 20   .. .. . . .. ..
0B10  FB 85 20 EC A2 20 9F 97  20 81 0A 20 2A A1  .. .. ..  .. *.

B00   8D9F jsr      dup
B03   85FB jsr      c@
B06   A2EC jsr      >r
B09   979F jsr      1+
B0C   8D9F jsr      dup
B0F   85FB jsr      c@
B12   A2EC jsr      >r
B15   979F jsr      1+
B18    A81 jsr      r>>
B1B   A12A jsr      swap
 ok
        ok
: ColorID>   ( ID -- addr )            \ look up color address from ID. tile_colors is list of pointers to color tables  compiled
   cells tile_colors + @ ;  ok
see ColorID>
nt: B1F  xt: B2F
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 12

0B2F  20 CC A3 20 BF 0A 20 65  99 20 70 8F   .. .. e . p.

B2F   A3CC jsr      cells
B32    ABF jsr      tile_colors
B35   9965 jsr      +
B38   8F70 jsr      @
 ok
  ok
: ColorTile ( tileID colorID -- )  compiled
   ColorID>                            \ get address of color table from ID  compiled
   1+ dup c@                           \ fetch length of color table  compiled
   swap 1+                             \ point to color pairs  compiled
   rot  compiled
   TileID> TileHW nip                  \ stack: colorsize colorpair_addr tileaddr height  compiled
   0 do                                \ loop through all rows  compiled
      begin  compiled
         dup c@                        \ get first byte of length,color pair  compiled
         swap 1+ swap                  \ increment tile pointer  compiled
      while  compiled
         rot dup >r -rot               \ get size of color table  compiled
         r> 0 do                       \ loop through pairs in color table. stack: colorsize colorpairs tileaddr  compiled
            2dup c@                    \ get color from tile  compiled
            swap i 2 * + dup >r c@     \ look up match color in color pair and save address  compiled
            = if                       \ if pair matches pixel from tile  compiled
               r> 1+ c@                \ get color to change pixel to from pair  compiled
               over c!                 \ store in tile  compiled
               leave                   \ color found so stop looping  compiled
            then  compiled
            r> drop                    \ get rid of unused address  compiled
         loop  compiled
      repeat  compiled
   loop   compiled
   3drop ;                             \ clean up stack  ok
see ColorTile
nt: B3C  xt: B4D
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 162

0B4D  20 2F 0B 20 9F 97 20 9F  8D 20 FB 85 20 2A A1 20   /. .. . . .. *.
0B5D  9F 97 20 D9 9A 20 E5 0A  20 00 0B 20 32 96 20 23  .. .. ..  .. 2. #
0B6D  A7 20 B2 08 20 9F 8D 20  FB 85 20 2A A1 20 9F 97  . .. ..  .. *. ..
0B7D  20 2A A1 20 04 92 E3 0B  20 D9 9A 20 9F 8D 20 EC   *. ....  .. .. .
0B8D  A2 20 5E 96 20 14 9A 20  23 A7 20 B2 08 20 34 A3  . ^. ..  #. .. 4.
0B9D  20 FB 85 20 2A A1 20 2A  0A 20 23 A3 20 DC A0 20   .. *. * . #. ..
0BAD  65 99 20 9F 8D 20 EC A2  20 FB 85 20 BE 8E 20 04  e. .. ..  .. .. .
0BBD  92 D2 0B 20 14 9A 20 9F  97 20 FB 85 20 15 98 20  ... .. . . .. ..
0BCD  05 86 4C DC 0B 20 14 9A  E8 E8 20 88 09 50 BE 68  ..L.. .. .. ..P.h
0BDD  68 68 68 4C 71 0B 20 88  09 50 89 68 68 68 68 20  hhhLq. . .P.hhhh
0BED  70 0A  p.

B4D    B2F jsr      ColorID>
B50   979F jsr      1+
B53   8D9F jsr      dup
B56   85FB jsr      c@
B59   A12A jsr      swap
B5C   979F jsr      1+
B5F   9AD9 jsr      rot
B62    AE5 jsr      TileID>
B65    B00 jsr      TileHW
B68   9632 jsr      nip
B6B   A723 jsr      0
B6E    8B2 jsr      do
                      begin
B71   8D9F jsr          dup
B74   85FB jsr          c@
B77   A12A jsr          swap
B7A   979F jsr          1+
B7D   A12A jsr          swap
B80   9204 jsr         while
B83        ?
B84        ?
B85   9AD9 jsr          rot
B88   8D9F jsr          dup
B8B   A2EC jsr          >r
B8E   965E jsr          -rot
B91   9A14 jsr          r>
B94   A723 jsr          0
B97    8B2 jsr          do
B9A   A334 jsr            2dup
B9D   85FB jsr            c@
BA0   A12A jsr            swap
BA3    A2A jsr            i
BA6   A323 jsr            2
BA9   A0DC jsr            *
BAC   9965 jsr            +
BAF   8D9F jsr            dup
BB2   A2EC jsr            >r
BB5   85FB jsr            c@
BB8   8EBE jsr            =
BBB   9204 jsr            if
BBE      B cmp.zi
BC0   9A14 jsr              r>
BC3   979F jsr              1+
BC6   85FB jsr              c@
BC9   9815 jsr              over
BCC   8605 jsr              c!
BCF    BDC jmp              leave
                           then
BD2   9A14 jsr            r>
BD5        inx            drop
BD6        inx
BD7    988 jsr           loop
BDA     BE bvc
BDC        pla
BDD        pla
BDE        pla
BDF        pla
BE0    B71 jmp         repeat
BE3    988 jsr       loop
BE6     89 bvc
BE8        pla
BE9        pla
BEA        pla
BEB        pla
BEC    A70 jsr      3drop
 ok
 eof


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 07, 2020 2:27 am 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
leepivonka, thanks for posting this. I see that do is smaller but what is the speed trade off?


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 07, 2020 8:03 am 
Offline

Joined: Fri Apr 15, 2016 1:03 am
Posts: 140
Here are 2 quick benchmarks.

\ using built-in DO
: x cc@ 10000 0 do i drop loop cc@ d- d. ;
: x2 cc@ 10000 0 do i drop 2 +loop cc@ d- d. ;

\ using replacement DO
: y cc@ 10000 0 do i drop loop cc@ d- d. ;
: y2 cc@ 10000 0 do i drop 2 +loop cc@ d- d. ;

x runs in 1130236 cycles (about 113 cycles per loop)
y runs in 1110245 cycles (about 111 cycles per loop, about 2% faster)

x2 runs in 565236 cycles (about 113 cycles per loop)
y2 runs in 745245 cycles (about 149 cycles per loop, about 32% slower)


Code:
Tali Forth 2 kernel for 65816s (5 Jan 2020)


Tali Forth 2 for the 65c02
Version 1.0 24. Jan 2020
Copyright 2014-2020 Scot W. Stevenson
Tali Forth 2 comes with absolutely NO WARRANTY
Type 'bye' to exit

A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00e014 lsr a
.@f_tali_emul.txt
.g
\ 65816s emulator stuff for Tali  ok
  ok
hex  ok
6 nc-limit !  ok
true strip-underflow !  ok
  ok
: ic@ [ ( -- d ) \ get emulator instruction count  ok
  f602 ,  \ f6 cop  \ get emulator instruction count in YA  ok
  caca , caca , \ dex  dex  dex  dex  ok
  0295 , eb c, 0395 , \ 2 sta.dx  xba  3 sta.dx  ok
  0094 , 0174 ,  \ 0 sty.dx  1 stz.dx  ok
  ] ;  ok
  ok
: cc@ [ ( -- d ) \ get emulator cycle count  ok
  f502 ,  \ f5 cop  ok
  caca , caca , \ dex  dex  dex  dex  ok
  0295 , eb c, 0395 , \ 2 sta.dx  xba  3 sta.dx  ok
  0094 , 0174 ,  \ 0 sty.dx  1 stz.dx  ok
  ] ;  ok
  ok
decimal  ok
 eof
  ok
\ using built-in DO  ok
: x cc@ 10000 0 do i drop loop cc@ d- d. ;  ok
: x2 cc@ 10000 0 do i drop 2 +loop cc@ d- d. ;  ok
  ok
  ok

A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00e014 lsr a
.@talido2.txt
.g
\ Replacement DO words for Tali that generate smaller inline code  ok
  ok
decimal  ok
5 nc-limit !  \ inline code up to 5 bytes  ok
true strip-underflow !  \ remove underflow checking  ok
  ok
\ Labels for unlabelled Tali internal variables  ok
0 constant User0  ok
User0 36 + constant tmp1  \ temporary storage  ok
User0 38 + constant tmp2  \ temporary storage  ok
User0 40 + constant tmp3  \ temporary storage (especially for print)  ok
  ok
hex  ok
  ok
: Branch, ( addr opcode -- )  \ compile a short or long 6502 branch  compiled
  over 1- 1- here - dup ff7f u> if  compiled
    swap c, c, drop  \ opcode & displacement  compiled
   else  compiled
    drop 20 xor c, 3 c,  \ opcode with reversed sense to branch around jmp  compiled
    4c c, ,  \ jmp  compiled
   then  compiled
  ;  ok
  ok
: Do_Run [ ( limit start -- ) ( R: -- limit start )  \ DO runtime  ok
  7a68 , \ pla  ply   \ pop rts addr  ok
    1a c,  01d0 , c8 c, \ inc.a 1 bne iny  ok
    85 c, tmp2 c, 84 c, tmp2 1+ c, \ tmp2 sta.z tmp2 1+ sty.z  ok
  38 c, 00a9 , 02f5 , a8 c, \ sec  0 lda.#  2 sbc.zx  tay \ push limit  ok
    80a9 , 03f5 ,  85 c, tmp3 c, 5a48 , \ 80 lda.#  3 sbc.zx  tmp3 sta.z  pha  phy  ok
  1898 , 0075 , a8 c, \ tya  clc  0 adc.zx  tay  \ push index  ok
    01b5 , 65 c, tmp3 c, 5a48 , \ 1 lda.zx  tmp3 adc.z  pha  phy   ok
  e8e8 , e8e8 , \ inx  inx  inx  inx  \ discard params  ok
  6c c, tmp2 , \ tmp2 jmp.i  ok
  ] ;  ok
  ok
variable Do_Leave  \ current LEAVE chain anchor  ok
  ok
: Do   compiled
  \ https://forth-standard.org/standard/core/DO  compiled
  Do_Leave @  \ save old leave ptr  compiled
  0 Do_Leave !  \ init  compiled
  ['] Do_Run compile,  compiled
  here  \ save do_addr to patch LOOP later  compiled
  ; immediate compile-only redefined do  ok
  ok
: ?Do  compiled
  abort \ incomplete  compiled
  ; immediate compile-only redefined ?do  ok
  ok
: Leave  compiled
  4c c, here Do_Leave @ , Do_Leave ! ; immediate compile-only redefined leave  ok
  ok
: +Loop_Run [ ( n -- )  \ +LOOP runtime  ok
  bada , a88a , fa c, \  phx  tsx  txa  tay  plx  \ Y=S-1  ok
  b918 , 104 , 0075 , 99 c, 104 , \ clc  104 lda.y  0 adc.zx  104 sta.y  \ add to index  ok
    b9 c, 105 , 0175 , 99 c, 105 , \  105 lda.y  1 adc.zx  105 sta.y  ok
  e8e8 , \ inx  inx  \ drop n  ok
  ] ; never-native  ok
  ok
: Loop_Run [  \ LOOP runtime  ok
  bada , \ phx  tsx  ok
  bd18 , 104 , 0169 , 9d c, 104 , \ clc  104 lda.x  1 adc.#  104 sta.x  \ add to index  ok
    bd c, 105 , 0069 , 9d c, 105 , \  105 lda.x  0 adc.#  105 sta.x  ok
  fa c, \ plx  ok
  ] ; never-native  ok
  ok
: LoopEnd ( OldLeave do_addr -- )  compiled
  50 Branch,  \ bvc do_addr  compiled
  Do_Leave @ begin dup while   \ patch leave chain  compiled
    dup @ swap here swap ! repeat drop  compiled
  Do_Leave !  \ restore OldLeave  compiled
  6868 , 6868 ,  \ pla pla pla pla  discard loop variables  compiled
  ;  ok
:  Loop  compiled
  [']  Loop_Run compile,  LoopEnd ; IMMEDIATE COMPILE-ONLY redefined loop  ok
: +Loop  compiled
  ['] +Loop_Run compile,  LoopEnd ; IMMEDIATE COMPILE-ONLY redefined +loop  ok
  ok
: I [ ( -- n )  ok
  bada , \ phx tsx  ok
  bd38 , 104 , fd c, 106 , a8 c, \ sec 104 lda.x 106 sbc.x tay  ok
    bd c, 105 , fd c, 107 , \  105 lda.x 107 sbc.x  ok
  fa c, \ plx  ok
  caca , 0195 ,  0094 , \ dex dex 1 sta.zx 0 sty.zx  ok
  ] ; never-native redefined i  ok
  ok
: J [ ( -- n )  ok
  bada , \ phx tsx  ok
  bd38 , 108 , fd c, 10a , a8 c, \ sec 108 lda.x 10a sbc.x tay  ok
    bd c, 109 , fd c, 10b , \  109 lda.x 10b sbc.x  ok
  fa c, \ plx  ok
  caca , 0195 ,  0094 , \ dex dex 1 sta.zx 0 sty.zx  ok
  ] ; never-native redefined j  ok
  ok
decimal  ok
here ' User0 - .  \ display size of this part 598  ok
 eof
\ using replacement DO  ok
: y cc@ 10000 0 do i drop loop cc@ d- d. ;  ok
: y2 cc@ 10000 0 do i drop 2 +loop cc@ d- d. ;  ok
  ok
x -1130236  ok
y -1110245  ok
  ok
x2 -565236  ok
y2 -745245  ok
  ok
see x
nt: 836  xt: 83F
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 120

083F  20 26 08 20 88 93 10 27  CA CA 74 00 74 01 A9 08   &. ...' ..t.t...
084F  48 A9 AD 48 38 A9 00 F5  02 95 02 A9 80 F5 03 95  H..H8... ........
085F  03 48 B5 02 48 18 B5 00  75 02 95 00 B5 01 75 03  .H..H... u.....u.
086F  48 B5 00 48 E8 E8 E8 E8  CA CA 86 2A BA 38 BD 01  H..H.... ...*.8..
087F  01 FD 03 01 A8 BD 02 01  FD 04 01 A6 2A 95 01 94  ........ ....*...
088F  00 E8 E8 20 8A 97 18 68  75 00 A8 B8 68 75 01 48  ... ...h u...hu.H
089F  98 48 E8 E8 70 03 4C 77  08 68 68 68 68 68 68 20  .H..p.Lw .hhhhhh
08AF  26 08 20 2E 8A 20 D4 8C   &. .. ..

83F    826 jsr      cc@
842   9388 jsr      10000
845     27 bpl
847        dex      0
848        dex
849      0 stz.zx
84B      1 stz.zx
84D      8 lda.#    do
84F        pha
850     AD lda.#
852        pha
853        sec
854      0 lda.#
856      2 sbc.zx
858      2 sta.zx
85A     80 lda.#
85C      3 sbc.zx
85E      3 sta.zx
860        pha
861      2 lda.zx
863        pha
864        clc
865      0 lda.zx
867      2 adc.zx
869      0 sta.zx
86B      1 lda.zx
86D      3 adc.zx
86F        pha
870      0 lda.zx
872        pha
873        inx
874        inx
875        inx
876        inx
877        dex        i
878        dex
879     2A stx.z
87B        tsx
87C        sec
87D    101 lda.x
880    103 sbc.x
883        tay
884    102 lda.x
887    104 sbc.x
88A     2A ldx.z
88C      1 sta.zx
88E      0 sty.zx
890        inx        drop
891        inx
892   978A jsr       loop
895        clc
896        pla
897      0 adc.zx
899        tay
89A        clv
89B        pla
89C      1 adc.zx
89E        pha
89F        tya
8A0        pha
8A1        inx
8A2        inx
8A3      3 bvs
8A5    877 jmp
8A8        pla
8A9        pla
8AA        pla
8AB        pla
8AC        pla
8AD        pla
8AE    826 jsr      cc@
8B1   8A2E jsr      d-
8B4   8CD4 jsr      d.
 ok
see y
nt: B9E  xt: BA7
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 37

0BA7  20 26 08 20 88 93 10 27  20 23 A7 20 ED 09 20 65   &. ...'  #. .. e
0BB7  0B E8 E8 20 C3 0A 50 F6  68 68 68 68 20 26 08 20  ... ..P. hhhh &.
0BC7  2E 8A 20 D4 8C  .. ..

BA7    826 jsr      cc@
BAA   9388 jsr      10000
BAD     27 bpl
BAF   A723 jsr      0
BB2    9ED jsr      do
BB5    B65 jsr        i
BB8        inx        drop
BB9        inx
BBA    AC3 jsr       loop
BBD     F6 bvc
BBF        pla
BC0        pla
BC1        pla
BC2        pla
BC3    826 jsr      cc@
BC6   8A2E jsr      d-
BC9   8CD4 jsr      d.
 ok
see x2
nt: 8B8  xt: 8C2
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 120

08C2  20 26 08 20 88 93 10 27  CA CA 74 00 74 01 A9 09   &. ...' ..t.t...
08D2  48 A9 30 48 38 A9 00 F5  02 95 02 A9 80 F5 03 95  H.0H8... ........
08E2  03 48 B5 02 48 18 B5 00  75 02 95 00 B5 01 75 03  .H..H... u.....u.
08F2  48 B5 00 48 E8 E8 E8 E8  CA CA 86 2A BA 38 BD 01  H..H.... ...*.8..
0902  01 FD 03 01 A8 BD 02 01  FD 04 01 A6 2A 95 01 94  ........ ....*...
0912  00 E8 E8 20 23 A3 18 68  75 00 A8 B8 68 75 01 48  ... #..h u...hu.H
0922  98 48 E8 E8 70 03 4C FA  08 68 68 68 68 68 68 20  .H..p.L. .hhhhhh
0932  26 08 20 2E 8A 20 D4 8C   &. .. ..

8C2    826 jsr      cc@
8C5   9388 jsr      10000
8C8     27 bpl
8CA        dex      0
8CB        dex
8CC      0 stz.zx
8CE      1 stz.zx
8D0      9 lda.#    do
8D2        pha
8D3     30 lda.#
8D5        pha
8D6        sec
8D7      0 lda.#
8D9      2 sbc.zx
8DB      2 sta.zx
8DD     80 lda.#
8DF      3 sbc.zx
8E1      3 sta.zx
8E3        pha
8E4      2 lda.zx
8E6        pha
8E7        clc
8E8      0 lda.zx
8EA      2 adc.zx
8EC      0 sta.zx
8EE      1 lda.zx
8F0      3 adc.zx
8F2        pha
8F3      0 lda.zx
8F5        pha
8F6        inx
8F7        inx
8F8        inx
8F9        inx
8FA        dex        i
8FB        dex
8FC     2A stx.z
8FE        tsx
8FF        sec
900    101 lda.x
903    103 sbc.x
906        tay
907    102 lda.x
90A    104 sbc.x
90D     2A ldx.z
90F      1 sta.zx
911      0 sty.zx
913        inx        drop
914        inx
915   A323 jsr        2
918        clc       +loop
919        pla
91A      0 adc.zx
91C        tay
91D        clv
91E        pla
91F      1 adc.zx
921        pha
922        tya
923        pha
924        inx
925        inx
926      3 bvs
928    8FA jmp
92B        pla
92C        pla
92D        pla
92E        pla
92F        pla
930        pla
931    826 jsr      cc@
934   8A2E jsr      d-
937   8CD4 jsr      d.
 ok
see y2
nt: BCD  xt: BD7
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 40

0BD7  20 26 08 20 88 93 10 27  20 23 A7 20 ED 09 20 65   &. ...'  #. .. e
0BE7  0B E8 E8 20 23 A3 20 9A  0A 50 F3 68 68 68 68 20  ... #. . .P.hhhh
0BF7  26 08 20 2E 8A 20 D4 8C   &. .. ..

BD7    826 jsr      cc@
BDA   9388 jsr      10000
BDD     27 bpl
BDF   A723 jsr      0
BE2    9ED jsr      do
BE5    B65 jsr        i
BE8        inx        drop
BE9        inx
BEA   A323 jsr        2
BED    A9A jsr       +loop
BF0     F3 bvc
BF2        pla
BF3        pla
BF4        pla
BF5        pla
BF6    826 jsr      cc@
BF9   8A2E jsr      d-
BFC   8CD4 jsr      d.
 ok


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 08, 2020 2:40 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
leepivonka, thanks again but I think I'm gonna stick with the built in words for the extra speed. I think some other people who use Tali Forth will find your version useful.

Continuing on the strategy of mutating my Forth file with Python, I was able to save almost 600 bytes with the 40 or so byte length variables I have. Before I was using something like this:
Code:
: byte create 0 c, ;
byte display_X
byte display_Y
byte hero_X
byte hero_Y

5 hero_X c!

Now I use this:
Code:
begin_var
   var display_X
   var display_Y
   var hero_X
   var hero_Y
end_var

5 hero_X c!

begin_var resets an internal variable counter and lays down "create v0" where 0 is incremented on each call to begin_var. Each instance of var records the name of the variable and the variable counter, and increments the variable counter. end_var lays down the number of bytes allocated and "allot". Any time one of the variables declared with var is referenced, it's replaced with "[ v0 y + ] literal" where y is the literal value of the variable counter. The above code is converted to this:
Code:
create v0
4 allot

5 [ v0 2 + ] literal c!

This can only be used in compile mode but it saves a lot of space.

Creating a headerless version is going to be more work than I thought, so I won't be going down the rabbit hole after all. Instead, I should be able to fit the remaining code into additional banks by copying the entire 12k Tali Forth 2 installation into each of those banks.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 15, 2020 2:54 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Juggling several banks at the same time is getting tough. Can you help me understand how set-current and set-order work? This is what I have so far:

0000-3FFF - dictionary 1 (always enabled)
4000-BFFF - video memory or dictionary 2
C000-FFDF - Tali forth and 4.8k free memory where dictionaries 4, 5, 6, and 7 will be

The plan is to put dictionary 4 in the 4.8k free in ROM above Tali (since this is a just simulation) then make copies of Tali into three 16k RAM banks. Dictionaries 5, 6, and 7 will go into the 4.8k spaces in those RAM banks. As it is now, I can point 4000-BFFF to either dictionary 2 or video memory, define words for dictionary 2, then switch between video memory and dictionary 2 when I need to use it. After copying Tali to the 16k RAM banks I can run Tali from any of them and store dictionary 4 there. What I can't get to work is storing words from dictionary 5 in a different bank and getting them to work. I think I can figure it out if I understood set-order better.

Does it matter what order I enable the memory where a dictionary is versus enabling it with set-order? Does it matter where cp points after you are done compiling everything and just want to switch a bank in to run code? Does it matter if wordlists have overlapping address ranges?


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 15, 2020 8:58 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
Let me give you a brief overview of what set-order and set-current do and hopefully that will make your questions easier to answer. We'll start with set-current because that word is simpler.

SET-CURRENT is used to change which wordlist will be the "compilation" wordlist where new words are added. The selected "current" wordlist does not have to be in the search order, but of course you won't be able to use any of the new words until it is.

In Tali2, there is an array of wordlist pointers, with each pointer holding the address of the header of the most recent word added to that wordlist. As a word is added to a wordlist, a header is created for the new word. That header contains a link to the next word in the wordlist, and the new header is updated in this array of wordlist pointers to be the most recent word. This makes each wordlist a linked list where new words are added at the beginning of the list. SET-CURRENT just changes which wordlist pointer will be updated with the new word.

SET-ORDER fills in an array of indices into the above array of wordlist pointers. The order specified is the order that the wordlists will be searched when looking up a word. If a wordlist is not in the search order, it will not be searched at all. Forth will look through each wordlist until it finds the word. If it makes it through all of the wordlists without finding the word, it will try to turn the input into a number instead.

Druzyek wrote:
Does it matter what order I enable the memory where a dictionary is versus enabling it with set-order? Does it matter where cp points after you are done compiling everything and just want to switch a bank in to run code? Does it matter if wordlists have overlapping address ranges?

You are likely to have problems if you don't enable the bank that contains words in a particular wordlist. The VERY NEXT WORD after SET-ORDER might access the enabled wordlist. Technically, it will stop as soon as it finds the word, but any numbers used end up going through all of the wordlists in the search order before they are turned into a number.

It generally does not matter where CP points while you are running code except under the following conditions:

1. You use HERE, which puts the address of CP on the stack.

2. You use the scratchpad PAD area, which Tali defines as an offset from CP.

3. You output a number. On Tali2, the text of the number is snuck in between CP and PAD while it is being converted to text, and then it is TYPE-ed from there.

For your final question, the wordlists can have overlapping addresses (I'm assuming you mean in your bank area, where the address space is reused) as long as you only search a wordlist that is present in memory.

Some bonus information that might be useful:

Tali currently just puts the header at CP and then moves it along and compiles the word right after the header. Tali doesn't require the word to to be right after the header, however, meaning you could have the header in one memory area and the code in another. This currently would require some shenanigans with CP, but one might imagine making a separate HP (header pointer) and modifying the few words that create headers to use that instead of CP.

If you wanted to put the headers all in one bank (if they fit), you might modify Tali's lookup words to swap in that bank, do the lookup, and then restore the previous bank when done. Note that the headers are only accessed when you lookup up words using the Forth interpreter - so once you start running your program, you won't need that header bank at all and will only need to swap in the banks that have your executable code or data (which I think you have figured out, right?)


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 15, 2020 9:09 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
SamCoVT wrote:
You are likely to have problems if you don't enable the bank that contains words in a particular wordlist. The VERY NEXT WORD after SET-ORDER might access the enabled wordlist. Technically, it will stop as soon as it finds the word, but any numbers used end up going through all of the wordlists in the search order before they are turned into a number.
This actually needs some clarification, because it's complicated. What I said above is true if you are using set-order in interpreted mode. You have more leeway when using it in a word definition (compiling mode) because set-order won't be called right then. The wordlists used will be those currently active while you are compiling. You can get away with a lot more in a compiled word because all of the lookups have been done (during compiling) by the time you go to actually run the new word, and the compiled code will just be JSRs.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 29, 2020 6:27 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
I just figured out a very weird bug after many hours of debugging. Trying to compile this line caused Tali Forth to jump to uninitialized memory and crash:
Code:
map_data + c@ MAP_ROCK <>               \ return true or false
It works fine if changed to any of these:
Code:
map_data + c@ MAP_ROCK <>               \ return true or
map_data + c@ MAP_ROCK <>               \ return true or fals
map_data + c@ MAP_ROCK <>               \ return true or false0
The first version crashes after executing about 67,000 instructions and trashes a lot of the memory, while the working examples only execute 27,000. It turns out, the dictionary pointers for the three removed wordlists were still searched even though they all now point to the base wordlist. This caused both the working and non working versions to load bad dictionary pointers and look at random addresses for words. In the non working version, one of the dictionary pointer addresses was in zero page and coincidentally loaded the length of the string that was input, which became part of the pointer to the next link in the chain. In the working examples with a different length, it also loaded a bad pointer, but because it pointed to a different address it must have reached a point in the chain with a 0000 word and halted. Changing the dictionary count from 4 to 1 and getting rid of the extra dictionary labels fixed everything.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 29, 2020 6:56 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
what a lovely bug!


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 18, 2020 6:31 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
I'm glad you were able to figure that out. Issues with the dictionary pointers can be quite frustrating/confusing. I suspect you have a much better idea of how wordlists work than you actually wanted to know.

I'm actually quite amazed that you got it working along with your banked memory system. You've made a lot of progress and put in a lot of effort for something that is a procrastination project.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 18, 2020 9:08 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Ya it was a real head scratcher. Adding a feature to track cycles to my emulator was the key. Hmm, what do you mean procrastination project? I did consider writing some of the firmware for my calculator in Forth but wanted to try it before I committed if that's what you mean.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 19, 2020 3:05 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Random thought but if anyone interested in Tali is also on IRC, it would be a really neat project to make an IRC bot that users can give Forth code to which the bot compiles, runs, and returns the results. There is one for C called geordi that does this which is pretty slick. I would put it on my todo list but I don't think I would get to it for a few years :)


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 20, 2020 8:08 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
I think procrastinaion was the wrong word to use -- it implies lazyness, which is certainly not an issue that you have. It's more that you were investigating Forth for use in a calculator project (which it is quite a good fit for) and ended up porting a game (graphical!) you had already written, and rewrote parts of the Forth to deal with a banked memory system in a simulator (that you also wrote) to make everything fit. You're crazy (in the good way that is much appreciated here in these forums)! I suspect you know enough about Forth now for any calculator project you might undertake in the future.

If you get a Forthbot running, let us know!


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 21, 2020 3:50 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Maybe "tangential" is a better fit for what you were trying to describe? I'm a reluctant expert on that subject ...

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 21, 2020 4:11 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
No objections :)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 73 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: