6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Jun 29, 2024 1:26 am

All times are UTC




Post new topic Reply to topic  [ 73 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Mon Jan 27, 2020 8:00 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 249
Here are some extra tools for your toolbox (not sure if they are applicable to your particular problem).

There is :NONAME, used in place of : and without a name after it, for making unnamed words. It leaves the address of the new word on the stack. The address can be stored in an array if you have a bunch of them, and then you just index the array to get to the one you want. The words can be executed by placing the address on the stack and using EXECUTE. This might work well if you have several different things that are "YELLOW" and you can arrange for the word YELLOW to just be the same offset into each array and therefore reusable.

Tali is a native compiling Forth, meaning that if a word being used in a definition is small, it will copy the opcodes for that word directly into the new word rather than using a JSR. This increases speed, but at the expense of memory. Because you are running into memory limits, you may want to set the variable NC-LIMIT to a low value like 5 (or even 0 to prevent native compiling and everything will be compiled as a JSR). The NC-LIMIT variable sets the size limit (in bytes) where a word will be natively compiled. It default to 20 bytes.
Code:
5 nc-limit !
Set this before compiling any code. You can also set the variable STRIP-UNDERFLOW (a flag) to true to save a few bytes by removing the check for stack underflow (for words that remove things from the stack) when native-compiling a word. If you've set NC-LIMIT to 0, this won't have any effect because native compiling will be completely disabled.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 28, 2020 5:50 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
Here are some extra tools for your toolbox (not sure if they are applicable to your particular problem).
Thanks. I think :NONAME especially will be useful at some point. nc-limit of 5 brings the RAM usage from 12k down to 9.8k, and nc-limit 0 saves 32 more bytes.

Splitting everything up into small words seems really non-intuitive at this point, although doing so would take more RAM and make the problem worse if the subwords are never reused. Writing long words seems like doing the inlining yourself possibly at the expense of readability. Should I split this up into smaller words?

Tiles are byte arrays that store compressed pixel data. The first byte is the width, the second is the height, then color pairs follow where the first byte is the length and the second byte is the color. For example, [7, 8, 3,1, 4,2, 0 ... ] would mean the tile is 7 pixels wide and 8 pixels high . The first line would have 3 pixels of color 1 then 4 pixels of color 2. The single zero means the row is finished (all 7 pixels were drawn) so move to the next line.

Color tables have pairs of colors where the first color is the color to find and the second is the color to change it to. The first byte is the count of pairs. [2, 1,15, 2,16] has two pairs. Any time the function below finds color 1 in a tile, it should change it to 15 and change color 2 to 16. Applied to the tile example above, it would change from [7, 8, 3,1, 4,2, 0 ... ] to [7, 8, 3,15, 4,16, 0 ... ].

Original C
Code:
int ColorTile(int tile, int color_index)
{
   register unsigned char *tile_ptr=tiles[tile];   //tile is ID number. tile is list of pointers to tile data
   register unsigned char row_count;               //height of tile stored in tile data
   register unsigned char t_count, t_color;        //fetched pixel count and color from color table to transfer to tile
   register const unsigned char *color_table;      //table with pairs of colors. first is existing color and second is what to change it to
   register unsigned char c_size;                  //fetched from color table. how many pairs of colors the table has
   register unsigned char i;

   color_table=tile_colors[color_index];           //tile_colors is table of pointers to color tables
   row_count=tile_ptr[1];                          //second byte of tile is height
   c_size=color_table[1]*2;                        //size of color table is pair count * 2 since pairs are 2 bytes
   
   tile_ptr+=2;                                    //skip width and height and point to pixel data
   while(row_count)                                //loop through all rows of tile
   {
      t_count=*tile_ptr++;                         //first byte of each pair is count of pixels
      if (t_count==0)                              //0 count indicates row is finished      
      {
         row_count--;               
      }
      else
      {
         t_color=*tile_ptr;                        //color is second byte of pair
         if (t_color>COLOR_TRANSPARENT)            //only apply to colors above TRANSPARENT
         {
            for (i=0;i<c_size;i+=2)                //loop through each pair in color table
            {
               if (t_color==color_table[i+2])      //+2 to skip header and count of pair
               {
                  *tile_ptr=color_table[i+3];      //match found, write second color in pair to tile
                  break;                           //stop looping since found
               }
            }
         }
         *tile_ptr++;                              //advance to next pixel in tile
      }
   }
   return 0;
}

Forth
Code:
: TileID> ( ID -- addr)                \ look up tile address from ID. tiles is list of pointers to tile data
   cells tiles + @ ;
      
: TileHW ( addr -- addr+2 w h)         \ generate pointer to pixel data. fetch width and height
   dup c@ >r 1+                        \ get width
   dup c@ >r 1+                        \ get height
   r>> swap ;
      
: ColorID>   ( ID -- addr )            \ look up color address from ID. tile_colors is list of pointers to color tables
   cells tile_colors + @ ;

: ColorTile ( tileID colorID -- )
   ColorID>                            \ get address of color table from ID
   1+ dup c@                           \ fetch length of color table
   swap 1+                             \ point to color pairs
   rot
   TileID> TileHW nip                  \ stack: colorsize colorpair_addr tileaddr height
   0 do                                \ loop through all rows
      begin
         dup c@                        \ get first byte of length,color pair
         swap 1+ swap                  \ increment tile pointer
      while
         rot dup >r -rot               \ get size of color table
         r> 0 do                       \ loop through pairs in color table. stack: colorsize colorpairs tileaddr
            2dup c@                    \ get color from tile
            swap i 2 * + dup >r c@     \ look up match color in color pair and save address
            = if                       \ if pair matches pixel from tile
               r> 1+ c@                \ get color to change pixel to from pair
               over c!                 \ store in tile
               leave                   \ color found so stop looping
            then
            r> drop                    \ get rid of unused address
         loop
      repeat
   loop
   3drop ;                             \ clean up stack


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 30, 2020 2:05 am 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 249
Druzyek wrote:
Splitting everything up into small words seems really non-intuitive at this point, although doing so would take more RAM and make the problem worse if the subwords are never reused. Writing long words seems like doing the inlining yourself possibly at the expense of readability. Should I split this up into smaller words?
You have the same problem/thoughts that I have. My Forth code looks very much like a C programmer wrote it. I sometimes just write my words out long and then look for the Forth code where I've repeated myself and factor it out into a word. It takes me 3-4 rounds before I'm generally satisfied, and some of my words still smell like C code. I also find that I don't mind rewriting a whole chunk when I see a better way to factor things.

The main advantage to the smaller words is testability. Once you are sure that a word works as you intend, you generally don't have to go back and look at it again and you only need to know the stack inputs/outputs. With longer words, creating a bug is easier and finding the bug is harder. With that said, if it really is a long sequence of things then you can just brute force your way through with a long word.

I'll also recommend avoiding >r and r> when easy/possible because they make the words harder to test. While they are sometimes the exact right tool for the job, they can only be used in word definitions while compiling. Code that uses only datastack manipulations can be run in intrepreted mode, making it easier to debug or to try new things interactively. As an example, your TileHW could be written as:
Code:
: TileHW ( addr -- addr+2 w h)
   dup c@ swap 1+  \ Get width
   dup c@ swap 1+  \ Get height
   -rot ;          \ Arrange the results.
The different here is that the entire body of the function can be run in interpreted mode if you want to try it "step by step". I'll use .s quite a bit to print out the current stack as I go along. Also, that DUP C@ SWAP 1+ sequence looks like it might be useful and should probably be a word.

I'm also curious about your R>> word. Did you POSTPONE two instances of R> (I don't think that works) or did you use ALWAYS-NATIVE (I do think that works) or is it a typo? In general, >R and R> shouldn't cross word boundaries (or anything that uses the return stack, like DO loops), so I'm curious what you did there.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 30, 2020 4:34 am 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
You have the same problem/thoughts that I have. My Forth code looks very much like a C programmer wrote it. I sometimes just write my words out long and then look for the Forth code where I've repeated myself and factor it out into a word.
I suppose it's fine to write it in long form like that if there is no obvious advantages to doing it otherwise. I have also been going back and extracting words to save space (which I'm quickly running out of) but most of them do not make logical sense. They just happen to help save space. For example, on startup I print out the memory taken and free in the three memory areas I have. Using this word three times saves about 100 bytes:
Code:
: BytesTaken - u. ." bytes taken" CR 2 spaces ;
This is just not something it makes logical sense to abstract since it's so simple and obvious when written out, but you have to to save memory.

Quote:
The main advantage to the smaller words is testability. Once you are sure that a word works as you intend, you generally don't have to go back and look at it again and you only need to know the stack inputs/outputs. With longer words, creating a bug is easier and finding the bug is harder. With that said, if it really is a long sequence of things then you can just brute force your way through with a long word.
Maybe I'm not using things exactly as intended. For complicated stack manipulations, I write it out in a file and send it to the emulator since that's the only way to save code. I suppose I could build the words interactively but it doesn't seem any faster that way. Just thinking all the stack changes through and documenting the stack picture every few lines works pretty well. Sometimes I feel like a human C compiler since half of the stack gymnastics stuff is what the language itself should generate for you. This word is really handy (break jumps to a BRK/RTS pair):
Code:
: halt .s cr break ;

Quote:
I'll also recommend avoiding >r and r> when easy/possible because they make the words harder to test. While they are sometimes the exact right tool for the job, they can only be used in word definitions while compiling
So far I have played around with 1-2 line words that use r> and >r to get things right then moving them to the text file. It does take some experimenting. I gave in when the word I have for rotating one of the tiles I described above by 90 degrees needed 6 local variables on the stack at once. The annoying thing is when you rearrange them in a loop you have to put them back in the same order at the end of the loop body so they can be rearranged again the next go round. I know this can be simplified a little by putting them in the right order before you go into the loop but when you need to access all of them in the loop, there is no easy way to arrange them initially. Shoving everything in the r stack long enough to arrange things before the next loop makes everything a lot easier. It almost feels like you are moving up and down a tape with a 3 item window rather than just operating on the top of a stack. Once the taboo of using the r stack was broken with that word, I let myself do it for all the rest of them.

Quote:
I'm also curious about your R>> word. Did you POSTPONE two instances of R> (I don't think that works) or did you use ALWAYS-NATIVE (I do think that works) or is it a typo? In general, >R and R> shouldn't cross word boundaries (or anything that uses the return stack, like DO loops), so I'm curious what you did there.
R>> is just R> R>. I had some trouble with 2R> since I assumed it would be R> R> but it actually reverses the order. Ya, the return address is on the R stack when you call a word like R>>, so I just pull it off, put my two values on the stack, then reinsert the return address. It works fine so far.
Code:
: >>r r> -rot >r >r >r ; compile-only
: >>>r r> swap >r -rot >r >r >r ; compile-only
: r>> r> r> r> rot >r ; compile-only
: r>>> r> r> r> rot r> swap >r ; compile-only
I used your instructions for editing the headers and removed a lot of the words from native_words.asm. This got me an extra 1.8k. Since this is all simulated, I can treat the last part of ROM like RAM and put a third dictionary there. There is now 7.8k of free memory spread across the three dictionaries with a lot of functionality left to implement. Another around of abstracting words should save a few hundred more bytes.


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 02, 2020 6:22 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
It's starting to get really tight memory-wise on my game project. How hard would it be to strip out all interpretation parts and header and leave only the code? One idea would be to emulate tali forth and paste in the source file as key strokes then pause the emulation and use the header information in memory to determine what words are referenced. From there, you could use Python or even conditional compilation to reduce the source to only what is needed for that build. Another idea is to skip the emulation and just use Python to do the compiling but then you would need to see which words call other words, which is actually very trivial. It seems like there would be some caveats like changing the function of a word with DEFER or something I've never tried to do that might not work. Do you think that's feasible? I wanted to check before I try in case it's something that could never really work due to the way Forth works internally somehow.


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 03, 2020 12:12 am 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 249
Druzyek wrote:
It's starting to get really tight memory-wise on my game project. How hard would it be to strip out all interpretation parts and header and leave only the code? One idea would be to emulate tali forth and paste in the source file as key strokes then pause the emulation and use the header information in memory to determine what words are referenced. From there, you could use Python or even conditional compilation to reduce the source to only what is needed for that build. Another idea is to skip the emulation and just use Python to do the compiling but then you would need to see which words call other words, which is actually very trivial. It seems like there would be some caveats like changing the function of a word with DEFER or something I've never tried to do that might not work. Do you think that's feasible? I wanted to check before I try in case it's something that could never really work due to the way Forth works internally somehow.

Can you do that? I don't think it will be easy, but I think you can. You are essentially asking if you can metacompile (using one forth to target another system). This is often done to use forth on one system to compile itself for another (different) system, but it can also be used to generate a headerless executable. There have been Forths that can strip all of the header info and leave just a binary, but I haven't used any of them. I will warn you that you might be looking down a rather deep rabbit hole. Perhaps some of the others here will chime in if they have done anything like you are asking (compile a separate headerless binary).

While Tali2 supports deferred words, I don't believe it ships with any. We went with vectored words for things like character I/O and block I/O. You could certainly add some support to Tali's compiling routines to help you make a list of words that have been used. You had discussed adding some kind of auxiliary I/O to your simulator, and this might a good use for that channel. If you just want to keep track of the addresses that are compiled and match them to names later, you might modify "cmpl_subroutine:" in taliforth.asm. If you want the names, they are determined in "interpret:" in "taliforth.asm" which handles interpretation and compiling. See the "_compile" label and note that "tmpbranch" has the current name token (header) address in it so you can get at the name if you want, while the TOS is the XT (eXecution Token = address of the start of the word).

You then might make a script that looks between the xt_xxxxx and z_xxxxx labels for each word (essentially looking through the assemble source for those words) looking for the pattern "jsr xt_" so you can see what words those are already assembled to use and add those to your list of words to keep. Rinse and repeat until your list of words doesn't get any larger.

Do be careful as a few (perhaps 5-6) words have "helper routines" that live outisde the xt_xxxxxx and z_xxxxxx labels. These are almost always found just after the word that needs them. These are usually just on the more complicated words (like ACCEPT) and words that have a particular runtime behavior (usually has a label with "do" at the beginning, like "doexecute:").


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 03, 2020 8:04 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 279
Maybe it might be useful to switch off (or severely reduce) native compilation?


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 03, 2020 9:08 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10837
Location: England
I'd quite like to understand what's going on with this game which makes it so large: it is certain that the game can't be more compact? (Having said which, how much memory are we talking about?)


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 04, 2020 1:38 am 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
Quote:
Can you do that? I don't think it will be easy, but I think you can. You are essentially asking if you can metacompile (using one forth to target another system). This is often done to use forth on one system to compile itself for another (different) system, but it can also be used to generate a headerless executable. There have been Forths that can strip all of the header info and leave just a binary, but I haven't used any of them. I will warn you that you might be looking down a rather deep rabbit hole. Perhaps some of the others here will chime in if they have done anything like you are asking (compile a separate headerless binary).
Ok, down the rabbit hole I go. Let's see how far I get.

Quote:
Maybe it might be useful to switch off (or severely reduce) native compilation?
How do you mean? The inlining size?

Quote:
I'd quite like to understand what's going on with this game which makes it so large: it is certain that the game can't be more compact? (Having said which, how much memory are we talking about?)
My plan was to wait until all the versions were finished then show everything at once, but I can show what I have so far in case you have some suggestions for making things smaller. Here's a barebones GitHub repo with the original version in Python, the CC65 version and the Tali Forth 2 version that I'm still working on (documentation and more comments to follow later.)

You can try the CC65 version and Tali Forth 2 versions in your browser at my website.

Keys:
WASD - move
space - mine/harvest/attack
c - character menu
k - skill menu
r - resource menu

The C version takes 27.4k of ROM and a few k of RAM. Because only 16k of ROM is visible at a time, several k of graphics data is copied from ROM to RAM in addition to the few k the game uses. Also, the key handling logic is in the bank of ROM that is normally hidden which holds the graphics to be copied into RAM since any updates to the screen can be done after the key logic is finished and the bank with that code is switched back to pointing at graphics memory.

The Forth version has three dictionaries: one in main ram, one in bank 2 that normally points to graphics memory but can be banked in when needed (just like the CC65 version above) and a third one at the end of ROM in the left over space not used by Tali. When you go to the page, you will see that bank 2 shows only 2k free, but this can be increased by another 16k by banking in the other bank that usually points to graphics memory, so there is no danger of running out of room there. The problem is that the 18k for code there can't draw to the screen, so I need to implement the menu system you see by pressing c, k, or r in the C version in the 3.6k left in main ram and main rom in the Forth version. Note that all the graphics data is loaded in the Forth version, so I just need to squeeze the rest of the code in.

There are a couple of obvious places to save code size. There are two 800 byte arrays in each version (the map is 40x20) with one byte for each map square. One holds the index of a monster if it exists on that square and the other holds the index of crystals. These could be combined to save 800 bytes, but then I would need extra code to separate them and extra time to do the processing, which would make it hard to compare that directly to the C version. If that 800 bytes were enough to hold the remaining code, I would consider it, but it will probably take a lot more than that, so I'll save that change for last. Also, the green selector box is technically the same 32x32 size as the robot and background tiles with transparency on all sides. Eliminating the transparency and shrinking the size then adding an offset would save maybe 50 bytes.

The graphics data is RLE encoded in pairs where one byte is the length and the next is the color. There is a 0 byte at the end of each row (ie pair with length zero and no color byte) which shows that row is finished. I could save a handful of bytes by eliminating those zeroes and changing the code to keep track of how many pixels have been drawn in the row, but I'm not sure it would be much of a gain. Some of the other data like the skill icons is stored with 1 bit per pixel.

Please let me know if anything is unclear. I plan to add more comments when the project is finished.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 04, 2020 2:59 am 
Offline

Joined: Fri Apr 15, 2016 1:03 am
Posts: 136
Below is a disassembly of compiled words from a code fragment from a previous post.

There are several places where Tali could be modifed or extended to generate significantly smaller but slightly slower code:

Do currently takes 42 bytes of inline code for each use. It could be implemented in Tali as a call to a runtime routine instead of being completely inline.
For example
Code:
        lda #<end_addr          do
        ldy #>end_addr
        jsr Do_Run


I currently takes 25 bytes of inline code for each use. It could be implemented in Tali as a call to a runtime routine instead of being inline.
For example
Code:
        jsr xt_I


LOOP currently takes 28 bytes of inline code for each use. It could be implemented in Tali as a call to a runtime routine instead be completely inline.
For example
Code:
        jsr Loop_Run
        bvs *+5
        jmp do_addr


The phrase 2 * takes 6 bytes for each use.
The application could be recoded to use 2* to use 3 or 4 bytes & run much faster.


Disassembled sample words using an older hacked version of Tali:
Code:
Tali Forth 2 kernel for 65816s

Tali Forth 2 for the 65c02
Version ALPHA 17. June 2018
Copyright 2014-2018 Scot W. Stevenson
Tali Forth 2 comes with absolutely NO WARRANTY
Type 'bye' to exit

A=0002 X=0076 Y=0000 S=01F9 EnvMXdIzc D=0000 B=00 4A90FB02 00f01b lsr a
.@f_drozyak1.txt
.g
\ http://forum.6502.org/viewtopic.php?f=9&t=5911&start=30  ok
 5 nc-limit !  \ set native compiling size to 5 bytes ok
 true uf-strip !  \ omit compiled underflow checking code ok
  ok
: 3drop drop 2drop ;  ok
: r>> r> r> ;  ok
create tiles 20 allot  ok
create tile_colors 20 allot  ok
  ok
  ok
: TileID> ( ID -- addr)                \ look up tile address from ID. tiles is list of pointers to tile data  compiled
   cells tiles + @ ;  ok
see tileID>
 nt: A97  xt: AA6
 size (decimal): 12

0AA6  20 C4 96 20 56 0A 20 76  91 20 A5 8A

AA6  jsr 96C4 ( cells )
AA9  jsr 0A56 ( tiles )
AAC  jsr 9176 ( + )
AAF  jsr 8AA5 ( @ )
 ok
        ok

: TileHW ( addr -- addr+2 w h)         \ generate pointer to pixel data. fetch width and height  compiled
   dup c@ >r 1+                        \ get width  compiled
   dup c@ >r 1+                        \ get height  compiled
   r>> swap ;  ok
see TileHW
 nt: AB3  xt: AC1
 size (decimal): 30

0AC1  20 89 89 20 5C 83 20 C8  95 20 3C 90 20 89 89 20
0AD1  5C 83 20 C8 95 20 3C 90  20 42 0A 20 D9 94

AC1  jsr 8989 ( dup )
AC4  jsr 835C ( c@ )
AC7  jsr 95C8 ( >r )
ACA  jsr 903C ( char+ )
ACD  jsr 8989 ( dup )
AD0  jsr 835C ( c@ )
AD3  jsr 95C8 ( >r )
AD6  jsr 903C ( char+ )
AD9  jsr 0A42 ( r>> )
ADC  jsr 94D9 ( swap )
 ok
        ok

: ColorID>   ( ID -- addr )            \ look up color address from ID. tile_colors is list of pointers to color tables  compiled
   cells tile_colors + @ ;  ok
see ColorID>
 nt: AE0  xt: AF0
 size (decimal): 12

0AF0  20 C4 96 20 80 0A 20 76  91 20 A5 8A

AF0  jsr 96C4 ( cells )
AF3  jsr 0A80 ( tile_colors )
AF6  jsr 9176 ( + )
AF9  jsr 8AA5 ( @ )
 ok
  ok

: ColorTile ( tileID colorID -- )  compiled
   ColorID>                            \ get address of color table from ID  compiled
   1+ dup c@                           \ fetch length of color table  compiled
   swap 1+                             \ point to color pairs  compiled
   rot  compiled
   TileID> TileHW nip                  \ stack: colorsize colorpair_addr tileaddr height  compiled
   0 do                                \ loop through all rows  compiled
      begin  compiled
         dup c@                        \ get first byte of length,color pair  compiled
         swap 1+ swap                  \ increment tile pointer  compiled
      while  compiled
         rot dup >r -rot               \ get size of color table  compiled
         r> 0 do                       \ loop through pairs in color table. stack: colorsize colorpairs tileaddr  compiled
            2dup c@                    \ get color from tile  compiled
            swap i 2 * + dup >r c@     \ look up match color in color pair and save address  compiled
            = if                       \ if pair matches pixel from tile  compiled
               r> 1+ c@                \ get color to change pixel to from pair  compiled
               over c!                 \ store in tile  compiled
               leave                   \ color found so stop looping  compiled
            then  compiled
            r> drop                    \ get rid of unused address  compiled
         loop  compiled
      repeat  compiled
   loop   compiled
   3drop ;                             \ clean up stack  ok
see ColorTile
 nt: AFD  xt: B0E
 size (decimal): 317

0B0E  20 F0 0A 20 3C 90 20 89  89 20 5C 83 20 D9 94 20
0B1E  3C 90 20 9C 92 20 A6 0A  20 C1 0A 20 F0 8E 20 4C
0B2E  99 A9 0C 48 A9 47 48 38  A9 00 F5 02 95 02 A9 80
0B3E  F5 03 95 03 48 B5 02 48  18 B5 00 75 02 95 00 B5
0B4E  01 75 03 48 B5 00 48 E8  E8 E8 E8 20 89 89 20 5C
0B5E  83 20 D9 94 20 3C 90 20  D9 94 B5 00 15 01 E8 E8
0B6E  A8 D0 03 4C 2C 0C 20 9C  92 20 89 89 20 C8 95 20
0B7E  1F 8F 20 19 92 20 4C 99  A9 0C 48 A9 28 48 38 A9
0B8E  00 F5 02 95 02 A9 80 F5  03 95 03 48 B5 02 48 18
0B9E  B5 00 75 02 95 00 B5 01  75 03 48 B5 00 48 E8 E8
0BAE  E8 E8 20 1C 96 20 5C 83  20 D9 94 CA CA 86 26 BA
0BBE  38 BD 01 01 FD 03 01 A8  BD 02 01 FD 04 01 A6 26
0BCE  95 01 94 00 20 07 96 20  A7 94 20 76 91 20 89 89
0BDE  20 C8 95 20 5C 83 20 A9  89 B5 00 15 01 E8 E8 A8
0BEE  D0 03 4C 07 0C 20 19 92  20 3C 90 20 5C 83 20 6B
0BFE  90 20 6A 83 68 68 68 68  60 20 19 92 20 36 89 20
0C0E  23 90 18 68 75 00 A8 B8  68 75 01 48 98 48 E8 E8
0C1E  70 03 4C B0 0B 68 68 68  68 68 68 4C 59 0B 20 23
0C2E  90 18 68 75 00 A8 B8 68  75 01 48 98 48 E8 E8 70
0C3E  03 4C 59 0B 68 68 68 68  68 68 20 30 0A

B0E  jsr 0AF0 ( ColorID> )      ColorID>
B11  jsr 903C ( char+ )         1+
B14  jsr 8989 ( dup )           dup
B17  jsr 835C ( c@ )            c@
B1A  jsr 94D9 ( swap )          swap
B1D  jsr 903C ( char+ )         1+
B20  jsr 929C ( rot )           rot
B23  jsr 0AA6 ( TileID> )       TileID>
B26  jsr 0AC1 ( TileHW )        TileHW
B29  jsr 8EF0 ( nip )           nip
B2C  jsr 994C ( 0 )             0
B2F  lda.# 0C                   do
B31  pha
B32  lda.# 47
B34  pha
B35  sec
B36  lda.# 00
B38  sbc.zx 02
B3A  sta.zx 02
B3C  lda.# 80
B3E  sbc.zx 03
B40  sta.zx 03
B42  pha
B43  lda.zx 02
B45  pha
B46  clc
B47  lda.zx 00
B49  adc.zx 02
B4B  sta.zx 00
B4D  lda.zx 01
B4F  adc.zx 03
B51  pha
B52  lda.zx 00
B54  pha
B55  inx
B56  inx
B57  inx
B58  inx
                                  begin
B59  jsr 8989 ( dup )               dup
B5C  jsr 835C ( c@ )                c@
B5F  jsr 94D9 ( swap )              swap
B62  jsr 903C ( char+ )             1+
B65  jsr 94D9 ( swap )              swap
B68  lda.zx 00                     while
B6A  ora.zx 01
B6C  inx
B6D  inx
B6E  tay
B6F  bne 03
B71  jmp 0C2C
B74  jsr 929C ( rot )               rot
B77  jsr 8989 ( dup )               dup
B7A  jsr 95C8 ( >r )                >r
B7D  jsr 8F1F ( -rot )              -rot
B80  jsr 9219 ( r> )                r>
B83  jsr 994C ( 0 )                 0
B86  lda.# 0C                       do
B88  pha
B89  lda.# 28
B8B  pha
B8C  sec
B8D  lda.# 00
B8F  sbc.zx 02
B91  sta.zx 02
B93  lda.# 80
B95  sbc.zx 03
B97  sta.zx 03
B99  pha
B9A  lda.zx 02
B9C  pha
B9D  clc
B9E  lda.zx 00
BA0  adc.zx 02
BA2  sta.zx 00
BA4  lda.zx 01
BA6  adc.zx 03
BA8  pha
BA9  lda.zx 00
BAB  pha
BAC  inx
BAD  inx
BAE  inx
BAF  inx
BB0  jsr 961C ( 2dup )                2dup
BB3  jsr 835C ( c@ )                  c@
BB6  jsr 94D9 ( swap )                swap
BB9  dex                              i
BBA  dex
BBB  stx.z 26
BBD  tsx
BBE  sec
BBF  lda.x 0101
BC2  sbc.x 0103
BC5  tay
BC6  lda.x 0102
BC9  sbc.x 0104
BCC  ldx.z 26
BCE  sta.zx 01
BD0  sty.zx 00
BD2  jsr 9607 ( 2 )                   2
BD5  jsr 94A7 ( * )                   *
BD8  jsr 9176 ( + )                   +
BDB  jsr 8989 ( dup )                 dup
BDE  jsr 95C8 ( >r )                  >r
BE1  jsr 835C ( c@ )                  c@
BE4  jsr 89A9 ( = )                   =
BE7  lda.zx 00                        if
BE9  ora.zx 01
BEB  inx
BEC  inx
BED  tay
BEE  bne 03
BF0  jmp 0C07 ( ColorTile +F9 )
BF3  jsr 9219 ( r> )                    r>
BF6  jsr 903C ( char+ )                 1+
BF9  jsr 835C ( c@ )                    c@
BFC  jsr 906B ( over )                  over
BFF  jsr 836A ( c! )                    c!
C02  pla                                leave
C03  pla
C04  pla
C05  pla
C06  rts
                                       then
C07  jsr 9219 ( r> )                  r>
C0A  jsr 8936 ( drop )                drop
C0D  jsr 9023 ( 1 )                  loop
C10  clc
C11  pla
C12  adc.zx 00
C14  tay
C15  clv
C16  pla
C17  adc.zx 01
C19  pha
C1A  tya
C1B  pha
C1C  inx
C1D  inx
C1E  bvs 03
C20  jmp 0BB0 ( ColorTile +A2 )
C23  pla
C24  pla
C25  pla
C26  pla
C27  pla
C28  pla
C29  jmp 0B59 ( ColorTile +4B )    repeat
C2C  jsr 9023 ( 1 )              loop
C2F  clc
C30  pla
C31  adc.zx 00
C33  tay
C34  clv
C35  pla
C36  adc.zx 01
C38  pha
C39  tya
C3A  pha
C3B  inx
C3C  inx
C3D  bvs 03
C3F  jmp 0B59 ( ColorTile +4B )
C42  pla
C43  pla
C44  pla
C45  pla
C46  pla
C47  pla
C48  jsr 0A30 ( 3drop )         3drop
 ok
 eof


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 04, 2020 8:47 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10837
Location: England
Druzyek wrote:
My plan was to wait until all the versions were finished then show everything at once, but I can show what I have so far ...

You can try the CC65 version and Tali Forth 2 versions in your browser at my website.

Thanks for sharing your work in progress!


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 04, 2020 2:48 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 249
leepivonka wrote:
Do currently takes 42 bytes of inline code for each use. It could be implemented in Tali as a call to a runtime routine instead of being completely inline.
Do loops are a bit tricky to JSR as the return address will be on the return stack and we put the loop values there, but it's certainly possible with a bit of shuffling. They are also a bit tricky as they bypass Tali's native compiling flags and always native compile themselves.
Quote:
I currently takes 25 bytes of inline code for each use. It could be implemented in Tali as a call to a runtime routine instead of being inline.
I is a bit easier. I will need to be rewritten (offset adjusted, really) in native_words to accommodate the return address being on the return stack, and then the header can be modified in headers.asm to remove the AN (Always Native) flag. This flag forces native compilation even when it has been disabled by setting NC-LIMIT to zero.
Quote:
LOOP currently takes 28 bytes of inline code for each use. It could be implemented in Tali as a call to a runtime routine instead be completely inline.
This is another of the "special" words that don't use Tali's native compiling flags and directly compile their own runtime. Again, this would be a matter of moving the runtime into a callable routine (with a RTS on the end) that knows there is a return address on the return stack, and then getting the JSR to that compiled when the word is used. If there are a lot of loops, this could potentially save a reasonable amount of space.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 04, 2020 9:45 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 249
SamCoVT wrote:
I is a bit easier. I will need to be rewritten (offset adjusted, really) in native_words to accommodate the return address being on the return stack, and then the header can be modified in headers.asm to remove the AN (Always Native) flag. This flag forces native compilation even when it has been disabled by setting NC-LIMIT to zero.

Edit... Not only should the AN (Always Native) flag be removed, the NN (Never Native) flag would need to be set instead, as the new version would always expect the return address from the JSR to be on the return stack (which just changes the offsets to the index) and therefore should always be compiled as a JSR.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 05, 2020 10:41 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
If you are all-out for the smallest code possible, and you are going to be writing different games, which means you're going to keep running into the size problem, you might want to consider creating a Token Thread Code (TTC) Forth instead of using Tali. See http://www.bradrodriguez.com/papers/moving1.htm for the AFAIK best introduction to the differences. It will be slower - possibly a lot slower - but you could strip the Forth down to the bare essentials and use single-byte tokens for fantastic size games.

Tali was written for clarity (in other words, it needed to fit my Forth beginner's brain), not max speed or min size, so you're going to always be fighting the code if you're it it for as "small as possible". I had considered writing a token-based Forth at one time - the name would have been "Packrat" - and even started some files, but nothing that is worth passing on as I wasn't even out of the design stage. I'm sure other people would be interested in this as well. Remember, Tali is in the public domain, so you should be able to just copy a lot of the code one-to-one, which should save a lot of work.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 05, 2020 1:29 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
scotws, thanks for your input. I decided on Tali because I wanted to generate the fastest forth code possible even if it's larger. I don't plan to change any of the words like leeviponka suggests. My ultimate goal is to write firmware for a calculator project. This game is just a one-off to get the feel for the different languages for the 6502. It's more fun to port this than write the same calculator algorithms in each language for example. I think the chances are good it will all fit if I can get a headless version working.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 73 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: