6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 11:41 am

All times are UTC




Post new topic Reply to topic  [ 266 posts ]  Go to page Previous  1 ... 14, 15, 16, 17, 18
Author Message
PostPosted: Sat Dec 31, 2022 11:06 pm 
Offline

Joined: Sat Apr 30, 2022 7:13 pm
Posts: 159
Location: Devon. UK
Fantastic! Thanks for making this change, it makes the disassembler much more useful

( I have to say I am really impressed by Forth in general. It's fast and compact.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 01, 2023 4:09 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
TLDR Version: I've updated the assembler further with string literal support so it doesn't try to disassemble the string data anymore.

VILR (Very Interesting... Let's Read!) Version: Tali inlines string data by compiling a jump over the string data, then a call to the string literal runtime handler, and then two cells that contain the address and length of the string:
Code:
 Strings are compiled into the dictionary like so:
           jmp a
           <string data bytes>
  a -->    jsr sliteral_runtime
           <string address>
           <string length>
Tali used to try to disassemble the string data after the jump and would sometimes gack on it. To fix this, I needed to recognize the JMP/JSR sliteral_runtime pattern. When the assembler encounters a JMP instruction, it peeks ahead at the jump destination to see if the 3 bytes there are the JSR sliteral_runtime instruction. If it matches, it adjusts the current disassembly location to skip over the string data and continue at the jsr. There is then a special handler for JSR sliteral_runtime that prints the following string address and length and skips over those as well.

I thought about printing the string data, but Tali supports very long strings and they are shown in the memory dump when using SEE, so the address and length should be good enough in the disassembly. Here is an example of the new behavior, including typing in the address and length and TYPEing one of the strings.
Code:
: teststrings s" This is a string literal" 2drop ." This is a printed string" ;  ok
see teststrings
nt: 800  xt: 813
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 78

0813  4C 2E 08 54 68 69 73 20  69 73 20 61 20 73 74 72  L..This  is a str
0823  69 6E 67 20 6C 69 74 65  72 61 6C 20 8A A0 16 08  ing lite ral ....
0833  18 00 20 D5 D7 E8 E8 E8  E8 4C 57 08 54 68 69 73  .. ..... .LW.This
0843  20 69 73 20 61 20 70 72  69 6E 74 65 64 20 73 74   is a pr inted st
0853  72 69 6E 67 20 8A A0 3F  08 18 00 20 DE A4  ring ..? ... ..

813    82E jmp
82E   A08A jsr     SLITERAL 816 18
835   D7D5 jsr     STACK DEPTH CHECK
838        inx
839        inx
83A        inx
83B        inx
83C    857 jmp
857   A08A jsr     SLITERAL 83F 18
85E   A4DE jsr     type
 ok
$816 $18 type This is a string literal ok
The 2DROP got inlined as the stack depth check and the INX instructions.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 03, 2023 3:13 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
SamCoVT wrote:
Note that SEE changes the base to hex


Although I have RB to restore BASE for those words where I need to change it, I've followed advice given by Charles Moore and don't change BASE in my tools. If I want HEX output from SEE , or DUMP , I'll set BASE to HEX first.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 3:36 pm 
Offline

Joined: Sat Apr 30, 2022 7:13 pm
Posts: 159
Location: Devon. UK
Hello all.
I am struggling with some relatively simple math (maths in the UK :-) ).

I have a 16 bit number returned from a thermometer (sht31 if anyone is interested). It has to be converted using this formula
temperature = 175 * (rawtemp / 65535) - 45
I have come up with the following (which works - sort of - see below) Values are in hex.

: convertT ( rT -- T)
445C ( 17500 decimal )
m*
swap drop
1198 - (4500 decimal)
;
This gives results like (in decimal) 1950 for 19.50 degrees C - which is fine. This code is in effect doing the following:

temperature = (rawtemp ** 17500) - 4500 where ** is a 32 bit result with the lower 16 bits thrown away to act as as a division by 65536 (which is not 65535 but close!).

My problem is this:
The thermometer can give raw values of any magnitude ~$0000 to ~$FFFF (corresponding to ~-45C up to ~120C. The forth code above fails at input of $8000 and above, which is obviously to do with the sign of the number throwing the m*.

An unsigned m* - Um* would probably do the trick but Taliforth doesnt have such a thing.

Can anyone (better at Forth than me - which is almost anyone) help me please.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 8:46 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Are you sure?  I think that in most Forths, M* is a secondary that uses the UM* primitive in its definition.

BTW, SWAP DROP can be shortened to NIP.

I might write:
Code:
: convertT ( rT -- T)  [ DECIMAL ]
   17500  UM*  NIP
    4500   -           [   HEX   ]       ;

Let us know how it works out.  The only thing I've done for temperature is an LM335 circuit run into my A/D converter.  I've never measured humidity.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 9:18 pm 
Offline

Joined: Sat Apr 30, 2022 7:13 pm
Posts: 159
Location: Devon. UK
oops. You are right, there is a UM* ~blush~

Works perfectly now. Why I decided there was no um* I don't know.

Thanks for the NIP!


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 28, 2023 12:46 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
Does Tali Forth still inline DO LOOPs?


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 28, 2023 5:55 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
JimBoyd wrote:
Does Tali Forth still inline DO LOOPs?
It does, and those words have special treatment so that they are always native compiled (inlined) when used, even if native compiling has been disabled by the user. It's a bit of a bear at 70 bytes for the two words, and it's not fast either. Druzyek and leepivonka have both done some investigating into the speed (or lack thereof) of Tali's DO LOOPs and I also investigated using an 8-bit index (which results in loops that take about 1/4 of the time or 1/2 the time if using I in them) here. I haven't personally run into an issue with speed, so I haven't spent too much time focusing on the issues involved in making it better. Here is an example of all of the overhead of doing a DO LOOP. Anyone trying to follow the assembly needs to the know that the ending value, and the "index", is subtracted from $8000 (this is sometimes called "fudge-factoring" the index) so that the oVerflow flag can be used to detect when to end the loop.[size=115]
Code:
: test do loop ;  ok
see test
nt: 800  xt: 80C
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 70

080C  A9 08 48 A9 51 48 38 A9  00 F5 02 95 02 A9 80 F5  ..H.QH8. ........
081C  03 95 03 48 B5 02 48 18  B5 00 75 02 95 00 B5 01  ...H..H. ..u.....
082C  75 03 48 B5 00 48 E8 E8  E8 E8 20 E9 97 18 68 75  u.H..H.. .. ...hu
083C  00 A8 B8 68 75 01 48 98  48 E8 E8 70 03 4C 36 08  ...hu.H. H..p.L6.
084C  68 68 68 68 68 68  hhhhhh

80C      8 lda.#
80E        pha
80F     51 lda.#
811        pha
812        sec
813      0 lda.#
815      2 sbc.zx
817      2 sta.zx
819     80 lda.#
81B      3 sbc.zx
81D      3 sta.zx
81F        pha
820      2 lda.zx
822        pha
823        clc
824      0 lda.zx
826      2 adc.zx
828      0 sta.zx
82A      1 lda.zx
82C      3 adc.zx
82E        pha
82F      0 lda.zx
831        pha
832        inx
833        inx
834        inx
835        inx
836   97E9 jsr     1
839        clc
83A        pla
83B      0 adc.zx
83D        tay
83E        clv
83F        pla
840      1 adc.zx
842        pha
843        tya
844        pha
845        inx
846        inx
847      3 bvs
849    836 jmp
84C        pla
84D        pla
84E        pla
84F        pla
850        pla
851        pla
 ok


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 06, 2023 1:02 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895

Fleet Forth's DO LOOP's also use the $8000 "fudge factor" so they end when an overflow is detected.
I asked about the inlining because I've been working on some ideas for an STC version of Fleet Forth and have versions of the DO LOOP words which do not get inlined. The added overhead for (DO) and (?DO) would be minimal since they only run once per loop. I also think the added overhead for (LOOP) and (+LOOP) would also be minimal, thanks to a suggestion from leepivonka.
Another benefit, there is a separate (LOOP) primitive compiled by LOOP .
STC versions of Fleet Forth's DO LOOP's.
The use of SUBR (subroutine) was so I could test these words with the current version of Fleet Forth, an ITC Forth.
I realize you may not want to change Tali Forth's DO LOOP's. If it's not broken, don't fix it; however, it might be worth trying if you have a really big application with lots of DO LOOP's and find memory getting a little tight.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 17, 2024 8:34 pm 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 255
JimBoyd wrote:

I asked about the inlining because I've been working on some ideas for an STC version of Fleet Forth and have versions of the DO LOOP words which do not get inlined. The added overhead for (DO) and (?DO) would be minimal since they only run once per loop.
...
I realize you may not want to change Tali Forth's DO LOOP's. If it's not broken, don't fix it; however, it might be worth trying if you have a really big application with lots of DO LOOP's and find memory getting a little tight.

Patrick (pdragon here) has been working on Tali Forth's loops and has made them significantly smaller and faster. Indeed, moving to a non-inlined DO was worthwhile, but LOOP is still inlined. Patrick also added a change that holds (caches, really) the LSB of the current count in zero page, and moved the loop info off of the return stack (to 0x100 and growing upwards, but could be moved anywhere in RAM).

Patrick actually wrote three different alternative looping options and benchmarked them against each other. We ended up choosing the "Loop Control Block - Simple" (lcb simple) version because it offered the best speedup while still being relatively simple to implement. Tali's test suite has a cycle counter added to py65mon, so we were able to benchmark all three options and saw the following speedups (the numbers are 65C02 cycle counts (clocks)):

Code:
: do?word1 5 5 ?do loop ;
    original    322
    master      322     ; all about the same
    push/pull   325
    lcb cmplx   325
    lcb simple  325
   
: do?word2 100 0 ?do i drop loop ; 
    original   12836
    master      8384
*   push/pull   6052    ; -6 cycles for i
    lcb cmplx   7218    ; speculatively incrementing i
    lcb simple  6658   
   
: doword 100 0 do loop ;
    original    6700
    master      2148
*   push/pull   1304    ; simple one-level loop
    lcb cmplx   2651
*   lcb simple  1310    ; about the same
       
: dowordi 100 0 do i drop loop ;
    original   12700
    master      8248
    push/pull   5904    ; -6 cycles for i
    lcb cmplx   7169
    lcb simple  6511
       
: dodoword 100 0 do 10 0 do loop loop ;
    original    90500
    master      44748
    push/pull   42704
    lcb cmplx   51318
*   lcb simple  33410   ; 9294/100 => 90 cycles better for nested loops
       
: dodowordij 100 0 do 10 0 do i drop j drop loop loop ;
    original    210500
    master      165748
    push/pull   149704  ; not a huge difference when all native comple
    lcb cmplx   158217
    lcb simple  156410  ; default J is a JSR; +6 cycles for i
*   lcb simple' 144410  ; forcing J native compile
   
: dodowordbigi 10 0 do 1024 0 do i drop loop loop ;
    original    1282730
    master      822298
*   push/pull   587424  ; -6 cycles for xt_i
    lcb cmplx   700317 
    lcb simple  648060  ; diff 60636 = +61440 for `i`, 804 better otherwise
   
   
: doword+loop 100 0 do 5 +loop ;
    original    2420
    master      2418
*   push/pull   2291
    lcb cmplx   2601
*   lcb simple  2213    ; faster when step `<256`

Patrick has made a few more small optimizations since then, but you can see that there was a lot of improvement to be had in many loop configurations. If you are interested in the "Loop Control Block" scheme that Tali now uses, let me know and I can provide more details as to the inner workings.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 21, 2024 10:48 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 895
SamCoVT wrote:
If you are interested in the "Loop Control Block" scheme that Tali now uses, let me know and I can provide more details as to the inner workings.
It does sound interesting; however, the STC version of Fleet Forth has been put on a back burner for now and it could be a while before I can do anything with this information.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 266 posts ]  Go to page Previous  1 ... 14, 15, 16, 17, 18

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron