6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu May 09, 2024 12:55 pm

All times are UTC




Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Mon Apr 07, 2014 6:55 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Ah yes, I see that now in my printed ANS 1994 version. Memory still holds strings as counted, not null-terminated, but they prefer that string functions put c-addr u on the stack. Older Forths did have COUNT which I defined in my '816 Forth as
Code:
        HEADER "COUNT", NOT_IMMEDIATE   ; ( ADR -- ADR+1 LEN )
COUNT:  PRIMITIVE
        LDA    (0,X)
        AND    #$00FF
        INC    0,X
        JMP    PUSH
 ;-------------------

It could be defined in Forth on a 6502/816 as
Code:
: COUNT   DUP 1+ SWAP C@  ;     ( adr1 -- adr2 u )
which is not only slower but takes more memory than the '816 primitive above.

My HP-71 hand-held computer's Forth in the Forth/Assembler module usually represents a string as c_addr u, and it seems to mostly be Forth-79; so it's strange that Forth-83 did a step back, and then ANS went to c_addr u again. The single cell on the stack is nevertheless nice sometimes though, just to reduce stack operations.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 07, 2014 2:18 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
The method that you describe covers how strings are represented on the stack for use, but not how they are stored. I am thinking about implementing a format that allows packed or unpacked to co-exist peacefully (I hope):

The first word of the stored string is the key. If the most-significant 8-bits are zero, then it's an unpacked string up to 2^24-1 chars long, with one char per word.

If the most-significant 8-bits of the first word are non-zero, then it's a packed string up to 255 chars long, with up to four chars per word.

An alternate 'storage' format could use a different technique to save space, but I'm not worried about that yet.

Mike


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 07, 2014 5:16 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
Quote:
If the most-significant 8-bits of the first word are non-zero, then it's a packed string
For most packed strings those MS 8-bits will automatically assume a non-zero value. But I guess you know very short packed strings will need special attention. If the string is only, say, 2 characters, remember to explicitly write a non-zero value into those MS 8-bits. :wink:

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 07, 2014 6:34 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8177
Location: Midwestern USA
I'm no Forth wiz (don't even know how to use Forth) but has anyone considered the idea of 2-for-3 string encoding? Said encoding relies on the fact that if case-insensitivity is acceptable, only five bits are required to represent all of the characters of the Roman alphabet, which means three ASCII characters can be compressed to two bytes. I use that principle in my 65C816 monitor to encode the mnemonics into a smaller space (e.g., LDA becomes $115a). The basic method can be theoretically extended to any string size, as long as the maximum unencoded string length is an exact multiple of 3. Eight bit 6502 code that runs in the Kowalski simulator follows.

Code:
   .opt proc6502,caseinsensitive
;===============================================================================
;
;PACK 65xx ASCII MNEMONIC INTO 2 BYTE BINARY
;
packmne  =$80                  ;packed mnemonic storage
;
basechar =$3f                  ;conversion constant
n_shift  =5                    ;bits per packed char
s_mnemon =3                    ;size of a 65xx mnemonic
ucmask   =@01011111            ;upper case conversion mask
;
         *=$2000
;
         ldx #0
         stx packmne           ;initialize
         stx packmne+1
;
l001     lda mnemon,x          ;get char
         and #ucmask           ;change to UC
         sec
         sbc #basechar         ;change to binary
         ldy #n_shift
;
l002     lsr
         ror packmne+1
         ror packmne
         dey
         bne l002
;
         inx
         cpx #s_mnemon
         bne l001              ;next char
;
         brk
;
;
mnemon   .byte "lda"           ;test mnemonic
;
   .end

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 07, 2014 6:49 pm 
Offline

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91
The problem is, so many strings require special characters.

Another possibility is radix-36 encoding, which if I recall correctly DEC used at one time in the PDP-11 family. Actually you can go up to radix-40 to encode 3 characters into 16 bits, which gives you letters, digits, and four other characters.

Edited to add: my mistake, DEC used radix-40 on the PDP-11. Though they called it radix-50 (octal). http://en.wikipedia.org/wiki/DEC_Radix-50

_________________
Because there are never enough Forth implementations: http://www.camelforth.com


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 08, 2014 1:53 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
Dr Jefyll wrote:
Quote:
If the most-significant 8-bits of the first word are non-zero, then it's a packed string
For most packed strings those MS 8-bits will automatically assume a non-zero value. But I guess you know very short packed strings will need special attention. If the string is only, say, 2 characters, remember to explicitly write a non-zero value into those MS 8-bits. :wink:

I'm thinking about something like the following:

string: 65m32
packed: 0536356d 3332xxxx
unpacked: 00000005 00000036 00000035 0000006d 00000033 00000032

empty string:
packed: 00000000
unpacked: 00000000

string containing a single NUL:
packed: 0100xxxx
unpacked: 00000001 00000000

Mike


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 08, 2014 4:49 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
BigDumbDinosaur wrote:
I'm no Forth wiz (don't even know how to use Forth) but has anyone considered the idea of 2-for-3 string encoding? Said encoding relies on the fact that if case-insensitivity is acceptable, only five bits are required to represent all of the characters of the Roman alphabet, which means three ASCII characters can be compressed to two bytes...


That method is quite efficient for the use that you specified, but not a good general-purpose solution. I think that Woz did something similar back in 1977 for the 'L' command in his Apple ][ monitor ROM ... maybe even a bit more aggressive than that, but I don't remember for sure.

Mike


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 09, 2014 5:22 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
It seems to me that a few of us are implementing a Forth for 65xxCPUs, with different design goals in mind. I don't mean to hijack this thread, and I also don't know whether it's right to start a new thread for every design issue that comes up. But the ugly has to go somewhere, so here goes:

In my implementation (for 6502), the innermost DO LOOP counter and limit are stored in 4 bytes of zeropage. This is for the speed and size advantages of not having to twiddle the machine stack all day long. As I'm reimplementing NUMBER, it occurs to me that I can also use >R R@ R> pretty freely inside the body of an inner DO loop, without having to worry about the loop index I getting in the way. It's not 79, 83, ANS, or Fig to be able to write something where R@ and I are not synonymous, like this, but it would work! Would I still be able to include the word FORTH-83 in my dictionary, which doesn't seem to have a lot to say about this divergence between R@ and I ?
Code:
( There is a string that is probably a number stored from addrlow..addrhigh -- caveat -- this probably has some bugs )
: NUMBER  ( addr -- d ; convert a counted string to a signed double, keeping track of the decimal )
   0 0 ROT COUNT   ( 0 0 addr+1 len )
   OVER C@ [ ASCII - ] =   ( leading minus? )
   DUP >R  IF  1- SWAP 1+ SWAP  THEN
   OVER + 1+ SWAP   ( 0 0 addrhigh addrlow )
   DPL ON
   ?DO
      BASE @ >R
      I C@  R@  DIGIT   ( doublesubtotal (currdigit) flag )
      IF
         SWAP R@ UM*
         DROP ROT
         R@ UM* D+
         DPL @ 1+  IF  DPL 1+!  THEN
      ELSE
         I C@ [ ASCII . ] = 
         DPL @ 0<  AND
         IF
            DPL OFF
         ELSE
            3 FAIL ( performs "NOT FOUND" ABORT - do not pass go do not collect $200 )
         THEN
      THEN
      R> DROP
   LOOP
   R>  IF  DNEGATE  THEN ;

Another thing I'm considering is to dispense with VOCABULARY entirely, at least in the initial release of PETTIL. No CURRENT, no CONTEXT, no VOCABULARY, but there would be a LATEST. Otherwise doing FORGET CREATE FIND all becomes mind-boggling, at least for my easily boggled mind. This is an obvious dealbreaker for the standards committee who will no doubt judge me harshly and kick me out of their clubhouse until I fix it. While I'm at it, I'm probably going to be playing fast and loose with BLOCK, EMPTY-BUFFERS, SAVE-BUFFERS, and all of that too, because this is not 1970 and I am not Chuck Moore and there is already a nice filesystem available on my target hardware platform, so I plan to use it. I will say that I am enjoying this project more than I really ought to, particularly the code golf. It's just that doing these things that violate standards makes me feel so impure. OB6502: Here's 1+! which I'm proud of, a word from the Forth79 Referrence wordset
Code:
;--------------------------------------------------------------
;
;       1+!   ( n -- )
;
; increments the word addressed by n
;
oneplusstorelfa .byt $de,$ad ; link
                .byt (oneplusstore-*-1)|bit7 ; length
                .asc "1+","!"|bit7 ; name
; there is no CFA, there is only code
oneplusstore    ldy #$ff
                sec
oneplusstore01  iny
                lda (tos),y
                adc #0 ; add one the first time, add C the second time
                sta (tos),y
                tya ; to set the Z flag
                beq oneplusstore01 ; loop exactly twice
                jmp pops


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 09, 2014 6:12 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
chitselb wrote:
It seems to me that a few of us are implementing a Forth for 65xxCPUs, with different design goals in mind. I don't mean to hijack this thread, and I also don't know whether it's right to start a new thread for every design issue that comes up. But the ugly has to go somewhere, so here goes:

One important thing is to be able to find stuff again later. Sometimes it helps to keep related things together even as a topic evolves, and other times it's good to separate it out and give it its own topic. The original poster's feelings on it should be respected, if he expresses or implies any.

Quote:
In my implementation (for 6502), the innermost DO LOOP counter and limit are stored in 4 bytes of zeropage. This is for the speed and size advantages of not having to twiddle the machine stack all day long.

It would probably be good then to push the ZP bytes before starting another loop, so you can nest loops. Actually, the standard word J reaches out to get the index value of the next loop out.

Quote:
Another thing I'm considering is to dispense with VOCABULARY entirely, at least in the initial release of PETTIL. No CURRENT, no CONTEXT, no VOCABULARY, but there would be a LATEST.

That's what I did. The volcabularies are particularly useful for when the main Forth and the assembler share word names, like AND; but in my assembler, I put the addressing mode on as part of the name, so there's no conflict anyway, like AND_ABS, AND#, AND_ZP, etc.. It made it possible to write the assembler in an evening and make it use the normal mnemonic-operand order, without having to do any parsing. (You just have to comma-in the operands, like LDA# 41 C, )

Quote:
While I'm at it, I'm probably going to be playing fast and loose with BLOCK, EMPTY-BUFFERS, SAVE-BUFFERS, and all of that too, because this is not 1970 and I am not Chuck Moore and there is already a nice filesystem available on my target hardware platform, so I plan to use it.

I used blocks briefly in about 1990 and absolutely hated them. I won't say they don't have any advantages, but I will say the disadvantages outweigh the advantages. I use a nice professional programmer's text editor, and can mark a block to try instantly, without leaving the editor. I don't have to work in a teensy window with a decrepit block editor.


Quote:
OB6502: Here's 1+! which I'm proud of, a word from the Forth79 Referrence wordset

1+! was called INCR by at least one commercial Forth supplier, so that's what I called mine. Here it is in my '816 Forth, along with the accompanying DECR:
Code:
        HEADER "INCR", NOT_IMMEDIATE    ; ( addr -- )
INCR:   PRIMITIVE
        LDA     (0,X)           ; Unfortunately there's no INC (DP,X)
        INA                     ; instruction.
 incr1: STA     (0,X)           ; This word probably won't get used enough to
        JMP     POP             ; justify using the POP1 macro here.)
 ;-------------------

        HEADER "DECR", NOT_IMMEDIATE    ; ( addr -- )
DECR:   PRIMITIVE
        LDA     (0,X)           ; Unfortunately, there's no DEC (DP,X)
        DEA                     ; instruction.
        BRA     incr1           ; Branch into INCR 's body.
 ;-------------------


Quote:
; there is no CFA, there is only code

With a few different projects going on here, it's hard to remember who's doing what; but in ITC Forth, the CFA of a primitive just points to the address after the CFA which is where the assembly-language code is-- in the parameter field, just to confuse things. :lol:

I haven't taken the time to look at the details of your NUMBER code (I really must get back to work!), but one thing I saw immediately was the first ) in "( doublesubtotal (currdigit) flag )" which would make Forth look up "flag". If you use \ (the backslash) for comments, you can put anything you want after it, including )...), since the entire remainder of the line will be ignored. Another usage example would be if you have f(1)-f(2) in the comments. Another one that popped out is that in the line " OVER C@ [ ASCII - ] = ( leading minus? )", you need LITERAL after the ] so the result will get compiled. Since [...] LITERAL is so common, I defined a word ]LIT to shorten it up.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 09, 2014 1:43 pm 
Offline

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91
A few comments:

R@ and I have been different for a long time. Take a look at F83's implementation of DO..LOOP (or CamelForth's, which shamelessly adopted the same technique).

You can do non-standard things in your kernel, as long as they're implementing standard functions. Your definition of NUMBER, for example, takes advantage of the fact that the innermost loop index I is kept in a register, so >R won't interfere with it. Of course, your NUMBER probably won't work in any other ANS Forth. And application programs can't use that trick and remain portable.

FORGET has largely been deprecated. Implement ANS Forth's MARKER instead. It's easier to implement and more versatile (e.g. it can easily handle separate Program and Data dictionary pointers).

_________________
Because there are never enough Forth implementations: http://www.camelforth.com


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 12, 2014 2:49 pm 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
Brad R wrote:
3 characters into 16 bits, which gives you letters, digits, and four other characters.

Edited to add: my mistake, DEC used radix-40 on the PDP-11. Though they called it radix-50 (octal). http://en.wikipedia.org/wiki/DEC_Radix-50
This would be extremely useful for compressing some text for cartoon speech bubbles, which I could limit to A-Z 0-9 and (say) space, comma, dash, question. It's a PET in graphics mode so I don't need upper/lowercase.

The trick is, how do you do unpack it quickly? Setup and two calls to UM/MOD would be a lot of overhead. Is there a quick divide-by-5 algorithm for 6502? I googled and the best I could come up with is a 7/32 (0.21875) or 13/64 (0.203125) approximation approach.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 12, 2014 3:03 pm 
Offline

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91
Given that you're limited to byte operations, I'm not sure this would be much faster than an ordinary 16-by-8 bit division, but here you go:

Integer division by constants
http://blogs.msdn.com/b/devdev/archive/ ... 02980.aspx

Code:
    q = (x >> 3) + (x >> 4);
    q = q + (q >> 4);
    q = q + (q >> 8);
    q = q + (q >> 16);
    r = x - 5*q;
    return q + ((11*r) >> 5);


Another useful approximation is 51/256 = 0.19922.

_________________
Because there are never enough Forth implementations: http://www.camelforth.com


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 04, 2014 10:26 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
BigEd wrote:
Nice idea to allow the strings to start non-justified - but to keep the benefit of allowing any character in a counted-string implementation, there's no room for special values. So as Garth says, you need to find 2 bits for this.

Yikes! I was distracted by 100 other things when Garth offered me that wise bit of advice, and now I'm facing that exact conundrum. I think that my revised design will have to steal two more bits from the first word, which would then leave my maximum unpacked string length reduced to 2^22 - 1, while leaving the maximum packed length at 255.

It's either that, or a separate set of words for each type (SKIP/SKIPP, SCAN/SCANP, etc.) and I'm not convinced that it would be a good idea to open that potential can of worms.

Perhaps my initial implementation of Camel Forth will only support unpacked strings, and I'll just write custom primitive versions of FIND and friends to save space just in my Headers. Yeah, that's probably what I'll do ... that (and the double-cell math family) should be enough to provide me enough info to finally put the final touches on my machine instruction set.

Mike


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: