A milliforth for 6502

agsb · Post by **agsb** » Sun Nov 12, 2023 4:37 am

The milliForth (https://github.com/fuzzballcat/milliForth) is a Forth implementation for Z80.

It uses less than 400 bytes.

It is based on sectorForth and uses getchar and putchar from system BIOS.

Most of words are defined in a hello-world.fth file.

What could be the lower size implementation of it for a 6502 ?

Using the same hello-word.fth as test case;

GARTHWILSON · Post by **GARTHWILSON** » Sun Nov 12, 2023 5:15 am

When I was a FIG (Forth Interest Group) member and getting the Forth Dimension magazine (and had a few articles of my own published), there was frequent discussion on how few primitives you could get away with, because of the interest in portability, where if you went to another processor, you'd want to re-write as little assembly language as possible. The accepted minimum seemed to be about 30; but that made for rather poor performance. I think someone had it down to about 12, making the performance ridiculously bad. MilliForth is even a lot less than that, and the "less than 400 bytes" part seems to be just those few primitives, and even hello_world.FORTH has to define some really basic stuff as secondaries (also called "colon definitions"), and often does so using more bytes than the primitive (also called "code definitions"), ie defined in assembly language) version would have been. These things are always interesting, seldom useful, but may nevertheless spark a useful idea here or there.

JimBoyd · Post by **JimBoyd** » Sun Nov 12, 2023 8:16 pm

This is why my Forth's kernel is as big as it is. First, the kernel of my Forth implements the entire required wordset for the Forth-83 Standard. There are words not in the Forth-83 Standard's required wordset which I decided to include in my Forth. 2>R and 2R> are two such words. Including them in the kernel makes the kernel larger; however, some words in the kernel can be written more efficiently. This results in a smaller overall system which is the kernel and the system loader.
The system is faster and more responsive because there are many primitives.

BruceRMcF · Post by **BruceRMcF** » Mon Nov 13, 2023 1:36 am

GARTHWILSON wrote:

When I was a FIG (Forth Interest Group) member and getting the Forth Dimension magazine (and had a few articles of my own published), there was frequent discussion on how few primitives you could get away with, because of the interest in portability, where if you went to another processor, you'd want to re-write as little assembly language as possible. The accepted minimum seemed to be about 30; but that made for rather poor performance. I think someone had it down to about 12, making the performance ridiculously bad. MilliForth is even a lot less than that, and the "less than 400 bytes" part seems to be just those few primitives, and even hello_world.FORTH has to define some really basic stuff as secondaries (also called "colon definitions"), and often does so using more bytes than the primitive (also called "code definition," ie defined in assembly language) version would have been. These things are always interesting, seldom useful, but may nevertheless spark a useful idea here or there.

It seems that if, rather than trying to have a portable platform, you are bootstrapping and willing to be very system specific, @ C@ ! C! and EXECUTE and an outer interpreter could probably get you going. If you know the "carnal details" of the hardware, you don't need that S@ structure thingie, you can make arithmetic words by poking machine code, you can write SP@ and RP@ in machine code, etc.

However, if the MilliForth was expanded to include an INCLUDE of some sort, then if you have "carnal knowledge" of how to build a primitive, you can define a CODE word on the command line with C! and !, and have a platform to bring up a 6502 forth on the basis of a very small ROM footprint. The key issue is having an efficient inner-interpreter, if you are going to build the full fledged Forth "on top of" Milliforth+INCLUDE.

BruceRMcF · Post by **BruceRMcF** » Tue Nov 14, 2023 2:45 pm

BruceRMcF wrote:

GARTHWILSON wrote:

When I was a FIG (Forth Interest Group) member and getting the Forth Dimension magazine (and had a few articles of my own published), there was frequent discussion on how few primitives you could get away with, because of the interest in portability, where if you went to another processor, you'd want to re-write as little assembly language as possible. The accepted minimum seemed to be about 30; but that made for rather poor performance. I think someone had it down to about 12, making the performance ridiculously bad. MilliForth is even a lot less than that, and the "less than 400 bytes" part seems to be just those few primitives, and even hello_world.FORTH has to define some really basic stuff as secondaries (also called "colon definitions"), and often does so using more bytes than the primitive (also called "code definition," ie defined in assembly language) version would have been. These things are always interesting, seldom useful, but may nevertheless spark a useful idea here or there.

It seems that if, rather than trying to have a portable platform, you are bootstrapping and willing to be very system specific, @ C@ ! C! and EXECUTE and an outer interpreter could probably get you going. If you know the "carnal details" of the hardware, you don't need that S@ structure thingie, you can make arithmetic words by poking machine code, you can write SP@ and RP@ in machine code, etc.

I will note that doing it with MilliForth / SectorForth (I would lean toward the latter) means that you HAVE the "Hello World", so even if it is slow, you can get to the command line interpreter really quickly and then incrementally replace the "slow" definitions with "fast" ones, based on hand coding the raw machine code of a primitive.

I am, indeed, tempted to do just that -- probably with the SectorForth wordset rather than the MilliForth one -- but with some modest adjustments. First, since I am not actually trying to fit it into an IBM-PC floppy disk boot sector, I would go ahead and have "ok" and "? DQP" error messages. Second, I need an INCLUDE and a SAVEIMAGE. Third, rather than skipping a number converter entirely, I will have a $FF $FFFF number converter built into "SectorForth+", deferring the full featured BASE based number converter to the Forth written in SectorForth+.

My INCLUDE though will be called something else ... perhaps SCRIPT ... because it will not take a name, it will be a "blockish" file that loads 64 bytes from the file into the first 64 bytes of the TIB, and pads it out with spaces. A program to take a text file and turn it into a script file would be a separate thing ... it could be in BASIC in a system which has a Basic, it just reads the lines of a text file and writes a stream of text padded by spaces under the "numeric name". Supporting SCRIPT is the \ comment word. Maybe by convention $00 SCRIPT is a "read me" or directory.

SCRIPT would set the source back to the console if the file ended with the last character, so after the last line in the script you are ready to type in. Scripts will not have text names, they will have numbers and a specific file type, so $02 SCRIPT will work. The saveimage would not be called "SAVEIMAGE", which by convention takes a file name ... the file name is given at a specific magic RAM location in the Forth, edit it with C! to change the name it is saved under, so this is called SAVEME.

So that is 18 words:

@ ( addr -- u|n )
! ( u|n addr -- )
SP@ ( -- addr )
RP@ ( -- addr )
+ ( n1 n2 -- n3 )
NAND ( n1 n2 -- n3 )
0= ( n1 -- T/F )
EXIT ( r: addr -- )
KEY ( -- c )
EMIT ( c -- )
SCRIPT ( n -- )
\ ( -- )
SAVEME ( -- )
STATE ( -- addr )
TIB ( -- addr )
>IN ( -- u )
HERE ( -- addr )
LATEST ( -- addr )

BruceRMcF · Post by **BruceRMcF** » Wed Nov 15, 2023 2:10 pm

I was doing some tinkering last night, pulling things about of my (not yet debugged) xForth for the X16 project. It is running at the command line and compiling, but my CREATE DOES> does not work, so it's not yet really working. Still, SectorForth/milliForth doesn't need CREATE DOES> in 6502 assembly, so maybe it will work.

To get RP@ and SP@ when I use ,X indexing (or ,Y indexing) to access the return and data stacks, I have two vectors in zero page, RP and SP, and load the index register out of those for operations on the data stack and return stack. Only SP@ and RP@ use the high bytes, to form an address that the userland code can use. ... the system words ignore the high bytes, but the stacks are within a single page, so any user code that changes the high byte is already broken.

I have to check the Forth written in SectorForth to see how it defines >R and R>, to check whether those really are SP@ and RP@ ... I am not sure how you define those in userland without the CamelForth SP! and RP! words.

The X16 has 94 bytes of zero page free -- $00/$01 are banking latched for the HighRAM and ROM windows in the memory map, and $02--$21 are a 16-word API for Kernal calls that need inputs/outputs greater than the three registers plus the carry flag, and $80-$FF is used by the system/Basic. So I put zero page vectors and my JMP (LIST) for the DONEXT routine plus 3 User vectors U, V and W in $22-$2F, and allocated $30-$7F to the X-indexed data stack. I push the bottom of the hardware stack to $01AF, and use $01B0-$01B7 for the four system variables (>IN, LATEST, HERE and STATE), and the $01B8-$01FF is the space for the X or Y indexed return stack.

barrym95838 · Post by **barrym95838** » Wed Nov 15, 2023 4:26 pm

BruceRMcF wrote:

I push the bottom of the hardware stack to $01AF, and use $01B0-$01B7 for the four system variables (>IN, LATEST, HERE and STATE), and the $01B8-$01FF is the space for the X or Y indexed return stack.

Interesting twist. I am far from an expert, but all of the 65xx implementations with which I'm familiar just use the S register for the Forth return stack pointer, influenced by the paucity of registers and the favorable trade-offs that come from fusing the hardware and interpreter return stacks. Do you have some sample primitives you can share (possibly in a fresh thread) which utilize RSP?

BruceRMcF · Post by **BruceRMcF** » Wed Nov 15, 2023 5:07 pm

barrym95838 wrote:

BruceRMcF wrote:

I push the bottom of the hardware stack to $01AF, and use $01B0-$01B7 for the four system variables (>IN, LATEST, HERE and STATE), and the $01B8-$01FF is the space for the X or Y indexed return stack.

Interesting twist. I am far from an expert, but all of the 65xx implementations with which I'm familiar just use the S register for the Forth return stack pointer, influenced by the paucity of registers and the favorable trade-offs that come from fusing the hardware and interpreter return stacks. Do you have some sample primitives you can share (possibly in a fresh thread) which utilize RSP?

It doesn't need to be a separate thread, because I did it to address part of the SectorForth/MilliForth style model, where RP@ has to return an address and act "as if" it is running the return stack with a pointer. You can have all of the R-stack users dutifully update the return stack, but assembly language routines and calling routines from interrupt vectors are not necessarily going to respect that.

So I wouldn't do it in general, it's for cramming the square peg of 65C02 X/Y indexed stacked into the round hole of this Forth model which, AFAIU, wants something that works like an actual top-of-stack pointer. For the general CamelForth model which has RP@ RP! SP@ SP!, I just use the X index into my data stack and the S index into my Return stack and the SP@ and RP@ just clear the high byte while the SP! and RP! ignore it.

It's early days yet, and if I think the SectorForth approach is too clunky, I could well return to the CamelForth style words, though at the cost of having to rewrite the SectorForth HelloWorld script. So at the very least, I want to implement the SectorForth model -- with some extensions, but not with modifications ... so if I run the SectorForth script and it doesn't work, I know it is the s4th.prg binary that is wrong rather than the script. Whether or not I decide to "de-clunkify it" would then be a matter of available time and interest.

It's at home but this is office hour, and I think I can recall the DOLIST.

The compiled forth word is:

Code: Select all

    JSR DOLIST
    [address of 1st word]
    [address of 2nd word]
    ...
    [address of EXIT]

So 1 minus the start of the list is on the top of the system stack.

Code: Select all

DOLIST:
    LDX RP ; the RP@ side requires it be maintained "as if" it is a return stack pointer
    DEX
    DEX
    STX RP
    CLC
    LDA #2
    ADC IP ; the model requires that the EXIT address on R: points to a list entry to start executing
    STA RL,X
    LDA #0
    ADC IP+1
    STA RH,X
    PLA
    PLX
    INC
    BNE +
    INX
+   STA IP
    STX IP+1
    JMP DONEXT

So the JSR DOLIST that extracts the address of the list of (direct threaded) executable words can push onto the R stack without interfering with the list subroutine return address on the stack.

The other aspect is when SectorForth is used to intermix conventional assembly language and forth ... since you can define a wordset to move HERE to where you want an assembly language routine, and then restore it when you are done:

$0400 STARTCODE
CODE< $DE , $AD , $BE , $ EF , >CODE
CODEHERE @ .
$0404 OK

... to put arbitrary data into an arbitrary location, so a script can inject a set of assembly language routines.

So this simplifies writing a helper routine for assembler language routines calling compiled Forth words. The helper routine injects the address of the return routine onto the Return stack, then returns to the caller, which can then call the Forth word as a subroutine ... and the helper can just return to the caller because the R stack and the hardware stack are separated:

Code: Select all

; User
; ...
    JSR CALL4TH
    JSR SETUP_IO
; ...

CALL4TH:    ; untested
    LDX RP
    DEX
    DEX
    STX RP
    LDA #<(CALL4TH1)
    STA RL,X
    LDA #>(CALL4TH1)
    STA RH,X
    RST    ; return from JSR CALL4TH
CALL4TH1: ; EXIT has called here
    LDX RP
    INX
    INX
    STX RP
    RTS    ; return from JSR SETUP_IO

BruceRMcF · Post by **BruceRMcF** » Wed Nov 15, 2023 6:07 pm

No original plan of mine ever encounters the initial assembly language sketch entirely intact (just as no original assembly language sketch survives execution and debugging entirely intact).

When I looked at my CamelForth routine for reading a line of text from the X16 keyboard Kernel routine, it became clear to me that I could just declare a size limit on the length of lines (79 characters) and read straight from a conventional text SEQ file.

In the X16, return in PETSCII mode is CR, while in ISO mode, return is LF, and for the built in text editor, the same thing happens with the text files it saves. So I just test for both CR and LF to detect the end of line from the console, and store a null value at that point.

I want to make most of the dictionary easily ROMable, so the the ME name for the SAVEME file image and the space for a script filename being called comes right after the Commodore Basic stub for making a machine language routine that can RUN from the Basic command line, which is a single line 10 SYS $810 basic program followed by the three nulls that mark the end of basic code followed by the code you call to. That is a "JMP SETUP" command, and then after it comes the file name buffers. For "belt and suspenders", the file name buffer is the tail of the vocabulary list, with a $0000 link pointer, a $0 length field so it never matches, and then the image name buffer and the script filename buffer.

Scripts are planned to be (I haven't assembled any of this yet, but I am in the middle of writing the assembly language outer interpreter, so I haven't even written sketches of the SCRIPT and SAVEME words) 0.sfs ("Sector Forth Script") through f.sfs, so a total of 16 possible scripts, though the X16 SD card file system has subdirectories, so that is really a limit of 16 scripts accessible at the same time. My plan is to use 0.sfs as the "directory", so each script from 1.sfs to however many has a comment line. Each line in the 0.sfs begins with the "\" comment word that loads $50 into >IN, so "$00 SCRIPT " would get nothing but the echo of the script.

In console mode, the first location in the script filename has a $00 (NUL) character, which is how SCRIPT knows whether they were executed from the console or executed from inside a script.

SCRIPTs do not nest ... they do not even nest with following Forth words on the command line ... but they chain ... if SCRIPT is executed in a SCRIPT, it over-writes the filename used by the previous one. But first, it checks whether the location is NUL or a digit, and if it is a digit, it closed the current open script file so it can re-use the same fileid#.

Then, whatever the previous status, SCRIPT writes the Hexadecimal digit of the first sixteen bits of the value on stack into the first location in the SCRIPT filename buffer and uses that to open the script. When it reaches the end of file, it closes the file, puts a $0 (NUL) in the first location in the SCRIPT filename buffer, set >IN to $50, and restores the default input to the keyboard.

If it is returning directly to the command line, the interpreter finds >IN is at the end of the TIB and so it gets another line of input from the user.

If there is chaining, it is the last callee in the chain that actually gets to the end of its file, closes its own file, restores CHRIN to the keyboard, and puts $00 into the first location in the script filename buffer.

For the callers, when SCRIPT finds that it is at the end of the TIB and needs another line from the file, first it checks the script filename buffer. If it is $00, then it knows that the current default input is no longer its file, so it simply returns.

If there is chaining, SCRIPT nests on the return stack, but only the last callee in the chain ever gets to its end of file ... all of the others are returned to from their call to SCRIPT and find out that they are finished already.

BruceRMcF · Post by **BruceRMcF** » Thu Nov 16, 2023 5:06 pm

I went through some of the SectorForth examples (and milliForth is really a dialect of SectorForth ... mostly, rather than having four system variables, it has an eight byte system variable block with a single word to give the base address of the block), and it really does use RP@ and SP@.

DUP and R@ is straightforward ... "SP@ @" and "RP@ @". It's RDROP or R> that is crazy.

Code: Select all

: >REXIT \ ( addr1 R:addr2 -- R:addr1 )
    RP@ ! ;

: >R  \ ( x1 R:x2 -- R:x2 x1 ) 
    RP@ @  SWAP  RP@ !  ;

\ and swap is:
: dup ( x -- x x ) sp@ @ ;
: -1 ( x -- x -1 ) dup dup nand dup dup nand nand ;
: 0 -1 dup nand ;
: 1 -1 dup + dup nand ;
: 2 1 1 + ;
: 4 2 2 + ;
: 6 2 4 + ;
: over ( x y -- x y x ) sp@ 2 + @ ;
: swap ( x y -- y x ) over over sp@ 6 + ! sp@ 2 + ! ;

... so, if I get SectorForth up and running for the X16, in addition to $00 hexadecimal numbers, SCRIPT and SAVEME, I am going to have an RDROP and a SWAP.

I first thought from the README listing of primitives that ":" and ";" are handled directly by the outer interpreter, but while they are not listed as primities, the assembly listing shows them being added to the vocabulary, so ":" and ";" are words -- ";" is the only immediate word among the starting set. However, despite example files making liberal use of ( ) and \ comments, there is no ( or \ comment word primitive ... because in SectorForth you do not load the example files, you read the example files and type them in. So with a SCRIPT load word, \ or ( ) is going to be needed. I am sticking with \ because it's simpler -- just put >IN at the end of the TIB and you are done.

BruceRMcF · Post by **BruceRMcF** » Fri Nov 17, 2023 2:57 pm

The $DEAD $BEEF literal converter that I am looking it is a bit cheeky. It has almost NO guardrails: the only syntax error it checks for is a $ alone without anything after in the token.

This is the classic model where if you cannot find the token in the wordlist, you attempt to convert it into a number. However, in this case, only hex numbers with leading "$" are supported.

'0'-'9' are %00110000 - %00111001
'A'-'F' are %01000001 - %01000110
'a'-'f' are %01100001 - %01100110

And the digital value needs to be in the top four bits of A to rotate into V and V+1 four bits at a time.

So shift the character left four times, "ASL : ASL : ASL : ASL"

If the carry flag is set, treat that as '0' through '9', and it is %00000000-%10010000 so you are ready to shift it in.
If the carry flag is clear, AND if it is a well formed hex value, it is %00010000-%01100000, when it needs to be %10100000-%11110000, so add 9:
"BCS + : ADC #9 : + ..." and then shift the value into V.

I am not finished writing the INTERPRET routine, but I can't wait to see if this works.

Another thing that emerged from starting to work on the sketch for this is that the "?? <word>" error routine and the STARTUP routine had a common factor for resetting the stacks and starting/restarting the interpreter ... and when I reorganized the common factor so that the "??" error could just jump to the appropriate point in the SETUP routine ... I realized that what I had was an ABORT. Since I am not trying to win any record in this for having the "smallest possible" Forth, I went ahead and added a vocabulary list header for the ABORT entry point which SETUP branches over to finish starting up the system.

Once I find out what a working version of the system assembles to, I will be able to work out a target size, and whether the finished version should focus on running in the X16's almost 38KB of LowRAM, within an 8KB segment of the 512KB of High RAM, or run with the base system primarily residing in ROM as a runtime for a forth that can run in the 1KB of Golden RAM at $0400-$07FF.

BruceRMcF · Post by **BruceRMcF** » Sat Nov 18, 2023 2:50 pm

agsb wrote:

The milliForth (https://github.com/fuzzballcat/milliForth) is a Forth implementation for ['86]

It uses less than 400 bytes. ...

What could be the lower size implementation of it for a 6502 ?

The rough sketch I have so far assembles to about 1.25KB, half of that the CBM "PRG" stub, initial system variable values and dictionary, and half of that the machine language runtime. A direct comparison would mean stripping out \ SCRIPT SAVEME and the support for hex literals, so certainly less than 1KB.

The machine language runtime is just 22 bytes over 0.5KB, so I am going to go over my sketch to see if I can factor out anything that is repeated to get that down to just two pages of runtime, then the image file would start out at just a bit over 0.5KB.

For a RAM runtime in the GoldenRAM at $0400-$07FF, where I need a magic number to check that it's there and doesn't need to be loaded, my magic number would be $53F4 ... 'S',$F4, for SectorForth.

BruceRMcF · Post by **BruceRMcF** » Sat Nov 18, 2023 10:18 pm

OK, I may have got the machine language runtime under 512bytes ... I say "may have" because I haven't loaded it into the emulator yet to try to run it, and of course fixing a mistake in coding could well reclaim some of the bytes I've saved.

I squeezed hard ... finding one spot where I rotated the value of hexadecimal digits into one of my three ZP user vectors and then copy it into the stack, and instead rotate it directly into the stack was probably the biggest chunk of bytes saved, and that was maybe 10 bytes.

Other than that it was finding an X save and restore that around kernal function calls that could be replaced by PHX/PLX to save two bytes, or:

Code: Select all

...
    LDY #1
    STA (VHERE),Y
    CLC
    LDA #2
    ADC VHERE
    STA VHERE
    BCC +
    INC VHERE+1
+   JMP NEXTWORD
...

... is replaced by:

Code: Select all

    LDY #1
    STA (VHERE),Y
    SEC
    TYA
    ADC VHERE
    STA VHERE
    BCC +
    INC VHERE+1
+   JMP NEXTWORD
...

... to save a single byte.

BruceRMcF · Post by **BruceRMcF** » Sun Nov 19, 2023 5:28 pm

Note:

agsb wrote:

The milliForth (https://github.com/fuzzballcat/milliForth) is a Forth implementation for Z80. ...

... it is, AFAICT, for the 8086, so that, just like SectorForth, it can be executed from a MS-DOS floppy boot sector. It's claim to fame is that it is a touch smaller than SectorForth, which is only important to people who care that SectorLisp came in at a smaller size than SectorForth.

A SectorForth for CP/M would be a much more aggressive challenge, since a CP/M boot sector is not hard-defined to be 512bytes so, for instance, a CP/M boot sector that is defined to be two logical 128 byte sectors could only hold 256 bytes of code.

Regarding IP issues, this kind of means that I have a reasonably strong case that this is a "clean" re-implementation of SectorForth/milliForth, from the docs rather than from the source, since while I can read Z80 assembly language, I can rarely make out '86 assembly language -- even the more approachable '86 assembly language of the 8086-80386. But as SectorForth is open source and I am not fussed about releasing under the same open source license, that's just an observation rather than an IP issue.

Anyway, just as I anticipated, I've had to use some of that machine language runtime space again ... enough that I had to move the ABORT header out of the prospective runtime block into the loaded code to keep it under 2 binary pages ... because I messed up the outer interpreter.

It goes without saying that when the outer interpreter executes a compiled word, it finally ends in an [EXIT] vector at the end of the compiled word, and when it executes a primitive, it much more quickly ends in a "JMP DONEXT" at the end of the primitive, ...

... but it turns out that it doesn't go without *thinking* it through, and my NEXTWORD routine for the outer interpreter ended up without the IP pointing to somewhere that has a "next" vector for DONEXT to jump to.

Now what I have is a stub of compiled code called OUTER, which is a pair of vectors. The outer interpreter placing the address of the found word in the first vector, and the address of NEXTWORD in the second vector. NEXTWORD resets the Return stack to the bottom, and places the address of OUTER in IP, and NEXTWORD ends in an indirect jump to (OUTER), so when a primitive executes DONEXT, NEXTWORD is called, and compiled code when EXIT is called exits to NEXTWORD, and the call to NEXTWORD resets IP for the next word to be executed.

This has to be in RAM because it is, in effect, self-modifying code, so the headless OUTER stub is put in the loadtime part of the code directly after the filename buffers to be used by SAVEME and SCRIPT.

BruceRMcF · Post by **BruceRMcF** » Mon Nov 20, 2023 4:08 pm

I just realized that I don't handle compiling literals in the compiler ... they are left on the stack.

One strategy, I can just leave $ literals as, effectively, immediate literals. Then I need [LIT,] in a script somewhere.

Second strategy, I implement a DOLIT internal primitive and compile DOLIT and the value if reaching the end of the hex value and state=1, which blows the runtime well past 0.5KB, but would be much less of a pain to work with.

A milliforth for 6502

A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502