A milliforth for 6502

barrym95838 · Post by **barrym95838** » Mon Nov 20, 2023 4:32 pm

BruceRMcF wrote:

Note:
... it is, AFAICT, for the 8086, so that, just like SectorForth, it can be executed from a MS-DOS floppy boot sector.

Why didn't anyone notice that earlier? I've been playing around a lot recently with a Z80 to 6502 project that is along a similar vein, but I'm not ready to share yet, and when and if I do, it'll be in a fresh thread. Does a tiny token-threaded 4th-cousin of Forth belong in the Forth sub-forum, or the generic programming sub-forum?

Quote:

... while I can read Z80 assembly language, I can rarely make out '86 assembly language -- even the more approachable '86 assembly language of the 8086-80386. But as SectorForth is open source and I am not fussed about releasing under the same open source license, that's just an observation rather than an IP issue.

I know what you mean, and I'm finding myself struggling to stay focused on the intent of the Z80 code I'm trying to grok. I really want to make my 6502 version look tight and efficient, not just a lazy translation. I'll know that I'm done when I no longer have any ZP variables called BC, DE, IX, IY and/or HL.

I wouldn't even attempt a project this large with x86 source ... my brain cells would mutiny in a matter of a few hours.

GARTHWILSON · Post by **GARTHWILSON** » Mon Nov 20, 2023 7:50 pm

barrym95838 wrote:

Does a tiny token-threaded 4th-cousin of Forth belong in the Forth sub-forum, or the generic programming sub-forum?

If there will be constant comparisons to Forth, I suppose the Forth section of the forum would be fine; but if it's just doing stack operations like Forth does but without actually using Forth, the Programming section might be more appropriate.

BruceRMcF · Post by **BruceRMcF** » Tue Nov 21, 2023 12:13 am

GARTHWILSON wrote:

barrym95838 wrote:

Does a tiny token-threaded 4th-cousin of Forth belong in the Forth sub-forum, or the generic programming sub-forum?

If there will be constant comparisons to Forth, I suppose the Forth section of the forum would be fine; but if it's just doing stack operations like Forth does but without actually using Forth, the Programming section might be more appropriate.

Token threaded, direct threaded, indirect threaded, subroutine threaded, bit threaded, bit-token threaded ... I don't see that that makes a big difference, but if it is a fourth-cousin of Forth, that would be more of an edge case than SectorForth, which is a "minimal Forth" in which one can define a full fledged Forth.

Of course, as noted further up in this thread, like all minimal Forths, it will sacrifice operating speed compared to a full fledged implementation, and since it is substantially more minimal than eForth, it is likely to be substantially slower than eForth as well.

Since I am not pursuing the "smallest possible" prize (and I believe those in the chase are doing it more for the honor of the prize than for the honorarium of $0.00), I don't really mind if I end up with something that takes a whole 2KB to get to the point of being able to interpret file scripts and to save the result in an executable image.

Edit: Even putting a headless DOLIT in the loadtime section of my source, adding the STATE smart handling of literals pushed the runtime back over 2 binary pages. However, the whole assembled binary remains under five binary pages.

BruceRMcF · Post by **BruceRMcF** » Tue Nov 21, 2023 4:30 pm

What I aim to do with this is to debug it to see if I can make it work ... which won't be happening until after Thanksgiving week, so this brief flurry of activity will pause ... and if I can, to convert it from the SectorForth to something more like a subset of Camel Forth, with RP@ RP! SP@ and SP! so that I can dump the clunky "stacks as indexes and also as pointer" approach that SectorForth requires.

Then I can convert to having a return stack in the ZP and a data stack with a TOS in the zero page and rest of the data stack on the hardware stack. That works for the DOLIST that I already have here, which relies on JSR DOLIST to get the list address onto the hardware stack and avoids juggling if it just pushes the IP into the return stack and then pops the return vector, incrementing it along the way to convert it to a jump vector. I can also add a more reasonable wordset for a more efficient minimal Forth ... including */ as a base for defining multiply and divide words when needed.

BruceRMcF · Post by **BruceRMcF** » Wed Feb 21, 2024 5:50 pm

I've picked this up again. I didn't get it up beyond showing the prompt, and wasn't interested enough in actually having a pure SectorForth model forth to dig into it.

So what I'm doing is abandoning the "pure" SectorForth, using rp@ rp! sp@ sp! and aim to climb up the compiled Sector Forth model to define as primitives the things that required the return stack pointer and stack pointer to be an actual integer pointer.

I am using a "JMP (addr,X)" direct threaded model where the operand of the JMP is the zero page Instruction "base" address. I walk the X forward until it passes $7F, then update the IP base address if it gets there ... but since the IP base address gets updated with every call to high level compiled code and every DOLIT operation, and well written Forth words are rarely over 127 bytes long anyway, it should only very rarely require the "half-page fix" part of the routine.

DONEXT can be reduced to 15 clock cycles, including the three clock jmp overhead, if the entire DONEXT routine is placed in the zero page.

An alternative model is the split pointer model, where the low byte of IP is $00 and X holds the low byte address, but the trade-off is that you need to check for page overflow with each INX, so you simplify and speed up EXIT and DOLIST but add two cycles to the DONEXT routine.

Code: Select all

; in ZP
JMPIPX:
    JMP ($FFFC,X)

IP = JMPIPX+1
IPH = IP+1

; ...
; in Golden RAM, LowRAM or HighRAM

DONEXT:    ; 18 clock cycles including JMP DONEXT in word
    INX
    INX
    BMI +
-   JMP JMPIPX
+   TXA
    CLC
    ADC IP
    STA IP
    BCC -
    INC IPH
    BNE -
    BRK

EXIT:
    LDY RNDX ; Y is a free register
    INC RNDX ; in Zero page
    LDA RL,Y
    STA IP
    LDA RH,Y
    STA IPH
    LDX #2
    JMP JMPIPX

DOLIST: ; JSR DOLIST precedes compiled list
    DEC RNDX ; pre-decrement Y
    LDY RNDX
    TXA
    CLC
    ADC IP
    STA RL,Y
    LDA #0
    ADC IPH
    STA RH,Y
    PLA
    STA IP
    PLA
    STA IPH
    LDX #1 ; return vector is call address -1
    JMP JMPIPX

; ...
AND:
    PLA
    AND T
    STA T
    PLA
    AND TH
    STA TH
    JMP DONEXT

I think the split-pointer version goes:

Code: Select all

; in ZP
JMPIPX:
    JMP ($FFFC,X)

IP = JMPIPX+1
IPH = IP+1

; ...
; in Golden RAM, LowRAM or HighRAM

DONEXT:    ; 18 clock cycles including JMP DONEXT in word
    INX
    BEQ +
DONEXT0:
    INX
    BEQ ++
-   JMP JMPIPX
+  INX
++  INC IPH
    JMP JMPIPX

EXIT:
    LDY RNDX ; Y is a free register
    INC RNDX ; RNDX in Zero page
    LDA RL,X
    CLC
    ADC #2
    TXA
    LDA RH,Y
    ADC #0
    STA IPH
    JMP JMPIPX

DOLIST: ; JSR DOLIST precedes compiled list
    DEC RNDX ; pre-decrement Y
    LDY RNDX
    STX RL,Y
    LDA IPH
    STA RH,Y
    PLX
    PLA
    STA IPH
    BRA DONEXT0

BruceRMcF · Post by **BruceRMcF** » Thu Feb 22, 2024 6:39 pm

I've had a closer look at keeping it "more or less minimal" (as opposed to the bleeding edge minimalist of the SectorForth / MilliForth implementations for PC-DOS), but more closely aligned with the actual implementation model, and what I'm coming around to for the stack operations is to look at the R stack, S stack and TOS register as three distinct sources or destinations, and focus the primitives on the operations on one, or between two of them.

So, for example, >R is not a primitive, because with this stack implementation, you decrement the RNDX, copy from TOS to R, and then pull from S into TOS ... so all three are involved in the operation.

The primitive is "DUP>R", where you decrement RNDX, and copy from TOS to R, and that's that. Pulling from S to TOS is already provided by "DROP".

Code: Select all

; operations on one stack
NIP: ; NIP ( a b -- b )
    PLA
    PLA
    JMP DONEXT

RDROP: ; ( R: a -- R: )
    INC RNDX
    JMP DONEXT

DROP:
    PLA
    STA T
    PLA
    STA TH
    JMP DONEXT

DUP:
    LDA TH
    PHA
    LDA T
    PHA
    JMP DONEXT

DUP2R:
    DEC RNDX
    LDY RNDX
    LDA T
    STA RL,Y
    LDA YH
    STA RH,Y
    JMP DONEXT

DROPRTO ; DROP-R@ ( a R: b -- b R: )
    LDY RNDX
    LDA RL,Y
    STA T
    LDA RH,Y
    STA TH
    JMP DONEXT

SCND2R: ; 2ND>R ( a b R: -- b R: a )
    DEC RNDX
    LDY RNDX
    PLA
    STA RL,Y
    PLA
    STA RH,Y
    JMP DONEXT

RF2SCND: ; R@>2ND ( a R: b -- b a R: )
    LDY RNDX
    LDA RH,Y
    PHA
    LDA RL,Y
    PHA
    JMP DONEXT

Then I'm thinking that you have something like:

: SWAP ( a b -- b a ) 2ND>R DUP DROP-R@ RDROP ;
: R@ ( R: a -- a R: a ) DUP DROP-R@ ;
: R> ( R: a -- a R: ) DUP DROP-R@ RDROP ;
: >R ( a -- R: a ) DUP>R DROP ;
: ROT ( a b c -- b c a ) DUP>R 2ND>R DROP R@>2ND RDROP R@>2ND RDROP ;
: 2DROP ( a b -- ) NIP DROP ;
; OVER ( a b -- a b a ) 2ND>R DUP DROP-R@ RDROP ;
; 2DUP ( a b -- a b a b ) 2nD>R R@>2ND DUP R@>2ND RDROP ;

... however, part of the point of the SectorForth approach is that if a compiled word is not needed in a specific application, it can simply be omitted, to reduce the total footprint.

BruceRMcF · Post by **BruceRMcF** » Fri Feb 23, 2024 7:15 pm

Now, for logic and math:

Especially since I am going to have a "preliminary" hex literal evaluation if the token fails to be found as a word, I am fine with just "+" as the underlying primitive for 1+, 2+, -, 1- etc.

However, after working out / looking at the implementation of NOT, AND OR and XOR using NAND alone, I decided that I am OK with:

: NOT ( x1 -- x2 ) DUP NAND ;
: AND ( x1 x2 -- x3 ) NAND DUP NAND ;

... but the definition of OR and XOR with NAND alone was just too much for me, so I've added "OR" as a primitive, which then also allows a simpler XOR since:

Code: Select all

0 1 0 1 -- bit in x1
0 0 1 1 -- bit in x2
1 1 1 0 -- NAND(x1,x2)
0 1 1 1 -- OR(x1,x2)
0 1 1 0 -- x3 = AND(NAND(x1,X2),OR(x1,x2))

So rather than the theoretical minimum NAND (or alternatively NOR) from which to build all logic, I got with the more practical set of NAND and OR as primitives.

Another big change I've made is going from embedded dictionary to a block dictionary. Length+Flags Byte up front, then as many characters as the length byte says, then the code field address, then the preceding entry in the dictionary. LATEST points to the bottom of the dictionary block, and adding a new entry is done by subtracting two from LASTEST, writing the address in HERE there, then subtracting the length +1 from LATEST, writing the counted string there, and finally setting any flags if need be. On loading, the dictionary block would be written to the top of RAM, so LATEST grows down from the top of RAM while HERE grows up toward the top of RAM. That copy is simplified because the dictionary contain any addresses located within the dictionary itself -- it has no link fields, so there is no issue in whether the link field is absolute or a slower relative offset.

I'm allowing up to 31 byte names, I've got an immediate flag, I reserving a flag for compile-only, so I can use the third flag set with a 0 length to mark the end of the dictionary block. My notion is if I ever need to handing situations where a simple block dictionary isn't enough, a non-zero length field with the third flag set can be used to indicate those special cases, but I'm not going to spend any time elaborating that at the moment ... this implementation will simply BRK on third flag set and a non-zero length field.

EDIT: Scratch that, it turns out that with this dictionary, I need the third flag as a smudge bit. However, it also turns out that a simple NUL works just fine as an end of dictionary marker. Further, it turns out that an immediate flag with a nul length works just fine for a "bridge" over arbitrary data embedded in the dictionary, with the following integer being the address of the next dictionary entry, so I'm going with that approach.

SO I've got to rewrite the dictionary search to shift it from an embedded linked list to a vocabulary built on a dictionary block.

agsb · Post by **agsb** » Sat Aug 17, 2024 9:37 pm

The milliforth-6502 IS done.

Both Direct Thread Code (640 bytes)
and Minimal Thread Code (623 bytes)
models working with my_hello_work.FORTH.

Milliforth is a reduced sectorforth, both for x86, now ported for 6502.

Using ca65 compiler and run6502 emulator.

All tips are welcome.

https://github.com/agsb/milliforth-6502

The overhead of MTC over DTC IS about 0.23% for instructions and 1.59% for cycles, in absolute counters, compiling the file my_hello_world.FORTH.

BruceRMcF · Post by **BruceRMcF** » Thu Mar 27, 2025 9:10 pm

agsb wrote:

The milliforth-6502 IS done.

Both Direct Thread Code (640 bytes)
and Minimal Thread Code (623 bytes)
models working with my_hello_work.FORTH.

Milliforth is a reduced sectorforth, both for x86, now ported for 6502.

Using ca65 compiler and run6502 emulator.

All tips are welcome.

https://github.com/agsb/milliforth-6502

The overhead of MTC over DTC IS about 0.23% for instructions and 1.59% for cycles, in absolute counters, compiling the file my_hello_world.FORTH.

Good job. This might be my excuse to learn the ca65 assembly system.

agsb · Post by **agsb** » Tue May 13, 2025 9:36 pm

we are in hackaday

https://hackaday.com/2025/04/20/millifo ... nt-8120803)

BruceRMcF · Post by **BruceRMcF** » Thu May 15, 2025 1:28 pm

agsb wrote:

we are in hackaday

https://hackaday.com/2025/04/20/millifo ... nt-8120803)

Excellent, and well deserved.

I'm still looking to possibly porting that to the C64 and the Commander X16 project if I can get my restarted xForth implementation up and running.

It seems like it just needs the dictionary to start at $810 so that it can be loaded with a C64 style Basic stub. The data/return/tib space might move to one of the four Golden RAM pages at $400 for the CX16, and for maximum dictionary space for the C64, after switching out the Basic ROM, perhaps to the end of C64 Golden RAM at $CF00,

Plus it already has 6502 and 65C02 versions, so I don't have to backport from 65C02 or port up to 65C02.

For C64 PRG files, a sector is 254 bytes, plus two bytes overhead in the first sector for the load address, so this would technically be a 2 Sector Forth.

v6ops · Post by **v6ops** » Tue Mar 24, 2026 1:26 pm

agsb wrote:

we are in hackaday

https://hackaday.com/2025/04/20/millifo ... nt-8120803)

FYI I took this repo, forked it and then tweaked it so that it would run on a Grant Searle inspired single board clone of the UK101 / Ohio Scientific Challenger 1E with CEGMON (my first computer 45 years ago). I had to tweak a lot of the memory locations to avoid conflicts with CEGMON variable e.g. cursor position etc. Tests seem to run OK.

Result is at https://github.com/v6ops/milliForth-650 ... tree/uk101

regards,

BigEd · Post by **BigEd** » Mon Mar 30, 2026 3:36 pm

(Welcome!)

A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502

Re: A milliforth for 6502