... it is, AFAICT, for the 8086, so that, just like SectorForth, it can be executed from a MS-DOS floppy boot sector.
A milliforth for 6502
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: A milliforth for 6502
BruceRMcF wrote:
Note:
... it is, AFAICT, for the 8086, so that, just like SectorForth, it can be executed from a MS-DOS floppy boot sector.
... it is, AFAICT, for the 8086, so that, just like SectorForth, it can be executed from a MS-DOS floppy boot sector.
Quote:
... while I can read Z80 assembly language, I can rarely make out '86 assembly language -- even the more approachable '86 assembly language of the 8086-80386. But as SectorForth is open source and I am not fussed about releasing under the same open source license, that's just an observation rather than an IP issue.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: A milliforth for 6502
barrym95838 wrote:
Does a tiny token-threaded 4th-cousin of Forth belong in the Forth sub-forum, or the generic programming sub-forum?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: A milliforth for 6502
GARTHWILSON wrote:
barrym95838 wrote:
Does a tiny token-threaded 4th-cousin of Forth belong in the Forth sub-forum, or the generic programming sub-forum?
Of course, as noted further up in this thread, like all minimal Forths, it will sacrifice operating speed compared to a full fledged implementation, and since it is substantially more minimal than eForth, it is likely to be substantially slower than eForth as well.
Since I am not pursuing the "smallest possible" prize (and I believe those in the chase are doing it more for the honor of the prize than for the honorarium of $0.00), I don't really mind if I end up with something that takes a whole 2KB to get to the point of being able to interpret file scripts and to save the result in an executable image.
Edit: Even putting a headless DOLIT in the loadtime section of my source, adding the STATE smart handling of literals pushed the runtime back over 2 binary pages. However, the whole assembled binary remains under five binary pages.
Re: A milliforth for 6502
What I aim to do with this is to debug it to see if I can make it work ... which won't be happening until after Thanksgiving week, so this brief flurry of activity will pause ... and if I can, to convert it from the SectorForth to something more like a subset of Camel Forth, with RP@ RP! SP@ and SP! so that I can dump the clunky "stacks as indexes and also as pointer" approach that SectorForth requires.
Then I can convert to having a return stack in the ZP and a data stack with a TOS in the zero page and rest of the data stack on the hardware stack. That works for the DOLIST that I already have here, which relies on JSR DOLIST to get the list address onto the hardware stack and avoids juggling if it just pushes the IP into the return stack and then pops the return vector, incrementing it along the way to convert it to a jump vector. I can also add a more reasonable wordset for a more efficient minimal Forth ... including */ as a base for defining multiply and divide words when needed.
Then I can convert to having a return stack in the ZP and a data stack with a TOS in the zero page and rest of the data stack on the hardware stack. That works for the DOLIST that I already have here, which relies on JSR DOLIST to get the list address onto the hardware stack and avoids juggling if it just pushes the IP into the return stack and then pops the return vector, incrementing it along the way to convert it to a jump vector. I can also add a more reasonable wordset for a more efficient minimal Forth ... including */ as a base for defining multiply and divide words when needed.
Re: A milliforth for 6502
I've picked this up again. I didn't get it up beyond showing the prompt, and wasn't interested enough in actually having a pure SectorForth model forth to dig into it.
So what I'm doing is abandoning the "pure" SectorForth, using rp@ rp! sp@ sp! and aim to climb up the compiled Sector Forth model to define as primitives the things that required the return stack pointer and stack pointer to be an actual integer pointer.
I am using a "JMP (addr,X)" direct threaded model where the operand of the JMP is the zero page Instruction "base" address. I walk the X forward until it passes $7F, then update the IP base address if it gets there ... but since the IP base address gets updated with every call to high level compiled code and every DOLIT operation, and well written Forth words are rarely over 127 bytes long anyway, it should only very rarely require the "half-page fix" part of the routine.
DONEXT can be reduced to 15 clock cycles, including the three clock jmp overhead, if the entire DONEXT routine is placed in the zero page.
An alternative model is the split pointer model, where the low byte of IP is $00 and X holds the low byte address, but the trade-off is that you need to check for page overflow with each INX, so you simplify and speed up EXIT and DOLIST but add two cycles to the DONEXT routine.
I think the split-pointer version goes:
So what I'm doing is abandoning the "pure" SectorForth, using rp@ rp! sp@ sp! and aim to climb up the compiled Sector Forth model to define as primitives the things that required the return stack pointer and stack pointer to be an actual integer pointer.
I am using a "JMP (addr,X)" direct threaded model where the operand of the JMP is the zero page Instruction "base" address. I walk the X forward until it passes $7F, then update the IP base address if it gets there ... but since the IP base address gets updated with every call to high level compiled code and every DOLIT operation, and well written Forth words are rarely over 127 bytes long anyway, it should only very rarely require the "half-page fix" part of the routine.
DONEXT can be reduced to 15 clock cycles, including the three clock jmp overhead, if the entire DONEXT routine is placed in the zero page.
An alternative model is the split pointer model, where the low byte of IP is $00 and X holds the low byte address, but the trade-off is that you need to check for page overflow with each INX, so you simplify and speed up EXIT and DOLIST but add two cycles to the DONEXT routine.
Code: Select all
; in ZP
JMPIPX:
JMP ($FFFC,X)
IP = JMPIPX+1
IPH = IP+1
; ...
; in Golden RAM, LowRAM or HighRAM
DONEXT: ; 18 clock cycles including JMP DONEXT in word
INX
INX
BMI +
- JMP JMPIPX
+ TXA
CLC
ADC IP
STA IP
BCC -
INC IPH
BNE -
BRK
EXIT:
LDY RNDX ; Y is a free register
INC RNDX ; in Zero page
LDA RL,Y
STA IP
LDA RH,Y
STA IPH
LDX #2
JMP JMPIPX
DOLIST: ; JSR DOLIST precedes compiled list
DEC RNDX ; pre-decrement Y
LDY RNDX
TXA
CLC
ADC IP
STA RL,Y
LDA #0
ADC IPH
STA RH,Y
PLA
STA IP
PLA
STA IPH
LDX #1 ; return vector is call address -1
JMP JMPIPX
; ...
AND:
PLA
AND T
STA T
PLA
AND TH
STA TH
JMP DONEXT
Code: Select all
; in ZP
JMPIPX:
JMP ($FFFC,X)
IP = JMPIPX+1
IPH = IP+1
; ...
; in Golden RAM, LowRAM or HighRAM
DONEXT: ; 18 clock cycles including JMP DONEXT in word
INX
BEQ +
DONEXT0:
INX
BEQ ++
- JMP JMPIPX
+ INX
++ INC IPH
JMP JMPIPX
EXIT:
LDY RNDX ; Y is a free register
INC RNDX ; RNDX in Zero page
LDA RL,X
CLC
ADC #2
TXA
LDA RH,Y
ADC #0
STA IPH
JMP JMPIPX
DOLIST: ; JSR DOLIST precedes compiled list
DEC RNDX ; pre-decrement Y
LDY RNDX
STX RL,Y
LDA IPH
STA RH,Y
PLX
PLA
STA IPH
BRA DONEXT0
Re: A milliforth for 6502
I've had a closer look at keeping it "more or less minimal" (as opposed to the bleeding edge minimalist of the SectorForth / MilliForth implementations for PC-DOS), but more closely aligned with the actual implementation model, and what I'm coming around to for the stack operations is to look at the R stack, S stack and TOS register as three distinct sources or destinations, and focus the primitives on the operations on one, or between two of them.
So, for example, >R is not a primitive, because with this stack implementation, you decrement the RNDX, copy from TOS to R, and then pull from S into TOS ... so all three are involved in the operation.
The primitive is "DUP>R", where you decrement RNDX, and copy from TOS to R, and that's that. Pulling from S to TOS is already provided by "DROP".
Then I'm thinking that you have something like:
: SWAP ( a b -- b a ) 2ND>R DUP DROP-R@ RDROP ;
: R@ ( R: a -- a R: a ) DUP DROP-R@ ;
: R> ( R: a -- a R: ) DUP DROP-R@ RDROP ;
: >R ( a -- R: a ) DUP>R DROP ;
: ROT ( a b c -- b c a ) DUP>R 2ND>R DROP R@>2ND RDROP R@>2ND RDROP ;
: 2DROP ( a b -- ) NIP DROP ;
; OVER ( a b -- a b a ) 2ND>R DUP DROP-R@ RDROP ;
; 2DUP ( a b -- a b a b ) 2nD>R R@>2ND DUP R@>2ND RDROP ;
... however, part of the point of the SectorForth approach is that if a compiled word is not needed in a specific application, it can simply be omitted, to reduce the total footprint.
So, for example, >R is not a primitive, because with this stack implementation, you decrement the RNDX, copy from TOS to R, and then pull from S into TOS ... so all three are involved in the operation.
The primitive is "DUP>R", where you decrement RNDX, and copy from TOS to R, and that's that. Pulling from S to TOS is already provided by "DROP".
Code: Select all
; operations on one stack
NIP: ; NIP ( a b -- b )
PLA
PLA
JMP DONEXT
RDROP: ; ( R: a -- R: )
INC RNDX
JMP DONEXT
DROP:
PLA
STA T
PLA
STA TH
JMP DONEXT
DUP:
LDA TH
PHA
LDA T
PHA
JMP DONEXT
DUP2R:
DEC RNDX
LDY RNDX
LDA T
STA RL,Y
LDA YH
STA RH,Y
JMP DONEXT
DROPRTO ; DROP-R@ ( a R: b -- b R: )
LDY RNDX
LDA RL,Y
STA T
LDA RH,Y
STA TH
JMP DONEXT
SCND2R: ; 2ND>R ( a b R: -- b R: a )
DEC RNDX
LDY RNDX
PLA
STA RL,Y
PLA
STA RH,Y
JMP DONEXT
RF2SCND: ; R@>2ND ( a R: b -- b a R: )
LDY RNDX
LDA RH,Y
PHA
LDA RL,Y
PHA
JMP DONEXT : SWAP ( a b -- b a ) 2ND>R DUP DROP-R@ RDROP ;
: R@ ( R: a -- a R: a ) DUP DROP-R@ ;
: R> ( R: a -- a R: ) DUP DROP-R@ RDROP ;
: >R ( a -- R: a ) DUP>R DROP ;
: ROT ( a b c -- b c a ) DUP>R 2ND>R DROP R@>2ND RDROP R@>2ND RDROP ;
: 2DROP ( a b -- ) NIP DROP ;
; OVER ( a b -- a b a ) 2ND>R DUP DROP-R@ RDROP ;
; 2DUP ( a b -- a b a b ) 2nD>R R@>2ND DUP R@>2ND RDROP ;
... however, part of the point of the SectorForth approach is that if a compiled word is not needed in a specific application, it can simply be omitted, to reduce the total footprint.
Re: A milliforth for 6502
Now, for logic and math:
Especially since I am going to have a "preliminary" hex literal evaluation if the token fails to be found as a word, I am fine with just "+" as the underlying primitive for 1+, 2+, -, 1- etc.
However, after working out / looking at the implementation of NOT, AND OR and XOR using NAND alone, I decided that I am OK with:
: NOT ( x1 -- x2 ) DUP NAND ;
: AND ( x1 x2 -- x3 ) NAND DUP NAND ;
... but the definition of OR and XOR with NAND alone was just too much for me, so I've added "OR" as a primitive, which then also allows a simpler XOR since:
So rather than the theoretical minimum NAND (or alternatively NOR) from which to build all logic, I got with the more practical set of NAND and OR as primitives.
Another big change I've made is going from embedded dictionary to a block dictionary. Length+Flags Byte up front, then as many characters as the length byte says, then the code field address, then the preceding entry in the dictionary. LATEST points to the bottom of the dictionary block, and adding a new entry is done by subtracting two from LASTEST, writing the address in HERE there, then subtracting the length +1 from LATEST, writing the counted string there, and finally setting any flags if need be. On loading, the dictionary block would be written to the top of RAM, so LATEST grows down from the top of RAM while HERE grows up toward the top of RAM. That copy is simplified because the dictionary contain any addresses located within the dictionary itself -- it has no link fields, so there is no issue in whether the link field is absolute or a slower relative offset.
I'm allowing up to 31 byte names, I've got an immediate flag, I reserving a flag for compile-only, so I can use the third flag set with a 0 length to mark the end of the dictionary block. My notion is if I ever need to handing situations where a simple block dictionary isn't enough, a non-zero length field with the third flag set can be used to indicate those special cases, but I'm not going to spend any time elaborating that at the moment ... this implementation will simply BRK on third flag set and a non-zero length field.
EDIT: Scratch that, it turns out that with this dictionary, I need the third flag as a smudge bit. However, it also turns out that a simple NUL works just fine as an end of dictionary marker. Further, it turns out that an immediate flag with a nul length works just fine for a "bridge" over arbitrary data embedded in the dictionary, with the following integer being the address of the next dictionary entry, so I'm going with that approach.
SO I've got to rewrite the dictionary search to shift it from an embedded linked list to a vocabulary built on a dictionary block.
Especially since I am going to have a "preliminary" hex literal evaluation if the token fails to be found as a word, I am fine with just "+" as the underlying primitive for 1+, 2+, -, 1- etc.
However, after working out / looking at the implementation of NOT, AND OR and XOR using NAND alone, I decided that I am OK with:
: NOT ( x1 -- x2 ) DUP NAND ;
: AND ( x1 x2 -- x3 ) NAND DUP NAND ;
... but the definition of OR and XOR with NAND alone was just too much for me, so I've added "OR" as a primitive, which then also allows a simpler XOR since:
Code: Select all
0 1 0 1 -- bit in x1
0 0 1 1 -- bit in x2
1 1 1 0 -- NAND(x1,x2)
0 1 1 1 -- OR(x1,x2)
0 1 1 0 -- x3 = AND(NAND(x1,X2),OR(x1,x2))
Another big change I've made is going from embedded dictionary to a block dictionary. Length+Flags Byte up front, then as many characters as the length byte says, then the code field address, then the preceding entry in the dictionary. LATEST points to the bottom of the dictionary block, and adding a new entry is done by subtracting two from LASTEST, writing the address in HERE there, then subtracting the length +1 from LATEST, writing the counted string there, and finally setting any flags if need be. On loading, the dictionary block would be written to the top of RAM, so LATEST grows down from the top of RAM while HERE grows up toward the top of RAM. That copy is simplified because the dictionary contain any addresses located within the dictionary itself -- it has no link fields, so there is no issue in whether the link field is absolute or a slower relative offset.
I'm allowing up to 31 byte names, I've got an immediate flag, I reserving a flag for compile-only, so I can use the third flag set with a 0 length to mark the end of the dictionary block. My notion is if I ever need to handing situations where a simple block dictionary isn't enough, a non-zero length field with the third flag set can be used to indicate those special cases, but I'm not going to spend any time elaborating that at the moment ... this implementation will simply BRK on third flag set and a non-zero length field.
EDIT: Scratch that, it turns out that with this dictionary, I need the third flag as a smudge bit. However, it also turns out that a simple NUL works just fine as an end of dictionary marker. Further, it turns out that an immediate flag with a nul length works just fine for a "bridge" over arbitrary data embedded in the dictionary, with the following integer being the address of the next dictionary entry, so I'm going with that approach.
SO I've got to rewrite the dictionary search to shift it from an embedded linked list to a vocabulary built on a dictionary block.
Re: A milliforth for 6502
The milliforth-6502 IS done.
Both Direct Thread Code (640 bytes)
and Minimal Thread Code (623 bytes)
models working with my_hello_work.FORTH.
Milliforth is a reduced sectorforth, both for x86, now ported for 6502.
Using ca65 compiler and run6502 emulator.
All tips are welcome.
https://github.com/agsb/milliforth-6502
The overhead of MTC over DTC IS about 0.23% for instructions and 1.59% for cycles, in absolute counters, compiling the file my_hello_world.FORTH.
Both Direct Thread Code (640 bytes)
and Minimal Thread Code (623 bytes)
models working with my_hello_work.FORTH.
Milliforth is a reduced sectorforth, both for x86, now ported for 6502.
Using ca65 compiler and run6502 emulator.
All tips are welcome.
https://github.com/agsb/milliforth-6502
The overhead of MTC over DTC IS about 0.23% for instructions and 1.59% for cycles, in absolute counters, compiling the file my_hello_world.FORTH.
Re: A milliforth for 6502
agsb wrote:
The milliforth-6502 IS done.
Both Direct Thread Code (640 bytes)
and Minimal Thread Code (623 bytes)
models working with my_hello_work.FORTH.
Milliforth is a reduced sectorforth, both for x86, now ported for 6502.
Using ca65 compiler and run6502 emulator.
All tips are welcome.
https://github.com/agsb/milliforth-6502
The overhead of MTC over DTC IS about 0.23% for instructions and 1.59% for cycles, in absolute counters, compiling the file my_hello_world.FORTH.
Both Direct Thread Code (640 bytes)
and Minimal Thread Code (623 bytes)
models working with my_hello_work.FORTH.
Milliforth is a reduced sectorforth, both for x86, now ported for 6502.
Using ca65 compiler and run6502 emulator.
All tips are welcome.
https://github.com/agsb/milliforth-6502
The overhead of MTC over DTC IS about 0.23% for instructions and 1.59% for cycles, in absolute counters, compiling the file my_hello_world.FORTH.
Re: A milliforth for 6502
agsb wrote:
I'm still looking to possibly porting that to the C64 and the Commander X16 project if I can get my restarted xForth implementation up and running.
It seems like it just needs the dictionary to start at $810 so that it can be loaded with a C64 style Basic stub. The data/return/tib space might move to one of the four Golden RAM pages at $400 for the CX16, and for maximum dictionary space for the C64, after switching out the Basic ROM, perhaps to the end of C64 Golden RAM at $CF00,
Plus it already has 6502 and 65C02 versions, so I don't have to backport from 65C02 or port up to 65C02.
For C64 PRG files, a sector is 254 bytes, plus two bytes overhead in the first sector for the load address, so this would technically be a 2 Sector Forth.
Re: A milliforth for 6502
agsb wrote:
Result is at https://github.com/v6ops/milliForth-650 ... tree/uk101
regards,
Re: A milliforth for 6502
(Welcome!)