I think something similar would be somewhere here on this site for sure, but I did not found it.
I am writing my own FORTH for atmega2560 and I have to decide, how to implement stack, where I want to keep TOS in register(s) for better speed, but I am not sure, how to manage the state, when the stack is empty. (As opposite of stack with 1 value inside (just registers) or more values (registers and normal stack).) I am able to write the code, but I do not know, what approach I want/can/should choose and I would like read something about it, how others solved such problems and why. (I know atmega2560 is not 65*02, but my question is about FORTH and philosophy, examples on 6502 are welcome and readable for me.)
- I may keep some flag, if stack is empty or not, but that mean to update it everytime again and again at speed penalty.
- I may use some "off one" value "outside of stack", which will be filled with "nonexistent previous value", when first push is done and which will be "poped in" when last pop is done, which mean somehow fake all stacks checks.
- I may implement circular stack and just pretend, that the start is in middle of something, so no underflow/overflow will be even possible, just undefined values on underflow and lose of part of history on overflow. (and there will be the part problem with 3bytes values, which are not atomic and wraping it around)
More background:
I am building my own 8bit computer around HD6309 (and maybe more varinants with 65*02 and Z80 too) and I want it to be able use VGA+keyboard, so I am building extra graphic card with ATmega2560 to provide this as simplified IO. I am developing the HW for this "graphic card" mainly here https://github.com/githubgilhad/MegaHomeFORTH and it can also work as SBC on its own and I found extremly usefull to use FORTH for interactive testing and "poking legs" / manipulating IO pins.
I first implemented something inspired by JonesForth but implemented in mix of C/C++/.ino/asm and with lot of debugging tools inside (memory dumps, range checking ...) and it somehow works, but I have constant struggle with data sizes (cell=16b, address=24b, double=32b, C uses __memx poniters, but C++ can use only uint32_t ...).
Also I am trying to have as much words in FLASH (~ROM) to save as much RAM as possible.
On atmega access to FLASH(~ROM) needs different instructions than access to RAM and program may be executed only from FLASH. So I am trying to write new, better implementation of FORTH in assembler (mainly), where everything will be 24bits/3bytes in size and rotines/macros will manage all important parts, where I know what I want to do, but C/C++ is not convenient for it. https://github.com/githubgilhad/memxFORTH-asm. I want to implement this one as fast running, as I hope to use it as part of firmware for the graphic card, which could be enhanced in runtime (where the 6309/6502/Z80 will send some routines to "atmega2560 coprocesor", both graphics and others to do in paralel). And generating VGA signal takes like 90% of time, for "normal work" can be used only blanks and borders, so speed is valued.
And later I will probably use the FORTH for the main (6309/6502/Z80) procesor too. It will be ofcurse written anew for the procesor, but probabely on the proven principles from its many predecessors
(I mean the variant for atmega328, the memxFORTH-core, the MegaHomeFORTH, the testing variant for PC, the memxFORTH-asm variant ... will continue with 6309 variant, 6502 variant and so ...)