My very first embedded Forth was on the R6511AQ chip where I bought and cloned the RSC-Forth ROMs and modified the machine code by hand. I think I could still read R6502 machine code to this day
RSC-Forth was based on Fig Forth but used vectors for (EMIT) and (KEY) and also (KEY?) IIRC. I used this extensively in my POS terminal designs where output would be redirected to displays and printers and even video. When I eventually created my own fresh source years later it was for the 65C816 (and then for the 65C02 for a volume product). Because I was used to using vectors and they were so useful, I used the same technique in my Forth. One thing I did was standardize the character coding so that for example a "clear screen" character would work the same across all output devices. I later enhanced this Forth to work with the M37702 which was a super-set of the 65C816, and used this MCU in many products for many years.
Nowadays I mainly work with the Parallax Propeller chip, especially the new P2 and this technique has been carried through from all my embedded products. I find though that some of the complicated command channel stuff I did by sending data to the (KEY) vector (one of which was to actually read input) etc is no longer necessary. I just have (EMIT) and (KEY) but KEY always returns a null if there is no input, and if it a raw binary device that could send a 00 then I simply OR it with $100. So KEY is non-blocking and instant, it is never trying to wait for input.
With (EMIT) it is always a blocking emit because I find that no matter how much buffering I might have, it is easy to fill that buffer so most of my output stuff is no buffered and way more efficient, especially if I use high baud rate for serial etc.
So the main takeaway is that there is no KEY? word, just KEY, and if it returns a zero then there is no input. However, KEY and especially EMIT are normally two good places for PAUSE because most I/O is inherently slow.
So, one day I will like to revisit the 65C02/65C816 and write a brand new Forth that embodies many of the techniques and things I have learned over the years.
BTW, there is no PAUSE in my multi-core Tachyon Forth on the Parallax P1 and P2 simply because each task can have its own CPU.