The STA N/LDY N pairs are starting to add up and will eventually be more of a speed penalty than the TAY in NEXT for every word.
Do you have a reason for using N like that? What's preventing you from replacing those STA N/LDY N pairs with a simple TAY in the words that need it?
The STA N / LDY N pairs are only in the routines that required the use of both the Acc and Y-reg. Also with any definitions that require a third input, N will always be needed. There are also instances where the Acc is an address, and to read that address, the Acc has to be temporarily stored in N.
I am far enough along with the source that I will start explaining a lot of these types of details in the other thread "Converting Forth to Assembly". Any questions, comments and suggestions can be directed over there. I am now fully confident I can convert Forth to Assembly in its entirety. Do I want to? Probably not as some Assembly words will be much larger than the Forth counterpart. But I will have fun trying.