MichaelM wrote:
The sbc instruction is implemented very simply. r = a + ~m + c. Essentially the implementation saves on an inverter in and and inverter out. The absence of a carry out causes the next operation in an sbc chain to be r = a + ~m + 0. This is the same as r = a - m - 1.
In the TOYF I went with a 6800-style subtract in which CF is the borrow, rather than a 6502-style subtract in which the ~CF is the borrow.
I did this so I could have MOV 0,CF used to prep both ADC and SBC --- I don't have a MOV 1,CF at all --- this is one fewer instruction that I need.
I'm willing to consider any kind of trick for reducing the complexity of the TOYF though --- I want it to fit in a small inexpensive FPGA --- as I said before, cost is often the only criteria that people have for choosing a processor (true when the 6502 was invented in the 1970s, and still true today).
On the PDP-11, JSR put the return-address in a register. This register could then access data compiled after the JSR in the calling function. This worked especially well with Forth DTC in which case that data was Forth threaded-code.
This didn't work on the 6502 because the return-address was on the return-stack rather than in a register, and it was 1- the address after the JSR in the calling function.
The PDP-11 was a pretty good design for supporting DTC Forth --- it was limited to 64KB total though --- this memory limitation was the primary reason why the PDP-11 died out.
Your M65c02A has support for both DTC and ITC with your IP and W registers. This is not a bad design. I would recommend however, that you have a 17-bit address-bus. Memory-access through IP would set the high-bit to 1 --- all other memory-access would set the high-bit to 0 --- this way all Forth threaded-code is in a 64KB bank above the main 64KB bank where the machine-code and application data are. You would likely only need 8KB for machine-code, so that leaves 56KB for application data, which is quite a lot. In your document you say that you are planning on some kind of memory-management scheme in the future --- I don't think this is necessary --- using the simple scheme I described above, you could support very large programs. Forth ITC code is pretty compact, and you can have as much as 64KB available for Forth threaded-code, so you can really have whopping big Forth programs. You would have 56KB of main-bank memory available for application data, which is adequate for pretty big programs.
In some cases, the program does not need 32-bit registers and does not need blazing speed --- the ARM Cortex gets used simply because the program and data are too big to fit in the 64KB that 8-bit processors such as the 65c02 etc. provide. Given the scheme I described above though, the M65c02A could possibly be used instead.
An interesting application would be a stenotype machine --- I would expect that your M65c02A would be capable of doing this, using the memory-management scheme I described above. An ordinary 64KB 65c02 might even be capable, as the size of the document being generated isn't all that great, but the 65c02 might be too slow (especially if programmed in Forth) --- the 65c816 would be yet another option.
This might be a moot argument though --- the ARM Cortex is pretty inexpensive --- unless the M65c02A or 65c816 provide a pretty good cost benefit, they would not be considered.
It is not clear to me what your goal with the M65c02A is --- what kind of applications are you expecting to use it for? --- considering that the 65c816 is largely ignored these days, why do you expect your M65c02A to gain traction?
I think 8-bit processors still have a future --- some kind of memory-management scheme is needed though --- the 64KB limitation of the 65c02 is a big problem.
If the 6502 designers had been somewhat more forward-thinking and had provided a 128KB system (possibly code in one 64KB bank and data in another 64KB bank), the 6502 era could have continued much longer than it did --- people switched to MS-DOS primarily because they got to break through the 64KB limitation --- the 4.77 Mhz. 8088 was actually slower in many cases than the 1 Mhz. 6510 used in the Commodore-64. It is not like the 6502 had a shortage of available pins and couldn't support a 17-bit address-bus --- the 6502 used one pin to set the V flag --- that was pretty useless, so this pin could have been used instead as the high-bit of a 17-bit address-bus.