Your hardware and software is almost everything I wanted to achieve.
If you want further quick wins for acceleration, I suggest auto-identification of the host and automatic breakpoints for the host's integer and floating point multiply routines. In the trivial case, JSR to a fixed address would reduce to one multiply. However, that fixed address is specific to each version of each host and may be affected by page bank registers which also vary by host.
You might want to implement big-banged cell networking between 6502 hosts. This would be a spiritual successor to 6854 used in AppleTalk and Acorn's EcoNet (unsuccessfully licensed by Commodore). Indeed, it would be quite special to see a common network protocol running across Commodore, Apple, Atari and Acorn. It should be possible to implement a common network interface using, for example, JSR $FFFE as another hook. From here, it should be fairly easy to implement text chat or similar.
If you could make a USD50 Commander X16, Foenix C256 or any reasonable subset thereof, that would be extremely popular. The Commander has a trivial page banking scheme and is loosely compatible with VIC20. The Feonix has a memory mapped FPU. Your implementation is likely to be faster and cheaper.
MicroCoreLabs on Sat 9 Jan 2021 wrote:
I first explored using a Teensy4.0 which is a smaller board that has less IO's, but would have needed multiplexed address and/or data signals in addition to having a bidirectional data bus.
My first message to this forum contains an outline solution for the necessary multiplexing. (
Circuit.
State machine. Apologies in advance for color highlights. The meta-data shows that I drew these diagrams six months before I joined the forum. I have subsequently seen superfluous use of color mentioned at least five times.)
Ignoring SYNC and RDY, the minimal implementation requires 11 microcontroller pins and seven chips: one buffer chip for interrupt lines, two (or more) latches for address, latch and buffer for data, one 74x138 to orchestrate and one glue logic chip. With one additional microcontroller pin, one 74x138 and one glue logic chip, it is possible to implement an additional 8 bit data bus with 64 bit address-space or larger. Four or more variants may use common firmware.
Curiously, from your research, the larger Teensy 4.1 may be preferable for 8 bit implementation and the smaller Teensy 4.0 may be preferable for 32 bit or 64 bit extension. Either way, I strongly recommend your work to randyhyde as the basis of a
32 bit or 64 bit extension similar to my own.