I've just remembered one of the observations about the teensy vs the pico: the teensy's pinout, in the versions we looked at, don't make it easy to drive or sample say a 16 bit bus in one go. There's more shifting and masking than would be ideal. That said, with a 600MHz processor overclockable to 900MHz, perhaps a few extra instructions isn't a big deal.
Yes, I used this shifting and masking on the MCL65+ to improve the Teensy 4.1's IO timing:
Thanks for pointing that out, Ed and Ted. Seeing Ted's code is actually encouraging for me: If one can get away with that, then it should not be a problem to use the Teensy CPU in an accelerator design which is meant to support different emulation targets with different pinouts.
Ted, I looked at your
code for the MCL65+ in github and noticed that you even do all the address shuffling
after having detected the positive clock edge. For write cycles you then send the data byte as eight separate bits too. And that still works nicely with the Apple's 1 MHz clock, right?
That is quite reassuring. I would trust that with some optimization (preparing as much as possible before waiting for the clock edge, and using assembler code if required) I should be able to keep up with the faster host clocks used e.g. in many chess computers -- 5 MHz 65C02s were commonly used in the late 1980s.