Just received my Teensy 4.1. I will probably start with some simple timing experiments using embedded assembler in the Arduino environment -- with all the caching, pipelining and branch prediction in the M7, how quick can I actually react to an incoming clock edge? In parallel I will dig into the proper NXP documentation and IDE to figure out how to program this beast on a low level. Let's see how that goes; my memories of the first steps (and the secod and third steps...) with the STM32 IDE are not that fond.
I will certainly be back with more questions!