My problem is in the bank switching I know now. It is switching as shown below, but the timing must be off. It's not reading $0000 correctly. The scope pattern never changes when I try to lengthen or shorten the loop reading from $0000. For now I rest. Tomorrow a new day!
As far as I understand it you're scoping the time between the STA $0000 and the LDA $0000: a write, three reads in ROM, and a read. What you're not measuring is the time for the actual loop, as the loop happens in the part that's "off-screen" on your scope.
Actually ... at 2.5MHz, 5 cycles should be 5 times 400ns, i.e. 2us, not 1600ns. It actually looks like you're measuring 4 cycles, two RAM accesses with 2 other accesses between them.
Anyway, try to increase the time base to include the loop. A loop with 32 iterations is about 160 cycles, which at 2.5MHz makes about 65us (or about 16us for 10MHz). Both times, 65us resp. 16us are too long for the scope setting in the included pictures. If you increase the time base to show at least one full iteration, you can then decide whether the loop works or not.
One thing that people sometimes do is to use an "EA" simulator. put the CPU in an adapter, and connect the data bus fixed to "EA" (disconnect IRQ, NMI, maybe use r/-w to qualify). The CPU then counts through all addresses (two cycles for each address) and you can more easily scope the repetitive signals on the address lines. Haven't tried that yet ... (at least in hardware. My VHDL test actually uses this method :-)
Personally, I have an adapter to plug into the CPU socket, where I can statically set the address bus of the CPU and check the address decoding easily. Even includes a jumper for the r/-w and a push button for phi2 :-) That's a very crude CPU simulator, but might work.
On a side note...: Recently I have used a 65816 board, plugged into the 6502 socket of a Commodore PET4032, that runs in the board's RAM and ROM. So I can use decent test programs loaded via the PET's I/O to test the actual behavior of the (broken) 4032. Thinking about it, even earlier (in the 90ties) I have actually used my CPU emulator board for my CS/A computer to replace the CPU of a VC1541. The VC1541's memory just became a 64k window in the 1MB CS/A memory. I could replace and test ROMs for example. I didn't have to fix the VC1541 then though.
Hope that helps,
André