For a memory read, the data bus is driven by the memory, so the delay between the addresses stabilising and the new value being driven is a function of the memory speed (as well as the circuit designer's choice as to how to control the Output Enable.)
In the case of visual6502, it looks like we're simulating a very fast memory, without any special OE control, so the data appears on the bus immediately. (In fact, visual6502 isn't trying at all hard to model the external circuit: this is just the simplest choice that works.)
But to compare this with a 6502 datasheet, from that point of view what matters is when the 6502 is sensitive to the updated value on the databus. In particular, a designer wants to know what's the latest that their memory system can deliver data and still have the 6502 accept it and use it in that cycle.
So a typical timing diagram will usually show the data arriving late, towards the end of the clock cycle, with the constraint being annotated as to how late it can arrive.
Hopefully, there's no conflict between those two points of view: the visual6502 runs with a very fast memory, and the timing diagrams will usually show a very slow memory.
See this thread for more:
Timing Diagrams. Visualizing 65xx Timing.
(The particular wording you give from the hardware manual seems potentially confusing to me: that's no great surprise, because it's difficult to describe this area succinctly and accurately.)
Edit: Further to the right in your trace capture, we see the 'pd' value does indeed change in phi2 - that's the predecode value, just before the new instruction gets captured in the IR. The predecode value is where a BRK can be jammed in in the case of an interrupt being taken. I wrote more about that here:
Notes on the 6502 "predecode" block