I found a link which indicates the 6502 at 1MHz gets
32 Dhrystone 2.1 per second (you'll need to scroll way down on the page to see the table), or
36 Dhrystone 1.1 per second.
We can thus ball-park the 65816's performance at 20MHz. If we execute the same 6502 binary as-is, we should receive a benchmark of 640 Dhry/s. Taking a conservative estimate of 1.7 factor performance gain by switching to a compiler with the same overall strategy but which emits 65816-native instructions in 16-bit mode, we can reasonably expect a performance of 1088 dhry/s. With more aggressive optimizations and compilation techniques, you can probably pull higher figures.
Update Strange -- according to the Apple II FAQ, the 65816 @ 2.8MHz exhibits a 236 Dhry(1.1)/s performance. This is some
6.5x better than a 1MHz 6502. It seems my conservative estimate above may be too small in the general case; it looks like the real scaling factor is closer to 2.3. However, the 65816 accelerators discussed in the FAQ do not scale well; perhaps because they're operating with a caching mechanism over the 1MHz Apple II bus? It would seem to me that a 20MHz 65816 should be capable of 1685, yet the FAQ implies smaller figures. In either case, though, the combination of 65816 and C yields much slower results than a 68000 and C. With an estimated 1685 dhry/s for a 20MHz 65816 and an estimated 2848 for a 20MHz 68000 (based on the largest 12.5MHz 68000 rating), the 65816 seems about 1.7 times slower.
This near factor of two leads me to believe it's because of 16-bit vs. 32-bit operations. But, we knew this was the case right from the start.
If anyone has a working C compiler for the 65816 target, I'd be very interested in seeing what confirmation or refutation comes from running Dhrystone benchmark.
I recognize that Dhrystone has been obsoleted by more relevant benchmarks within the last decade or two. Nonetheless, Dhrystone remains useful for ballparking one CPU/compiler combination against another, at least for integer performance.