cjs wrote:
It would be interesting to know if, once one has global history, how useful static analysis would be as an addition to that. Does global history get enough right on most workloads (and, in particular, on VTL02C
) that static analysis adds nothing to do that? Or does static analysis perhaps allow some savings in global history memory by allowing you to avoid recording certain branches that always have the same result?
I’m with you on this cjs, and thanks for the suggestion.
One thing to note is that always-taken BRA like branches are handled reasonably well by 2-bit prediction. If the initial assumption is wrong (as it would be for a forward branch), then it will be quickly corrected by the consistent behaviour of the branch thereafter. We therefore get at most two mis-predictions in the life of the branch, which is tolerable in the context of many iterations. (Consider that we are mainly concerned with branches which occur many times in the regular course, since those that don’t also won’t impact performance much either).
Still, we can try to do better. In general, we can observe that if the value of a flag is not changed by an instruction currently in the pipeline, then the flag is an excellent predictor of a branch which tests that flag. Conversely, all bets are off otherwise. Intuitively, it's not likely that the Z flag and N flags will survive very long in the pipeline unchanged, but the C and V flags should fare better. Still, it's hard to know how often things will play out favourably. We could build logic to check, of course, but as is usual in the pipeline, it's best to just go ahead and take our chances. So, rather than slowing down to make sure, we just predict based on the flag's value and move on.
As you suggest, this
Flags predictor is probably good as a supplement to a main predictor. In effect, we run two predictions in parallel, and then track which delivers the more accurate results for a given branch. The
Flags predictor will easily beat other approaches for those branches where the tested flag remains unchanged. Conversely, if there is no relationship between the eventual outcome of the branch and the value of the flag at the time we sample it, then this predictor will be duely demoted and quickly ignored. Either way, the overall prediction accuracy benefits from the combination.
And once again there is a general approach here, the so-called "tournament" or "hybrid" predictors. The observation is that different branches are better predicted differently, and there is no "best" scheme for all cases. So we run several predictors concurrently and pick a winner for each branch based on its specific dynamic behaviour. The "tournament" method learns soon enough what works best for each branch, and overall accuracy improves as a result. Moreover, we also adapt very well should circumstances change. All in all, it's an excellent approach.
But of course nothing's free in this world. A tournament predictor implies a full implemention of several prediction schemes at once, as well as the logic to choose between them. This adds complexity, hardware and delay, so it's hard to know whether it's justified here. But we should know better soon enough. I'm currently running some tests to improve the pipeline's main predictor. Thereafter we will be in a better position to asses other supplementary approaches.
Once again, thanks cjs for these excellent suggestions and dialogue. I'm sure the end result will be much better for it.
Cheers for now,
Drass