GARTHWILSON wrote:
Why not let the programmer tell it if it's likely to branch or not.
I’ve been thinking about this actually. The existing predictor does offer up an initial static prediction for every branch, namely Backward-Taken-Forward-Not-Taken. It would be beneficial to have this initial prediction be more sophisticated, including perhaps taking a hint from the programmer.
Quote:
Maybe I'm not understanding the situation completely, but ISTM that you're asking for an instruction bit that simply can't exist unless you implement 9-bit bytes or something similar. You're instantly losing 65xx machine code compatibility no matter how you slice it, right?
Yes, the question is how to provide the CPU additional information while still maintaining compatibility. One way I’ve been considering is to allow for some mechanism whereby certain internal structures, like the btb and the prediction table, are pre-loaded. This would essentially be metadata that the CPU can really benefit from. We could pass initial predictions for various branches, for example, whether those are provided by the programmer or otherwise arrived at by profiling the code and observing how certain branches actually behave — or perhaps a combination of these two where code is profiled and the programmer can override the automatic hints.
A related thought here is that the simulator might eventually do double duty as a static
profiler/optimizer and generate metadata exactly for this purpose. It’s only a concept now, but probably worth mentioning given the discussion. One obvious challenge is how exactly to load this metadata into the CPU. Another is that performance during profiling will vary depending on the inputs to a given program, so one has use a “representative” input set. Probably easier said than done, but it’s interesting nevertheless.
Quote:
Well, that's partly what I was wondering about, if the code has to be 100% compatible with earlier versions and software that's already out there.
I have not quite given up on 100% compatibility. Self-modifying code is a pain, but certainly not an insurmountable problem. For example, the pipeline can check if any writes will modify instructions that are already loaded and flush them in that case. That would mean that self-modifying code would run slowly, but that’s better than not running at all. The
profiler/optimizer might then be a means of “fixing” self-modifying code on an optional basis, perhaps by flagging it and offering btb entries which can be either dynamically or statically applied to the code. Again, only a concept now.
Quote:
It’s amazing what a modern branch predictor can achieve
I am becoming increasingly convinced of this myself. After all, the current optimizer does reach 95% accuracy on this test despite the challenging branches. Yes, it needs 16k bytes to do it, but that’s still a great result. At minimum, I would say the approach is promising, and perhaps can be made even better when supplemented by good static analysis as described above.
Quote:
I had adapted a ‘C’ compiler at one point to accept an optional second parameter in if () statements.
Really cool Rob. Thanks for mentioning it.