Quote:
The first problem is in deciding which instructions can be executed in parallel. Huge amounts of chip space in both the Alpha 21264 and the Pentium II are dedicated to this task. The second problem is in finding enough instructions that can be executed in parallel. Most C code results in a branch instruction after an average of five sequential instructions, and this makes it impossible to make most code run in parallel.
For what it's worth, I think this quote that BigEd provided sums up the problem very well. In my opinion, the issue of effective utilization of the pipelines of modern super-scalar processors is not going to be easily solved. Much of the SW targeting these processor architectures is written in programming languages that do not provide a framework for the development of algorithms with a parallel execution flavor. In my personal experience, I first design/plan algorithms in a sequential fashion, and only after the algorithm and its program is working correctly, do I look to see if I can re-write it as a parallel algorithm. In many cases, I find that temporal data dependencies, programming language limitations, etc. would result in an parallel version being more complicated than necessary. In addition, the sequential algorithm is more than fast enough for my needs, so any further effort on my part in eking out additional performance is not really worth the effort.
Again, FWIW, I think a heterogeneous processor like the CDC 6600 and CDC 7600 with their multiprocessor high-performance mainframe CPUs and low performance Peripheral Processors (PPs) is a better match to the modern processing problem. In other words, one or two multi-issue, super-scalar, out-of-order (OOO) processor cores matched with several single issue, barrel processors sharing the same instruction set on the same die would be a much better general purpose solution. Far more independent threads of execution (processes/tasks) could be supported by such a beast. I suspect that such a beast would also provide better performance for all of those low priority, I/O bound tasks that seemingly always cause my media player to annoyingly skip.