BigEd wrote:
I think perhaps Toshi's point is that Samuel implies that getting adequate bandwith to memory by pipelining accesses is good enough, but in fact if the CPU is waiting for the data, it isn't.
I never claimed that pipelining would solve everything. But, supporting 3 concurrent memory accesses with the usual 4 cycle access times is a great way of filling the gap,
if the processor can keep itself busy in the mean time, which it turns out it usually can do.
Quote:
As we're entirely and completely off-topic now, I thought it interesting that the
recent "5 trillion digits of pi" calculation (using a single PC, for the most part) had this to say about ram sizes and bus speeds:
Quote:
Due to time constraints, most of the algorithms that are used for disk operations had been "forcibly ported" from the in-ram equivalents with little or no regard to performance implications. Most of these performance penalties come in the form of sub-optimal cache use.
This sub-optimal cache use leads to excessive memory-thrashing which explains why Shigeru Kondo noticed that 96 GB of ram @ 1066 MHz had higher performance than 144 GB of ram @ 800 MHz
This amply illustrates my point. Memory access latencies are important, but if you abuse or misuse other performance-enhancing features of the system, it turns out you've got bigger problems.
Where I work, we use moderately powerful servers to host our web application software. But, they're not mainframes, and even in the x86 world, they're hardly what I'd consider
the top of the line hardware. Since we're only a medium-sized company, we must remain cost conscious.
Our solutions to memory bottlenecks
rarely involves swapping out a motherboard for a faster FSB or for faster RAM sticks. Most of our performance gains come from decomposing problems differently, compensating for cache effects, or parallelization through the use of multiple machines. We service the same volume of customer traffic as Facebook, using only 1/100th the number of servers. RAM speed is
almost never an issue in contemporary high-performance software.
There are places where RAM speed absolutely matters, of course. Supercomputing comes to mind, but even here, properly built, industrial-grade vector processors (not the toys you see in Intel or PowerPC instruction sets) can stream data from SDRAM at bus-speed. I have a friend who works for nVidia, and their GPUs do all sorts of neat tricks with streaming data from SDRAM at bus speed, for example. To benefit from these optimizations, coders must place their texture (and other) data in memory in ways that fit the SDRAM bursting model, which isn't always the nice, clean, flat 2-D bitmaps we've all come to appreciate over the years.
But, on the whole, across the entire industry and probing all sectors, where CPUs dwarf RAM in performance, we are finding that RAM is nonetheless faster than most software running on these CPUs. This is particularly the case with systems where the FSB approximates internal core operating speeds! So, the end result is a computer running software at woefully suboptimal speeds.
I stand by my statements: the problem
rarely is the RAM performance these days. It's how you
use the RAM that determines overall performance now. This isn't 1996 anymore.