Kevin Schofield point to some Jim Gray papers, including a new (Jan 2005) one that I hadn't read.
The paper discusses the challenges around working with multi-petrabyte scientific datasets. There are some interesting approaches discussed (including Google's Map Reduce of courser).
However, I often wonder why this little gem from Dan Creswell never got more attention. He has written extensions to JavaSpaces that allow the code (think queries & processing instructions) to migrate to the data, instead of the code trying to download the data and process it. In a prototype system:
I then ran a test with each version submitting ten objects and then removing them from the queue. The code-uploading version was nearly 7 times as fast and I'm certain that as concurrency increases, the performance gap will get greater still as contention increases
Code Downloading for Improved Performance
I'd love to have time to look at this more, but I'd imagine that would be an approach that would work well.