I'd like to start an open discussion on the topic of parallelization for NGS data. I noticed that Galaxy recently came out with a cloud-based interface using Amazon EC3. I've been trying to learn more about how these NGS analysis algorithms (for alignment, assemly, etc.) are actually implemented in a parallel fashion, but I have had trouble finding specific documentation and resources describing how it works and how it is implemented. Any direction/resources that people can provide would be much appreciated. Also, I have seen some papers describing parallelization of various specific algorithms, especially recently (such as PASQUAL from Georgia Tech), but they all seem to be operating on relatively "small" networks of distributed computing resources. Does anyone have any idea about how far the parallelization and speeding up of these analyses can be pushed? How difficult would it to be to implement something that runs on a distributed network of say 100,000 computers, or even more... say a million? Is there a bottleneck somewhere that would prevent that from being feasible for NGS analysis? Or would that make the analyses amazingly fast compared to what's available now? I'm thinking of a system like what the SETI project has set up for their distributed computing user base and wondering what the limits are and how one could implement such a system if the user base is already in place.
participants (1)
-
Josh Tietjen