Extreme parallelization for NGS analysis

12 Aug 2012

      I'd like to start an open discussion on the topic of parallelization for
NGS data. I noticed that Galaxy recently came out with a cloud-based
interface using Amazon EC3. I've been trying to learn more about how these
NGS analysis algorithms (for alignment, assemly, etc.) are actually
implemented in a parallel fashion, but I have had trouble finding specific
documentation and resources describing how it works and how it is
implemented. Any direction/resources that people can provide would be much
appreciated.

Also, I have seen some papers describing parallelization of various
specific algorithms, especially recently (such as PASQUAL from Georgia
Tech), but they all seem to be operating on relatively "small" networks of
distributed computing resources. Does anyone have any idea about how far
the parallelization and speeding up of these analyses can be pushed? How
difficult would it to be to implement something that runs on a distributed
network of say 100,000 computers, or even more... say a million? Is there a
bottleneck somewhere that would prevent that from being feasible for NGS
analysis? Or would that make the analyses amazingly fast compared to what's
available now? I'm thinking of a system like what the SETI project has set
up for their distributed computing user base and wondering what the limits
are and how one could implement such a system if the user base is already
in place.

Josh Tietjen

tags

participants (1)