parallelizing an NGS mapping workflow
Hello, I'd like to use Galaxy on our local beowulf cluster for NGS workflows. One typical use case we'd be replacing with Galaxy is a parallel BWA alignment of large fastq files. To distribute this across the cluster we split the fastq file into many parts, run each separately against the same reference, and then use samtools to merge the SAM output. It's not uncommon to end up with hundreds of parts after splitting. How does Galaxy handle the parallelization of large NGS mappings? I've found the tools for fastq QC, mapping, and SAM merging, but couldn't find any set of tools that would control the parallelization. This trouble ticket (http://bitbucket.org/galaxy/galaxy-central/issue/197/starting-workflows-with...) would suggest this functionality hasn't been implemented yet, but it seems necessary for many (most?) Illumina or SOLiD runs to get a reasonable mapping turnaround time. If this is already a feature it would be great if I could be pointed to the relevant docs and maybe it could be given a more prominent place in the wiki/interface. If it's not yet a feature, is there a timeline for when it will be added? Thanks, Chris -- Chris Berthiaume Center for Environmental Genomics University of Washington
Chris Berthiaume wrote:
Hello,
I'd like to use Galaxy on our local beowulf cluster for NGS workflows. One typical use case we'd be replacing with Galaxy is a parallel BWA alignment of large fastq files. To distribute this across the cluster we split the fastq file into many parts, run each separately against the same reference, and then use samtools to merge the SAM output. It's not uncommon to end up with hundreds of parts after splitting. How does Galaxy handle the parallelization of large NGS mappings? I've found the tools for fastq QC, mapping, and SAM merging, but couldn't find any set of tools that would control the parallelization. This trouble ticket (http://bitbucket.org/galaxy/galaxy-central/issue/197/starting-workflows-with...) would suggest this functionality hasn't been implemented yet, but it seems necessary for many (most?) Illumina or SOLiD runs to get a reasonable mapping turnaround time. If this is already a feature it would be great if I could be pointed to the rele v! ant docs and maybe it could be given a more prominent place in the wiki/interface. If it's not yet a feature, is there a timeline for when it will be added?
Hi Chris, This is a long standing feature request which has a ticket here: http://bitbucket.org/galaxy/galaxy-central/issue/79 Unfortunately, still no timeline on when it'll be implemented, but it's moving up on the list of priorities. --nate
Thanks, Chris
participants (2)
-
Chris Berthiaume
-
Nate Coraor