Hello Ryan, I'm in the exact same situation with my bowtie/tophat tools, going back and forth between outputing a SAM, sorted SAM, BAM or sorted BAM, and I'm still not sure what's the best method. Storage wise - you're correct, just saving the sorted BAM is the best (even more with the fact the processing SAM files as text is so horrendous that I think alnost no tool uses them directly, always requiring intervals or sorted BAM). But one annoyance (for me) is that samtools (the program) is very in-efficient - using only a single thread (and the sort part isn't doing a great job at that). So if I give the "mapping" tool as a whole 20 threads or more, and a part of the running time (the samtools sort part) is only using a single-thread - I'm wasting the other threads, as they sit idle waiting for the sort to finish. I also tried sorting the SAM file directly, using GNU sort (version 8.10 can use multiple threads, and the memory management actually works, as opposed to "samtools sort -m") - but I'm not sure it's worth the effort. I didn't find an optimal solution that I like, and I'm interested to hear what others think. -gordon Ryan Golhar wrote, On 04/05/2011 01:08 PM:
Hi all - I find it redundant to hold on to SAM output from NGS Mapping tools such when I end up converting the SAM files to BAM files anyway. The cleanup scripts require the history items to be deleted, but I don't want to delete them yet as I want the entire workflow to be kept until we are done analyzing our data.
So, I was thinking of a way to remove the intermediate SAM files and thought how I would do this on the command line...simply pipe the output of BWA to samtools to create a BAM file and never have a SAM file to deal with.
The BWA tool runner can be modified to pipe BWA output directly to samtools so a SAM file is never physical stored on disk. Has anyone done this? Does this seem like a good idea?
Ryan