Hi all,
I'm looking for a little advice on the pre-existing SAM/BAM filtering
tools already in the Galaxy Tool Shed (to avoid reinventing the wheel).
As I mentioned on another thread, I'm working on a wrapper for the
"samtools bam2fq" command (targeting samtools 1.1 which fixed
some bugs in this tool and added new functionality compared to
samtools 0.1.19), see:
https://github.com/peterjc/pico_galaxy/tree/master/tools/samtools_bam2fq
https://toolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq
https://testtoolshed.g2.bx.psu.edu/view/peterjc/samtools_bam2fq
One of my motivating use cases is a workflow like this:
1. Upload paired end FASTQ files.
2. Map them against a known contaminant genome giving a BAM file
(note I need the mapper to report unmapped reads in the output).
3. Filter the BAM to get unmapped reads, plus reads whose partner is
unmapped (conversely, remove reads where both partners are mapped).
4. Convert the filtered BAM back into FASTQ (with samtools bam2fq).
5. Proceed with analysis (e.g. de novo assembly).
Assuming I have understood "samtools view", this filtering step
has to be multiple parts:
This would get the unmapped reads
$ samtools view -f 0x4 ...
This would get reads with an unmapped partner:
$ samtools view -f 0x8 ...
However this would only get unmapped reads with an unmapped partner:
$ samtools view -f 0x12 ...
i.e. samtools view allows logical AND, not logical OR, when combining
flag filters.
So, I believe using samtools directly, a two stage filter is needed followed
by a merge (and sort), taking care not to duplicate reads, perhaps:
$ samtools view -f 4 ... > unmapped.bam
$ samtools view -f 8 -F 4 ... > mapped_with_partner_unmapped.bam
$ samtools merge unmapped.bam mapped_with_partner_unmapped.bam > ...
That could be repeated within Galaxy but is surprisingly complicated
with multiple steps in the history - so I do not want to go that route.
Have I overlooked a simple ToolShed solution using samtools?
As far as I could tell, the only other option on the current Tool Shed
is the Sambamba Filter tool (using "unmapped or mate_is_unmapped"),
which has a very capable looking filter system:
https://toolshed.g2.bx.psu.edu/view/lomereiter/sambamba_filter
@Artem - have you explored updating your tool_dependencies.xml
to download your pre-compiled binaries by default? That would
make deployment far easier, since D compilers are still rare, and
would mean we can see the test results on the Tool Shed :)
Please ask if you'd like advice on Tool Shed packaging.
Thanks,
Peter