Hi all,
I've got a Python script which divides a FASTQ file containing a mixture of paired, unpaired and orphan reads into valid pairs and single/orphan reads.
Such a situation can occur after applying quality filtering to a file of paired FASTQ reads. I've also had raw data supplied in this state from a sequencing center.
It works by looking at the read names, and understand the /1 and /2 convention used by Illumina, .f and .r which I understand is common, and the Sanger convention as well. The only requirement (so that only a single pass though the input is needed) is that the input be sorted such that any pairs come as consecutive entries (forward then reverse read).
Is there anything like this in Galaxy already that I have missed?
Would you consider merging such a tool into Galaxy? I haven't written a wrapper XML file yet, and it also currently uses Biopython for FASTQ parsing, but I could switch it to use the Galaxy FASTQ code instead.
Regards,
Peter
Hi Peter,
The tool is unique, but let's put it into the Tool Shed for now and consider incorporating it later on.
Thanks again for all of your input!!
Best,
Jen Galaxy team
On 11/23/10 10:21 AM, Peter wrote:
Hi all,
I've got a Python script which divides a FASTQ file containing a mixture of paired, unpaired and orphan reads into valid pairs and single/orphan reads.
Such a situation can occur after applying quality filtering to a file of paired FASTQ reads. I've also had raw data supplied in this state from a sequencing center.
It works by looking at the read names, and understand the /1 and /2 convention used by Illumina, .f and .r which I understand is common, and the Sanger convention as well. The only requirement (so that only a single pass though the input is needed) is that the input be sorted such that any pairs come as consecutive entries (forward then reverse read).
Is there anything like this in Galaxy already that I have missed?
Would you consider merging such a tool into Galaxy? I haven't written a wrapper XML file yet, and it also currently uses Biopython for FASTQ parsing, but I could switch it to use the Galaxy FASTQ code instead.
Regards,
Peter _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On Wed, Dec 8, 2010 at 1:15 PM, Jennifer Jackson jen@bx.psu.edu wrote:
Hi Peter,
The tool is unique, but let's put it into the Tool Shed for now and consider incorporating it later on.
Thanks again for all of your input!!
Best,
Jen
Hi Jen,
I wrote the XML wrapper and converted the script from using Biopython to use Galaxy for FASTQ handling (which will make it a little easier for people to install the tool). I've just uploaded it to the 'Galaxy Tool Shed' for review under the tool name "Divide FASTQ file into paired and unpaired reads".
Regards,
Peter
Approved now - thanks Peter, again, for all of your contributions!
Jen Galaxy team
On 12/15/10 9:08 AM, Peter wrote:
On Wed, Dec 8, 2010 at 1:15 PM, Jennifer Jacksonjen@bx.psu.edu wrote:
Hi Peter,
The tool is unique, but let's put it into the Tool Shed for now and consider incorporating it later on.
Thanks again for all of your input!!
Best,
Jen
Hi Jen,
I wrote the XML wrapper and converted the script from using Biopython to use Galaxy for FASTQ handling (which will make it a little easier for people to install the tool). I've just uploaded it to the 'Galaxy Tool Shed' for review under the tool name "Divide FASTQ file into paired and unpaired reads".
Regards,
Peter
galaxy-dev@lists.galaxyproject.org