Hi Peter, The tool is unique, but let's put it into the Tool Shed for now and consider incorporating it later on. Thanks again for all of your input!! Best, Jen Galaxy team On 11/23/10 10:21 AM, Peter wrote:
Hi all,
I've got a Python script which divides a FASTQ file containing a mixture of paired, unpaired and orphan reads into valid pairs and single/orphan reads.
Such a situation can occur after applying quality filtering to a file of paired FASTQ reads. I've also had raw data supplied in this state from a sequencing center.
It works by looking at the read names, and understand the /1 and /2 convention used by Illumina, .f and .r which I understand is common, and the Sanger convention as well. The only requirement (so that only a single pass though the input is needed) is that the input be sorted such that any pairs come as consecutive entries (forward then reverse read).
Is there anything like this in Galaxy already that I have missed?
Would you consider merging such a tool into Galaxy? I haven't written a wrapper XML file yet, and it also currently uses Biopython for FASTQ parsing, but I could switch it to use the Galaxy FASTQ code instead.
Regards,
Peter _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
-- Jennifer Jackson http://usegalaxy.org