Read shuffler and code contributions
Hi, I was wondering if there is a tool in Galaxy to put mate pair reads located in two files inside a single file? I made the error of believing that the FASTQ joiner does that, but it does not. If this feature is not planned, I am willing to work on it. Which of the following would be better for integration into Galaxy? * a clean Python implementation like the other utilities in tools/fastq/, i.e. fastq_groomer.py and fastq_paired_end_joiner.py * a wrapper around the Velvet utilities, shuffleSequences_fastq.pl and shuffleSequences_fasta.pl, given that Velvet already has a wrapper in Galaxy Regarding contributing code to the galaxy-central repository, what is the best way to get it done? Recently, I cloned the galaxy-central repository on Bitbucket, made some changes and requested the changes to be pulled, but I have not heard from the Galaxy Team yet. Let me know if you like to do things a different way! Best, Florent
On Wed, Dec 15, 2010 at 2:42 AM, Florent Angly <florent.angly@gmail.com> wrote:
Hi,
I was wondering if there is a tool in Galaxy to put mate pair reads located in two files inside a single file? I made the error of believing that the FASTQ joiner does that, but it does not.
If this feature is not planned, I am willing to work on it. Which of the following would be better for integration into Galaxy? * a clean Python implementation like the other utilities in tools/fastq/, i.e. fastq_groomer.py and fastq_paired_end_joiner.py * a wrapper around the Velvet utilities, shuffleSequences_fastq.pl and shuffleSequences_fasta.pl, given that Velvet already has a wrapper in Galaxy
Are you asking for a tool to interleave to FASTQ or FASTA files with matching entries (with matching names in the same order) into one file which alternates forward then reverse read? Would you prefer it with or without error checking? I think the scripts in velvet are fast but will fail horribly with bad input... note there is a simple Biopython script to do this included with velvet already (simple version with no error checking, I have written a more robust version too - it looks like I haven't sent it to Daniel to include in velvet though). Peter
On 15/12/10 20:39, Peter wrote:
On Wed, Dec 15, 2010 at 2:42 AM, Florent Angly<florent.angly@gmail.com> wrote:
Hi,
I was wondering if there is a tool in Galaxy to put mate pair reads located in two files inside a single file? I made the error of believing that the FASTQ joiner does that, but it does not.
If this feature is not planned, I am willing to work on it. Which of the following would be better for integration into Galaxy? * a clean Python implementation like the other utilities in tools/fastq/, i.e. fastq_groomer.py and fastq_paired_end_joiner.py * a wrapper around the Velvet utilities, shuffleSequences_fastq.pl and shuffleSequences_fasta.pl, given that Velvet already has a wrapper in Galaxy
Hi Peter,
Are you asking for a tool to interleave to FASTQ or FASTA files with matching entries (with matching names in the same order) into one file which alternates forward then reverse read? Yes, indeed, this is what I am proposing.
Would you prefer it with or without error checking? Error checking is best.
I think the scripts in velvet are fast but will fail horribly with bad input... note there is a simple Biopython script to do this included with velvet already (simple version with no error checking, I have written a more robust version too - it looks like I haven't sent it to Daniel to include in velvet though).
I rolled my own FASTQ paired read interlacer and deinterlacer today, using the Galaxy Python modules in lib/galaxy_utils/. I must say these modules made it quite convenient and efficient to implement error-checking in the (de)interlacing. You can find the scripts here if you're interested: http://bitbucket.org/fangly/galaxy-central I'll make the XML wrappers tomorrow and test them. Hopefully after this is done, my changes can be pulled into the official Galaxy repository. Best, Florent
Hi Peter,
Are you asking for a tool to interleave to FASTQ or FASTA files with matching entries (with matching names in the same order) into one file which alternates forward then reverse read?
Yes, indeed, this is what I am proposing.
Would you prefer it with or without error checking?
Error checking is best.
I'd agree.
I think the scripts in velvet are fast but will fail horribly with bad input... note there is a simple Biopython script to do this included with velvet already (simple version with no error checking, I have written a more robust version too - it looks like I haven't sent it to Daniel to include in velvet though).
I rolled my own FASTQ paired read interlacer and deinterlacer today, using the Galaxy Python modules in lib/galaxy_utils/. I must say these modules made it quite convenient and efficient to implement error-checking in the (de)interlacing. You can find the scripts here if you're interested: http://bitbucket.org/fangly/galaxy-central I'll make the XML wrappers tomorrow and test them. Hopefully after this is done, my changes can be pulled into the official Galaxy repository.
For the deinterlacer, I previously offered to write something like that for Galaxy and was told to submit it to the Tool Shed initially (although it may be merged into the official repository at some point). See "Divide FASTQ file into paired and unpaired reads" on http://community.g2.bx.psu.edu/ for my tool. I also note you've changed the return behaviour of the Galaxy FASTQ library method get_paired_identifier - that API change could break other parts of Galaxy or 3rd party tools. Looking at that Galaxy lib, perhaps I can offer some of my code for identifying Sanger read pairs and the .f .r suffices to enhance the class fastqJoiner (look like it only does Illumina /1 and /2 right now which I think is too narrow). Peter
participants (2)
-
Florent Angly
-
Peter