Combining the paired reads from Illumina run
Hi, I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this? Thanks! -Surya
Are these illumina or solid reads? Tx, anton On Mar 29, 2011, at 11:29 AM, Surya Saha wrote:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
You can try converting fastq to tabular (NGS: QC and Manipulation). Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. Thanks, anton On Mar 29, 2011, at 11:38 AM, Surya Saha wrote:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Hi Anton, Thank you for the tip. The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right? -Surya On Tue, Mar 29, 2011 at 3:46 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote:
You can try converting fastq to tabular (NGS: QC and Manipulation).
Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though.
In a hacky way, where you translate "/1" into something else such as two spaces " ", or your favorite chemical element such as "He" ;) a. On Mar 29, 2011, at 4:00 PM, Surya Saha wrote:
The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right?
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Hi Surya, I made Galaxy scripts, FASTQ interlacer and de-interlacer, to do exactly what you are describing: https://bitbucket.org/fangly/galaxy-central/changeset/3fa11cf2730d The tools extend the Galaxy Python API and therefore need Galaxy to work. Unfortunately, FASTQ interlacer and de-interlacer are still waiting to be committed to the Galaxy development repository by a Galaxy maintainer. Florent On 30/03/11 01:29, Surya Saha wrote:
participants (3)
-
Anton Nekrutenko
-
Florent Angly
-
Surya Saha