Combining the paired reads from Illumina run

Hi, I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each. I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this? Thanks! -Surya

Are these illumina or solid reads? Tx, anton On Mar 29, 2011, at 11:29 AM, Surya Saha wrote:
Hi,
I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each.
I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this?
Thanks!
-Surya ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

These are Illumina reads -S. On Tue, Mar 29, 2011 at 11:37 AM, Anton Nekrutenko <anton@bx.psu.edu> wrote:
Are these illumina or solid reads?
Tx,
anton
On Mar 29, 2011, at 11:29 AM, Surya Saha wrote:
Hi,
I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each.
I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this?
Thanks!
-Surya ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

You can try converting fastq to tabular (NGS: QC and Manipulation). Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though. Thanks, anton On Mar 29, 2011, at 11:38 AM, Surya Saha wrote:
These are Illumina reads
-S.
On Tue, Mar 29, 2011 at 11:37 AM, Anton Nekrutenko <anton@bx.psu.edu> wrote: Are these illumina or solid reads?
Tx,
anton
On Mar 29, 2011, at 11:29 AM, Surya Saha wrote:
Hi,
I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each.
I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this?
Thanks!
-Surya ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

Hi Anton, Thank you for the tip. The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right? -Surya On Tue, Mar 29, 2011 at 3:46 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote:
You can try converting fastq to tabular (NGS: QC and Manipulation).
Jointing (Join, Subtract and Group) the two files on ids (provided they do not have /1 and /2). Splitting into two files with cut (Text manipulation), and going back into fastq with tabulat-to-fastq (NGS: QC and Manipulation). With 30 mil reads this will likely take some time though.
Thanks, anton
On Mar 29, 2011, at 11:38 AM, Surya Saha wrote:
These are Illumina reads
-S.
On Tue, Mar 29, 2011 at 11:37 AM, Anton Nekrutenko <anton@bx.psu.edu> wrote:
Are these illumina or solid reads?
Tx,
anton
On Mar 29, 2011, at 11:29 AM, Surya Saha wrote:
Hi,
I have two fastq files with the forward(/1) and reverse(/2) paired
reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each.
I am trying to pull out all the paired reads for which both fwd and rev
exist. Can I use a combination of fastq tools in Galaxy to do this?
Thanks!
-Surya ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

In a hacky way, where you translate "/1" into something else such as two spaces " ", or your favorite chemical element such as "He" ;) a. On Mar 29, 2011, at 4:00 PM, Surya Saha wrote:
The sequence names do end in /1 and /2 but that can be fixed using Manipulate FASTQ tool, right?
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

Hi Surya, I made Galaxy scripts, FASTQ interlacer and de-interlacer, to do exactly what you are describing: https://bitbucket.org/fangly/galaxy-central/changeset/3fa11cf2730d The tools extend the Galaxy Python API and therefore need Galaxy to work. Unfortunately, FASTQ interlacer and de-interlacer are still waiting to be committed to the Galaxy development repository by a Galaxy maintainer. Florent On 30/03/11 01:29, Surya Saha wrote:
Hi,
I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each.
I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this?
Thanks!
-Surya
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:

Hi Florent, This looks great. Hope it gets committed into the repository soon. Best, Surya On Tue, Mar 29, 2011 at 5:59 PM, Florent Angly <florent.angly@gmail.com>wrote:
Hi Surya,
I made Galaxy scripts, FASTQ interlacer and de-interlacer, to do exactly what you are describing: https://bitbucket.org/fangly/galaxy-central/changeset/3fa11cf2730d The tools extend the Galaxy Python API and therefore need Galaxy to work. Unfortunately, FASTQ interlacer and de-interlacer are still waiting to be committed to the Galaxy development repository by a Galaxy maintainer.
Florent
On 30/03/11 01:29, Surya Saha wrote:
Hi,
I have two fastq files with the forward(/1) and reverse(/2) paired reads. The reads are not in same order in either file, some pairs are absent/missing and the files are 8 GB each with abt 30 mill reads each.
I am trying to pull out all the paired reads for which both fwd and rev exist. Can I use a combination of fastq tools in Galaxy to do this?
Thanks!
-Surya
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (3)
-
Anton Nekrutenko
-
Florent Angly
-
Surya Saha