Re: [galaxy-user] Pre-processing of Illumina RNA-Seq paired end data

22 Feb 2012

      Hi,

I think you need to first remove the adaptors and then trim the reads.
 That is probably the correct way.  As for the second part of the question,
you could try a rudimentary way to actually search for a sequence header.
 I have seen this different sizes in the r1 and r2 read files, but taken
together almost 90% turn out to be true the paired reads.

Hope this helps,
Sameet

On Wed, Feb 22, 2012 at 12:29 PM, Ravi Karra <ravi.karra@gmail.com> wrote:
...
Hello,
I have Illumina 76bp paired end data for a zebrafish RNA-seq experiment
and am basically stuck while trying to pre-process my data prior to using
Tophat/CuffDiff.
For each sample, I have a read1 fastq file and a paired read2 fastq file.
 After using FASTQ Groomer, I trimmed the ends using FASTQ quality trimmer
with a threshold quality score of 20 ans a window size of 1 (I think that
will essentially lop off the end of the read until the quality score is >=
20).  Next, I trimmed the adapters using Clip.
What I am left with is a modified read1 fastq file and a modified read2
file, where the pairs are not in the same order and some reads are left
without pairs. From what I have read, I don't think TopHat can incorporate
paired end data that is out of order.. I tried to get around the ordering
issue using FASTQ joiner, but this tool is not able to join the reads
(return is 0 joined reads).  I am not really sure why FASTQ joiner didn't
work for me and am looking for suggestions of what to try next.
Thanks!
ravi
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
-- 
Sameet Mehta, Ph.D.,
Phone:  (301) 842-4791