Thanks, jen. I have asked informatic scientists at Hutch to do the QC for me and both R2 and R4 are ok from FASTQC analysis.

My question is: Do I still need to use the groomer in GALAXY and use the groomed data for further analysis such as TOPHAT? Should I skip the steps to compute quality statistics and draw boxplots using the groomed data?



-----Original Message-----
From: Jennifer Jackson []
Sent: Fri 8/26/2011 7:19 PM
To: galaxy-user
Cc: Peng, Tao
Subject: Re: [galaxy-user] quality score

===>  Please use "Reply All" when responding to this email!<===

Hello Tao,

The tool "NGS: QC and manipulation -> FastQC" (last tool in group) may
be helpful for your project.

In general, sequence with quality scores this low would be considered
unusable. Perhaps double check the options used with the Fastq Groomer
tool? Or check/filter the data before grooming?

This may not be the case for your data, but just in case, please note
that CASAVA 1.8+ now produces both filtered and unfiltered results and
would need to be used with the "Sanger" option with the "Fastq Groomer"

This prior Q&A explains the filtering:

Hopefully this helps. Please send future questions directly to the
mailing list as the "to" recipient. There is no need to send directly
"to" or as "cc" any of the Galaxy team directly. This helps us to track
and address questions quickly and as a team.


Galaxy team

> Hi jen, I followed the GALAXY web cast to check the quality of RNA-seq
> data: one sample seem to have score above 20 in most bases (R2); but the
> other one is around 6-8 in most bases (R4) (see the attached PDF files).
> Does this mean R4 RNA-seq data are BAD? What exactly does it mean anyway?
> Thanks for your help,
> tao
> -----Original Message-----
> From: Jennifer Jackson []
> Sent: Thu 8/18/2011 3:46 PM
> To:
> Cc: Peng, Tao
> Subject: visualization of alignment
> Hello Tao,
> For the Bowtie results, the aligned results may be low because the data
> is RNA and not DNA. TopHat is generally considered a better choice for
> RNA since it allows for bridges over splice sites (introns). The full
> documentation for each program is on each tool's form and/or you can
> contact the tool authors with scientific questions at
> Also, a tutorial and FAQ are available here:
> For visualization, an update that allows the use of a user-specified
> fasta reference genome is coming out very soon. For now, you can view
> annotation by creating a custom genome build, but the actual reference
> will be not included. Use "Visualization -> New Track Browser" and
> follow the instructions for "Is the build not listed here? Add a Custom
> Build".
> Help for using the tool is available here:
> As stated before, please email the mailing list directly and not
> individual team members. Specifically, with a "to" to the mailing list
> (only) and not including team members as a "to" or "cc" unless ask to do
> so when sharing private data. Our internal tracking system and public
> archives rely on this method. Thank you for your future corporation.
> Best,
> Jen
> Galaxy team
> On 8/18/11 3:15 PM, Peng, Tao wrote:
>  > Hi jen, I have used BOWTIE to align my RNA-seq reads to HSV2 genome; out
>  > of 35,000,000 lines, only 621 lines left when I chose to have mapped
>  > reads only. How can visualize these aligned reads to HSV-2 genome?
>  >
>  > In the panel of converted SAM to BAM, I tried to use the data in
>  > trickster, but I am not sure to how to build a HSV genome as a
>  > reference?
>  >
>  > I appreciate your help,
>  >
>  >
>  > tao
>  >
> --
> Jennifer Jackson
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:

Jennifer Jackson