FastQ Groomer and Compute Quality Statistics

I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file. I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first. Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool? -John

On Wed, Jun 1, 2011 at 5:02 PM, John David Osborne <ozborn@uab.edu> wrote:
I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file.
I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first.
Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool?
-John
If you know your data is already in Sanger FASTQ format, you can say this when uploading the data into Galaxy. Or, use the "pencil" icon to edit the attributes and change the file type. This doesn't change the file itself on disk. Peter

You can avoid the space/time overhead of grooming and get comprehensive QC reports using the new wrapper for FastQC (under NGS: QC) - it takes fastq of any flavour (and bam) groomed or not, producing a superset of the compute quality stats output without the need for an intermediate step. Highly recommended. On Wed, Jun 1, 2011 at 12:02 PM, John David Osborne <ozborn@uab.edu> wrote:
I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file.
I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first.
Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool?
-John
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

Thanks Ross, I don't see it under my local install - are there any pre-written scripts to integrate it with a local galaxy instance? I assume you are talking about this tool here: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ -John ________________________________________ From: Ross [ross.lazarus@gmail.com] Sent: Wednesday, June 01, 2011 11:41 AM To: John David Osborne Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] FastQ Groomer and Compute Quality Statistics You can avoid the space/time overhead of grooming and get comprehensive QC reports using the new wrapper for FastQC (under NGS: QC) - it takes fastq of any flavour (and bam) groomed or not, producing a superset of the compute quality stats output without the need for an intermediate step. Highly recommended. On Wed, Jun 1, 2011 at 12:02 PM, John David Osborne <ozborn@uab.edu> wrote:
I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file.
I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first.
Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool?
-John
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

On Thu, Jun 9, 2011 at 10:12 AM, John David Osborne <ozborn@uab.edu> wrote:
Thanks Ross, I don't see it under my local install - are there any pre-written scripts to integrate it with a local galaxy instance?
I assume you are talking about this tool here: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
Hi, John. it's on main and test - ie the FastQC wrapper is distributed with the current stable and central branches so your local tool_conf.xml may be out of date since it's not automagically refreshed from the distro .sample ? If you do a diff of your local tool_conf.xml with the current distributed sample, you should see the lines you need to add which points to rgenetics/fastqc.xml Thu,Jun 09 at 10:22am grep -i fastqc tool_conf.xml <label text="FastQC: fastq/sam/bam" id="fastqcsambam" /> <tool file="rgenetics/rgFastQC.xml" /> Like everything else, you'll want to install the jar locally so it can be found by the cluster - the default location is tool-data/shared/jars/FastQC so the tool can find the fastqc perl script (yes, I know...but it's worth it!) <command interpreter="python"> rgFastQC.py -i $input_file -d $html_file.files_path -o $html_file -n "$out_prefix" -f $input_file.ext -e ${GALAXY_DATA_INDEX_DIR}/shared/jars/FastQC/fastqc I hope this helps?
-John
________________________________________ From: Ross [ross.lazarus@gmail.com] Sent: Wednesday, June 01, 2011 11:41 AM To: John David Osborne Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] FastQ Groomer and Compute Quality Statistics
You can avoid the space/time overhead of grooming and get comprehensive QC reports using the new wrapper for FastQC (under NGS: QC) - it takes fastq of any flavour (and bam) groomed or not, producing a superset of the compute quality stats output without the need for an intermediate step. Highly recommended.
On Wed, Jun 1, 2011 at 12:02 PM, John David Osborne <ozborn@uab.edu> wrote:
I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file.
I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first.
Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool?
-John
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;
-- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Director of Bioinformatics, Channing Lab; Tel: +1 617 505 4850; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

Hi guys, We are trying to load Illumina data to our local Galaxy instance. The files are between 700 MB and 2.2 GB. Files below 2 GB load in less than 5 minutes. Files larger than 2 GB don't upload at all. We installed Galaxy locally because we thought loading files will be faster than the server version. Any suggestions to solve this problem is highly appreciated. Tilahun ----------------------- John David Osborne wrote:
I noticed that for our new Ilumina data (which generate Sanger format) the FastQ groomer output is identical to the Ilumina FastQ input file. I was hoping to go ahead and just use the raw FastQ files as input (saving disk space) for computing quality statistics to look at box plots, but it appears that the tool "Compute Quality Statistics" appears to require that the data have been run through FastQ Groomer first. Is there a way to get around this and is this a bug? I assuming this is some sort of safety measure built into this tool? -John
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (4)
-
John David Osborne
-
Peter Cock
-
Ross
-
Tilahun Abebe