Re: [galaxy-user] FASTQ splitter produced empty dataset, please help

10 Aug 2012

      Hello Jianguang,

Your data is paired end, but it was already split into forward and 
reverse reads when extracted in FASTQ format from the SRA. The tool 
'FASTQ splitter' is not needed (this tool literally cuts a joined 
sequence record into two). What you most likely want to do instead is 
sort out the forward and reverse reads into separate datasets.

The tool 'Manipulate FASTQ' in the same tool group would be a good 
choice. All of the sequences ending in a ".1" are forward, ending in a 
".2" are reverse. Run the tool twice on your dataset.

You do not need run 'FASTQ Groomer' on this data. According to the SRA 
report, the sequencing technology already produced Phred+33 scaled base 
calls. This means that you can simply assign the datatype to be 
"fastqsanger" to have it be recognized by the FASTQ tools. Do this as a 
first step, before 'Manipulate FASTQ', on the original data.

Are you working on the Galaxy Main instance at http://main.g2.bx.psu.edu 
(http://usegalaxy.org)? If you need more help, please share your history 
with the question. Use "Options (gear icon) -> Share of Publish", 
generate a share link, and then email the galaxy-bugs@bx.psu.edu mailing 
list instead (to keep your history private, is an internal list to our 
team only).

But, hopefully this helps to resolve the issues!

Jen
Galaxy team

On 8/10/12 7:21 AM, Du, Jianguang wrote:
...
I have problem to split a paired-end FASTQ dataset into two separate
datasets. In order to explain the problem clearly, I list the detail of
what I did with my dataset:
Step 1) My aim is to compare datasets for the differential alternative
splicing. I downloaded paired-end datasets at FASTQ format from SRA of
NCBI as original data.
Below is part of my paired-end FASTQ dataset that I downloaed from SRA
of NCBI, Does this dataset look OK?
@SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35
GTTTTCTGAGTGAGAAAAGGTGTGTTTGGAGTTTG
+SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35
I28II;II*2/<5:++,(..*943F@I.('+.35'
@SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35
AAAGATGTTAGTGTTTTATACGGAAAGGATATCTC
+SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35
9+*9+7@?F1206,IGI+D122&/0++-.>+6/@?
Step 2) Then I performed FASTQ groomer at setting as follows:
a) Input FASTQ quality scores type: Illumina 1.3-1.7
b)Advanced Options: Hide Advanced Options.
Did I choose the right setting for FASTQ groomer? Should I use Advanced
Options? If yes, what is the setting for Advances Options?
Below is part of groomed dataset:
@SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35
GTTTTCTGAGTGAGAAAAGGTGTGTTTGGAGTTTG
+SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35
*!!**!**!!!!!!!!!!!!!!!!'!*!!!!!!!!
@SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35
AAAGATGTTAGTGTTTTATACGGAAAGGATATCTC
+SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35
!!!!!!!!'!!!!!*(*!%!!!!!!!!!!!!!!!!
Does the groomed data look right? Is number represnting the member of a
pair correct? Here they are ".1" and ".2", should they be "/1" and "/2"?
Step 3) Then I ran FASTQ splitter with the groomed files. There is not
setting for the splitter. I chose the right groomed file and then click
"Excute". Below is the description of the splitted dataset:
empty
format:fastqsanger, database:hg19
Info: Split 0 of 15277248 reads (0.00%).
Please help me dela with this problem.
Thanks.
Jianguang Du
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
-- 
Jennifer Jackson
http://galaxyproject.org