Hello Jianguang, Your data is paired end, but it was already split into forward and reverse reads when extracted in FASTQ format from the SRA. The tool 'FASTQ splitter' is not needed (this tool literally cuts a joined sequence record into two). What you most likely want to do instead is sort out the forward and reverse reads into separate datasets. The tool 'Manipulate FASTQ' in the same tool group would be a good choice. All of the sequences ending in a ".1" are forward, ending in a ".2" are reverse. Run the tool twice on your dataset. You do not need run 'FASTQ Groomer' on this data. According to the SRA report, the sequencing technology already produced Phred+33 scaled base calls. This means that you can simply assign the datatype to be "fastqsanger" to have it be recognized by the FASTQ tools. Do this as a first step, before 'Manipulate FASTQ', on the original data. Are you working on the Galaxy Main instance at http://main.g2.bx.psu.edu (http://usegalaxy.org)? If you need more help, please share your history with the question. Use "Options (gear icon) -> Share of Publish", generate a share link, and then email the galaxy-bugs@bx.psu.edu mailing list instead (to keep your history private, is an internal list to our team only). But, hopefully this helps to resolve the issues! Jen Galaxy team On 8/10/12 7:21 AM, Du, Jianguang wrote:
I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly, I list the detail of what I did with my dataset:
Step 1) My aim is to compare datasets for the differential alternative splicing. I downloaded paired-end datasets at FASTQ format from SRA of NCBI as original data.
Below is part of my paired-end FASTQ dataset that I downloaed from SRA of NCBI, Does this dataset look OK?
@SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35 GTTTTCTGAGTGAGAAAAGGTGTGTTTGGAGTTTG +SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35 I28II;II*2/<5:++,(..*943F@I.('+.35' @SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35 AAAGATGTTAGTGTTTTATACGGAAAGGATATCTC +SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35 9+*9+7@?F1206,IGI+D122&/0++-.>+6/@?
Step 2) Then I performed FASTQ groomer at setting as follows:
a) Input FASTQ quality scores type: Illumina 1.3-1.7
b)Advanced Options: Hide Advanced Options.
Did I choose the right setting for FASTQ groomer? Should I use Advanced Options? If yes, what is the setting for Advances Options?
Below is part of groomed dataset:
@SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35 GTTTTCTGAGTGAGAAAAGGTGTGTTTGGAGTTTG +SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35 *!!**!**!!!!!!!!!!!!!!!!'!*!!!!!!!! @SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35 AAAGATGTTAGTGTTTTATACGGAAAGGATATCTC +SRR192532.1.2 HWI-EAS269:1:4:655:110.2 length=35 !!!!!!!!'!!!!!*(*!%!!!!!!!!!!!!!!!!
Does the groomed data look right? Is number represnting the member of a pair correct? Here they are ".1" and ".2", should they be "/1" and "/2"?
Step 3) Then I ran FASTQ splitter with the groomed files. There is not setting for the splitter. I chose the right groomed file and then click "Excute". Below is the description of the splitted dataset:
empty format:fastqsanger, database:hg19 Info: Split 0 of 15277248 reads (0.00%).
Please help me dela with this problem.
Thanks.
Jianguang Du
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org