Re: [galaxy-user] need help to split paired-end dataset

10 Aug 2012

      Hi Jianguang,

I took a screenshot to simplify the instructions. Please see the 
attached. The tool is:

   1 - filtering for a match against the identifier with a regular 
expression

   2 - removing those matched reads, leaving the remainder

Run twice, once with each regular expression. Remember that the reverse 
of the match will be in the output.

.*\.1\sHWI.*  <- this is in the attached screenshot

.*\.2\sHWI.*

Others expressions would work, these are just examples that you can use 
right now, for your exact data. I tried to not be overly cryptic so this 
could help as a base for future queries.

@SRR192532.1.1 HWI-EAS269:1:4:655:110.1 length=35
             ^^^^^^

I am matching the sequences where the ^^ are: at the end of the 
identifier, the first space, and the start of the description. The link 
on the tool form to the regular expression help is a good one to aid 
with understanding how/why this works.

Hopefully this helps!

Jen
Galaxy team

On 8/10/12 12:43 PM, Du, Jianguang wrote:
...
I am new to the NGS analysis. I need help to solve this problem.
As shown in my previous emial/question shown below, I have some
paired-end datasets at FASTQ format, and I have problem to split each of
these datasets into two datasets (one forward and one reverse).
Jennifer instructed me to assign the datatype to be fastqsanger first
and then run 'Manipulate FASTQ'.
I have two questions:
1) Now that the datasets were already split into forward and reverse
reads when extracted in FASTQ format from the SRA, can I use them just
as single end data?
2) If I do need to split each dataset into two datasets, how should I
choose the settings when I run "Manipulte FASTQ"?
Thanks.
Jianguang
-- 
Jennifer Jackson
http://galaxyproject.org