RNA-seq Galaxy workflow for PE barcoded samples?

20 Apr 2011

      Hello,

I posted to the seqanswers forum, but have not received any feedback.  I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The two files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.

Would the following Galaxy workflow be correct?

1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome

The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? If there is a more standard way to handle these types of barcoded files, I would appreciate hearing about this workflow.

Thanks very much in advance,
jjw

P.S. Galaxy is an incredibly useful resource.  Thanks!

Whyte, Jeffrey

Jennifer Jackson

tags

participants (2)