Yes, you have this correct, please use the Barcode splitter/Splitter
tool as you describe. Creating a workflow (if you haven't already) from
your history after running on one dataset would be a way to simplify
running the same analysis on future datasets.
Apologies for the delay in reply,
On 4/20/11 6:52 AM, Whyte, Jeffrey wrote:
I posted to the seqanswers forum, but have not received any feedback. I am working with
RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/
). The two files are
100bp paired-end reads, multiplexed with barcoding to distinguish samples. The barcodes
are the first four bases of the sequences in the s_7_1_sequence.txt file.
Would the following Galaxy workflow be correct?
1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ
files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference
The problem I am having is that if I select paired-end for the library in Tophat, it
requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ
files? If there is a more standard way to handle these types of barcoded files, I would
appreciate hearing about this workflow.
Thanks very much in advance,
P.S. Galaxy is an incredibly useful resource. Thanks!
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists,
please use the interface at: