Hello,
I posted to the seqanswers forum, but have not received any feedback. I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The two files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.
Would the following Galaxy workflow be correct?
1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected 2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ 3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files 4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group 5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome
The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? If there is a more standard way to handle these types of barcoded files, I would appreciate hearing about this workflow.
Thanks very much in advance, jjw
P.S. Galaxy is an incredibly useful resource. Thanks!
Hello Jeffrey,
Yes, you have this correct, please use the Barcode splitter/Splitter tool as you describe. Creating a workflow (if you haven't already) from your history after running on one dataset would be a way to simplify running the same analysis on future datasets.
Apologies for the delay in reply,
Best,
Jen Galaxy team
On 4/20/11 6:52 AM, Whyte, Jeffrey wrote:
Hello,
I posted to the seqanswers forum, but have not received any feedback. I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The two files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.
Would the following Galaxy workflow be correct?
- Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
- Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
- Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
- Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
- Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome
The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? If there is a more standard way to handle these types of barcoded files, I would appreciate hearing about this workflow.
Thanks very much in advance, jjw
P.S. Galaxy is an incredibly useful resource. Thanks!
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
galaxy-user@lists.galaxyproject.org