Hello Kristen,
Our RNA-seq tutorial and FAQ can help out with the general
workflow:
https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise
https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq
And an iGenomes reference annotation GTF dataset for mm9 is
in the Shared Libraries here:
(Import " genes.gtf" to your history, please ignore other
content as it is under revision)
http://usegalaxy.org
-> Shared Data -> Data Libraries -> iGenomes ->
mm9
To address your questions, one key misunderstanding may be
the difference between a "reference genome" and a "reference
annotation" dataset.
* "reference genome" = genomic sequence (sourced in .fasta
format) that the data is mapped against with TopHat and used
as a scaffold for the RNA-seq tools. Since you are using
mm9, selecting the "built-in index" for mm9 is an
appropriate choice. A reference genome does not provide
annotation beyond genomic positional coordinates. When using
a mapping tool, including TopHat, there are mapping
parameters that can be set to specify whether to keep only
the best or all hits - it sounds as if you need to adjust
these parameters in your run. The filter you ran (question
#2) may have removed most or all hits - check the output
from the SAM filter, was the output greatly reduced or
empty? If so, re-run TopHat with parameters that keep the
best hit from the start and move to Cufflinks from there
without filtering through SAMTools. Help is on the tool form
itself and in the links to the manual.
* "reference annotation" = known transcripts (sourced in
.gtf or .gff3 format) that are also mapped against the
reference genome. These transcript annotations are the most
useful when they contain gene, transcript start site, and
other key attributes that the Cuff* tools can interpret.
This annotation can guide assembly at various levels (loose
or strict) depending on how the tool parameters are
configured. The annotation MUST be mapped to the same exact
reference genome that your FASTQ datasets are mapped to,
with the same exact chromosome naming (see the RNA-seq FAQ
for details). Help is also on the Cuff* tools including
links to the manuals.
More help, including links to tool help is on our wiki here:
(see ' Tools on the Main server: Example: unexpected results
with RNA-seq analysis tools.)
http://wiki.g2.bx.psu.edu/Support#Interpreting_scientific_results
Hopefully this helps,
Jen
Galaxy team
On 6/13/12 7:07 AM, Kristen Roop wrote:
Hello,
Galaxy Main
1.) I am having trouble adding annotations to my
Tophat and Cufflinks tools.
I used the Mus.Musculus 9MM reference using the built in index.
For the Tophat mapping but no annotations were
available in the output files.
I then tried converting the the Ref Genome from the
UCSC to a SAM file using Sam Tools. Tophat would not
recognize this but Cufflinks did. The Cufflinks output
file did not have the annotation either.
Any thoughts on the proper way to add annotations?
2.) I am also trying to filter the single mapped
reads from the multiple mapped reads that resulted from
Tophat. After converting the output file from Tophat I
used the filter tool in the Sam Tools choosing
0x100 map is not primary. Afterwards
I tried to run Cufflinks on the filtered output only to
have it fail.
My ultimate goal is to look at RNA seq gene
expression. I know that I have to upload my files ->
groom using FASTQ groomer -> download a reference
sequence from UCSC -> convert the reference genome
file to a usable format ->Run Tophat for mapping
using the groomed file and the converted reference
annotation -> Filter the single mapped reads ->
Run cufflinks using the filtered single mapped reads
from Tophat.
From here I will continue with some other statistical
analysis but right now I need to get this basic pipeline
to work.
Thanks,
Kristen Roop
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://galaxyproject.org